China Will Win at AI Because of Elsevier
Elsevier doesn’t just sell access to human readers. It aggressively enforces licenses that prohibit text and data mining for machine learning. Even universities that pay for journal access often find their AI research groups barred from using that content to train models. The terms are clear: you can read the paper—but your model can’t.
Meanwhile, China ignores these restrictions. Its researchers operate with centralized access to nearly every major Western journal. In many cases, they use institutional mirrors, semi-legal repositories, or just direct scraping. Tools like Sci-Hub are quietly tolerated or integrated into internal systems. Whether legal or not, the outcome is clear: China’s models are learning from the full scientific corpus.
In the West, researchers are stuck paying Elsevier for access, and still told they can't use it for machine learning unless they strike special deals—which are expensive, limited, or flatly denied.
Everyone talks about compute. But the real long-term advantage lies in training data. If China is feeding its models every scientific paper ever published, and Western models are trained on Reddit, Wikipedia, and scraped blogs—who's really ahead?
We’ve put up massive walls around our most valuable content and then told our own researchers to innovate with scraps. Elsevier’s copyright model was designed for print-era publishing—but it now acts as a national AI tax.
If AI is the new electricity, Elsevier is the dam. And China built a bypass.
p.s. I changed the text, after seeing how the formatting here gets stripped.
English speaking countries are going to have a mega advantage here.