We give data to train AI models and get nothing in return

2 whooocareslol 2 5/18/2025, 3:16:27 PM
I’m less worried about being replaced by AI and more frustrated that companies are stealing our data to train AI models they profit from with potential to make us less valuable over time.

Whether you’re:

- A coder writing clean, reusable functions or internal tooling,

- A UGC creator making tutorials or product demos,

- A data labeller doing precise annotations...

…all of that labor creates intellectual property that ends up training AI models.

But here’s the problem: we don’t own any of it, even though it wouldn’t exist without us.

They take our data—by hook or by crook—train a model, and extract massive value from it, while paying us nothing or, at best, a small one-time fee.

Yes, companies do play a valuable role. But they are using our work to replace us or devalue our work. So we have every right to ask for more.

If you really think about it, data mining is much like mineral mining — just as companies extract valuable resources like gold or diamonds from the earth, often exploiting labor and poorly governed regions, data mining extracts value from a poorly managed pool of people and their data, frequently without their full knowledge or consent regarding how it will be used.

I think now is the right time to build fairer systems around data for everyone—royalties? data unions? open ownership of internal contributions within companies?

This business model isn't new—some data sourcing and collection companies charge not only a one-time fee but also a usage-based fee each time the data is used.

Doing this is not only necessary to make the data supply chain fair, but also to improve AI. We all know that AI performance scales with compute, and the best way to leverage increasing compute is by applying it to new data. So, if we want AI to continue improving, we need a proper data supply chain. And if we want high-quality data for more complex tasks, we must ensure that everyone is paid fairly.

Would love to hear your thoughts on this.

Comments (2)

babyent · 3h ago
Any code you write for your company where you’re a contractor or w2 is not “your” code. It isn’t yours, it belongs to the company.

The company benefits because your code makes the models better which makes engineers more productive.

airylizard · 3h ago
The data "supply chain" has already surged ahead of production elsewhere. Companies aren't just passively taking what's out there, they actively harvest highly curated content, benefiting even further when we voluntarily correct and refine their models. Heck, some of us are even paying them for the privilege of training AI. The best time to have made this argument would've been when GPT originally released, but I think most people were too enamored with it to care and the idea it would be "open-source" meant we'd get it back at the end of the day.

Unrelated, but this is exactly why I've been spending time building my AI framework (TSCE). The idea is to leverage these open-weight LLMs, typically smaller and accessible, to achieve accuracy and reliability comparable to larger models. It doesn't necessarily make the models "smarter" (like retraining or fine-tuning might), but it empowers everyday users to build reliable agentic workflows or AI tools from multiple smaller LLM instances. Check it out: https://github.com/AutomationOptimization/tsce_demo