Ask HN: Is "ethical AI" possible, or is there a catch?

1 mrdependable 3 6/30/2025, 5:34:43 PM
I have been reading about companies like Asteria that say they only train their model on licensed work. Is it actually possible that they would license enough content to train their own model to compete with something like Veo, or am I misunderstanding what they mean? Are they just fine-tuning another model on their own dataset?

Comments (3)

PaulHoule · 4h ago
You could train reasoning models on 100% synthetic data.
mrdependable · 4h ago
That sounds like the data equivalent of money laundering.
PaulHoule · 4h ago
(1) Could be, doesn't have to be. If you want to teach it to do physics and math problems, it doesn't have to be.

(2) I'm kinda sick of how people never got excited about Google extracting 99% of the value generated by the web but now that it's just a little more democratic people start worrying that the horses left the barn... a decade ago.

Locking everything behind Cloudflare cement's Google's monopoly and slams the door in front of any "exit" from the enshittification of the web.

I can't get that morally outraged that Elon Musk stole a trillion turds to train grok, I mean, what does it buy him, something that can write the latest Kanye West Song?

(3) From the current perspective, copyright is responsible for a "dark ages" that runs from the public domain horizon to around 2010 or so when you can scrape lots of stuff on the web. To many people today the Roman Empire might be real, even Emily Bronte and Shakespeare, but Watergate never happened.