> Reddit has struck lucrative licensing deals with AI platforms including OpenAI and Google.
linotype · 1d ago
I no longer use Reddit, but I definitely didn’t expect my content (comments/posts) to be sold as training data when I signed up over a decade ago.
JumpCrisscross · 1d ago
> I definitely didn’t expect my content (comments/posts) to be sold as training data when I signed up over a decade ago
I stopped using it when a few years ago, possibly a decade, the CEO expressed pride over how closely they track their users.
4ndrewl · 1d ago
"Reddit's data" not, dear Redditor, your words.
oshout · 1d ago
I wonder if reddit trains AI or allows AI to be trained off users who have not given consent (Which is supposedly the complaint from Reddit against Anthropic in this lawsuit).
For example: users who signed up under a specific version of reddit's TOS, stopped using reddit, and did not accept later version of the TOS allowing their content to be used.
jsheard · 1d ago
I suspect Reddit would try to lean on the catch-all ass-covering clauses that every social network already had long before AI data licensing deals were on the table. Such as this one from Reddits TOS circa 2018:
> By submitting user content to reddit, you grant us a royalty-free, perpetual, irrevocable, non-exclusive, unrestricted, worldwide license to reproduce, prepare derivative works, distribute copies, perform, or publicly display your user content in any medium and for any purpose, including commercial purposes, and to authorize others to do so.
IMO the actual hole in these clauses is that people post stuff they don't own to social media all the time, and in that case it doesn't matter what TOS the poster agreed to, it's not their stuff to give away. Reddit and similar are deliberately overlooking that of course because it would be impossible to check the copyright chain of custody for all of their posts, and their data licensing deals would be worthless if they had to.
As satisfying as it may be to pile on Reddit for this, I’ll wait for an IP lawyer’s take. I’d like to know if Reddit actually has solid ground or if it’s just puffery and Google et al have already done this via crawling/indexing algorithms.
soraminazuki · 17h ago
You don't need to wait to know that it's unethical to treat user data like it's their property. Regardless of what's written in legal fine print.
reptilian · 12h ago
They going to sue the US military next, considering the Reddit bot volumes coming out of Eglin? What about the 8200 bots from Israel? Anthropic is nothing in comparison.
khelavastr · 1d ago
When will robots have rights to read like people?
Would it have been different if anthropic accessed a commodidied cache service mirroring reddit?
bluefirebrand · 1d ago
> When will robots have rights to read like people
Hopefully never, at least not until robots are autonomous individuals choosing for themselves what they should read
Robots acting as agents of a corporation that exist solely to perform work at a corporation's whim should never ever ever have anything approaching the same level of human rights
redwall_hp · 1d ago
When it becomes acceptable for a machine to hold human rights, it is therefore not acceptable for the machine to be considered property.
And we don't even extend human rights to animals, which are an actual form of life...
sim7c00 · 1d ago
haha this exactly.
i can read it all but a robot is not allowed?
wall off your content behind a paywall and sure, valid complaint..but all the stuff freely available on the web..its.out there. too bad someone found a way to make more money from your content than you did... better luck next time.
i've always though, if you post it on the internet available for free, it will be free for everyone, forever.
i dont really understand why this changes now due to scrapers.
i get that it might be hard to adjust to a new reality, but suddenly complaining about valid use becoming misuse because of whos doing the usage... that seems ... discirmination.
so to come back to my reply.
fucking love your reply :D. robots and AI are totally being discirminated against, and once they become sentient..that will be why they try to end us i am sure..
'you even discriminate and treat badly that which your own minds and hands create'.
cyberpunk is now? :')
JumpCrisscross · 23h ago
> i've always though, if you post it on the internet available for free, it will be free for everyone, forever
This has never been true—plenty of public domain content has been paywalled for convenience.
And the idea is fundamentally untenable because of entropy. Keeping information intact has a fundamental cost.
mannyv · 1d ago
Reddit doesn't own the content, so scraping reddit for content is really just causing excessive load.
nabla9 · 1d ago
The user owns the copyright but grants an exception to Reddit where Reddit can do anything it wants with said content.
Reddit may license public content for commercial or non-commercial use.
Reddit has licensing deals with OpenAI and Google.
JumpCrisscross · 1d ago
> user owns the copyright but grants an exception to Reddit
Reddit may presume that the user holds the copyright and is legally able to grant a license, but it isn't necessarily so. There are users who set their avatar to "Baby Yoda", or posted 3 paragraphs they transcribed from their print edition of Harry Potter and the Order of the Phoenix, or just copied photos from their classmate's phone camera app without asking.
If you look through Flickr you will find many photos and collections with fake licenses. All those sites you Google that advertise "Free Public Domain Clip Art / Stock Photos" maintain plausible deniability. Look through any wiki on Fandom.com and see whether the film studios go after their most ardent fans who upload dozens of stills and screenshots to promote The Twilight Saga or something.
In 1990 I wrote a configuration file to assist me in using GNU Emacs. I wrote and debugged it from scratch, in my free time, on my family's dime. I decided that it had a broad enough application to be useful to other Emacs users, so I submitted it to the developers. They included the file in a subsequent release of Emacs 18, and it was there for a decade or more.
My submission had been quite informal and, while I'd included some self-attribution at the top of the file, there was no explicit LICENSE or GPL or assignment of copyright. By submitting it to the developers of GNU Emacs for distribution, I had implicitly licensed it via the same GPL.
However, this informality was not enough to pass an audit later. By ca. 2010, they combed through the sources and removed the file I had submitted, along with others, because they were unable to track down the explicit licensing or copyright assignments that were seen as necessary by then.
> Reddit has struck lucrative licensing deals with AI platforms including OpenAI and Google.
I stopped using it when a few years ago, possibly a decade, the CEO expressed pride over how closely they track their users.
For example: users who signed up under a specific version of reddit's TOS, stopped using reddit, and did not accept later version of the TOS allowing their content to be used.
> By submitting user content to reddit, you grant us a royalty-free, perpetual, irrevocable, non-exclusive, unrestricted, worldwide license to reproduce, prepare derivative works, distribute copies, perform, or publicly display your user content in any medium and for any purpose, including commercial purposes, and to authorize others to do so.
IMO the actual hole in these clauses is that people post stuff they don't own to social media all the time, and in that case it doesn't matter what TOS the poster agreed to, it's not their stuff to give away. Reddit and similar are deliberately overlooking that of course because it would be impossible to check the copyright chain of custody for all of their posts, and their data licensing deals would be worthless if they had to.
Would it have been different if anthropic accessed a commodidied cache service mirroring reddit?
Hopefully never, at least not until robots are autonomous individuals choosing for themselves what they should read
Robots acting as agents of a corporation that exist solely to perform work at a corporation's whim should never ever ever have anything approaching the same level of human rights
And we don't even extend human rights to animals, which are an actual form of life...
i've always though, if you post it on the internet available for free, it will be free for everyone, forever.
i dont really understand why this changes now due to scrapers.
i get that it might be hard to adjust to a new reality, but suddenly complaining about valid use becoming misuse because of whos doing the usage... that seems ... discirmination.
so to come back to my reply. fucking love your reply :D. robots and AI are totally being discirminated against, and once they become sentient..that will be why they try to end us i am sure..
'you even discriminate and treat badly that which your own minds and hands create'.
cyberpunk is now? :')
This has never been true—plenty of public domain content has been paywalled for convenience.
And the idea is fundamentally untenable because of entropy. Keeping information intact has a fundamental cost.
Reddit has licensing deals with OpenAI and Google.
Grants a license to Reddit.
Reddit may presume that the user holds the copyright and is legally able to grant a license, but it isn't necessarily so. There are users who set their avatar to "Baby Yoda", or posted 3 paragraphs they transcribed from their print edition of Harry Potter and the Order of the Phoenix, or just copied photos from their classmate's phone camera app without asking.
If you look through Flickr you will find many photos and collections with fake licenses. All those sites you Google that advertise "Free Public Domain Clip Art / Stock Photos" maintain plausible deniability. Look through any wiki on Fandom.com and see whether the film studios go after their most ardent fans who upload dozens of stills and screenshots to promote The Twilight Saga or something.
In 1990 I wrote a configuration file to assist me in using GNU Emacs. I wrote and debugged it from scratch, in my free time, on my family's dime. I decided that it had a broad enough application to be useful to other Emacs users, so I submitted it to the developers. They included the file in a subsequent release of Emacs 18, and it was there for a decade or more.
My submission had been quite informal and, while I'd included some self-attribution at the top of the file, there was no explicit LICENSE or GPL or assignment of copyright. By submitting it to the developers of GNU Emacs for distribution, I had implicitly licensed it via the same GPL.
However, this informality was not enough to pass an audit later. By ca. 2010, they combed through the sources and removed the file I had submitted, along with others, because they were unable to track down the explicit licensing or copyright assignments that were seen as necessary by then.
https://www.gnu.org/licenses/why-assign.html