I remember the folks here who were dragging the Internet Archive for controlled digital lending, just trying to be a digital library, like it was infringing on authors by attempting to get back what was taken by publishers requiring libraries to buy licenses for ebooks that expired and had to be repurchased time and time again. “Universal access to all knowledge.”
Now, with judicial opinions on this fair use firming up, I am hopeful this will allow them to train on every book they’ve ever acquired and release those models to the world.
JumpCrisscross · 9h ago
> folks here who were dragging the Internet Archive for controlled digital lending, just trying to be a digital library
The problem with the Internet Archive was it jumped into uncontrolled lending. Basically, there was no practical reason to buy a book that the Internet Archive would "lend" you. That simply isn't true for an LLM citing a book, which may still cause me to read it.
harshreality · 8h ago
During the initial COVID lockdown, IA implemented their National Emergency Library lending program, which is the only uncontrolled (not 1:1) lending of works known to be under copyright that I know of:
1. Almost all physical libraries, including at universities, had shut down, and IA's efforts were to compensate for that. Supply of previously-purchased-for-public-use books had dried up overnight.
2. Normal 1:1 lending was not observed, but DRM was still applied. No books lent during that period were usable outside of a narrow window, without the end user intentionally circumventing the DRM which is a distinct crime and isn't trivial for the average computer user.
3. IA entered into dialogue with major academic publishers before implementing the emergency library. They published an opt-out address on their blog. That wasn't as visible to all publishers or independent authors as it probably should have been, but everyone knows if it had been opt-in instead, authors and publishers all would have reflexively refused, despite the unique situation. Mainstream publishers and most authors simply do not care if people can't access library books for a month or two. They would ban libraries if they could. They're so high on their intellectual property rights they think those can't be relaxed no matter what.
4. As far as I know, all NEL books were scanned PDFs, from donated library book collections. Casual readers don't read scanned books for entertainment. The idea that this significantly displaced kindle purchases, for instance, or lending of real ebooks from local libraries, needs a citation.
Disclosure: No affiliation besides sending them financial support as well as physical collections. I am a supporter, to put it mildly, because we need libraries that cannot be burned.
[section removed; refer to harshreality's context in sibling comment, which is more thorough than what I had commented]
As a reasonable person who believes one should be reasonable until they can no longer be reasonable, as a hacker, this is a green light to be unreasonable as it relates to onerous copyright regulations infringing on the commons. Copyright stakeholders took too much from the commons, and this enables people to take back (with a combination of judicial review and technology).
> While there was a brief period during COVID where lending restrictions were removed from the lending system due to quarantines and shutdowns (and you could quite literally not get into some physical libraries)
This is the moment their liability accelerated. It was stupid, impulsive and put--and may continue to put--the whole project at risk.
toomuchtodo · 9h ago
Well, they made a bold move to protect what they believe in as a bona fide library [1], and while it was not successful, I am more than happy to see a bigger bully come along and even things out. Copyright holders aren't just going to allow fair use, or let profits go uncaptured, ya know? They're going to keep taking until there is nothing left and everything is locked behind a paywall for ~150 years [2] (life + 70 years).
Edit (wrt your reply): Drop into a weekly lunch at the SF location and have a conversation on this. Additional context is always helpful, I find.
> Copyright holders aren't just going to allow fair use
What they allow doesn't matter. Under controlled lending, the Archive was operating within precedent. I'm not against launching a test case for uncontrolled lending. But doing it with the entire library, irrespective of publication date or jurisdiction, and through the main organisation was just stupid.
mistrial9 · 8h ago
people were in a panic under lockdown here in urban California, and elsewhere. The whole of the Internet Archive is experimental and out-of-the-box.. It is beyond unjust to watch corporate data hoarders profit at large while IA gets excoriated by fellow "smarty" people. IA is wonderful in their weird ways.
JumpCrisscross · 8h ago
> beyond unjust to watch corporate data hoarders profit at large while IA gets excoriated by fellow "smarty" people. IA is wonderful in their weird ways
I love the Archive. And as I said above, I'm not fundamentally against uncontrolled lending as a legal test case. But doing it with the entire catalogue, instead of just e.g. older books, and doing it out of the main organisation such that there is no legal segregation between the experiment and the entire project struck me as incredibly impulsive. They're wonderful and weird. But perhaps not the best steward of what's pitched as an archive.
mosdl · 9h ago
There is a big difference between the public able to read the books vs using them build a commercial product. Especially if that product will be used to generate commercial work that competes with your work.
jjk166 · 8h ago
Is there? Copyright is not supposed to prevent the generation of commercial work that might compete with your own.
jplusequalt · 9h ago
>I hope they train on every book they’ve ever acquired and release those models to the world
Yay, more AI slop to pollute the internet with.
toomuchtodo · 9h ago
Was Google Search slop?
Edit: I respect your position to not engage. We can just do things regardless of those who would disagree (which is good imho when the risk of harm is low and the benefit potentially high), in this case while still adhering to the law as set forth by a court. The judicial interpretation guides that you can train if you own a copy of the work; build accordingly.
I'm not going to engage with anyone who tries to force a weak comparison with an LLM to a different kind of technology. I think we as developers and scientists are well aware that these LLMs are sufficiently different from what's come before for such a comparison to work.
subscribed · 8h ago
Llama is open weight and they have been caught training on torrents of the copyrighted books, so you can use it already :)
criddell · 9h ago
Legally acquired copyrighted books.
> Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI.
No comments yet
mmonaghan · 8h ago
It's the right move and authors + publishers should be rooting for it. Either your work lives in the corpus of human knowledge which AI's will increasingly reflect more perfectly over time or you're forgotten. You've also got precedent that they have to pay for access to your work like any other human.
As long as they're not violating copyright laws in output, it's fine and good.
bgwalter · 8h ago
You have a nice book there. It would be a shame if something happened to it!
phendrenad2 · 9h ago
While this may seem like a big issue, it's really to be expected. Asking judges to rule on new technology applied to old laws is like asking a bus driver to design an energy efficient bus motor. Judges are technicians, not scientists. They apply the law, they can't think creatively about new laws. For that, we have experts (political think-tanks). And I'm sure political think-tanks are kicking into overdrive as they realize that ramifications of this ruling. This will have an impact, but it's hard to determine what that will be. To some degree, it will disincentivize writing books. If this ruling only applies to SELLING books, then some people will make their books subscription-only, and will test this law against that (do AI companies need a perpetual subscription if they've trained an AI on a subscription-only book?). If AI companies are the primary consumers of books, and everyone just gets their information from AI, then direct-to-consumer books will cease to be a thing, and authors will sell their books directly to AI companies for $100,000 or $1,000,000.
JumpCrisscross · 9h ago
> judges are technicians, not scientists. They apply the law, they can't think creatively about new laws
You've never read an opinion that required creativity?
phendrenad2 · 8h ago
THAT's the part of my post you disagree with? lmao
JumpCrisscross · 8h ago
> THAT's the part of my post you disagree with?
Yes. Judges in a common-law system don't just "apply the law," they literally make law. Treating judges as automatons fundamentally misunderstands their role, which makes any predictions based on that misunderstanding likely specious.
phendrenad2 · 5h ago
I think we're getting hung up on a technicality here, so I'll concede that judges "make law" in whatever way you say they do.
And I don't just base my predictions on that understanding, it's more of a pass-through thing. I base my predictions on observed reality, where there is a legislative branch that makes laws and a judicial branch that interprets them within narrow confines (supposedly, they basically have free reign to go wild and say up is down and down is up, but that rarely happens because they don't want to be laughed at in public)
megaman821 · 9h ago
I don't think there should be many (if any at all) copyright restrictions on training. This doesn't give AIs the right to violoate copyright laws in its output. Plus stopping AIs from training may also stop just gathering metadata on materials like word frequency, genre or protaganist age.
This might lead to some creators and publishers silo-ing off valuable content in tightly controlled environments. Tightly controlled in terms of both DRM used to prevent screen/web scraping and potential contractual obligations restricting use for training if granted access.
Think Blu-Ray DRM but for more than video, it's already happened with publishers and college textbooks.
DamnInteresting · 8h ago
I say this as an author who has definitely had a lot of my work slurped up by these machine-learning goblins: This was the right call. I learned to write by reading other authors' works, so I'd have to be quite the hypocrite to stop others from learning from me. Still, it makes me sad and tired to know that I'm unwittingly training my own replacement--one that will never be sad or tired itself.
In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans. As a society we may need to consider some better protections for an artist's/studio's style, otherwise distinct and novel and interesting styles will become watered down into a sea of bland mimicry until the sweet release of the heat death of the universe.
123yawaworht456 · 7h ago
>In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans.
it's not a gray area when humans do it, so it's not a gray area
well over 50% of those obnoxiously loud anti-ai artists make a living off fan art, and until 3 years ago, copyright concerns would be scoffed at
bgwalter · 8h ago
“Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, ‘Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.’”
More smug nonsense emanating from Misanthropic. There is no creativity that is enabled. People tweak the prompts like children until something that was stolen from others emerges.
Most people working on "AI" have never created anything substantial. They rely on utilizing other people's creations. It is very sad that Alsup caves to big tech and issues a vibe ruling.
meristohm · 9h ago
Huh. Us meatbags are not just artificial intelligences, we're organic intelligences and thus more important than robots (who are not alive; even if we make golems and infuse them with "life", they are not human animals), and all of us are in training throughout our lives, so this means training on copyrighted material is fair use.
Edit: I see another commenter, presumably human, clarified: "legally-acquired copyrighted books"
Even with the arguments about AI being potentially helpful to disabled humans, one healthier route is to help each other out directly instead of dividing and conquering with technology, in the name of helping. Feels like one of the aims of Capitalists is to put us each into our Matrix (1999 movie) battery capsules and bleed us dry while we're distracted.
Now, with judicial opinions on this fair use firming up, I am hopeful this will allow them to train on every book they’ve ever acquired and release those models to the world.
The problem with the Internet Archive was it jumped into uncontrolled lending. Basically, there was no practical reason to buy a book that the Internet Archive would "lend" you. That simply isn't true for an LLM citing a book, which may still cause me to read it.
1. Almost all physical libraries, including at universities, had shut down, and IA's efforts were to compensate for that. Supply of previously-purchased-for-public-use books had dried up overnight.
2. Normal 1:1 lending was not observed, but DRM was still applied. No books lent during that period were usable outside of a narrow window, without the end user intentionally circumventing the DRM which is a distinct crime and isn't trivial for the average computer user.
3. IA entered into dialogue with major academic publishers before implementing the emergency library. They published an opt-out address on their blog. That wasn't as visible to all publishers or independent authors as it probably should have been, but everyone knows if it had been opt-in instead, authors and publishers all would have reflexively refused, despite the unique situation. Mainstream publishers and most authors simply do not care if people can't access library books for a month or two. They would ban libraries if they could. They're so high on their intellectual property rights they think those can't be relaxed no matter what.
4. As far as I know, all NEL books were scanned PDFs, from donated library book collections. Casual readers don't read scanned books for entertainment. The idea that this significantly displaced kindle purchases, for instance, or lending of real ebooks from local libraries, needs a citation.
There's plenty of additional background here: https://blog.archive.org/national-emergency-library/
[section removed; refer to harshreality's context in sibling comment, which is more thorough than what I had commented]
As a reasonable person who believes one should be reasonable until they can no longer be reasonable, as a hacker, this is a green light to be unreasonable as it relates to onerous copyright regulations infringing on the commons. Copyright stakeholders took too much from the commons, and this enables people to take back (with a combination of judicial review and technology).
https://www.library.upenn.edu/news/hachette-v-internet-archi...
https://blog.archive.org/tag/controlled-digital-lending/
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
This is the moment their liability accelerated. It was stupid, impulsive and put--and may continue to put--the whole project at risk.
Edit (wrt your reply): Drop into a weekly lunch at the SF location and have a conversation on this. Additional context is always helpful, I find.
[1] https://blog.librarylaw.com/librarylaw/2007/07/internet-arch...
[2] https://www.copyright.gov/help/faq/faq-duration.html
What they allow doesn't matter. Under controlled lending, the Archive was operating within precedent. I'm not against launching a test case for uncontrolled lending. But doing it with the entire library, irrespective of publication date or jurisdiction, and through the main organisation was just stupid.
I love the Archive. And as I said above, I'm not fundamentally against uncontrolled lending as a legal test case. But doing it with the entire catalogue, instead of just e.g. older books, and doing it out of the main organisation such that there is no legal segregation between the experiment and the entire project struck me as incredibly impulsive. They're wonderful and weird. But perhaps not the best steward of what's pitched as an archive.
Yay, more AI slop to pollute the internet with.
Edit: I respect your position to not engage. We can just do things regardless of those who would disagree (which is good imho when the risk of harm is low and the benefit potentially high), in this case while still adhering to the law as set forth by a court. The judicial interpretation guides that you can train if you own a copy of the work; build accordingly.
When Search Engine Services meet Large Language Models: Visions and Challenges - https://arxiv.org/html/2407.00128v1
I'm not going to engage with anyone who tries to force a weak comparison with an LLM to a different kind of technology. I think we as developers and scientists are well aware that these LLMs are sufficiently different from what's come before for such a comparison to work.
> Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI.
No comments yet
As long as they're not violating copyright laws in output, it's fine and good.
You've never read an opinion that required creativity?
Yes. Judges in a common-law system don't just "apply the law," they literally make law. Treating judges as automatons fundamentally misunderstands their role, which makes any predictions based on that misunderstanding likely specious.
And I don't just base my predictions on that understanding, it's more of a pass-through thing. I base my predictions on observed reality, where there is a legislative branch that makes laws and a judicial branch that interprets them within narrow confines (supposedly, they basically have free reign to go wild and say up is down and down is up, but that rarely happens because they don't want to be laughed at in public)
Think Blu-Ray DRM but for more than video, it's already happened with publishers and college textbooks.
In my view, one real gray area is in image/video generation, especially "x in the style of y" kinds of shenanigans. As a society we may need to consider some better protections for an artist's/studio's style, otherwise distinct and novel and interesting styles will become watered down into a sea of bland mimicry until the sweet release of the heat death of the universe.
it's not a gray area when humans do it, so it's not a gray area
well over 50% of those obnoxiously loud anti-ai artists make a living off fan art, and until 3 years ago, copyright concerns would be scoffed at
More smug nonsense emanating from Misanthropic. There is no creativity that is enabled. People tweak the prompts like children until something that was stolen from others emerges.
Most people working on "AI" have never created anything substantial. They rely on utilizing other people's creations. It is very sad that Alsup caves to big tech and issues a vibe ruling.
Edit: I see another commenter, presumably human, clarified: "legally-acquired copyrighted books" Even with the arguments about AI being potentially helpful to disabled humans, one healthier route is to help each other out directly instead of dividing and conquering with technology, in the name of helping. Feels like one of the aims of Capitalists is to put us each into our Matrix (1999 movie) battery capsules and bleed us dry while we're distracted.