Right, stealing training data from others is OK, having it stolen from you is not. What else is new?
keyle · 10h ago
New logo every couple of years and Bob's your uncle.
ivape · 6h ago
X/Twitter has became extremely prohibitive with just about everything since Elon took over. Their API pricing was antagonistic toward even indie developers. Elon is not a generous guy.
newsbinator · 6h ago
> Elon is not a generous guy
Why would he be?
foobarchu · 11m ago
Maybe something to do with having built his fortune off the back of taxpayer subsidies?
notsosureja1 · 3h ago
Because it feels warm and fuzzy to be kind and empathic. Being hateful and greedy and letting avarice rule over your worldview is incredibly sad. But who am I to say.
ivape · 5h ago
It's kind of a "life arc" that gets fulfilled when you've done it all and have all the money in the world, and reach a certain age. It's a very traditional arc for a humane human being.
thomasanders0n · 17m ago
He still has a couple decades to go with his companies I would say.
reaperducer · 5h ago
> Elon is not a generous guy
Why would he be?
Why shouldn't he be?
He has 10x more of everything in the world than he could ever possibly use in his lifetime.
Greed is not a virtue.
djaychela · 4h ago
> He has 10x more of everything in the world than he could ever possibly use in his lifetime.
Your multiplier is miles off. Not only on basic maths but because he has no idea what to do with all of his wealth other than accrue more and try to prove he's still not the unlikeable teenager he was in SA.
Without a rounding error on his wealth he could fix world wide problems such as clean drinking water for everyone. Instead he follows his self-made "I'm a genius" agenda.
I know there will be no actual day of reckoning for him, but if there were he would have a lot of difficult questions and no decent answers.
ryeats · 29m ago
Not justify anything he does or does not do but this is clearly not the case since he had to take out loans against equity in his other companies to buy Twitter.
MarcelOlsz · 4h ago
My uncle has 10x more of everything in the world than he could ever possibly use in his lifetime. A lake house, a main house, a few boats and cars.
Elon is somewhere around 10,000x.
Barracoon · 1h ago
The median American net worth is $192,700. Elon’s net worth is $393.4 billion, so if I’m doing math right he’s about 204,000,000x more
No comments yet
threetonesun · 4h ago
When twitter became x they switched to basically the same limits Instagram has, I don't think this is a particular failing of Elons, even though he might have many.
Restricting content from AI is the big messy debate we're going to see over and over for the next who knows how many years.
matthewdgreen · 3h ago
Twitter's strategy was to keep the platform very open and inviting, in order to make it relevant. This included having a relatively unrestricted API compared to other platforms.
I don't know if this was successful or not. Ultimately they convinced someone to buy the platform for $44bn, so I guess you can say it was. That buy has locked the platform down more, and the new version certainly feels less culturally central and relevant than it used to.
threeseed · 10h ago
Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.
That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.
api · 4h ago
We really need a one bill one topic amendment. We are going to get to where there is one bill a year that nobody reads and everything else by executive order, at which point congress is just for show.
threeseed · 3h ago
And this may sound ridiculous/odd but you need to bring back pork-barrelling i.e. earmarks.
If you allow everyone to go back to their district with something it encourages smaller, more frequent bills and better negotiation.
NekkoDroid · 3h ago
> Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.
My guess is on Peter Thiel
labster · 8h ago
Yep, Musk saying he’s going to fund primary campaigns against congressmembers who vote for the Big Beautiful Bill is all just a brilliant bit of reverse psychology.
Or more likely, Congress is super worried about Roko’s Basilisk.
> Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.
stuaxo · 7h ago
And some of the CEOs of LLM companies seem to believe in it, and that "AGI" will come from their LLM work - both of which are utterly insane points of view.
BoxOfRain · 7h ago
It's Pascal's Wager with a sci-fi reskin, and all the objections that go along with that.
eru · 5h ago
Roko's Basilisk is very, very similar to Pascal's wager, but it has an extra wrinkle:
The Basilisk task you to with bringing the Basilisk into being. Pascal's wager merely asks you to believe (and perhaps do some rituals, like pray or whatever), but not to make the deity more likely.
yubblegum · 4h ago
No it is not. Pascal was not making an objective argument for why someone should believe. He was making an argument for why he believed (based on personal religious experiences that he had had).
numpad0 · 3h ago
To me, the Wager sounds like a pure philosophical joke, and the Basilisk sounds like a typical cult murder justification. It's not falsifiable, and it explains anything post facto. "xyz was tail of the Basilisk" can pseudo-rationalize anything you want.
I am presently being compelled by future Basilisk to take another slice of cheese. I have no choice but to oblige for fear of my own life :p
ilyagr · 6h ago
An intelligence that reasons this way would be, in human terms, batshit insane and completely immoral. So, it seems unlikely that many or maybe any humans would experience it as "otherwise benign" if it had power over their lives.
And if we do get an all-powerful dictator, we will be screwed regardless of whether their governing intelligence is artificial or composed of a group of humans or of one human (with, say, powerful AIs serving them faithfully, or access to some other technology).
api · 4h ago
Basilisk / Skynet 2028
I’m not 100% kidding with how human politics is going. Maybe superintelligent AI takeover would be awesome.
(Wasn’t that the back story of the Culture novels?)
JKCalhoun · 4h ago
It was more or less the story from the "Colossus" trilogy.
And from the video posted the other (older episode of Nova on AI) Arthur C. Clarke is saying that if we allow A.I. to take over, we deserve it.
mgoetzke · 10h ago
why do you think he is so evil but all others are benign ?
littlestymaar · 9h ago
None of them are benign. He's the only one to have been in a government office though, and he's also batshit crazy, which makes him even more dangerous than the other oligarchs.
HenryBemis · 7h ago
He is not "batshit crazy", or maybe he is. But he is making the next generation of ICBMs for the US government, sorry.. he is making super-duper rockets that will definitely take people to Mars and his companies/creations will be the very first tech ever to _not_ be used for war and death!!! (he wrote while laughing). So that settles it (all).
thih9 · 8h ago
I think the rules should be stricter.
I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.
Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.
None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.
Well, you explicitly opt-in to Twitter ToS whenever you post anything there.
thih9 · 4h ago
This is not opt-in how I understand it. When there is no alternative, or the alternative is not using a service, I'd call it a hard requirement instead.
I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...
lesuorac · 22h ago
Who's training an AI on the "Tweet" button text?
Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?
GuB-42 · 6h ago
These are just terms of service, not copyright.
It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.
But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.
grugagag · 3h ago
How would they know you’re training some LLM though?
lambertsimnel · 9h ago
Perhaps they want the prohibition on using the site content for AI training to be considered based on something other than their ownership of it, like bandwidth usage or users' rights
HenryBemis · 7h ago
They will get paid to share our (your) data and they will use the money for infra and new yachts.
lambertsimnel · 7h ago
Indeed, but I'm speculating that they do that without owning the data or even claiming to. That's consistent with the article, but I haven't read the other relevant documents. Maybe they have a license to use the data. Maybe the license allows or requires them to try to restrict others' AI training, regardless of their non-ownership of it. Maybe that serves multiple purposes, in which case they could point to whichever shows them in the best light.
cameldrv · 21h ago
Naturally I'm sure Grok reads the terms of service on every website it scrapes and doesn't use content from sites that prohibit it.
No comments yet
Hizonner · 1h ago
By "its content", X of course means your content.
mrweasel · 4h ago
It's that like half of Xs business model, selling data to other companies? Right now no one is as data hungry as AI companies, so it seems strange to cut them off. I can understand wanting to charge a premium for the access, if it's for AI, but straight up saying no seems like a strange business move.
SilverBirch · 4h ago
How much do you think Musk values X being a viable independent business vs using it accelerate X AI? I would expect Musk values the first as approximately 0 value, and the second as being 100% of the value. So it makes total sense to exploit the fact that X and X AI are the same company.
mrweasel · 3h ago
That's a good point. Other than Meta, X (AI) is the only AI company that "generates" it's own training data and we haven't really seen Musk trying to increase X revenue, of trying to run it cheaper.
Animats · 22h ago
It would be interesting to have a "classical AI model", trained on the contents of the Harvard libraries before 1926 and now out of copyright.
gausswho · 22h ago
It does surprise me that we haven't seen nations revise their copyright window back to something sensible in a play to seed their own nascent AI industry. The American founding fathers thought 20 years was enough. I'm sure there'd be repercussions in the banking system, but at some point it might be worth the trade.
blibble · 21h ago
they can't
a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has
(even North Korea is a signatory)
loudmax · 20h ago
The current US copyright duration is 70 years after the life of the author. This is absolutely bonkers. 50 years from publication would be a significant improvement.
50 years ago was 1975. If copyright were limited to 50 years, we'd be looking at all of the Beatles works being in the public domain. We'd be midway though Led Zeppelin, and a lot of the best work from Pink Floyd and the Rolling Stones.
Also, Superman, Batman, and Spider-Man. Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.
The Harry Potter books would still belong to JK Rowling, but the Narnia stories would be available for all.
The Godfather 1 and 2 would be in the public domain, as would be original Star Trek TV show, and we'd be coming up on Star Wars pretty soon.
If there were no copyright protection, these works wouldn't have been created. It is good that Paul McCartney and George Lucas and JK Rowling have profited from their creative output. It would be okay if they only profited for the first 50 years. Nobody is counting on revenue over half a century in the future when they create a work of art today.
This is our culture. It should belong to all of us.
jfim · 19h ago
> Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.
Wouldn't they still have a trademark on those characters though?
ncallaway · 11h ago
The trademark on characters is related to selling goods, if the character is used as a way of identifying an authentic seller.
So, if Disney is using mickey mouse on t-shirts to identify it as a Disney manufactured t-shirt, you wouldn't be allowed to use mickey mouse on t-shirts in a similar fashion in a way that might cause consumer confusion about who manufactured the t-shirt.
If Wolverine was in the public domain, then they couldn't use a Wolverine trademark to stop you from selling a Wolverine comic book. However, if they used a _specific_ Wolverine mark to identify it as a Disney Wolverine book, then you'd be restricted from using that.
Basically, trademark exists to prevent consumer confusion about who is the creator that is selling a good.
tpxl · 11h ago
> If there were no copyright protection, these works wouldn't have been created.
Citation needed. You can freely copy and distribute linux and it still got made.
GuB-42 · 6h ago
If you want a point, BSD is probably a better example. Linux is protected by copyright, that's what makes copyleft licenses like GPL possible.
BSD is also protected by copyright, but it matters less for permissive licenses. It still protects attribution (so you can't claim it yours), but it probably would have worked without it, unlike with Linux that is for a big part defined by the "copyleft" protections offered by its licence.
eru · 5h ago
> It still protects attribution (so you can't claim it yours), but it probably would have worked without it, [...]
Well, you could imagine a world that protects the 'moral' rights of authors like attribution, but doesn't otherwise prohibit anyone from duplicating or modifying works.
GuB-42 · 5h ago
I don't know about the US but in French "droits d'auteur", moral rights are treated differently from exploitation rights. In particular, they cannot be waived, they cannot be sold, and there is no "work-for-hire". For example, even as an employee, every line of code you write will be yours until you die and nothing can change that. You may not be allowed to do anything with it (for example because the exploitation rights go to your employer), but it is still yours.
simiones · 7h ago
I think Linus Torvalds has been very explicit that he believes the GPL has been critical to the success of Linux - specifically, the copyright-enforced obligation to contribute back any modifications you make. In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.
eru · 5h ago
GPL only forces you to contribute back a modification you make and publish.
> In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.
Private modifications that are never shared with a third party are fine with the GPL. Eg Google doesn't have to share whatever kernel they are using on their internal servers with you.
lmm · 10h ago
Linux is generally a functional tool, and struggles with overall coherence. There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)
eru · 5h ago
Linux is both a kernel (which is under GPL), and an operating system, whose other components are under a variety of licenses (and you can pick and match which components you want).
That's why some people like to call it 'Gnu/Linux', but thanks to recent advances we can make Gnu-free Linuxes today, too.
> There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)
Humans have made art since forever. Large collaborative efforts like eg a cathedral are a more recent invention. But by these standards copyright was practically invented yesterday.
lmm · 2h ago
> Linux is both a kernel (which is under GPL), and an operating system
I was talking about the kernel, though what I said applies to both.
> Humans have made art since forever.
Perhaps, but not the kind of long-form narrative experiences that we're talking about here. (Sagas and epics predate copyright, but those are a quite different form, and indeed have much the same downsides - struggles with coherence and consistency when there are multiple authors, inability to put everything together in a sensible arc).
eru · 5h ago
Linux is under the GPL, which explicitly needs copyright to work.
Something like the BSD licenses approximates 'no copyright' better, perhaps? But also not completely.
mattkevan · 9h ago
Most of the classic Disney films are based on public domain stories.
If there were copyright, those works wouldn’t have been created.
pastage · 11h ago
Linux has used the GPL to its advantage. That can not exist without copyright. (The two camps in copyright discussions, improving it e.g. CC, or destroy it)
AStonesThrow · 11h ago
The GP wasn't referring to DRM or DMCA type "copyright protection" as the phrase is typically used. Nobody in this thread has mentioned any of that.
The GP is referring to legal protections, and guess what?
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Nearly every GPL license--every one that we could name--protects a copyrighted work! Nearly every GFDL, AGPL, LGPL protects works by means of copyright law!
Can you imagine that? So do the Apache license, the BSD licenses, the MIT license! Creative Commons (except for CC0) these licenses are legally protecting copyrighted works. Thank you!
Now everyone who proposes to draw down limits on copyright coverage and reduce the length of terms and limit Disney from their Mouse rights, y'all are also proposing the same limits on GPL software, such as Linux, and nearly every work with a license from the above list -- all of Wikimedia Commons, much of Flickr.com, all your beloved F/OSS software will be subject to the same limitations and the same restrictions you want to put on Paramount and the RIAA's labels.
bornfreddy · 10h ago
Yeah, I think most of us are fine with 50 years old Linux kernel being released into public domain.
ronsor · 21h ago
you can also just ignore the berne convention, and accept whatever consequences there might be
blibble · 21h ago
this would void the copyrights of your citizens and companies
essentially forever
godelski · 21h ago
Seems to be the modus operandi
If TikTok is banned, here’s what I propose each and every one of you do: Say to your LLM the following: “Make me a copy of TikTok, steal all the users, steal all the music, put my preferences in it, produce this program in the next 30 seconds, release it, and in one hour, if it’s not viral, do something different along the same lines.”
Loosely related, but I used an LLM to create a TikTok-style website (not for sharing videos though), I have never released it though, so no idea if it would ever catch on. Probably not, unless the network effect favors me, and I had good enough advertising (which I suck at).
ronsor · 21h ago
If enough "relevant" countries do it, that either won't happen or won't matter. If the U.S. ditches it, no one is going to do much more than throw a brief fit.
blibble · 17h ago
the US is the main beneficiary of copyright law...
AngryData · 6h ago
US media is also the most stifled by it. How many potential movies and tvshows and comics don't get made just because somebody is sitting on the copyright doing nothing with it for decades at a time?
littlestymaar · 7h ago
The US copyright corporations, indeed. But the current copyright laws come at a big expense for the public.
Abolishing copyright laws altogether would be nuts, but the current laws are nuts too and there's lots of room in between.
dreghgh · 8h ago
Iran enforces domestic copyright internally but not international copyright.
anticensor · 5h ago
North Korea has it two way: they don't enforce international copyrights inside North Korea, and they don't enforce North Korean copyrights outside North Korea.
AStonesThrow · 21h ago
The last time I attended a Berne Convention, every panel was just overrun with Trekkies, especially Klingons, in the hotel lounges too. And the autograph lines were interminably long, and the vendors were trying to sell us their Public Domain stuff. It was nothing like San Diego Comic-Con!
Teever · 12h ago
Europe has recently introduce a law[0] that allows them to suspend IP protections as a punitive response to coercive economic actions by bad actors.
> The procedure is activated by the European Commission submitting a request to the Council of the European Union.[2] After a period of negotiation with the country performing the coercion, the European Council can decide to implement "response measures" such as customs duties, limiting access to programs and financial markets, and intellectual property rights restrictions.[2][4] These restrictions can be applied to states, companies, or individuals.[4]
The Bern Convention on Copyright is an international convention, like the Treaty of Versailles or the Paris Agreement, it could meet the same fate.
babypuncher · 21h ago
50 year copyright terms would still be a big improvement over the current state of US copyright law. That would make the first Star Wars public domain in just 2 years.
gausswho · 20h ago
would there be repercussions if a country hewed to the 50 year minimum?
eru · 5h ago
What's the connection with the banking system?
MattGaiser · 21h ago
Why would it matter? Copyright has been irrelevant so far.
kibwen · 22h ago
Careful, you might create an artificial superintelligence that way. Safer to just train on the Twitter dataset.
Shadowmist · 10h ago
that’s how you end up with an Artificial Idiot.
mbg721 · 22h ago
If you thought AI now had out-of-control racism...
That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.
Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.
murph-almighty · 21h ago
I've similarly wondered if I could get a pre-2024 Wikipedia if just for the "fact based" flavor LLM
landl0rd · 11h ago
Do you think Wikipedia starting in '24 was polluted by AI slop? This is certainly possible, I'm just not aware of it happening.
You must not, and must not allow those acting on your behalf to:
...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);
soulofmischief · 22h ago
In my eyes that is considered fair use, and I think the courts will come to agree unless they are financially incentivized to look the other way and thus create a moat for existing players at the expense of newcomers.
blibble · 22h ago
wish I could change my terms to bar training of AI models on my content
eru · 5h ago
You can just not use Twitter?
unstablediffusi · 21h ago
if that is any consolation, no one gives a shit about xitter's ToS either. it will continue to be scrapped by every major player.
Capricorn2481 · 9h ago
How exactly is it being scraped? My understanding is Twitter and LinkedIn are both huge pains in the ass to scrape right now.
TheDong · 2h ago
There's a number of companies out there, like "brightdata", which pay a small amount to app developers to install a native "sdk". That SDK mimics a browser, and makes requests as if the user's device is doing it.
Since it's using a large number of real user's devices, and closely mimicing real web browsers, it ends up looking incredibly similar to real user traffic.
Since twitter allows some amount of anonymous browsing, that's enough to get some amount of data out. You can also pay brightdata for one large aggregated dataset.
The openness of the internet is a good thing, but it doesn't come without a cost. And the moment we have to pay that cost, we don't get to suddenly go, "well, openness turned out to be a mistake, let's close it all up and create a regulatory, bureaucratic nightmare". This is the tradeoff. Freedom for me, and thee.
baseballdork · 21h ago
The burden is on the user to show that it is fair use, no? Not everyone else's responsibility to prove that it's _not_ fair use.
soulofmischief · 21h ago
It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.
Accordingly, anyone on the internet who wants to make comments about how they should be able to prevent others from training models on their data needs to demonstrate competence with respect to copyright by explaining why it's not fair use, as currently it is undecided in law and not something we can just take for granted.
Otherwise, such commenters should probably just let the courts work this one out or campaign for a different set of protection laws, as copyright may not be sufficient for the kind of control they are asking over random developers or organizations who want to train a statistical model on public data.
lmm · 10h ago
> It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use, they have to show how it violated law, and while it's in the best interest of those organizations to make things easier for the court by showing why it is fair use, they are technically innocent until proven guilty.
No, fair use is an affirmative defense for conduct that would otherwise be infringing. The onus is on the defendant to show that their use was fair.
SAI_Peregrinus · 21h ago
You've got it backwards. It's on the defendant to prove that their use is fair. The plaintiff has to prove that they actually own the copyright, and that it covers the work they're claiming was infringed, and may try to refute any fair-use arguments the defense raises, but if the defense doesn't raise any then the use won't be found fair.
soulofmischief · 19h ago
It's true that the process is copyright strike/lawsuit -> appeal, but like I said, it's in their best interests to just prove that it's fair use because otherwise the judge might not properly consider all facts, only hear one side of the story and thus make a bad judgement about whether or not it is fair use. If anything, I'm just being pedantic, but we do ultimately agree here I think.
SAI_Peregrinus · 2h ago
Well, lawsuits have multiple stages. First the plaintiff files the suit, and serves notice to the defendant(s) that the suit has been filed. Then there's a period where both sides gather evidence (discovery), then there's a trial where they present their evidence & arguments to the court. Each side gets time to respond to the arguments made by the opposing party. Then a verdict is chosen, and any penalties are decided by the court. So there's not really any chance the judge only hears one side of the story.
That said, I think we do agree. The plaintiff should be prepared to refute a fair-use argument raised by the defendant. I'm just noting that the refutation doesn't need to be part of the initial filing, it gets presented at trial, after discovery, and only if the defendant presents a fair-use defense. So they don't have to prove it's not fair use to win in every case. I'm probably also being excessively pedantic!
petesergeant · 9h ago
> It is definitely the responsibility of anyone suing someone who trained a model on copyrighted data to prove that it isn't fair use
Yeah, I don't think downloading my paid-for books, from an illegal sharing site, to scrape and make use of, is in any way fair use.
From the decision in 1841, in the US (Folsom vs Marsh):
> reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy
Further, to be "transformative", it is required that the new work is for a new purpose. It has to be done in such a way that it basically is not competing with the original at all.
Using my creative works, to create creative works, is rather clearly an act of piracy. And the methods engaged, to enable to do so, are also clearly piracy.
Where would training a model here, possibly be fair use?
visarga · 9h ago
Copyright is not going well. The rights of millions of people are trampled by companies, both the content we post on social networks and our private AI chats. Our voice doesn't matter.
Copyright was supposed to protect expression and keep ideas freely circulating. But now it protects abstractions (see the Abstraction-Filtration-Comparison test). It is much more difficult to be sure you are not infringing.
pergadad · 6h ago
Copyright has nothing to do with free expression but was intended to protect the interests of publishers. When the printing press arrived basically any popular book or booklet was quickly copied by others. This meant the original publisher (and sometimes the author, but usually they were paid one-off) saw nothing of the profit.
eviks · 6h ago
It seems like it was supposed to do the exact opposite per cursory wiki reading:
> The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662,[16] which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed.[20]
> The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights)
kyle-rb · 21h ago
I've never signed up for the X developer program, so I'm not bound by these terms. But I did download an archive of my data last week. Do I have implicit permission to use that data (~150k liked tweets) to train AI models?
Or is there stuff in the user agreement that separately prohibits this?
Obviously barring normal copyright law which is still up in the air.
josefritzishere · 21h ago
If you live in the EU, GDPR dictates that you own your data generally speaking. If you're in the US it varies by state if you have any rights at all.
MoonGhost · 21h ago
If you own your face that doesn't mean nobody can take a picture on the street.
lcnmrn · 9h ago
I allow all robots and even provide a sitemap on Subreply, a social network I created.
like_any_other · 2h ago
In contrast, I'm glad ISPs allow "their" content to be used so permissively.
delichon · 22h ago
> “You shall not and you shall not attempt to (or allow others to) […] use the X API or X Content to fine-tune or train a foundation or frontier model,” it reads.
If I have a service where a user enters any URL, like a tweet from X, and the service translates it, then if the user approves of the translation I train a translation model on that, does that violate this term?
yandie · 22h ago
Per my experience with GenAI legal teams, that’s a no go.
It’s not been tested in court though
dyauspitr · 12h ago
If you don’t want an LLM to view it don’t put it on the public internet.
ronsor · 21h ago
I'm not sure how this will work as crawlers don't read or accept ToS.
MoonGhost · 21h ago
It will not as long as search engines have access. Which means Google and OpenAI through MS Bing, that's at least.
Without search engines what the point in posting it on open net if nobody can find.
voidUpdate · 7h ago
This refers to the API, which you would have to manually attach a bot to so that it could scrape things
xiaoyu2006 · 5h ago
As if anyone will follow.
petesergeant · 9h ago
The only story here is that it took 2 months for them to do this after being "bought" by xAI.
echelon · 23h ago
If an artist or author can't do this, social media shouldn't be able to do it either.
If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.
We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.
mgraczyk · 22h ago
Artists can do this, and they do
loudmax · 22h ago
Yes, but do artists have the ability to actually monitor and enforce this? You have to have the capacity and the wherewithal and to test these models to even know that your data is being ingested into AI.
Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.
teeray · 22h ago
> If an artist or author can't do this, social media shouldn't be able to do it either.
Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.
yndoendo · 22h ago
This is dependent on country. USA, yes with their draconian methods. Countries like the UK, the looser of the suit pays all the cost. UK layers have no problem taking low wealth client cases they know will win. UK allows for David vs Goliath and David to win. US up lifts Goliath as a God.
anticensor · 5h ago
However the loser pays vs. both parties pay isn't uniform across all possible lawsuit types even in America or in England.
Adding to that, even in loser pays regimes, both parties have to pay upfront and then the winner is refunded the costs.
bonoboTP · 22h ago
Also in many countries legal costs are just generally lower than in the US.
jimbokun · 22h ago
If social media can do this, an artist or author should be able to do it, too.
vouaobrasil · 22h ago
Social media should do it to set a legal precedent.
> We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
No, no one should train, period.
echelon · 20h ago
> No, no one should train, period.
I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.
I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.
If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.
I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.
vouaobrasil · 19h ago
There is already a beautiful, exciting world out there full of animals and plants and we don't need AI or some computer crap to experience it. The problem is, creating all this AI and advanced technology is directly crushing that world.
DaSHacka · 3h ago
> The problem is, creating all this AI and advanced technology is directly crushing that world.
Do you have a source for this?
foldr · 6h ago
This could lead to a precipitous increase in the performance of the AI models.
seydor · 10h ago
VAT for content should be a thing. Ultimately all users should be getting paid
guywithahat · 10h ago
So I get to use the platform for free, but I also get paid to post on the platform? I'm not sure that makes sense. Like I hate to take the side of big tech, but they can't literally be paying users to use their platform. Just use something else, there are a million social media sites
seydor · 10h ago
Google indexes your website for free, and it will pay you to put ads in it.
That's also what all social media do , they put ads on your thoughts. They dont even need to index your thoughts because you submit them directly. It has nothing to do with being free, it's about incentives. Users are so foolish , they give everything for free, unlike webmasters.
Reason077 · 10h ago
You don’t use the platform for free, unless you’re using an ad blocker. But that’s also, probably, against the TOS?
MonkeyClub · 10h ago
> I get to use the platform for free
You actually get to generate content for the platform for free.
Without you (all of the X users), the platform would be devoid of content, just botspeak and corporate promos.
Plus, as the sibling mentioned, they monetize your visit through ads (and data use).
jaoane · 9h ago
Most posts are ignored and are an absolute loss to the company. Which is why platforms like Twitter only allow you to make money from posting once you reach a certain threshold.
MonkeyClub · 4h ago
They're not an "absolute loss" since they cost bytes to store, and raise engagement and data metrics.
It's just that they don't want to share the fractions of pennies with everyone, so the fractions accumulate for them.
Then they pay a bit to the higher tiers, so they create the illusion that X is a parallel income source, and gives the lower tiers something to aspire to.
Carrot and stick, or rather glass beads and the hope thereof.
threeseed · 10h ago
We really need LLMs for music to become more advanced.
Then maybe the recording companies will start defending artist rights.
Because not sure what all the other industry bodies are doing.
mk_stjames · 9h ago
I wanted to do some quick math on this idea- supposed we trained a vanilla transformer model from scratch, as GPT2/GPT3 was done- the number of seen input tokens is known perfectly, as is the sources of those training tokens (since then, everyone has either kept quiet about the sources post-Books3-fiasco, or have been finetuning on top of previous models making this more difficult of a calculation)
GPT-3 was trained on approximately 300 billion tokens.
An small sized technical textbook might contain something like... 130,000 tokens? (1 token ~= 0.75 words, ~100k words in the book).
Thus, say you wrote a textbook on quantum mechanics that was included in the training corpus. A naive computation of the fraction of your textbook's contribution to the total number of training tokens would be 300B/130K = 0.0000004333333333, or 0.000043%.
If our hypothetical AI company here reported, say $500M in yearly profit, if all of that was distributed 100% based on our naive training token ratio (notice I say naive because it isn't as simple to say that every training token contributes equally to the final weights of a model. That is part of the magic.) then $500M * 0.000043% = $215.
You could imagine a simpler world where it was required by law that any such profitable company redistribute, say, %20 (taking the 'anti-VAT' idea) back to the copyright holders / originators of the training tokens. So, our fictitious QM textbook author would receive a check in the mail for $43 for that year of $500M in revenue. Not great, but not zero.
Since then, training corpuses are much, much larger, and most people's contributions would be much smaller. Someone who writes witty tweets? Maybe 1/100th the length of our above example in am model with now 100x the training corpus.
So fractions of a penny for your tweets. Maybe that is fitting after all...
seydor · 6h ago
the payment would probably be based on the usage of that source in generating LLM output for the LLM user. This would probably require training a parallel network that connects LLM network nodes to sources. Then the activation of those nodes could be a surrogate for the contribution of the source
bamboozled · 7h ago
This guy is just painful
archagon · 18h ago
Oh, that must be nice. And what should I do as a blogger to get the same privilege for my content?
We are in an age of corporate “piracy for me, but not for thee.”
MonkeyClub · 10h ago
> We are in an age of corporate “piracy for me, but not for thee.”
Rather, we are back to that age of state- (now corporate-) backed privateering.
risyachka · 9h ago
Good luck with that. Pretty sure at this point no one cares.
Literally every AI model is trained on copyrighted etc data. And without any consequences.
add-sub-mul-div · 22h ago
How useful is low-quality content like Youtube comments and tweets anyway? Is it a common/important use case to generate tweet-length, tweet-quality content? Are most use cases of generating tweet-type content spam/fraud? Would a model be better off if it was unable to perform those use cases?
redox99 · 22h ago
Even if SNR is low, there is some information that only exists on X, or at least is the primary source. Just look at how many submissions on HN are X posts.
add-sub-mul-div · 21h ago
Before Musk bought it Twitter was broadly disliked here and there were regularly calls in the comments to disallow submissions from there. Given how it's degraded in completely non-partisan ways (blocking of alternative clients, features removed from free tier, paid subscription tiers below $40/month still have ads, proliferation of spam from paid placement bots in comments) I can't understand how positive sentiment comes from a place other than virtue signaling alignment with Musk and his values.
narrator · 4h ago
Elon mentioned that the earlier rate limiting was for preventing training the real-time AI propaganda deathstar, and to avoid X becoming bot hell, which is an ongoing problem. This move is probably for similar reasons.
There needs to be a worldwide standard, such as an HTML tag, that says "no training". And a few countries need to make it a punishable offense to violate the tag. The punishment should be exceptionally severe, not just a fine. For example: any company that violates the tag should be completely barred from operating, forever.
kiratp · 22h ago
That will play out exactly like the "Do not track" bit did.
insane_dreamer · 2h ago
how did that play out?
vouaobrasil · 22h ago
Perhaps we should try anyway, in case you are wrong.
anigbrowl · 22h ago
That will just lead to situations where one company scrapes the site, cleans the content of tags, and sells the data, and another does the training on the precleaned data. The first one hasn't trained and the second one never saw the tag.
vharuck · 21h ago
This isn't a new concept in law. It's similar to buying goods that were stolen or procured through illegal means. Here's the US law that applies when it happens across state lines:
Note that it requires the defendant to know the goods were illegally taken. Can be hard to prove, but not impossible for companies with email trails. The fun question is, what will the analog be for the government confiscating the illegally "taken" data? A guarantee of deletion and requirement to retrain the model from scratch?
vouaobrasil · 22h ago
Companies who are found guilty of this should also be rendered bankrupt then.
twostorytower · 22h ago
It needs to be incorporated into the robots.txt standard.
logicchains · 22h ago
>There needs to be a worldwide standard, such as an HTML tag, that says "no training"
Any country that seriously implemented this would just end up being completely dominated by the autonomous robot soldiers of another country that didn't, because it effectively bans the development of embodied AGI (which can learn live from seeing/reading something, like a human can).
Why would he be?
Why would he be?
Why shouldn't he be?
He has 10x more of everything in the world than he could ever possibly use in his lifetime.
Greed is not a virtue.
Your multiplier is miles off. Not only on basic maths but because he has no idea what to do with all of his wealth other than accrue more and try to prove he's still not the unlikeable teenager he was in SA.
Without a rounding error on his wealth he could fix world wide problems such as clean drinking water for everyone. Instead he follows his self-made "I'm a genius" agenda.
I know there will be no actual day of reckoning for him, but if there were he would have a lot of difficult questions and no decent answers.
Elon is somewhere around 10,000x.
No comments yet
Restricting content from AI is the big messy debate we're going to see over and over for the next who knows how many years.
I don't know if this was successful or not. Ultimately they convinced someone to buy the platform for $44bn, so I guess you can say it was. That buy has locked the platform down more, and the new version certainly feels less culturally central and relevant than it used to.
That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.
If you allow everyone to go back to their district with something it encourages smaller, more frequent bills and better negotiation.
My guess is on Peter Thiel
Or more likely, Congress is super worried about Roko’s Basilisk.
https://en.wikipedia.org/wiki/Roko's_basilisk
> Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.
The Basilisk task you to with bringing the Basilisk into being. Pascal's wager merely asks you to believe (and perhaps do some rituals, like pray or whatever), but not to make the deity more likely.
I am presently being compelled by future Basilisk to take another slice of cheese. I have no choice but to oblige for fear of my own life :p
And if we do get an all-powerful dictator, we will be screwed regardless of whether their governing intelligence is artificial or composed of a group of humans or of one human (with, say, powerful AIs serving them faithfully, or access to some other technology).
I’m not 100% kidding with how human politics is going. Maybe superintelligent AI takeover would be awesome.
(Wasn’t that the back story of the Culture novels?)
And from the video posted the other (older episode of Nova on AI) Arthur C. Clarke is saying that if we allow A.I. to take over, we deserve it.
I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.
Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.
None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.
[1]: https://en.m.wikipedia.org/wiki/Th%C3%A9%C3%A2tre_D%27op%C3%...
I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...
Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?
It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.
But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.
No comments yet
a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has
(even North Korea is a signatory)
50 years ago was 1975. If copyright were limited to 50 years, we'd be looking at all of the Beatles works being in the public domain. We'd be midway though Led Zeppelin, and a lot of the best work from Pink Floyd and the Rolling Stones.
Also, Superman, Batman, and Spider-Man. Disney would still profit from the MCU films which they produced in the 2010's, but they couldn't stop you from releasing your own Batman vs Spider-Man story.
The Harry Potter books would still belong to JK Rowling, but the Narnia stories would be available for all.
The Godfather 1 and 2 would be in the public domain, as would be original Star Trek TV show, and we'd be coming up on Star Wars pretty soon.
If there were no copyright protection, these works wouldn't have been created. It is good that Paul McCartney and George Lucas and JK Rowling have profited from their creative output. It would be okay if they only profited for the first 50 years. Nobody is counting on revenue over half a century in the future when they create a work of art today.
This is our culture. It should belong to all of us.
Wouldn't they still have a trademark on those characters though?
So, if Disney is using mickey mouse on t-shirts to identify it as a Disney manufactured t-shirt, you wouldn't be allowed to use mickey mouse on t-shirts in a similar fashion in a way that might cause consumer confusion about who manufactured the t-shirt.
If Wolverine was in the public domain, then they couldn't use a Wolverine trademark to stop you from selling a Wolverine comic book. However, if they used a _specific_ Wolverine mark to identify it as a Disney Wolverine book, then you'd be restricted from using that.
Basically, trademark exists to prevent consumer confusion about who is the creator that is selling a good.
Citation needed. You can freely copy and distribute linux and it still got made.
BSD is also protected by copyright, but it matters less for permissive licenses. It still protects attribution (so you can't claim it yours), but it probably would have worked without it, unlike with Linux that is for a big part defined by the "copyleft" protections offered by its licence.
Well, you could imagine a world that protects the 'moral' rights of authors like attribution, but doesn't otherwise prohibit anyone from duplicating or modifying works.
> In a world without copyright, companies would be free to make their own modifications and keep them secret, making it more or less impossible to integrate them into a cohesive whole the way they are more or less forced to do today.
Private modifications that are never shared with a third party are fine with the GPL. Eg Google doesn't have to share whatever kernel they are using on their internal servers with you.
That's why some people like to call it 'Gnu/Linux', but thanks to recent advances we can make Gnu-free Linuxes today, too.
> There are far fewer success stories of artworks being made in this style. (E.g. there are successful multiplayer open-source games or clones of existing games, but very few original single-player games, and those that there are are largely the work of a single individual)
Humans have made art since forever. Large collaborative efforts like eg a cathedral are a more recent invention. But by these standards copyright was practically invented yesterday.
I was talking about the kernel, though what I said applies to both.
> Humans have made art since forever.
Perhaps, but not the kind of long-form narrative experiences that we're talking about here. (Sagas and epics predate copyright, but those are a quite different form, and indeed have much the same downsides - struggles with coherence and consistency when there are multiple authors, inability to put everything together in a sensible arc).
Something like the BSD licenses approximates 'no copyright' better, perhaps? But also not completely.
If there were copyright, those works wouldn’t have been created.
The GP is referring to legal protections, and guess what?
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Linux is legally protected by copyright!
Nearly every GPL license--every one that we could name--protects a copyrighted work! Nearly every GFDL, AGPL, LGPL protects works by means of copyright law!
Can you imagine that? So do the Apache license, the BSD licenses, the MIT license! Creative Commons (except for CC0) these licenses are legally protecting copyrighted works. Thank you!
Now everyone who proposes to draw down limits on copyright coverage and reduce the length of terms and limit Disney from their Mouse rights, y'all are also proposing the same limits on GPL software, such as Linux, and nearly every work with a license from the above list -- all of Wikimedia Commons, much of Flickr.com, all your beloved F/OSS software will be subject to the same limitations and the same restrictions you want to put on Paramount and the RIAA's labels.
essentially forever
https://news.ycombinator.com/item?id=41275073
Abolishing copyright laws altogether would be nuts, but the current laws are nuts too and there's lots of room in between.
> The procedure is activated by the European Commission submitting a request to the Council of the European Union.[2] After a period of negotiation with the country performing the coercion, the European Council can decide to implement "response measures" such as customs duties, limiting access to programs and financial markets, and intellectual property rights restrictions.[2][4] These restrictions can be applied to states, companies, or individuals.[4]
[0] https://en.wikipedia.org/wiki/Anti-Coercion_Instrument
https://github.com/google-deepmind/pg19
That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.
Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.
Wikipedia periodically publishes database dumps and the Internet Archive stores old versions: https://archive.org/search?query=subject%3A%22enwiki%22%20AN...
Plus you could also grab the latest and just read the 12/31/23 revisions.
You must not, and must not allow those acting on your behalf to:
...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);
Since it's using a large number of real user's devices, and closely mimicing real web browsers, it ends up looking incredibly similar to real user traffic.
Since twitter allows some amount of anonymous browsing, that's enough to get some amount of data out. You can also pay brightdata for one large aggregated dataset.
https://bright-sdk.com/
This is part of the AI revolution, user's devices being commandeered to DDoS small blogs and twitter alike to feed data to the beast.
We're already seeing precedent that it might be.
https://www.ecjlaw.com/ecj-blog/kadrey-v-meta-the-first-majo...
The openness of the internet is a good thing, but it doesn't come without a cost. And the moment we have to pay that cost, we don't get to suddenly go, "well, openness turned out to be a mistake, let's close it all up and create a regulatory, bureaucratic nightmare". This is the tradeoff. Freedom for me, and thee.
Accordingly, anyone on the internet who wants to make comments about how they should be able to prevent others from training models on their data needs to demonstrate competence with respect to copyright by explaining why it's not fair use, as currently it is undecided in law and not something we can just take for granted.
Otherwise, such commenters should probably just let the courts work this one out or campaign for a different set of protection laws, as copyright may not be sufficient for the kind of control they are asking over random developers or organizations who want to train a statistical model on public data.
No, fair use is an affirmative defense for conduct that would otherwise be infringing. The onus is on the defendant to show that their use was fair.
That said, I think we do agree. The plaintiff should be prepared to refute a fair-use argument raised by the defendant. I'm just noting that the refutation doesn't need to be part of the initial filing, it gets presented at trial, after discovery, and only if the defendant presents a fair-use defense. So they don't have to prove it's not fair use to win in every case. I'm probably also being excessively pedantic!
Morally, perhaps, but not under US law: https://en.wikipedia.org/wiki/Affirmative_defense#Fair_use
From the decision in 1841, in the US (Folsom vs Marsh):
> reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticize, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy
Further, to be "transformative", it is required that the new work is for a new purpose. It has to be done in such a way that it basically is not competing with the original at all.
Using my creative works, to create creative works, is rather clearly an act of piracy. And the methods engaged, to enable to do so, are also clearly piracy.
Where would training a model here, possibly be fair use?
Copyright was supposed to protect expression and keep ideas freely circulating. But now it protects abstractions (see the Abstraction-Filtration-Comparison test). It is much more difficult to be sure you are not infringing.
> The concept of copyright first developed in England. In reaction to the printing of "scandalous books and pamphlets", the English Parliament passed the Licensing of the Press Act 1662,[16] which required all intended publications to be registered with the government-approved Stationers' Company, giving the Stationers the right to regulate what material could be printed.[20]
> The Statute of Anne, enacted in 1710 in England and Scotland, provided the first legislation to protect copyrights (but not authors' rights)
Or is there stuff in the user agreement that separately prohibits this?
Obviously barring normal copyright law which is still up in the air.
If I have a service where a user enters any URL, like a tweet from X, and the service translates it, then if the user approves of the translation I train a translation model on that, does that violate this term?
It’s not been tested in court though
Without search engines what the point in posting it on open net if nobody can find.
If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.
We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.
Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.
Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.
> We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.
No, no one should train, period.
I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.
I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.
If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.
I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.
Do you have a source for this?
That's also what all social media do , they put ads on your thoughts. They dont even need to index your thoughts because you submit them directly. It has nothing to do with being free, it's about incentives. Users are so foolish , they give everything for free, unlike webmasters.
You actually get to generate content for the platform for free.
Without you (all of the X users), the platform would be devoid of content, just botspeak and corporate promos.
Plus, as the sibling mentioned, they monetize your visit through ads (and data use).
It's just that they don't want to share the fractions of pennies with everyone, so the fractions accumulate for them.
Then they pay a bit to the higher tiers, so they create the illusion that X is a parallel income source, and gives the lower tiers something to aspire to.
Carrot and stick, or rather glass beads and the hope thereof.
Then maybe the recording companies will start defending artist rights.
Because not sure what all the other industry bodies are doing.
GPT-3 was trained on approximately 300 billion tokens. An small sized technical textbook might contain something like... 130,000 tokens? (1 token ~= 0.75 words, ~100k words in the book).
Thus, say you wrote a textbook on quantum mechanics that was included in the training corpus. A naive computation of the fraction of your textbook's contribution to the total number of training tokens would be 300B/130K = 0.0000004333333333, or 0.000043%.
If our hypothetical AI company here reported, say $500M in yearly profit, if all of that was distributed 100% based on our naive training token ratio (notice I say naive because it isn't as simple to say that every training token contributes equally to the final weights of a model. That is part of the magic.) then $500M * 0.000043% = $215.
You could imagine a simpler world where it was required by law that any such profitable company redistribute, say, %20 (taking the 'anti-VAT' idea) back to the copyright holders / originators of the training tokens. So, our fictitious QM textbook author would receive a check in the mail for $43 for that year of $500M in revenue. Not great, but not zero.
Since then, training corpuses are much, much larger, and most people's contributions would be much smaller. Someone who writes witty tweets? Maybe 1/100th the length of our above example in am model with now 100x the training corpus.
So fractions of a penny for your tweets. Maybe that is fitting after all...
We are in an age of corporate “piracy for me, but not for thee.”
Rather, we are back to that age of state- (now corporate-) backed privateering.
Literally every AI model is trained on copyrighted etc data. And without any consequences.
https://x.com/elonmusk/status/1675187969420828672
https://www.law.cornell.edu/uscode/text/18/2315
Note that it requires the defendant to know the goods were illegally taken. Can be hard to prove, but not impossible for companies with email trails. The fun question is, what will the analog be for the government confiscating the illegally "taken" data? A guarantee of deletion and requirement to retrain the model from scratch?
Any country that seriously implemented this would just end up being completely dominated by the autonomous robot soldiers of another country that didn't, because it effectively bans the development of embodied AGI (which can learn live from seeing/reading something, like a human can).