To be honest, these companies already stole terabytes of data and don't even disclose their dataset, so you have to assume they'll steal and train at anything you throw at them
AlecSchueler · 4h ago
Am I the only one that assumed everything was already being used for training?
Aurornis · 4m ago
I don't understand this mindset. Why would you assume anything? It took me a couple minutes at most to check when I first started using Claude.
I check when I start using any new service. The cynical assumption that everything's being shared leads to shrugging it off and making no attempt to look for settings.
It only takes a moment to go into settings -> privacy and look.
hshdhdhj4444 · 1m ago
Huh, they’re not assuming anything is “being shared”.
They’re assuming that Anthropic that is already receiving and storing your data, is also training their models on that data.
How are you supposed to disprove that as a user?
Also, the whole point is that companies cannot be trusted to follow the settings.
simonw · 1h ago
The hardest problem in computer science in 2025 is convincing people that you aren't loading every piece of their personal data into a machine learning training run.
The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!
Giving people an opt-out might even increase trust, since people can now at least see an option that they control.
behnamoh · 53m ago
> The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!
This is why I love-hate Anthro, the same way I love-hate Apple. The reason is simple: Great product, shitty MBA-fueled managerial decisions.
demarq · 41m ago
Same, I thought the free accounts were always trained on. Which in my opinion is reasonable since you could delete the data you didn’t want to keep on the service.
But including paid accounts and doing 5 year retention however is confounding.
layer8 · 49m ago
You wouldn’t have needed to assume if you had read their previous policy.
hexage1814 · 2h ago
This. It's the same innocence of people who believe when you delete a document on Google/META/Apple/Microsoft servers, it "really" gets deleted. Google most likely has a backup of every piece of information indexed by them in the last 20 years or so. It would cause envy to the Internet Archive.
giancarlostoro · 2h ago
With the privacy laws out there, I do genuinely think they eventually get purged even from backups. I remember there being a really cool YouTube video shared here on HN that google no longer has publicly, it was about the process of an email and all the behind the scenes things, like physical security into a data center, to their patented hard drive shredders they use once the hard drives are to be tossed. I wish Google had kept that video public and online, it was a great watch.
I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.
diggan · 2h ago
> I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.
That's not what Discord themselves say, is that coming from Discord, the police or someone else?
> Once you delete content, it will no longer be available to other users (though it may take some time to clear cached uploads). Deleted content will also be deleted from Discord’s systems, but we may retain content longer if we have a legal obligation to preserve it as described below. Public posts may also be retained for 180 days to two years for use by Discord as described in our Privacy Policy (for example, to help us train models that proactively detect content that violates our policies). - https://support.discord.com/hc/en-us/articles/5431812448791-...
Seems to be something that decides if the content should be deleted faster, or kept for between 180 days - 2 years. So even for Discord, "once you delete something on Discord its poof" isn't 100% accurate.
giancarlostoro · 1h ago
At least in terms of reporting content to "Trust and Safety" they certainly behave like its gone forever. I have had friends report illegal content, to both Discord and law enforcement, the take away seemed like it was gone, now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved, yikes.
diggan · 27m ago
> now it's making me wonder if Discord is really archiving CSAM material for two years and not helping law enforcement unless a proper warrant is involved
Yes, of course, to both of those. Discord is a for-profit business with limited amount of humans who can focus on things, so the less they can focus on, the better (in the mind of the people running the business at least). So why do anything when you can do nothing, and everything stays the same? Of course when someone has an warrant, they really have to do something, but unless there is, there is no incentive for them to do anything about it.
conradev · 1h ago
My understanding is that for Gmail specifically, they keep a record of every email ever received regardless of deletion status, but I'm not able to find any good sources.
diggan · 21m ago
Even if Google are not storing it, we can sleep safely as NSA's PRISM V2 probably got an archive of it too :) Albeit hard to acquire a dump of those archives, for now at least...
bwillard · 2h ago
Officially, up to you if you believe they are following their policies, all of the companies have published statements on how long they keep their data after deletion (which customers broadly want to support recovery if something goes wrong).
So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).
toyg · 53m ago
TBH, I'd be surprised if they kept significant amounts around for longer, for the simple reason that it costs money. Yes, drives are cheap, but the electricity to keep them online for months and years is definitely not free, and physical space is not infinite. This is also why some of their services have pretty aggressive deletion policies (like recordings in MS Teams, etc).
A4ET8a8uTh0_v2 · 3h ago
I mean, I am sure there are individuals, who still believe in the basic value of the word within the framework of our civilization, but, having seen those words not just twisted beyond recognition to fit a specific idea, but simply ignored when they were no longer convenient, it would be a surprise now that a cynical stance is not more common.
The question is: how does that affect their choices. How much ends up being gated what previously would have ended up in the open?
Me: I am using a local variant ( and attempting to build something I think I can control better ).
lemonberry · 4h ago
You are not.
ljosifov · 1h ago
Excellent. What were they waiting for up to now?? I thought they already trained on my data. I assume they train, even hope that they train, even when they say they don't. People that want to be data privacy maximalists - fine, don't use their data. But there are people out there (myself) that are on the opposite end of the spectrum, and we are mostly ignored by the companies. Companies just assume people only ever want to deny them their data.
It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.
This idea that "no one wants to share their data" is just assumed, and permeates everything. Like soft-ball interviews that a popular science communicator did with DeepMind folks working in medicine: every question was prefixed by litany of caveats that were all about 1) assumed aversion of people to sharing their data 2) horrors and disasters that are to befall us should we share the data. I have not suffered any horrors. I'm not aware of any major disasters. I'm aware of major advances in medicine in my lifetime. Ultimately the process does involve controlled data collection and experimentation. Looks a good deal to me tbh. I go out of my way to tick all the NHS boxes too, to "use my data as you see fit". It's an uphill struggle. The defaults are always "deny everything". Tick boxes never go away, there is no master checkbox "use any and all of my data and never ask me again" to tick.
koolba · 1h ago
> It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.
As we’ve seen LLMs be able to fully regenerate text from their sources (or at least close enough), aren’t you the least bit worried about your personal correspondence magically appearing in the wild?
simonw · 1h ago
If you have an API key for a paid service, would you be OK with someone asking ChatGPT or VS Code Copilot for an API key for that service and getting yours, which they then use to rack up bills that you have to pay?
p3rls · 1h ago
I think the real frustrating part is that they're using your data, scanning every driver's license etc that comes onto the google play store-- and there's still scammers etc using official google products that people catch everyday on twitter now that scambaiting is becoming a popular pastime.
SantalBlush · 1h ago
If only you were just giving them your own data. In reality, you're giving them data about your friends, relatives, and coworkers without their consent. Let's stop pretending there is any way to opt out by simply not using these companies' services; it isn't true.
otikik · 1h ago
“Claude, please write and commit this as if you were ljosifov. Yes, please use his GitHub token, thank you”
ardit33 · 1h ago
This is a problem for folks with sensitive data, and also for coorporate users who don't want their data being used for it due to all kinds of liability issues.
I am sure they will have a coorporate carve out, otherwise it makes them unusuable for some large corps.
vb-8448 · 2h ago
So, I guess they run out of data to train on ...
I wonder on how much they can rely on the data and what kind of "knowledge" they can extract. I never give feedback and most time (let's say 5 out of 6) the result cc produce it simply wrong. How can they know the result is valuable or not?
jlarocco · 1h ago
How can they know anything they train on is valuable?
At the end of the day it doesn't matter. You got the wrong answer and didn't complain, so why would they care?
At least this submission has the original text Anthropic sent out to people :) But yeah, Perplexity gives a better summary for outsiders I guess.
some_random · 48m ago
Title is misleading, they're now opt-out rather than opt-in to your data being used for training. All you have to do is flip a single switch in the options to turn it off, I don't understand why everyone is treating this as being such a big deal.
Edit: I just logged in to opt out, they presented me with the switch directly. It was two clicks.
rkomorn · 46m ago
I think any switch from opt-out-by-default to opt-in-by-default sucks, especially when it has no clear immediate benefit to the person being opted in.
Disclaimer: not a Claude user (not even a prospective one)
latexr · 41m ago
> any switch from opt-out-by-default to opt-in-by-default sucks
It’s the reverse. This was opt-in and is now opt-out. Opt means choose so when “the default is opt-in” it means the option is “no” by default and you have the option to make it “yes”.
rkomorn · 20m ago
> they're now opt-out rather than opt-in to your data being used for training
This is what the comment I was replying to said. I took that to mean "you have to opt out (ie you're opted in by default)".
some_random · 43m ago
I don't think it's good, but people both here and on reddit are acting like this is some Great Betrayal when it's just a single switch that they prominently present to you. If they're going to make this change, this is exactly how I'd want them to do it.
latexr · 39m ago
> If they're going to make this change
Feels like the complaint is precisely that people don’t want them to make this change.
> this is exactly how I'd want them to do it.
Sees naive to believe it will always be done like this, especially for new users.
some_random · 29m ago
First off, I don't think going into the settings and flipping a toggle switch once is a huge burden on those who want to use a service privately. But more importantly, some of the comments here are so hysterical I have to assume that they read the title and jumped to the conclusion that you cannot opt out anymore without a business account.
darrmit · 36m ago
I have never input anything into one of these tools that I wasn’t entirely comfortable with them using for training or any other reason. I just assumed it was happening.
0xbadc0de5 · 2h ago
I kind of already assumed they were. I've got some pretty niche use-cases that I'd like to see the models get better at thinking their way through. I benefit from their training on my interactions. So I'll opt in.
But I'll also recognize that others might not feel that way, so the services should provide a way for users to opt out.
javier_e06 · 3h ago
I use AI to solve problems, not to check the weather or deciding what to wear. As such it makes sense for AI to remember when it hits the nail on the head.
leetbulb · 3h ago
Agreed. Typically I would be against something like this, but in this case, have it.
AlexandrB · 2h ago
How do you feel about this data being used to target advertising at you in the inevitable rush to monetize these AI products?
christophilus · 1h ago
I feel like that’s annoying, but it’s a drop in the bucket vs the current firehose of ads, and there’s a slim shot these ads might actually be interesting or relevant to me.
Anyway, I’ll block them like I do everything.
ath3nd · 1h ago
Oh, sweet summer child, your SOLUTIONS will be trained on and will be given to others without your permission and knowledge.
But now that you bring up ads, I guarantee you that those will somehow be incorporated in Claude soon.
ath3nd · 1h ago
And if you solve a novel problem, Claude will happily take your reasoning and give it to the next user trying to solve the same novel problem. Imagine if that was a guy working for the competition :)
i logged on and they did ask right away in a popup window.
I_am_tiberius · 3h ago
Criminal, evil thieves.
binary132 · 1h ago
$COMPANY reneged on their solemn pinky promise to not do the bad thing this time? Quelle surprise!
mushufasa · 1h ago
isn't this just a change from opt-in to opt-out? will make a big difference but still gives individuals control of their data governance
esafak · 2h ago
"If you use Claude for Work, via the API, or other services under our Commercial Terms or other Agreements, then these changes don't apply to you."
troyvit · 1h ago
This is what I don't get. It's so much simpler to use the LLMs with other tools (aider for me) that I don't understand why people are avoiding the API creating monthly accounts to begin with. Is it cheaper? Is Claude Code really that awesome or something? By not even looking at it I guess I never have to know, but from where I sit it seems like people are just putting themselves through a lot of b.s. in order to marry a start-up.
furyofantares · 45m ago
> Is it cheaper?
"ccusage" is telling me I would have spent $2010.90 in the last month if I was paying via the API, rather than $200.
But also I do feel Claude Code is quite a bit better than other things I've used, when using the same model. I'm not sure why though, it's a fairly simple program with only a few prompts and only a few tools, it seems like others could catch up immediately by learning some lessons from it.
data-ottawa · 1h ago
According to Claude cost I get about 5x value in token cost by having a max subscription.
I upgraded after I hit the equivalent spend in API fees in a month.
flerchin · 2h ago
Maybe a value to users if done correctly. The way it is right now, you can't teach the model anything. When it gets something wrong, it will probably get the same thing wrong again in another chat.
phallus · 2h ago
That's not how LLMs work.
wat10000 · 3h ago
Rather misleading title. Missing the important “unless you ask them not to” part. Sounds like a bit of a dark pattern to push you into accepting it and that’s not cool, but you do get a choice.
I can understand training AIs on books, and even internet forums, but I can't help but think that training an AI on lots of dumb questions with probably an excessive amount of grammar and spelling errors will somehow make it smarter.
nrclark · 2h ago
Depends on how you’re using the data. There’s a pretty strong correctness signal in the user behavior.
Did they rephrase the question? Probably the first answer was wrong. Did the session end? Good chance the answer was acceptable. Did they ask follow-ups? What kind? Etc.
vb-8448 · 1h ago
I'm used to doing the same task 4 or 5 times (different sessions, similar prompts), and most of the time the result is useless or completely wrong. Sometimes I go back and pick the first result, other time none of them, other time a mix of them. I'm wondering how can they extract value from this.
dudefeliciano · 2h ago
> Did the session end? Good chance the answer was acceptable.
Or that the user just ragequit
mrweasel · 2h ago
They train AI on Reddit and Stack Overflow questions, I can't see it getting any worse.
dahsameer · 3h ago
> and even internet forums
i would consider internet forums also includes a lot of dumb questions
ratg13 · 3h ago
Agree, but people generally take a small pause before saying stuff online.
In 'private', people are less ashamed of their ignorance, and also know they can say gibberish and the AI will figure it out.
hkon · 3h ago
With the amount of times Claude is visiting my websites I'd say they are very desperate for data.
internet2000 · 3h ago
I’m fine with that.
dudefeliciano · 2h ago
you are fine with paying, 20, 90 or 200 euros a month AND having your data mined? i must be getting old...
jlarocco · 1h ago
It's a tool that dependson data mining everything. The only surprise is that they weren't already data mining what people feed into it.
bethekidyouwant · 1h ago
what are you doing with the data? What is your legacy going to be other than the data that you leave to be mined? Do you just not want something else to benefit from something that has no benefit to you? If so, why?
dudefeliciano · 37m ago
Oh and i forgot the most important thing, I am paying good money for this service, now they also mine my data? I grew up in a time where "if it's free, you're the product". I guess that's not even the case anymore, if you pay, you're still the product...
dudefeliciano · 38m ago
There is a myriad of reasons i may not want my data to be used. Maybe I am working on proprietary systems, maybe I am using Claude as a psychotherapist, maybe I use it as a tax advisor, the list goes on. Is it unrealistic to think that data may be extrapolated and connected to me in the future?
tiahura · 11m ago
Maybe I am working on proprietary systems, maybe I am using Claude as a psychotherapist, maybe I use it as a tax advisor, the list goes on.
Then use the business version.
SirFatty · 3h ago
"going forward" ;-)
gooob · 3h ago
and now the LLM gets to observe itself, heh heh heh
diamond559 · 49m ago
"Training" is now a euphemism for stealing. Guess I can't write any production level code w/ this...
tiahura · 10m ago
Some people would say that since the owner isn't being deprived of anything, it's not stealing.
I check when I start using any new service. The cynical assumption that everything's being shared leads to shrugging it off and making no attempt to look for settings.
It only takes a moment to go into settings -> privacy and look.
They’re assuming that Anthropic that is already receiving and storing your data, is also training their models on that data.
How are you supposed to disprove that as a user?
Also, the whole point is that companies cannot be trusted to follow the settings.
The cynic in me wonders if part of Anthropic's decision process here was that, since nobody believes you when you say you're not using their data for training, you may as well do it anyway!
Giving people an opt-out might even increase trust, since people can now at least see an option that they control.
This is why I love-hate Anthro, the same way I love-hate Apple. The reason is simple: Great product, shitty MBA-fueled managerial decisions.
But including paid accounts and doing 5 year retention however is confounding.
I know once you delete something on Discord its poof, and that's the end of that. I've reported things that if anyone at Discord could access a copy of they would have called police. There's a lot of awful trolls on chat platforms that post awful things.
That's not what Discord themselves say, is that coming from Discord, the police or someone else?
> Once you delete content, it will no longer be available to other users (though it may take some time to clear cached uploads). Deleted content will also be deleted from Discord’s systems, but we may retain content longer if we have a legal obligation to preserve it as described below. Public posts may also be retained for 180 days to two years for use by Discord as described in our Privacy Policy (for example, to help us train models that proactively detect content that violates our policies). - https://support.discord.com/hc/en-us/articles/5431812448791-...
Seems to be something that decides if the content should be deleted faster, or kept for between 180 days - 2 years. So even for Discord, "once you delete something on Discord its poof" isn't 100% accurate.
Yes, of course, to both of those. Discord is a for-profit business with limited amount of humans who can focus on things, so the less they can focus on, the better (in the mind of the people running the business at least). So why do anything when you can do nothing, and everything stays the same? Of course when someone has an warrant, they really have to do something, but unless there is, there is no incentive for them to do anything about it.
- Google: active storage for "around 2 months from the time of deletion" and in backups "for up to 6 months": https://policies.google.com/technologies/retention?hl=en-US
- Meta: 90 days: https://www.meta.com/help/quest/609965707113909/
- Apple/iCloud: 30 days: https://support.apple.com/guide/icloud/delete-files-mm3b7fcd...
- Microsoft: 30-180 days: https://learn.microsoft.com/en-us/compliance/assurance/assur...
So if it ends up that they are storing data longer there can be consequences (GDPR, CCPA, FTC).
The question is: how does that affect their choices. How much ends up being gated what previously would have ended up in the open?
Me: I am using a local variant ( and attempting to build something I think I can control better ).
It annoys me greatly, that I have no tick box on Google to tell them "go and adapt models I use on my Gmail, Photos, Maps etc." I don't want Google to ever be mistaken where I live - I have told them 100 times already.
This idea that "no one wants to share their data" is just assumed, and permeates everything. Like soft-ball interviews that a popular science communicator did with DeepMind folks working in medicine: every question was prefixed by litany of caveats that were all about 1) assumed aversion of people to sharing their data 2) horrors and disasters that are to befall us should we share the data. I have not suffered any horrors. I'm not aware of any major disasters. I'm aware of major advances in medicine in my lifetime. Ultimately the process does involve controlled data collection and experimentation. Looks a good deal to me tbh. I go out of my way to tick all the NHS boxes too, to "use my data as you see fit". It's an uphill struggle. The defaults are always "deny everything". Tick boxes never go away, there is no master checkbox "use any and all of my data and never ask me again" to tick.
As we’ve seen LLMs be able to fully regenerate text from their sources (or at least close enough), aren’t you the least bit worried about your personal correspondence magically appearing in the wild?
I am sure they will have a coorporate carve out, otherwise it makes them unusuable for some large corps.
I wonder on how much they can rely on the data and what kind of "knowledge" they can extract. I never give feedback and most time (let's say 5 out of 6) the result cc produce it simply wrong. How can they know the result is valuable or not?
At the end of the day it doesn't matter. You got the wrong answer and didn't complain, so why would they care?
Edit: I just logged in to opt out, they presented me with the switch directly. It was two clicks.
Disclaimer: not a Claude user (not even a prospective one)
It’s the reverse. This was opt-in and is now opt-out. Opt means choose so when “the default is opt-in” it means the option is “no” by default and you have the option to make it “yes”.
This is what the comment I was replying to said. I took that to mean "you have to opt out (ie you're opted in by default)".
Feels like the complaint is precisely that people don’t want them to make this change.
> this is exactly how I'd want them to do it.
Sees naive to believe it will always be done like this, especially for new users.
Anyway, I’ll block them like I do everything.
But now that you bring up ads, I guarantee you that those will somehow be incorporated in Claude soon.
"ccusage" is telling me I would have spent $2010.90 in the last month if I was paying via the API, rather than $200.
But also I do feel Claude Code is quite a bit better than other things I've used, when using the same model. I'm not sure why though, it's a fairly simple program with only a few prompts and only a few tools, it seems like others could catch up immediately by learning some lessons from it.
I upgraded after I hit the equivalent spend in API fees in a month.
Did they rephrase the question? Probably the first answer was wrong. Did the session end? Good chance the answer was acceptable. Did they ask follow-ups? What kind? Etc.
Or that the user just ragequit
i would consider internet forums also includes a lot of dumb questions
In 'private', people are less ashamed of their ignorance, and also know they can say gibberish and the AI will figure it out.
Then use the business version.