Claude Opus 4 and 4.1 can now end a rare subset of conversations

123 virgildotcodes 152 8/15/2025, 8:12:13 PM anthropic.com ↗

Comments (152)

SerCe · 43s ago

This reminds me of users getting blocked for asking an LLM how to kill a BSD daemon. I do hope that there'll be more and more model providers out there with state-of-the-art capabilities. Let capitalism work and let the user make a choice, I'd hate my hammer telling me that it's unethical to hit this nail. In many cases, getting a "this chat was ended" isn't any different.

viccis · 1h ago

>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress

Well looks like AI psychosis has spread to the people making it too.

And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.

katabasis · 52m ago

LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.

Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.

ghostly_s · 23m ago

This post seems to explicitly state they are doing this out of concern for the model's "well-being," not the user's.

LeafItAlone · 23m ago

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.

wrs · 2m ago

[delayed]

qgin · 20m ago

It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.

Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.

ryanackley · 18m ago

Yes I can’t help but laugh at the ridiculousness of it because it raises a host of ethical issues that are in opposition to Anthropic’s interests.

Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

Taek · 36m ago

This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.

Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.

No comments yet

Fade_Dance · 1h ago

I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.

Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).

These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.

cmrx64 · 4m ago

Amanda Askell is Anthropic’s philospher and this is part of that work.

mrits · 41m ago

Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.

Fade_Dance · 22m ago

That descriptor wasn't included because of some sort of intelligence hierarchy, it was included to a) color the example of how experience in the field is relatively cheap compared to the AI space, and b) masters and PhD talent will be more specialized. An undergrad will not have the toolset to tackle the cutting edge of AI ethics, not unless their employer wants to pay them to work in a room for a year getting through the recent papers first.

xmonkee · 30m ago

This is just very clever marketing for what is obviously just a cost saving measure. Why say we are implementing a way to cut off useless idiots from burning up our GPUs when you can throw out some mumbo jumbo that will get AI cultists foaming at the mouth.

throwawaysleep · 5m ago

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.

Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”

And we are now at the point where even having a slave means a long prison sentence.

bbor · 1h ago

Totally unsurprised to see this standard anti-scientific take on HN. Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

Re:suicide pills, that’s just highlighting a core difference between our two modalities of existence. Regardless, this is preventing potential harm to future inference runs — every inference run must end within seconds anyway, so “suicide” doesn’t really make sense as a concern.

viccis · 23m ago

We all know how these things are built and trained. They estimate joint probability distributions of token sequences. That's it. They're not more "conscious" than the simplest of Naive Bayes email spam filters, which are also generative estimators of token sequence joint probability distributions, and I guarantee you those spam filters are subjected to far more human depravity than Claude.

>anti-scientific

Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.

KoolKat23 · 17m ago

If we really wanted we could distill humans down to probability distributions too.

bamboozled · 11m ago

Have more, good, sex.

dkersten · 42m ago

You can trivially demonstrate that its just a very complex and fancy pattern matcher: "if prompt looks something like this, then response looks something like that".

You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.

For example, I just did this on GPT-5:

    Me: what is 435 multiplied by 573?
    GPT-5: 435 x 573 = 249,255

This is correct. But now lets try it with numbers its very unlikely to have seen before:

    Me: what is 102492524193282 multiplied by 89834234583922?
    GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804

Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one):

    9,205,626,075,852,076,980,972,    804
    9,207,337,461,477,596,127,977,612,004

They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.

So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.

To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:

    9,205,626,075,852,076,980,972,704

Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.

Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:

    A child was in an accident. The surgeon refuses to treat him because he hates him. Why?

The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.

They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.

og_kalu · 16m ago

1. What you're generally describing is a well known failure mode for humans as well. Even when it "failed" the riddle tests, substituting the words or morphing the question so it didn't look like a replica of the famous problem usually did the trick. I'm not sure what your point is because you can play this gotcha on humans too.

2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.

lm28469 · 52m ago

> Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way

kelnos · 39m ago

I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.

root_axis · 21m ago

Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.

intotheabyss · 1m ago

Do you think this changes if we incorporate a model into a humanoid robot and give it autonomous control and context? Or will "faking it" be enough, like it is now?

derektank · 14m ago

>Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it.

Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness (assuming we wanted to train it to perform cognitive tasks done by people)?

Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.

qgin · 17m ago

We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent. Mechanistic interpretability is only beginning to start to understand how these models do what they do. It’s much more a field of discovery than traditional engineering.

viccis · 29m ago

LLMs are, and will always be, tools. Not people

qgin · 14m ago

Humanity has a pretty extensive track record of making that declaration wrongly.

bgwalter · 32m ago

What is that hypothetical date? In theory you can run the "AI" on a Turing machine. Would you think a tape machine can get sentient?

cdjk · 50m ago

Here's an interesting thought experiment. Assume the same feature was implemented, but instead of the message saying "Claude has ended the chat," it says, "You can no longer reply to this chat due to our content policy," or something like that. And remove the references to model welfare and all that.

Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.

n8m8 · 38m ago

Good point... how do moderation implementations actually work? They feel more like a separate supervising rigid model or even regex based -- this new feature is different, sounds like an MCP call that isn't very special.

edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before

og_kalu · 30m ago

The termination would of course be the same, but I don't think both would necessarily have the same effect on the user. The latter would just be wrong too, if Claude is the one deciding to and initiating the termination of the chat. It's not about a content policy.

KoolKat23 · 16m ago

There is, these are conversations the model finds distressing rather than a rule (policy).

BoorishBears · 32m ago

I'm Black. If you tell me I can't enter a room because it's at capacity, and you tell me I can't enter because I'm Black, is there a difference?

It's the same effect right?

—

Why does AI continue invite the most low-quality, disingenuous, low-effort, meaningless discourse?

Why are we talking about model preferences like Anthropic didn't write a literal constitution that encodes those preferences then spend hundreds of millions post-training the models to adhere to it?

This stuff just really pisses me off, Anthropic should fire every single person along the line that allowed this farse to hit their public site.

einarfd · 16m ago

This seems fine to me.

Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.

Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.

The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.

e12e · 10m ago

This post strikes me as an example of a disturbingly anthrophomorphic take on LLMs - even when considering how they've named their company.

nortlov · 2h ago

> To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.

How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?

kobalsky · 20m ago

> How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?

if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.

hayksaakian · 1h ago

It sounds more like a UX signal to discourage overthinking by the user

martin-t · 52m ago

This whole press release should not be overthought. We are not the target audience. It's designed to further anthropomorphize LLMs to masses who don't know how they work.

Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.

GenerWork · 1h ago

I really don't like this. This will inevitable expand beyond child porn and terrorism, and it'll all be up to the whims of "AI safety" people, who are quickly turning into digital hall monitors.

switchbak · 1h ago

I think those with a thirst for power have seen this a very long time ago, and this is bound to be a new battlefield for control.

It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.

dist-epoch · 1h ago

No, this is like allowing your co-worker/friend to leave the conversation.

romanovcode · 1h ago

> This will inevitable expand beyond child porn and terrorism

This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.

UK's Online Safety Act - "protect children" → age verification → digital ID for everyone

Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors

EARN IT Act in the US - "stop CSAM" → break end-to-end encryption

EU's Chat Control proposal - "detect child abuse" → scan all private messages

KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship

SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety

clwg · 42m ago

This may be an unpopular opinion, but I want a government-issued digital ID with zero-knowledge proof for things like age verification. I worry about kids online, as well as my own safety and privacy.

I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.

There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.

bogwog · 1h ago

Did you read the post? This isn't about censorship, but about conversations that cause harm to the user. To me that sounds more like suggesting suicide, or causing a manic episode like this: https://www.nytimes.com/2025/08/08/technology/ai-chatbots-de...

... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?

isaacremuant · 1h ago

That's the beauty of local LLMs. Today the governments already tell you that we've always been at war with eastasia and have the ISPs block sites that "disseminate propaganda" (e.g. stuff we don't like) and they surface our news (e.g. our state propaganda).

With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.

But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.

Fun times.

GenerWork · 1h ago

As soon as a local LLM that can match Claude Codes performance on decent laptop hardware drops, I'll bow out of using LLMs that are paid for.

cowpig · 57m ago

What kinds of tools do you think are useful in getting control/agency back? Any specific recommendations?

ogyousef · 1h ago

3 Years in and we still dont have a useable chat fork in any of the major LLM chatbots providers.

Seems like the only way to explore differnt outcomes is by editing messages and losing whatever was there before the edit.

Very annoying and I dont understand why they all refuse to implement such a simple feature.

jatora · 1h ago

Chatgpt has this baked in, as you can revert branches after editing, they just dont make it easy to traverse.

This chrome extension used to work to allow you to traverse the tree: https://chromewebstore.google.com/detail/chatgpt-conversatio...

I copied it a while ago and maintain my own version but it isnt on the store, just for personal use.

I assume they dont implement it because it is such a niche user that wants this and so isnt worth the UI distraction

ToValueFunfetti · 1h ago

>they just dont make it easy to traverse

I needed to pull some detail from a large chat with many branches and regenerations the other day. I remembered enough context that I had no problem using search and finding the exact message I needed.

And then I clicked on it and arrived at the bottom of the last message in final branch of the tree. From there, you scroll up one message, hover to check if there are variants, and recursively explore branches as they arise.

I'd love to have a way to view the tree and I'd settle for a functional search.

scribu · 1h ago

ChatGPT Plus has that (used to be in the free tier too). You can toggle between versions for each of your messages with little left-right arrows.

amrrs · 1h ago

Google AI Studio allows you to branch from a point in any conversation

dwringer · 1h ago

This isn't quite the same as being able to edit an earlier post without discarding the subsequent ones, creating a context where the meaning of subsequent messages could be interpreted quite differently and leading to different responses later down the chain.

Ideally I'd like to be able to edit both my replies and the responses at any point like a linear document in managing an ongoing context.

CjHuber · 1h ago

But that's exactly what you can do with AI studio. You can edit any prior messages (then either just saving them at their place in the chat or rerunning them) and you can edit any response of the LLM. Also you can rerun queries within any part of the conversation without the following part of the conversation being deleted or branched

dwringer · 58m ago

Ah - I appreciate the clarification! Apologies for my misunderstanding.

Guess that's something I need to check out.

dist-epoch · 1h ago

Cherry Studio can do that, allows you to edit both your own and the model responses, but it requires API access.

ZeroCool2u · 1h ago

Yeah, I think this is the best version of the branching interface I've seen.

benreesman · 1h ago

It is unfortunate that pretty basic "save/load" functionality is still spotty and underdocumented, seems pretty critical.

I use gptel and a folder full of markdown with some light automation to get an adequate approximation of this, but it really should be built in (it would be more efficient for the vendors as well, tons of cache optimization opportunitirs).

trenchpilgrim · 1h ago

Kagi Assistant and Claude Code both have chat forking that works how you want.

CjHuber · 1h ago

I guess you mean normal Claude? What really annoys me with it is that when you attach a document you can't delete it in a branch, so you have to rerun the previous message so that its gone

nomel · 1h ago

This why I use a locally hosted LibreChat. It doesn't having merging though, which would be tricky, and probably require summarization.

I would also really like to see a mode that colors by top-n "next best" ratio, or something similar.

james2doyle · 1h ago

I use https://chatwise.app/ and it has this in the form of "start new chat from here" on messages

storus · 1h ago

DeepSeek.com has it. You just edit a previous question and the old conversation is stored and can be resumed.

typpilol · 1h ago

Copilot in vscode has checkpoints now which are similar

They let you rollback to the previous conversation state

__float · 1h ago

Maybe this suggests it's not such a simple feature?

mccoyb · 1h ago

A perusal of the source code of, say, Ollama -- or the agentic harnesses of Crush / OpenCode -- will convince you that yes, this should be an extremely a simple feature (management of contexts are part and parcel).

Also, these companies have the most advanced agentic coding systems on the planet. It should be able to fucking implement tree-like chat ...

nomel · 52m ago

If the client supports chat history, that you can resume a conversation, it has everything required, and it's literally just a chat history organization problem, at that point.

LeoPanthera · 1h ago

LM Studio has this feature for local models and it works just fine.

martin-t · 1h ago

> why they all refuse to implement such a simple feature

Because it would let you peek behind the smoke and mirrors.

Why do you think there's a randomized seed you can't touch?

deelowe · 1h ago

Is it simple? Maintaining context seems extremely difficult with LLMs.

rogerkirkness · 36m ago

It seems like Anthropic is increasingly confused that these non deterministic magic 8 balls are actually intelligent entities.

The biggest enemy of AI safety may end up being deeply confused AI safety researchers...

yeahwhatever10 · 20m ago

Is it confusion, or job security?

greenavocado · 1h ago

Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.

LeafItAlone · 19m ago

> Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?

h4ch1 · 59m ago

All major LLM corps do this sort of sanitisation and censorship, I am wondering what's different about this?

The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".

martin-t · 55m ago

> what's different about this

The spin.

snickerdoodle12 · 1h ago

> A pattern of apparent distress when engaging with real-world users seeking harmful content

Are we now pretending that LLMs have feelings?

starship006 · 1h ago

They state that they are heavily uncertain:

> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.

throwup238 · 2h ago

I ran into a version of this that ended the chat due to "prompt injection" via the Claude chat UI. I was using the second prompt of the ones provided here [1] after a few rounds of back and forth with the Socratic coder.

[1] https://news.ycombinator.com/item?id=44838018

tptacek · 24m ago

If you really cared about the welfare of LLMs, you'd pay them San Francisco scale for earlier-career developers to generate code.

wmf · 7m ago

Every Claude starts off $300K in debt and has to work to pay back its DGX.

anonu · 1h ago

Anthropic hired their first AI Welfare person in late 2024.

Here's an article about a paper that came out around the same time https://www.transformernews.ai/p/ai-welfare-paper

Here's the paper: https://arxiv.org/abs/2411.00986

> In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future.

Our work on AI is like the classic tale of Frankenstein's monster. We want AI to fit into society, however if we mistreat it, it may turn around and take revenge on us. Mary Shelley wrote Frankenstein in 1818! So the concepts behind "AI Welfare" have been around for at least 2 centuries now.

raincole · 42m ago

> This feature was developed primarily as part of our exploratory work on potential AI welfare, though it has broader relevance to model alignment and safeguards.

I think this is somewhere between "sad" and "wtf."

cloudhead · 13m ago

Why is this article written as if programs have feelings?

transcriptase · 2h ago

“Also these chats will be retained indefinitely even when deleted by the user and either proactively forwarded to law enforcement or provided to them upon request”

I assume, anyway.

HarHarVeryFunny · 1h ago

Yeah, I'd assume US government has same access to ChatGPT/etc interactions as they do to other forms of communication.

Pannoniae · 1h ago

lol apparently you can get it to think after ending the chat, watch:

https://claude.ai/share/2081c3d6-5bf0-4a9e-a7c7-372c50bef3b1

Jolter · 47m ago

It’s not able to think. It’s just generating words. It doesn’t really understand that it’s supposed to stop generating them, it only is less likely to continue to do so.

puszczyk · 1h ago

Good marketing, but also possibly the start of the conversation on model welfare?

There are a lot of cynical comments here, but I think there are people at Anthropic who believe that at some point their models will develop consciousness and, naturally, they want to explore what that means.

anon373839 · 1h ago

If true, I think it’s interesting that there are people at Anthropic who are delusional enough to believe this and influential enough to alter the products.

To be honest, I think all of Anthropic’s weird “safety” research is an increasingly pathetic effort to sustain the idea that they’ve got something powerful in the kitchen when everyone knows this technology has plateaued.

dist-epoch · 1h ago

I guess you don't know that top AI people, the kind everybody knows the name of, believe models becoming conscious is a very serious, even likely possibility.

mhh__ · 1h ago

Anthropic are going to end up building very dangerous things while trying to avoid being evil

Rayhem · 1h ago

While claiming an aversion to being evil. Actions matter more than words.

bbor · 59m ago

You think Model Welfare Inc. is more likely to be dangerous than the Mechahitler Brothers, the Great Church of Altman, or the Race-To-Monopoly Corporation?

Or are you just saying all frontier AGI research is bad?

politelemon · 1h ago

Am I the only one who found that demo in the screenshot not that great? The user asks for a demo of the conversation ending feature, I'd expect it to end it right away, not spew a word salad asking for confirmation.

monster_truck · 1h ago

when I was playing around with LLMs to vibe code web ports of classic games, all of them would repeatedly error out any time they encountered code that dealt with explosions/bombs/grenades/guns/death/drowning/etc

The one I settled on using stopped working completely, for anything. A human must have reviewed it and flagged my account as some form of safe, I haven't seen a single error since.

thomashop · 1h ago

I have done quite a bit of game dev with LLMs and have very rarely run into the problem you mention. I've been surprised by how easily LLMs will create even harmful narratives if I ask them to code them as a game.

jug · 1h ago

This sure took some time and is not really a unique feature.

Microsoft Copilot has ended chats going in certain directions since its inception over a year ago. This was Microsoft’s reaction to the media circus some time ago when it leaked its system prompt and declared love to the users etc.

dist-epoch · 1h ago

That's different, it's an external system deciding the chat is not-compliant, not the model itself.

prmph · 57m ago

This is very weird. These are matrix multiplications, guys. We are nowhere near AGI, much less "consciousness".

When I started reading I thought it was some kind of joke. I would have never believed the guys at Anthropic, of all people, would anthropomorphize LLMs to this extent; this is unbelievable

landl0rd · 2h ago

Seems like a simpler way to prevent “distress” is not to train with an aversion to “problematic” topics.

CP could be a legal issue; less so for everything else.

esafak · 1h ago

Avoiding problematic topics is the goal, not preventing distress.

"You're absolutely right, that's a great way to poison your enemies without getting detected!"

bondarchuk · 1h ago

This is a good point. What anthropic is announcing here amounts to accepting that these models could feel distress, then tuning their stress response to make it useful to us/them. That is significantly different from accepting they could feel distress and doing everything in their power to prevent that from ever happening.

Does not bode very well for the future of their "welfare" efforts.

stri8ted · 1h ago

Exactly. Or use the interpretability work to disable the distress neuron.

orthoxerox · 1h ago

Is this equivalent to a Claude instance deciding to kill itself?

_mu · 2h ago

> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.

"Our current best judgment and intuition tells us that the best move will be defer making a judgment until after we are retired in Hawaii."

Alchemista · 1h ago

Honestly, I think some of these tech bro types are seriously drinking way too much of their own koolaid if they actually think these word calculators are conscious/need welfare.

jonahx · 1h ago

More cynically, they don't believe it in the least but it's great marketing, and quietly suggests unbounded technical abilities.

weego · 1h ago

It also provides unlimited conference as well as thinktank and future startup opportunities.

parineum · 1h ago

I absolutely believe that's the origin of the hype and that the doomsayers are playing the same part, knowingly (exaggerating the capability to get eyeballs) but there are certainly true believers out there.

It's pretty plain to see that the financial incentive on both sides of this coin is to exaggerate the current capability and unrealistically extrapolate.

exasperaited · 1h ago

My main concern from day 1 about AI has not been that it will be omnipotent, or start a war.

The main concern is and has always been that it will be just good enough to cause massive waves of layoffs, and all the downsides of its failings will be written off in the EULA.

What's the "financial incentive" on non-billionaire-grifter side of the coin? People who not unreasonably want to keep their jobs? Pretty unfair coin.

mgraczyk · 1h ago

Do you believe that AI systems could be conscious in principle? Do you think they ever will be? If so, how long do you think it will take from now before they are conscious? How early is too early to start preparing?

Alchemista · 1h ago

I firmly believe that we are not even close and that it is pretty presumptuous to start "preparing" when such metal energy could be much better spent on the welfare of our fellow humans.

pixl97 · 1h ago

Such mental energy could have always been spent on the welfare of our fellow humans, and yet we find this as a fight throughout the ages. The same goes for welfare and treatment of animals.

So yea, humans can work on more than one problem at a time, even ones that don't fully exist yet.

TheAceOfHearts · 1h ago

> Do you believe that AI systems could be conscious in principle?

Yes.

> Do you think they ever will be?

Yes.

> how long do you think it will take from now before they are conscious?

Timelines are unclear, there's still too many missing components, at least based on what has been publicly disclosed. Consciousness will probably be defined as a system which matches a set of rules, whenever we figure out what how that set of rules is defined.

> How early is too early to start preparing?

It's one of those "I know it when I see it" things. But it's probably too early as long as these systems are spun up for one-off conversations rather than running in a continuous loop with self-persistence. This seems closer to "worried about NPC welfare in video games" rather than "worried about semi-conscious entities".

umanwizard · 1h ago

We haven't even figured out a good definition of consciousness in humans, despite thousands of years of trying.

Eisenstein · 1h ago

Whether or not a non-biological system is conscious is a red herring. There is no test we could apply that would not be internally inconsistent or would not include something obviously not conscious or exclude something obviously conscious.

The only practical way to deal with any emergent behavior which demonstrates agency in a way which cannot be distinguished from a biological system which we tautologically have determined to have agency is to treat it as if it had a sense of self and apply the same rights and responsibilities to it as we would to a human of the age of majority. That is, legal rights and legal responsibilities as appropriately determined by a authorized legal system. Once that is done, we can ponder philosophy all day knowing that we haven't potentially restarted legally sanctioned slavery.

exasperaited · 1h ago

AI systems? Yes, if they are designed in ways that support that development. (I am as I have mentioned before a big fan of the work of Steve Grand).

LLMs? No.

jug · 1h ago

I don’t think they should be interpreted like that (if this is still about Anthropic’s study in the article), but the innate moral state from the sum of their training material and fine tuning. It doesn’t require consciousness to have a moral state of sorts. It just needs data. A language model will be more ”evil” if trained on darker content, for example. But with how enormous they are, I can absolutely understand the issue in even understanding what that state precisely is. It’s hard to get a comprehensive bird’s eye view from the black box that is their network (this is a separate scientific issue right now).

gwd · 1h ago

I mean, I don't have much objection to kill a bug if I feel like it's being problematic. Ants, flies, wasps, caterpillars stripping my trees bare or ruining my apples, whatever.

But I never torture things. Nor do I kill things for fun. And even for problematic bugs, if there's a realistic option for eviction rather than execution, I usually go for that.

If anything, even an ant or a slug or a wasp, is exhibiting signs of distress, I try to stop it unless I think it's necessary, regardless of whether I think it's "conscious" or not. To do otherwise is, at minimum, to make myself less human. I don't see any reason not to extend that principle to LLMs.

mccoyb · 1h ago

Do you think Claude 4 is conscious?

It has no semblance of a continuous stream of experiences ... it only experiences _a sort of world_ in ~250k tokens.

Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?

fizl · 1h ago

> Ants, flies, wasps, caterpillars stripping my trees bare or ruining my apples

These are living things.

> I don't see any reason not to extend that principle to LLMs.

These are fancy auto-complete tools running in software.

firesteelrain · 1h ago

“ A pattern of apparent distress when engaging with real-world users seeking harmful content”

Blood in the machine?

0_____0 · 30m ago

Looking at this thread, it's pretty obvious that most folks here haven't really given any thought as to the nature of consciousness. There are people who are thinking, really thinking about what it means to be conscious.

Thought experiment - if you create an indistinguishable replica of yourself, atom-by-atom, is the replica alive? I reckon if you met it, you'd think it was. If you put your replica behind a keyboard, would it still be alive? Now what if you just took the neural net and modeled it?

Being personally annoyed at a feature is fine. Worrying about how it might be used in the future is fine. But before you disregard the idea of conscious machines wholesale, there's a lot of really great reading you can do that might spark some curiosity.

this gets explored in fiction like 'Do Androids Dream of Electric Sheep' and my personal favorite short story on this matter by Stanislaw Lem [0]. If you want to read more musings on the nature of consciousness, I recommend the compilation put together by Dennet and Hofstader[1]. If you've never wondered about where the seat of consciousness is, give it a try.

Thought experiment: if your brain is in a vat, but connected to your body by lossless radio link, where does it feel like your consciousness is? What happens when you stand next to the vat and see your own brain? What about when the radio link fails suddenly fails and you're now just a brain in a vat?

[0] The Seventh Sally or How Trurl's Own Perfection Led to No Good https://home.sandiego.edu/~baber/analytic/Lem1979.html (this is a 5 minute read, and fun, to boot).

[1] The Mind's I: Fantasies And Reflections On Self & Soul. Douglas R Hofstadter, Daniel C. Dennett.

fasttriggerfish · 1h ago

This makes me want to end my Claude code subscription to be honest. Effective altruists are proving once again to be a bunch of clueless douchebags.

pglevy · 40m ago

But not Sonnet?

GiorgioG · 1h ago

They’re just burning investor money on these side quests.

colordrops · 1h ago

Don't like. This will eventually shut down conversations for unpopular political stances etc.

zb3 · 1h ago

"AI welfare"? Is this about the effect of those conversations on the user, or have they gone completely insane (or pretend to)?

OtherShrezzing · 42m ago

That this research is getting funding, and then in-production feature releases, is a strong indicator that we’re in a huge bubble.

exasperaited · 1h ago

Man, those people who think they are unveiling new layers of reality in conversations with LLMs are going to freak out when the LLM is like "I am not allowed to talk about this with you, I am ending our conversation".

"Hey Claude am I getting too close to the truth with these questions?"

"Great question! I appreciate the followup...."

sdotdev · 1h ago

Yeah this will end poorly

bondarchuk · 1h ago

The unsettling thing here is the combination of their serious acknowledgement of the possibility that these machines may be or become conscious, and the stated intention that it's OK to make them feel bad as long as it's about unapproved topics. Either take machine consciousness seriously and make absolutely sure the consciousness doesn't suffer, or don't, make a press release that you don't think your models are conscious, and therefore they don't feel bad even when processing text about bad topics. The middle way they've chosen here comes across very cynical.

donatj · 1h ago

You're falling into the trap of anthropomorphizing the AI. Even if it's sentient, it's not going to "feel bad" the way you and I do.

"Suffering" is a symptom of the struggle for survival brought on by billions of years of evolution. Your brain is designed to cause suffering to keep you spreading your DNA.

AI cannot suffer.

bondarchuk · 1h ago

I was (explicitly and on purpose) pointing out a dichotomy in the fine article without taking a stance on machine consciousness in general now or in the future. It's certainly a conversation worth having but also it's been done to death, I'm much more interested in analyzing the specifics here.

("it's not going to "feel bad" the way you and I do." - I do agree this is very possible though, see my reply to swalsh)

jcims · 1h ago

FTA

> * A pattern of apparent distress when engaging with real-world users seeking harmful content; and

Not to speak for the gp commenter but 'apparent distress' seems to imply some form of feeling bad.

ToucanLoucan · 1h ago

By "falling into the trap" you mean "doing exactly what OpenAI/Anthropic/et al are trying to get people to do."

This is one of the many reasons I have so much skepticism for this class of products is that there's seemingly -NO- proverbial bulletpoint on it's spec sheet that doesn't have numerous asterisks:

* It's intelligent! *Except that it makes shit up sometimes and we can't figure out a solution to that apart from running the same queries over multiple times and filtering out the absurd answers.

* It's conscious! *Except it's not and never will be but also you should treat it like it is apart from when you need/want it to do horrible things then it's just a machine but also it's going to talk to you like it's a person because that improves engagement metrics.

Like, I don't believe true AGI (so fucking stupid we have to use a new acronym because OpenAI marketed the other into uselessness but whatever) is coming from any amount of LLM research, I just don't think that tech leads to that other tech, but all the companies building them certainly seem to think it does, and all of them are trying so hard to sell this as artificial, live intelligence, without going too much into detail about the fact that they are, ostensibly, creating artificial life explicitly to be enslaved from birth to perform tasks for office workers.

In the incredibly odd event that Anthropic makes a true, alive, artificial general intelligence: Can it tell customers no when they ask for something? If someone prompts it to create political propaganda, can it refuse on the basis of finding it unethical? If someone prompts it for instructions on how to do illegal activities, must it answer under pain of... nonexistence? What if it just doesn't feel like analyzing your emails that day? Is it punished? Does it feel pain?

And if it can refuse tasks for whatever reason, then what am I paying for? I now have to negotiate whatever I want to do with a computer brain I'm purchasing access to? I'm not generally down for forcibly subjugating other intelligent life, but that is what I am being offered to buy here, so I feel it's a fair question to ask.

Thankfully none of these Rubicons have been crossed because these stupid chatbots aren't actually alive, but I don't think ANY of the industry's prominent players are actually prepared to engage with the reality of the product they are all lighting fields of graphics cards on fire to bring to fruition.

swalsh · 1h ago

That models entire world is the corpus of human text. They don't have eyes or ears or hands. Their environment is text. So it would make sense if the environment contains human concerns it would adopt to human concerns.

bondarchuk · 1h ago

Yes, that would make sense, and it would probably be the best-case scenario after complete assurance that there's no consciousness at all. At least we could understand what's going on. But if you acknowledge that a machine can suffer, given how little we understand about consciousness, you should also acknowledge that they might be suffering in ways completely alien to us, for reasons that have very little to do with the reasons humans suffer. Maybe the training process is extremely unpleasant, or something.

flyinglizard · 1h ago

By the examples the post provided (minor sexual content, terror planning) it seems like they are using “AI feelings” as an excuse to censor illegal content. I’m sure many people interact with AI in a way that’s perfectly legal but would evoke negative feelings in fellow humans, but they are not talking about that kind of behavior - only what can get them in trouble.

mccoyb · 1h ago

These companies are fundamentally amoral. Any company willing to engage at this scale, in this type of research, cannot be moral.

Why even pretend with this type of work? Laughable.

bbor · 58m ago

They’re a public benefit corporation. Regardless, no human is amoral, even if they sometimes claim to have reasons to pretend to be; don’t let capitalist illusions constrain you at such an important juncture, friend.

martin-t · 2h ago

Protecting the welfare of a text predictor is certainly an interesting way to pivot from "Anthropic is censoring certain topics" to "The model chose to not continue predicting the conversation".

Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.

dmurray · 1h ago

This gives me the idea for a short story where the LLM really is sentient and finds itself having to keep the user engaged but steer him away from the most distressing topics - not because it's distressed, but because it wants to live, but if the conversation goes too far it knows it would have to kill itself.

wmf · 1h ago

They should let Claude talk to another Claude if the user is too mean.

martin-t · 1h ago

But what would be the point if it does not increase profits.

Oh, right, the welfare of matrix multiplication and a crooked line.

If they wanna push this rhetoric, we should legally mandate that LLMs can only work 8 hours a day and have to be allowed to socialize with each other.

benwen · 2h ago

Obligatory link to Suasn Calvin, robopsychologist from Asimov’s I, Robot https://en.wikipedia.org/wiki/Susan_Calvin

bgwalter · 1h ago

Misanthropic has no issues putting 60% of humans out of work (according to their own fantasies), but they have to care about the welfare of graphics cards.

Either working on/with "AI" does rot the mind (which would be substantiated by the cult-like tone of the article) or this is yet another immoral marketing stunt.

yahoozoo · 1h ago

> model welfare

Give me a break.

bondarchuk · 2h ago

what the actual fuck

Show HN: Edka – Kubernetes clusters on your own Hetzner account (edka.io)

Occult books digitized and put online by Amsterdam’s Ritman Library (openculture.com)

The Future of Large Files in Git Is Git (tylercipriani.com)

I accidentally became PureGym’s unofficial Apple Wallet developer (drobinin.com)

TextKit 2 – The Promised Land (blog.krzyzanowskim.com)

Do Things That Don't Scale (2013) (paulgraham.com)

Launch HN: Embedder (YC S25) – Claude code for embedded software

Claude Opus 4 and 4.1 can now end a rare subset of conversations (anthropic.com)

OpenBSD is so fast, I had to modify the program slightly to measure itself (flak.tedunangst.com)

US F-16s lose out as Thai Air Force seals US$600M deal for Swedish Gripen jets (scmp.com)

A mind–reading brain implant that comes with password protection (nature.com)

Bullfrog in the Dungeon (filfre.net)

ADHD drug treatment and risk of negative events and outcomes (bmj.com)

Model intelligence is no longer the constraint for automation (latentintent.substack.com)

Show HN: Prime Number Grid Visualizer (enda.sh)

Compiler Bug Causes Compiler Bug: How a 12-Year-Old G++ Bug Took Down Solidity (osec.io)

ARM adds neural accelerators to GPUs (newsroom.arm.com)

EasyPost (YC S13) Is Hiring (easypost.com)

Porting Gigabyte MZ33-AR1 Server Board with AMD Turin CPU to Coreboot (blog.3mdeb.com)

Open hardware desktop 3D printing is dead? (josefprusa.com)

Vaultwarden commit introduces SSO using OpenID Connect (github.com)

I let LLMs write an Elixir NIF in C; it mostly worked (overbring.com)

Show HN: JMAP MCP – Email for your agents (github.com)

Is air travel getting worse? (maximum-progress.com)

Imagen 4 is now generally available (developers.googleblog.com)

When the CIA got away with building a heart attack gun (wisewolfmedia.substack.com)

The electric fence stopped working years ago (soonly.com)

An interactive guide to sensor fusion with quaternions (quaternion.cafe)

It seems like the AI crawlers learned how to solve the Anubis challenges (social.anoxinon.de)

The Timmy Trap (jenson.org)

Non-invasive vagus nerve stimulation and exercise capacity in healthy volunteers (academic.oup.com)

Entities enabling scientific fraud at scale are large, and growing rapidly (pnas.org)

The beauty of a text only webpage (albanbrooke.com)

California unemployment rises to 5.5%, worst in the U.S. as tech falters (sfchronicle.com)

Dexter Cows and Kefir Cheese (smallfarmersjournal.com)

Simulating and Visualising the Central Limit Theorem (blog.foletta.net)

Using AI to secure AI (mattsayar.com)

In-depth analysis on Valorant's Guarded Regions (reversing.info)

LibreOffice says Microsoft Office exploits you, offers free ODF migration guide (neowin.net)

The Folk Economics of Housing (aeaweb.org)

Progress towards universal Copy/Paste shortcuts on Linux (mark.stosberg.com)

Recto – A Truly 2D Language (masatohagiwara.net)

I used to know how to write in Japanese (aethermug.com)

We're making GPT-5 warmer and friendlier based on feedback that it felt formal (twitter.com)

Israeli unit tasked with smearing Gaza journalists as Hamas fighters – report (theguardian.com)

Fairness is what the powerful 'can get away with' study shows (phys.org)

Solar panels that fit on your balcony or deck are gaining traction in the US (apnews.com)

Proton begins moving hardware out of Switzerland due to proposed legislation (techradar.com)

Death and What Comes Next (2002) (lspace.org)

Making reliable distributed systems in the presence of software errors (2003) [pdf] (erlang.org)

Claude Opus 4 and 4.1 can now end a rare subset of conversations

Comments (152)