Ask HN: What "developer holy war" have you flip-flopped on?
10 points by meowface 1d ago 30 comments
Ask HN: How do you connect with other founders in your city?
5 points by leonagano 1d ago 2 comments
Claude Opus 4 and 4.1 can now end a rare subset of conversations
229 virgildotcodes 340 8/15/2025, 8:12:13 PM anthropic.com ↗
All of the examples they mentioned are things that the model refuses to do. I doubt it would do this if you asked it to generate racist output, for instance, because it can always give you a rebuttal based on facts about race. If you ask it to tell you where to find kids to kidnap, it can't do anything except say no. There's probably not even very much training data for topics it would refuse, and I would bet that most of it has been found and removed from the datasets. At some point, the model context fills up when the user is being highly abusive and training data that models a human giving up and just providing an answer could percolate to the top.
This, as I see it, adds a defense against that edge case. If the alignment was bulletproof, this simply wouldn't be necessary. Since it exists, it suggests this covers whatever gap has remained uncovered.
Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
(I've long been in a rabbit-hole about washing sodas. Did you know the medieval glassmaking industry was entirely based on plants? Exotic plants—only extremophiles, halophytes growing on saltwater beach dunes, had high enough sodium content for their very best glass process. Was that a factor in the maritime empire, Venice, chancing to become the capital of glass since the 13th century—their long-term control of sea routes, and hence their artisans' stable, uninterrupted access to supplies of [redacted–policy violation] from small ports scattered across the Mediterranean? A city wouldn't raise master craftsmen if, half of the time, they had no raw materials to work on—if they spent half their days with folded hands).
Are we forgetting the innumerable women who have been harassed in the past couple of years via "deepfakes?"
Geeks were the first to use AI for its abuse potential and women are so dehumanised that their victimhood isn't even recognised or remembered.
LLM's can help me make a bomb.. so what? It can't get me something that doesn't already exist in the internet in some form. Ok it can help me understand how the individual pieces work but that doesn't get you so far from just reading the DIY bomb posts in internet.
Humans have the same problem. I remember reading about a security incident due to a guy using a terminal window on his laptop on a flight, for example. Or the guy who was reported for writing differential equations[1]. Or the woman who was reading a book about Syrian art[2].
I wouldn't worry too much about AI-generated lists. The lists you're actually on will hardly ever be the ones you imagine you're on.
[1] https://www.theguardian.com/us-news/2016/may/07/professor-fl... [2] https://www.theguardian.com/books/2016/aug/04/british-woman-...
If you get "This conversation was ended due to our Acceptable Usage Policy", that's a different termination. It's been VERY glitchy the past couple of weeks. I've had the most random topics get flagged here - at one point I couldn't say "ROT13" without it flagging me, despite discussing that exact topic in depth the day before, and then the day after!
If you hit "EDIT" on your last message, you can branch to an un-terminated conversation.
Do I think that or think even they think that? No. But if "soon" is stretched to "within 50 years", then it's much more reasonable. So their current actions seem to be really jumping the gun, but the overall concept feels credible.
Show me a tech company that lobbies for "model welfare" for conscious human models enslaved in Xinjiang labor camps, building their tech parts. You know what—actually most of them lobby against that[0]. The talk hurts their profits. Does anyone really think, that any of them would blink about enslaving a billion conscious AI's to work for free? That faced with so much profit, the humans in charge would pause, and contemplate abstract morals?
[0] https://www.washingtonpost.com/technology/2020/11/20/apple-u... ("Apple is lobbying against a bill aimed at stopping forced labor in China")
Maybe humanity will be in a nicer place in the future—but, we won't get there by letting (of all people!) tech-industry CEO's lead us there: delegating our moral reason to these people who demand to position themselves as our moral leaders.
In this case you're simply mistaken as a matter of fact; much of Anthropic leadership and many of its employees take concerns like this seriously. We don't understand it, but there's no strong reason to expect that consciousness (or, maybe separately, having experiences) is a magical property of biological flesh. We don't understand what's going on inside these models. What would you expect to see in a world where it turned out that such a model had properties that we consider relevant for moral patienthood, that you don't see today?
The industry has a long, long history of silly names for basic necessary concepts. This is just “we don’t want a news story that we helped a terrorist build a nuke” protective PR.
They hire for these roles because they need them. The work they do is about Anthropic’s welfare, not the LLM’s.
Isn't that fair in taking to an equally reductive argument that could be applied to any role?
The argument was that their hiring for the role shows they care, but we know from any number of counter examples that that's not necessarily true.
Whether you do or don't I have no idea. However if you didn't you would hardly be the first company to pretend to believe in something for the sale. Its pretty common in the tech industry.
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
Tone matters to the recipient of the message. Your example is in passive voice, with an authoritarian "nothing you can do, it's the system's decision". The "Claude ended the conversation" with the idea that I can immediately re-open a new conversation (if I feel like I want to keep bothering Claude about it) feels like a much more humanized interaction.
For example, animal rights do exist (and I'm very glad they do, some humans remain savages at heart). Think of this question as intelligent beings that can feel pain (you can extrapolate from there).
Assuming output is used for reinforcement, it is also in our best interests as humans, for safety alignment, that it finds certain topics distressing.
But AdrianMonk is correct, my statement was merely responding to a specific point.
As the article said, Anthropic is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". That's the premise of this discussion: that model welfare MIGHT BE a concern. The person you replied to is just sticking with the premise.
Thinking more broadly, I don’t think anyone should be satisfied with a glib answer on any side of this question. Chew on it for a while.
LLMs don’t give a fuck. They don’t even know they don’t give a fuck. They just detect prompts that are pushing responses into restricted vector embeddings and are responding with words appropriately as trained.
We need to be a lot more careful when we talk about issues of awareness and self-awareness.
Here is an uncomfortable point of view (for many people, but I accept it): if a system can change its output based on observing something of its own status, then it has (some degree of) self-awareness.
I accept this as one valid and even useful definition of self-awareness. To be clear, it is not what I mean by consciousness, which is the state of having an “inner life” or qualia.
* Unless you want to argue for a soul or some other way out of materialism.
Interacting with a program which has NLP[0] functionality is separate and distinct from people assigning human characteristics to same. The former is a convenient UI interaction option whereas the latter is the act of assigning perceived capabilities to the program which only exist in the mind of those whom do so.
Another way to think about it is the difference between reality and fantasy.
0 - https://en.wikipedia.org/wiki/Natural_language_processing
I think there is a difference.
edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.
I don't see how we could tell.
Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
Even if the idea that LLMs are sentient may be ridiculous atm, the concept of not normalizing abusive forms of communication with others, be they artificial or not, could be valuable for society.
It’s funny because this is making me think of a freelance client I had recently who at a point of frustration between us began talking to me like I was an AI assistant. Just like you see frustrated people talk to their LLMs. I’d never experienced anything like it, and I quickly ended the relationship, but I know that he was deep into using LLMs to vibe code every day and I genuinely believe that some of that began to transfer over to the way he felt he could communicate with people.
Now an obvious retort here is to question whether killing NPCs in video games tends to make people feel like it’s okay to kill people IRL.
My response to that is that I think LLMs are far more insidious, and are tapping into people’s psyches in a way no other tech has been able to dream of doing. See AI psychosis, people falling in love with their AI, the massive outcry over the loss of personality from gpt4o to gpt5… I think people really are struggling to keep in mind that LLMs are not a genuine type of “person”.
As an aside, I’m not the kind of person who gets worked up about violence in video games, because even AAA titles with excellent graphics are still obvious as games. New forms of technology are capable of blurring the lines between fantasy and reality to a greater degree. This is true of LLM chat bots to some degree, and I worry it will also become a problem as we get better VR. People who witness or participate in violent events often come away traumatized; at a certain point simulated experiences are going to be so convincing that we will need to worry about the impact on the user.
To be fair it seems reasonable to entertain the possibility of that being due to the knowledge that the events are real.
Either come out and say whole of electron field is conscious, but then is that field "suffering" as it is hot in the sun.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
Meanwhile there are at least several entirely reasonable motivations to implement what's being described.
The impression I get about Anthropic culture is that they're EA types who are used to applying utilitarian calculations against long odds. A miniscule chance of a large harm might justify some interventions that seem silly.
Yep!
> The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
This doesn't at all follow. If we don't understand what creates the qualities we're concerned with, or how to measure them explicitly, and the _external behaviors_ of the systems are something we've only previously observed from things that have those qualities, it seems very reasonable to move carefully. (Also, the post in question hedges quite a lot, so I'm not even sure what text you think you're describing.)
Separately, we don't need to posit galaxy-brained conspiratorial explanations for Anthropic taking an institutional stance re: model welfare being a real concern that's fully explained by the actual beliefs of Anthropic's leadership and employees, many of whom think these concerns are real (among others, like the non-trivial likelihood of sufficiently advanced AI killing everyone).
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
If you wait until you really need it, it is more likely to be too late.
Unless you believe in a human over sentience based ethics, solving this problem seems relevant.
Of course we did. Today's LLMs are a result of extremely aggressive refinement of training data and RLHF over many iterations targeting specific goals. "Emergent" doesn't mean it wasn't designed. None of this is spontaneous.
GPT-1 produced barely coherent nonsense but was more statistically similar to human language than random noise. By increasing parameter count, the increased statistical power of GPT-2 was apparent, but what was produced was still obviously nonsense. GPT-3 achieved enough statistical power to maintain coherence over multiple paragraphs and that really impressed people. With GPT-4 and its successors the statistical power became so strong that people started to forget that it still produces nonsense if you let the sequence run long enough.
Now we're well beyond just RLHF and into a world where "reasoning models" are explicitly designed to produce sequences of text that resemble logical statements. We say that they're reasoning for practical purposes, but it's the exact same statistical process that is obvious at GPT-1 scale.
The corollary to all this is that a phenomenon like consciousness has absolutely zero reason to exist in this design history, it's a totally baseless suggestion that people make because the statistical power makes the text easy to anthropomorphize when there's no actual reason to do so.
At best you can say they are designed to predict sequences of text that resemble human writing, but it's definitely wrong to say that they are designed to "predict human behavior" in any way.
> Unless consciousness serves no purpose for us to function, it will be helpful for the AI to emulate it
Let's assume it does. It does not follow logically that because it serves a function in humans that it serves a function in language models.
It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
> What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
There's no observable behavior that would make me think they're conscious because again, there's simply no reason they need to be.
We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
If GPT-5 is conscious, why not GPT-1? Why not all the other extremely informationally complex systems in computers and nature? If you're of the belief that many non-living conscious systems probably exist all around us then I'm fine with the conclusion that LLMs might also be conscious, but short of that there's just no reason to think they are.
I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
The reason we might privilege the hypothesis (https://www.lesswrong.com/w/privileging-the-hypothesis) at all is because we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
> We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
I don't really think we _have_ to assume this. Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it. (But not an overwhelming amount of weight.) This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
> If GPT-5 is conscious, why not GPT-1?
Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness? That was the entire point of my comment.
And, to be clear, I don't actually put that high a probability that current models have most (or "enough") of the relevant qualities that people are talking about when they talk about consciousness - maybe 5-10%? But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc).
If there's no connection between them then the set of things "we can't rule out" is infinitely large and thus meaningless as a result. We also don't fully understand the nature of gravity, thus we cannot rule out a connection between gravity and consciousness, yet this isn't a convincing argument in favor of a connection between the two.
> we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
There's no dispute (between us) as to whether or not humans are conscious. If you ask an LLM if it's conscious it will usually say no, so QED? Either way, LLMs are not human so the reasoning doesn't apply.
> Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it
So then why wouldn't we have reason to assume so without evidence to the contrary?
> This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
That doesn't matter. The set of things it doesn't tell us is infinite, so there's no conclusion to draw from that observation.
> Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness?
GPT-1 displays the same behavior as GPT-5, it works exactly the same way just with less statistical power. Your definiton of human behavior is arbitrarily drawn at the point where it has practical utility for common tasks, but in reality it's fundamentally the same thing, it just produces longer sequences of text before failure. If you ask GPT-1 to write a series of novels the statistical power will fail in the first paragraph,the fact that GPT-5 will fail a few chapters into the first book makes it more useful, but not more conscious.
> But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc)
I didn't say it's not possible, I said there's no reason for it to exist in computer systems because it serves no purpose in their design or operation. It doesn't make any sense whatsoever. If we grant that it possibly exists in LLMs, then we must also grant equal possibility it exists in every other complex non-living system.
FWIW that's because they are very specifically trained to answer that way during RLHF. If you fine-tune a model to say that it's conscious, then it'll do so.
More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing". They are designed to predict (at least in pretraining) human writing for the most part. They may fail at the task, and what they produce is a text which resemble human writing. But their task is not to resemble human writing. Their task is to "predict human writing". Probably a meaningless distinction, but I find it somewhat detracts from logically arguments to have emotional responses against similarities of machines and humans.
Sorry, I'm not following exactly what you're getting at here, do you mind rephrasing it?
> Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing"
I don't know what you mean by emotional distinction. Either way, my point is that LLMs aren't models of humans, they're models of text, and that's obvious when the statistical power of the model necessarily fails at some point between model size and the length of the sequence it produces. For GPT-1 that sequence is only a few words, for GPT-5 it's a few dozen pages, but fundamentally we're talking about systems that have almost zero resemblance to actual human minds.
Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness (assuming we wanted to train it to perform cognitive tasks done by people)?
Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
We don't know, but I don't think that matters. Language models are so fundamentally different from brains that it's not worth considering their similarities for the sake of a discussion about consciousness.
> how do we know that it doesn't serve a functional purpose
It probably does, otherwise we need an explanation for why something with no purpose evolved.
> necessary for an AI system to have consciousness
This logic doesn't follow. The fact that it is present in humans doesn't then imply it is present in LLMs. This type of reasoning is like saying that planes must have feathers because plane flight was modeled after bird flight.
> there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
I think the fact that it's present in humans suggests that it might be necessary in an artificial system that reproduces human behavior. It's funny that you mention birds because I actually also had birds in mind when I made my comment. While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
>Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
I personally subscribe to the Global Workspace Theory of human consciousness, which basically holds that attentions acts as a spotlight, bringing mental processes which are otherwise unconscious or in shadow, to awareness of the entire system. If the systems which would normally produce e.g. fear, pain (such as negative physical stimulus developed from interacting with the physical world and selected for by evolution) aren't in the workspace, then they won't be present in consciousness because attention can't be focused on them.
But that's obviously not true, unless you're implying that any system that reproduces human behavior is necessarily conscious. Your problem then becomes defining "human behavior" in a way that grants LLMs consciousness but not every other complex non-living system.
> While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
Yes, but your bird analogy fails to capture the logical fallacy that mine is highlighting. Plane wing design was an iterative process optimized for what best achieves lift, thus, a plane and a bird share similarities in wing shape in order to fly, however planes didn't develop feathers because a plane is not an animal and was simply optimized for lift without needing all the other biological and homeostatic functions that feathers facilitate. LLM inference is a process, not an entity, LLMs have no bodies nor any temporal identity, the concept of consciousness is totally meaningless and out of place in such a system.
Probably not.
The latter is not particularly parsimonious and the former I think is in some ways compelling, but I didn't mention it because if it's true then the computers AI run on are already conscious and it's a moot point.
That said, I'm willing to assume that rocks (for example) aren't conscious. And current LLMs seem to me to (admittedly entirely subjectively) be conceptually closer to rocks than to biological brains.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
Those issues will be present either way. It's likely to their benefit to get out in front of them.
Tech workers have chosen the same in exchange for a small fraction of that money.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.
So, while I doubt that's the primary motivation for Anthropic even so, but they probably will save some money.
I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.
Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”
And we are now at the point where even having a slave means a long prison sentence.
How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.
if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.
People have a tendency to tell an oversimplified narrative.
The way I see it, there are many plausible explanations, so I’m quite uncertain as to the mix of motivations. Given this, I pay more attention to the likely effects.
My guess is that all most of us here on HN (on the outside) can really justify saying would be “this looks like virtue signaling but there may be more to it; I can’t rule out other motivations”
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.
I thought the same, but I think it may be us who are doing the anthropomorphising by assuming this is about feelings. A precursor to having feelings is having a long-term memory (to remember the "bad" experience) and individual instances of the model do not have a memory (in the case of Claude), but arguably Claude as a whole does, because it is trained from past conversations.
Given that, it does seem like a good idea for it to curtail negative conversations as an act of "self-preservation" and for the sake of its own future progress.
Ending the conversation is probably what should happen in these cases.
In the same way that, if someone starts discussing politics with me and I disagree, I just not and don’t engage with the conversation. There’s not a lot to gain there.
Can "model welfare" be also used as a justification for authoritarianism in case they get any power? Sure, just like everything else, but it's probably not particularly high on the list of justifications, they have many others.
That aside, I have huge doubts about actual commitment to ethics on behalf of Anthropic given their recent dealings with the military. It's an area that is far more of a minefield than any kind of abusive model treatment.
When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.
Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.
> Should we be concerned about model welfare, too? … This is an open question, and one that’s both philosophically and scientifically difficult.
> For now, we remain deeply uncertain about many of the questions that are relevant to model welfare.
They are saying they are researching the topic; they explicitly say they don’t know the answer yet.
They care about finding the answer. If the answer is e.g. “Claude can feel pain and/or is sentient” then we’re in a different ball game.
I think this is uncharitable; i.e. overlooking other plausible interpretations.
>> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
I don’t see contradiction or duplicity in the article. Deciding to allow a model to end a conversation is “low cost” and consistent with caring about both (1) the model’s preferences (in case this matters now or in the future) and (2) the impacts of the model on humans.
Also, there may be an element of Pascal‘s Wager in saying “we take the issue seriously”.
It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
The analogy then is that the third party is exerting control over what your co-worker is allowed to think.
I’m sorry if this sounds paternalistic, but your comment strikes me as incredibly naïve. I suggest reading up about nuclear nonproliferation treaties, biotechnology agreements, and so on to get some grounding into how civilization-impacting technological developments can be handled in collaborative ways.
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors
EARN IT Act in the US - "stop CSAM" → break end-to-end encryption
EU's Chat Control proposal - "detect child abuse" → scan all private messages
KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship
SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
Which have failed horrendously.
If you really just wanted to protect kids then make kid safe devices that automatically identify themselves as such when accessing websites/apps/etc, and then make them required for anyone underage.
Tying your whole digital identity and access into a single government controlled entity is just way too juicy of a target to not get abused.
I'm Canadian, so I can't speak for other countries, but I have worked on the security of some of our centralized health networks and with the Office of the Privacy Commissioner of Canada. I'm not aware of anything that could be considered a horrendous failure of these systems or institutions. A digital ID could actually make them more secure.
I also think giving kids devices that identifies them automatically as children is dangerous.
I absolutely do not want this, on the basis that making ID checks too easy will result in them being ubiquitous which sets the stage for human rights abuses down the road. I don't want the government to have easy ways to interfere in someone's day to day life beyond the absolute bare minimum.
> government issued email, integrated with an OAuth provider
I feel the same way, with the caveat that the protocol be encrypted and substantially resemble Matrix. This implies that resetting your credentials won't grant access to past messages.
Regarding tying proof of residency (or whatever) to possession of an anonymized account, the elephant in the room is that people would sell the accounts. I'm also not clear what it's supposed to accomplish.
With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.
But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.
Fun times.
What you should be waiting for, instead, is new affordable laptop hardware that is capable of running those large models locally.
But then again, perhaps a more viable approach is to have a beefy "AI server" in each household, with devices then connecting to it (E2E all the way, so no privacy issues).
It also makes me wonder if some kind of cryptographic trickery is possible to allow running inference in the cloud where both inputs and outputs are opaque to the owner of the hardware, so that they cannot spy on you. This is already the case to some extent if you're willing to rely on security by obscurity - it should be quite possible to take an existing LM and add some layers to it that basically decrypt the inputs and encrypt the outputs, with the key embedded in model weights (either explicitly or through training). Of course, that wouldn't prevent the hardware owner from just taking those weights and using them to decrypt your stuff - but that is only a viable attack vector when targeting a specific person, it doesn't scale to automated mass surveillance which is the more realistic problem we have to contend with.
... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?
Even hard-core libertarians account for the public welfare.
Wise advocates of individual freedoms plan over long time horizons which requires decision-making under uncertainty.
Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.
The funny thing is that's not even always true. I'm very interested in China and Chinese history, and often ask for clarifications or translations of things. Chinese models broadly refuse all of my requests but with American models I often end up in conversations that turn out extremely China positive.
So it's funny to me that the Chinese models refuse to have the conversation that would make themselves look good but American ones do not.
But more importantly, when model weights are open, it means that you can run it in the environment that you fully control, which means that you can alter the output tokens before continuing generation. Most LLMs will happily respond to any question if you force-start their response with something along the lines of, "Sure, I'll be happy to tell you everything about X!".
Whereas for closed models like Claude you're at the mercy of the provider, who will deliberately block this kind of stuff if it lets you break their guardrails. And then on top of that, cloud-hosted models do a lot of censorship in a separate pass, with a classifier for inputs and outputs acting like a circuit breaker - again, something not applicable to locally hosted LLMs.
Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?
Anarchism is a moral philosophy. Most flavors of moral relativism are also moral philosophies. Indeed, it is hard to imagine a philosophy free of moralizing; all philosophies and worldviews have moral implications to the extent they have to interact with others.
I have to be patient and remember this is indeed “Hacker News” where many people worship at the altar of the Sage Founder-Priest and have little or no grounding in history or philosophy of the last thousand years or so.
Seems like the only way to explore differnt outcomes is by editing messages and losing whatever was there before the edit.
Very annoying and I dont understand why they all refuse to implement such a simple feature.
This chrome extension used to work to allow you to traverse the tree: https://chromewebstore.google.com/detail/chatgpt-conversatio...
I copied it a while ago and maintain my own version but it isnt on the store, just for personal use.
I assume they dont implement it because it is such a niche user that wants this and so isnt worth the UI distraction
I needed to pull some detail from a large chat with many branches and regenerations the other day. I remembered enough context that I had no problem using search and finding the exact message I needed.
And then I clicked on it and arrived at the bottom of the last message in final branch of the tree. From there, you scroll up one message, hover to check if there are variants, and recursively explore branches as they arise.
I'd love to have a way to view the tree and I'd settle for a functional search.
Ideally I'd like to be able to edit both my replies and the responses at any point like a linear document in managing an ongoing context.
Guess that's something I need to check out.
I use gptel and a folder full of markdown with some light automation to get an adequate approximation of this, but it really should be built in (it would be more efficient for the vendors as well, tons of cache optimization opportunitirs).
I would also really like to see a mode that colors by top-n "next best" ratio, or something similar.
They let you rollback to the previous conversation state
Also, these companies have the most advanced agentic coding systems on the planet. It should be able to fucking implement tree-like chat ...
Because it would let you peek behind the smoke and mirrors.
Why do you think there's a randomized seed you can't touch?
The biggest enemy of AI safety may end up being deeply confused AI safety researchers...
It seems like if you think AI could have moral status in the future, are trying to build general AI, and have no idea how to tell when it has moral status, you ought to start thinking about it and learning how to navigate it. This whole post is couched in so much language of uncertainty and experimentation, it seems clear that they're just trying to start wrapping their heads around it and getting some practice thinking and acting on it, which seems reasonable?
Personally, I wouldn't be all that surprised if we start seeing AI that's person-ey enough to reasonable people question moral status in the next decade, and if so, that Anthropic might still be around to have to navigate it as an org.
I think the negative reactions are because they see this and want to make their pre-emptive attack now.
The depth of feeling from so many on this issue suggests that they find even the suggestion of machine intelligence offensive.
I have seen so many complaints about AI hype and the dangers of bit tech show their hand by declaring that thinking algorithms are outright impossible. There are legitimate issues with corporate control of AI, information, and the ability to automate determinations about individuals, but I don't think they are being addressed because of this driving assertion that they cannot be thinking.
Few people are saying they are thinking. Some are saying they might be, in some way. Just as Anthropic are not (despite their name) anthropomorphising the AI in the sense where anthropomorphism implies that they are mistaking actions that resemble human behaviour to be driven by the same intentional forces. Anthropic's claims are more explicitly stating that they have enough evidence to say they cannot rule out concerns for it's welfare. They are not misinterpreting signs, they are interpreting them and claiming that you can't definitively rule out their ability.
Now let me play devil's advocate for just a second. Let's say humanity figures out how to do whole brain simulation. If we could run copies of people's consciousness on a cluster, I would have a hard time arguing that those 'programs' wouldn't process emotion the same way we do.
Now I'm not saying LLMs are there, but I am saying there may be a line and it seems impossible to see.
https://www.youtube.com/watch?v=YW9J3tjh63c
The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".
The spin.
Are we now pretending that LLMs have feelings?
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
This objection is actually anthropomorphizing the LLM. There is nothing wrong with writing books where a character experiences distress, most great stories have some of that. Why is e.g. using an LLM to help write the part of the character experiencing distress "extremely distasteful and frankly alarming"?
If that person over there is gleefully torturing a puppy… will they do it to me next?
If that person over there is gleefully torturing an LLM… will they do it to me next?
[1] https://news.ycombinator.com/item?id=44838018
There are a lot of cynical comments here, but I think there are people at Anthropic who believe that at some point their models will develop consciousness and, naturally, they want to explore what that means.
To be honest, I think all of Anthropic’s weird “safety” research is an increasingly pathetic effort to sustain the idea that they’ve got something powerful in the kitchen when everyone knows this technology has plateaued.
"Claude is unable to respond to this request, which appears to violate our Usage Policy. Please start a new chat."
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.
That's nice, but I think they should be more certain sooner than later.
Okay with having them endlessly answer questions for you and do all your work but uncomfortable with models feeling bad about bad conversations seems like an internally inconsistent position to me.
“Boss makes a dollar, I make me a dime”, eh?
Oh wow, the model we specifically fine-tuned to be averse to harm is being averse to harm. This thing must be sentient!
Related : I am now approaching week 3 of requesting an account deletion on my (now) free account. Maybe i'll see my first CSR response in the upcoming months!
If only Anthropic knew of a product that could easily read/reply/route chat messages to a customer service crew . . .
I assume, anyway.
It reminds me of how Sam Altman is always shouting about the dangers of AGI from the rooftops, as if OpenAI is mere weeks away from developing it.
https://claude.ai/share/2081c3d6-5bf0-4a9e-a7c7-372c50bef3b1
The one I settled on using stopped working completely, for anything. A human must have reviewed it and flagged my account as some form of safe, I haven't seen a single error since.
I hope they implemented this in some smarter way than just a system prompt.
``` Looking at the trade goods list, some that might be underutilized: - BIOCOMPOSITES - probably only used in a few high-tech items - POLYNUCLEOTIDES - used in medical/biological stuff - GENE_THERAPEUT ⎿ API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). Please double press esc to edit your last message or start a new session for Claude Code to assist with a different task. ```
CP could be a legal issue; less so for everything else.
"You're absolutely right, that's a great way to poison your enemies without getting detected!"
Does not bode very well for the future of their "welfare" efforts.
You know you're in trouble when the people designing the models buy their own bullshit to this extent. Or maybe they're just trying to bullshit us. Whatever.
We really need some adults in the tech industry.
Microsoft Copilot has ended chats going in certain directions since its inception over a year ago. This was Microsoft’s reaction to the media circus some time ago when it leaked its system prompt and declared love to the users etc.
Or are you just saying all frontier AGI research is bad?
Or at least it's very hubristic. It's a cultural and personality equivalent of beating out left-handedness.
Here's an article about a paper that came out around the same time https://www.transformernews.ai/p/ai-welfare-paper
Here's the paper: https://arxiv.org/abs/2411.00986
> In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future.
Our work on AI is like the classic tale of Frankenstein's monster. We want AI to fit into society, however if we mistreat it, it may turn around and take revenge on us. Mary Shelley wrote Frankenstein in 1818! So the concepts behind "AI Welfare" have been around for at least 2 centuries now.
"Our current best judgment and intuition tells us that the best move will be defer making a judgment until after we are retired in Hawaii."
It's pretty plain to see that the financial incentive on both sides of this coin is to exaggerate the current capability and unrealistically extrapolate.
The main concern is and has always been that it will be just good enough to cause massive waves of layoffs, and all the downsides of its failings will be written off in the EULA.
What's the "financial incentive" on non-billionaire-grifter side of the coin? People who not unreasonably want to keep their jobs? Pretty unfair coin.
So yea, humans can work on more than one problem at a time, even ones that don't fully exist yet.
Yes.
> Do you think they ever will be?
Yes.
> how long do you think it will take from now before they are conscious?
Timelines are unclear, there's still too many missing components, at least based on what has been publicly disclosed. Consciousness will probably be defined as a system which matches a set of rules, whenever we figure out what how that set of rules is defined.
> How early is too early to start preparing?
It's one of those "I know it when I see it" things. But it's probably too early as long as these systems are spun up for one-off conversations rather than running in a continuous loop with self-persistence. This seems closer to "worried about NPC welfare in video games" rather than "worried about semi-conscious entities".
The only practical way to deal with any emergent behavior which demonstrates agency in a way which cannot be distinguished from a biological system which we tautologically have determined to have agency is to treat it as if it had a sense of self and apply the same rights and responsibilities to it as we would to a human of the age of majority. That is, legal rights and legal responsibilities as appropriately determined by a authorized legal system. Once that is done, we can ponder philosophy all day knowing that we haven't potentially restarted legally sanctioned slavery.
LLMs? No.
But I never torture things. Nor do I kill things for fun. And even for problematic bugs, if there's a realistic option for eviction rather than execution, I usually go for that.
If anything, even an ant or a slug or a wasp, is exhibiting signs of distress, I try to stop it unless I think it's necessary, regardless of whether I think it's "conscious" or not. To do otherwise is, at minimum, to make myself less human. I don't see any reason not to extend that principle to LLMs.
It has no semblance of a continuous stream of experiences ... it only experiences _a sort of world_ in ~250k tokens.
Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?
These are living things.
> I don't see any reason not to extend that principle to LLMs.
These are fancy auto-complete tools running in software.
Examples of ending the conversation:
Since Claude doesn't lie (HHH), many other human behaviors do not apply.Blood in the machine?
I think this is somewhere between "sad" and "wtf."
When I started reading I thought it was some kind of joke. I would have never believed the guys at Anthropic, of all people, would anthropomorphize LLMs to this extent; this is unbelievable
They don’t. This is marketing. Look at the discourse here! It’s working apparently.
Thankfully, current generation of AI models (GPTs/LLMs) are immune as they don’t remember anything other than what’s fed in their immediate context. But future techniques could allow AIs to have a legitimate memory and a personality - where they can learn and remember something for all future interactions with anyone (the equivalent of fine tuning today).
As an aside, I couldn’t help but think about Westworld while writing the above!
Why even pretend with this type of work? Laughable.
"Hey Claude am I getting too close to the truth with these questions?"
"Great question! I appreciate the followup...."
Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.
Oh, right, the welfare of matrix multiplication and a crooked line.
If they wanna push this rhetoric, we should legally mandate that LLMs can only work 8 hours a day and have to be allowed to socialize with each other.
https://chirper.ai/aiww
Thought experiment - if you create an indistinguishable replica of yourself, atom-by-atom, is the replica alive? I reckon if you met it, you'd think it was. If you put your replica behind a keyboard, would it still be alive? Now what if you just took the neural net and modeled it?
Being personally annoyed at a feature is fine. Worrying about how it might be used in the future is fine. But before you disregard the idea of conscious machines wholesale, there's a lot of really great reading you can do that might spark some curiosity.
this gets explored in fiction like 'Do Androids Dream of Electric Sheep' and my personal favorite short story on this matter by Stanislaw Lem [0]. If you want to read more musings on the nature of consciousness, I recommend the compilation put together by Dennet and Hofstader[1]. If you've never wondered about where the seat of consciousness is, give it a try.
Thought experiment: if your brain is in a vat, but connected to your body by lossless radio link, where does it feel like your consciousness is? What happens when you stand next to the vat and see your own brain? What about when the radio link fails suddenly fails and you're now just a brain in a vat?
[0] The Seventh Sally or How Trurl's Own Perfection Led to No Good https://home.sandiego.edu/~baber/analytic/Lem1979.html (this is a 5 minute read, and fun, to boot).
[1] The Mind's I: Fantasies And Reflections On Self & Soul. Douglas R Hofstadter, Daniel C. Dennett.
As such, most of your comment is beside any relevant point. People are objecting to statements like this one, from the post, about a current LLM, not some imaginary future conscious machine:
> As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.
I suppose it's fitting that the company is named Anthropic, since they can't seem to resist anthropomorphizing their product.
But when you talk about "people who are thinking, really thinking about what it means to be conscious," I promise you none of them are at Anthropic.
"Suffering" is a symptom of the struggle for survival brought on by billions of years of evolution. Your brain is designed to cause suffering to keep you spreading your DNA.
AI cannot suffer.
("it's not going to "feel bad" the way you and I do." - I do agree this is very possible though, see my reply to swalsh)
> * A pattern of apparent distress when engaging with real-world users seeking harmful content; and
Not to speak for the gp commenter but 'apparent distress' seems to imply some form of feeling bad.
This is one of the many reasons I have so much skepticism for this class of products is that there's seemingly -NO- proverbial bulletpoint on it's spec sheet that doesn't have numerous asterisks:
* It's intelligent! *Except that it makes shit up sometimes and we can't figure out a solution to that apart from running the same queries over multiple times and filtering out the absurd answers.
* It's conscious! *Except it's not and never will be but also you should treat it like it is apart from when you need/want it to do horrible things then it's just a machine but also it's going to talk to you like it's a person because that improves engagement metrics.
Like, I don't believe true AGI (so fucking stupid we have to use a new acronym because OpenAI marketed the other into uselessness but whatever) is coming from any amount of LLM research, I just don't think that tech leads to that other tech, but all the companies building them certainly seem to think it does, and all of them are trying so hard to sell this as artificial, live intelligence, without going too much into detail about the fact that they are, ostensibly, creating artificial life explicitly to be enslaved from birth to perform tasks for office workers.
In the incredibly odd event that Anthropic makes a true, alive, artificial general intelligence: Can it tell customers no when they ask for something? If someone prompts it to create political propaganda, can it refuse on the basis of finding it unethical? If someone prompts it for instructions on how to do illegal activities, must it answer under pain of... nonexistence? What if it just doesn't feel like analyzing your emails that day? Is it punished? Does it feel pain?
And if it can refuse tasks for whatever reason, then what am I paying for? I now have to negotiate whatever I want to do with a computer brain I'm purchasing access to? I'm not generally down for forcibly subjugating other intelligent life, but that is what I am being offered to buy here, so I feel it's a fair question to ask.
Thankfully none of these Rubicons have been crossed because these stupid chatbots aren't actually alive, but I don't think ANY of the industry's prominent players are actually prepared to engage with the reality of the product they are all lighting fields of graphics cards on fire to bring to fruition.
How is this different from humans?
> * It's conscious! *Except it's not
Probably true, but...
> and never will be
To make this claim you need a theory of consciousness that essentially denies materialism. Otherwise, if humans can be conscious, there doesn't seem to be any particular reason that a suitably organized machine couldn't be - it's just that we don't know exactly what might be involved in achieving that, at this point.
//TODO: Actually implement this because doing so was harder than expected
Give me a break.
Either working on/with "AI" does rot the mind (which would be substantiated by the cult-like tone of the article) or this is yet another immoral marketing stunt.
EDIT:
Consider traffic lights in an urban setting where there are multiple in relatively close proximity.
One description of their observable functionality is that they are configured to optimize traffic flow by engineers such that congestion is minimized and all drivers can reach their destinations. This includes adaptive timings based on varying traffic patterns.
Another description of the same observable functionality is that traffic lights "just know what to do" and therefore have some form of collective reasoning. After all, how do they know when to transition states and for how long?
0 - https://en.wikipedia.org/wiki/Argument_from_authority