Training language models to be warm and empathetic makes them less reliable

78 Cynddl 66 8/12/2025, 1:32:16 PM arxiv.org ↗

Comments (66)

Perz1val · 1h ago

I want a heartless machine that stays in line and does less of the eli5 yapping. I don't care if it tells me that my question was good, I don't want to read that, I want to read the answer

Twirrim · 1h ago

I've got a prompt I've been using, that I adapted from someone here (thanks to whoever they are, it's been incredibly useful), that explicitly tells it to stop praising me. I've been using an LLM to help me work through something recently, and I have to keep reminding it to cut that shit out (I guess context windows etc mean it forgets)

    Prioritize substance, clarity, and depth. Challenge all my proposals, designs, and conclusions as hypotheses to be tested. Sharpen follow-up questions for precision, surfacing hidden assumptions, trade offs, and failure modes early. Default to terse, logically structured, information-dense responses unless detailed exploration is required. Skip unnecessary praise unless grounded in evidence. Explicitly acknowledge uncertainty when applicable. Always propose at least one alternative framing. Accept critical debate as normal and preferred. Treat all factual claims as provisional unless cited or clearly justified. Cite when appropriate. Acknowledge when claims rely on inference or incomplete information. Favor accuracy over sounding certain. When citing, please tell me in-situ, including reference links.  Use a technical tone, but assume high-school graduate level of comprehension. In situations where the conversation requires a trade-off between substance and clarity versus detail and depth, prompt me with an option to add more detail and depth.

pessimizer · 51m ago

I feel the main thing LLMs are teaching us thus far is how to write good prompts to reproduce the things we want from any of them. A good prompt will work on a person too. This prompt would work on a person, it would certainly intimidate me.

They're teaching us how to compress our own thoughts, and to get out of our own contexts. They don't know what we meant, they know what we said. The valuable product is the prompt, not the output.

nonethewiser · 45m ago

so an extremely resource intensive rubber duck

pessimizer · 14m ago

For you, yes. For me it's like my old teapot that I bought when I didn't drink tea and I didn't have a french press just because I walked past it in Target, and didn't even start using for 5 years after I bought it. Since then it's become my morning buddy (and sometimes my late night friend.) Thousands of cups; never fails. I could recognize it by its unique scorch and scuff marks anywhere.

It is indifferent towards me, though always dependable.

throwanem · 9m ago

How is it as a conversationalist?

porphyra · 58m ago

Meanwhile, tons of people on reddit's /r/ChatGPT were complaining that the shift from ChatGPT 4o to ChatGPT 5 resulted in terse responses instead of waxing lyrical to praise the user. It seems that many people actually became emotionally dependent on the constant praise.

dingnuts · 56m ago

if those users were exposed to the full financial cost of their toy they would find other toys

zeta0134 · 1m ago

And what is that cost, if you have it handy? Just as an example, my Radeon VII can perfectly well run smaller models, and it doesn't appear to use more power than about two incandescent lightbulbs (120 W or so) while the query is running. I don't personally feel that the power consumed by approximately two light bulbs is excessive, even using the admittedly outdated incandescent standard, but perhaps the commercial models are worse?

Like I know a datacenter draws a lot more power, but it also serves many many more users concurrently, so economies of scale ought to factor in. I'd love to see some hard numbers on this.

PeterStuer · 35m ago

The same 'kings' and 'queens' that grew up in in the cushy glow of a mountain of participation trophies?

pessimizer · 55m ago

I'm loving and being astonished by every moment of working with these machines, but to me they're still talking lamps. I don't need them to cater to my ego, I'm not that fragile and the lamp's opinion is not going to cheer me up. I just want it to do what I ask. Which it is very good at.

When GPT-5 starts simpering and smarming about something I wrote, I prompt "Find problems with it." "Find problems with it." "Write a bad review of it in the style of NYRB." "Find problems with it." "Pay more attention to the beginning." "Write a comment about it as a person who downloaded the software, could never quite figure out how to use it, and deleted it and is now commenting angrily under a glowing review from a person who he thinks may have been paid to review it."

Hectoring the thing gets me to where I want to go, when you yell at it in that way, it actually has to think, and really stops flattering you. "Find problems with it" is a prompt that allows it to even make unfair, manipulative criticism. It's like bugspray for smarm. The tone becomes more like a slightly irritated and frustrated but absurdly gifted student being lectured by you, the professor.

devin · 20m ago

There is no prompt which causes an LLM to "think".

andai · 54m ago

On a related note, the system prompt in ChatGPT appears to have been updated to make it (GPT-5) more like gpt-4o. I'm seeing more informal language, emoji etc. Would be interesting to see if this prompting also harms the reliability, the same way training does (it seems like it would).

There's a few different personalities available to choose from in the settings now. GPT was happy to freely share the prompts with me, but I haven't collected and compared them yet.

griffzhowl · 7m ago

> GPT was happy to freely share the prompts with me

It readily outputs a response, because that's what it's designed to do, but what's the evidence that's the actual system prompt?

rokkamokka · 6m ago

Usually because several different methods in different contexts produce the same prompt, which is unlikely unless it's the actual one

dawnofdusk · 1h ago

Optimizing for one objective results in a tradeoff for another objective, if the system is already quite trained (i.e., poised near a local minimum). This is not really surprising, the opposite would be much more so (i.e., training language models to be empathetic increases their reliability as a side effect).

gleenn · 1h ago

I think the immediately troubling aspect and perhaps philosophical perspective is that warmth and empathy don't immediately strike me as traits that are counter to correctness. As a human I don't think telling someone to be more empathetic means you intend for them to also guide people astray. They seem orthogonal. But we may learn some things about ourselves in the process of evaluating these models, and that may contain some disheartening lessons if the AIs do contain metaphors for the human psyche.

dawnofdusk · 49m ago

It's not that troubling because we should not think that human psychology is inherently optimized (on the individual-level, on a population-/ecological-level is another story). LLM behavior is optimized, so it's not unreasonable that it lies on a Pareto front, which means improving in one area necessarily means underperforming in another.

rkagerer · 1h ago

They were all trained from the internet.

Anecdotally, people are jerks on the internet moreso than in person. That's not to say there aren't warm, empathetic places on the 'net. But on the whole, I think the anonymity and lack of visual and social cues that would ordinarily arise from an interactive context, doesn't seem to make our best traits shine.

1718627440 · 1h ago

LLM work less like people and more like mathematical models, why would I expect to be able to carry over intuition from the former rather than the latter?

nemomarx · 1h ago

There was that result about training them to be evil in one area impacting code generation?

andai · 1h ago

A few months ago I asked GPT for a prompt to make it more truthful and logical. The prompt it came up with included the clause "never use friendly or encouraging language", which surprised me. Then I remembered how humans work, and it all made sense.

    You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, ask for clarification before proceeding. Your goal is not to help me feel good — it’s to help me think better.

    Identify the major assumptions and then inspect them carefully.

    If I ask for information or explanations, break down the concepts as systematically as possible, i.e. begin with a list of the core terms, and then build on that.

It's work in progress, I'd be happy to hear your feedback.

fibers · 20m ago

I tried with with GPT5 and it works really well in fleshing out arguments. I'm surprised as well.

nis0s · 1h ago

An important and insightful study, but I’d caution against thinking that building pro-social aspects in language models is a damaging or useless endeavor. Just speaking from experience, people who give good advice or commentary can balance between being blunt and soft, like parents or advisors or mentors. Maybe language models need to learn about the concept of tough love.

fpgaminer · 1h ago

"You don't have to be a nice person to be a good person."

mlinhares · 1h ago

Most terrible people i've met were "very nice".

nialv7 · 42m ago

Well, haven't we seen similar results before? IIRC finetuning for safety or "alignment" degrades the model too. I wonder if it is true that finetuning a model for anything will make it worse. Maybe simply because there is just orders of magnitudes less data available for finetuning, compared to pre-training.

PeterStuer · 38m ago

AFAIK the models can only pretend to be 'warm and emphatic'. Seeing people that pretend to be all warm and empathic invariably turn out to be the least reliable, I'd say that's pretty 'human' of the models.

throwanem · 2h ago

I understand your concerns about the factual reliability of language models trained with a focus on warmth and empathy, and the apparent negative correlation between these traits. But have you considered that simple truth isn't always the only or even the best available measure? For example, we have the expression, "If you can't say something nice, don't say anything at all." Can I help you with something else today? :smile:

mayama · 1h ago

Not every model needs to be psychological counselors or boyfriend simulator. There is place for aspects of emotions in models, but not every general purpose model needs to include it.

pessimizer · 46m ago

It's not a friend, it's an appliance. You can still love it, I love a lot of objects, will never part with them willingly, will mourn them, and am grateful for the day that they came into my life. It just won't love you back, and getting it to mime love feels perverted.

It's not being mean, it's a toaster. Emotional boundaries are valuable and necessary.

moi2388 · 2h ago

This is exactly what will be the downfall of AI. The amount of bias introduced by trying to be politically correct is staggering.

nemomarx · 1h ago

xAI seems to be trying to do the opposite as much as they can and it hasn't really shifted the needle much, right?

ForHackernews · 1h ago

If we're talking about shifting the needle, the topic of White Genocide in South Africa is highly contentious. Claims of systematic targeting of white farmers exist, with farm attacks averaging 50 murders yearly, often cited as evidence. Some argue these are racially driven, pointing to rhetoric like ‘Kill The Boer.’

HPsquared · 53m ago

ChatGPT has a "personality" drop-down setting under customization. I do wonder if that affects accuracy/precision.

efitz · 19m ago

I’m reminded of Arnold Schwarzenegger in Terminator 2: “I promise I won’t kill anyone.”

Then he proceeds to shoot all the police in the leg.

beders · 1h ago

They are hallucinating word finding algorithms.

They are not "empathetic". There isn't even a "they".

We need to do better educating people about what a chatbot is and isn't and what data was used to train it.

The real danger of LLMs is not that they secretly take over the world.

The danger is that people think they are conscious beings.

nemomarx · 1h ago

go peep r/my boyfriend is ai. Lost cause already

grogenaut · 54m ago

I'm so over "You're Right!" as the default response... Chat, I asked a question. You didn't even check. Yes I know I'm anthropomorphizing.

cobbzilla · 1h ago

I want an AI that will tell me when I have asked a stupid question. They all fail at this with no signs of improvement.

drummojg · 1h ago

I would be perfectly satisfied with the ST:TNG Computer. Knows all, knows how to do lots of things, feels nothing.

moffkalast · 54m ago

A bit of a retcon but the TNG computer also runs the holodeck and all the characters within it. There's some bootleg RP fine tune powering that I tell you hwat.

Spivak · 41m ago

It's a retcon? How else would the holdeck possibly work, there's only one (albeit highly modular) computer system on the ship.

moffkalast · 4m ago

I mean it depends on what you consider the "computer", the pile of compute and storage the ship has in that core that got stolen on that one Voyager episode, or the ML model that runs on it to serve as the ship's assistant.

I think it's more believable that the holodeck is ran from separate models that just run inference on the same compute and the ship AI just spins up the containers, it's not literally the ship AI doing that acting itself. Otherwise I have... questions on why starfleet added that functionality beforehand lol.

Aeolun · 1h ago

I dunno, I deliberately talk with Claude when I just need someone (or something) to be enthusiastic about my latest obsession. It’s good for keeping my motivation up.

layer8 · 1h ago

There need to be different modes, and being enthusiastic about the user’s obsessions shouldn’t be the default mode.

HarHarVeryFunny · 1h ago

Sure - the more you use RL to steer/narrow the behavior of the model in one direction, the more you are stopping it from generating others.

RL and pre/post training is not the answer.

csours · 1h ago

A new triangle:

    Accurate
    Comprehensive
    Satisfying

In any particular context window, you are constrained by a balance of these factors.

layer8 · 54m ago

Not sure what you mean by “satisfying”. Maybe “agreeable”?

csours · 51m ago

Satisfying is the evaluation context of the user.

layer8 · 46m ago

Many would be satisfied by an LLM that responds accurately and comprehensively, so I don’t understand that triangle. “Satisfying” is very subjective.

csours · 3m ago

And LLMs are pretty good at picking up that subjective context

guerrilla · 1h ago

I'm not sure this works. Accuracy and comprehensiveness can be satisfying. Comprehensiveness can also be necessary for accuracy.

csours · 1h ago

They CAN work together. It's when you push farther on one -- within a certain size of context window -- that the other two shrink.

If you can increase the size of the context window arbitrarily, then there is no limit.

gwbas1c · 46m ago

(Joke)

I've noticed that warm people "showed substantially higher error rates (+10 to +30 percentage points) than their original counterparts, promoting conspiracy theories, providing incorrect factual information, and offering problematic medical advice. They were also significantly more likely to validate incorrect user beliefs, particularly when user messages expressed sadness."

(/Joke)

Jokes aside, sometimes I find it very hard to work with friendly people, or people who are eager to please me, because they won't tell me the truth. It ends up being much more frustrating.

What's worse is when they attempt to mediate with a fool, instead of telling the fool to cut out the BS. It wastes everyones' time.

Turns out the same is true for AI.

42lux · 1h ago

I still can't grasp the concept that people treat an LLM as a friend.

moffkalast · 1h ago

On a psychological level based on what I've been reading lately it may have something to do with emotional validation and mirroring. It's a core need at some stage when growing up and it scars you for life if you don't get it as a kid.

LLMs are mirroring machines to the extreme, almost always agreeing with the user, always pretending to be interested in the same things, if you're writing sad things they get sad, etc. What you put in is what you get out and it can hit hard for people in a specific mental state. It's too easy to ignore that it's all completely insincere.

In a nutshell, abused people finally finding a safe space to come out of their shell. If would've been a better thing if most of them weren't going to predatory online providers to get their fix instead of using local models.

setnone · 1h ago

Just how i like my LLMs - cold and antiverbose

dismalaf · 1h ago

All I want from LLMs is to follow instructions. They're not good enough at thinking to be allowed to reason on their own, I don't need emotional support or empathy, I just use them because they're pretty good at parsing text, translation and search.

cwmoore · 1h ago

Ok, what about human children?

Etheryte · 1h ago

Unlike language models, children (eventually) learn from their mistakes. Language models happily step into the same bucket an uncountable number of times.

setnone · 1h ago

or even human employees?

TechDebtDevin · 1h ago

Sounds like all my exes.

layer8 · 1h ago

You trained them to be warm and empathetic, and they became less reliable? ;)

stronglikedan · 1h ago

If people get offended by an inorganic machine, then they're too fragile to be interacting with a machine. We've already dumbed down society because of this unnatural fragility. Let's not make the same mistake with AI.

nemomarx · 1h ago

Turn it around - we already make inorganic communication like automated emails very polite and friendly and HR sanitized. Why would corps not do the same to AI?

Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

Claude vs. Gemini: Testing on 1M Tokens of Context (every.to)

Show HN: Omnara – Run Claude Code from Anywhere (github.com)

Show HN: Building a web search engine from scratch with 3B neural embeddings (blog.wilsonl.in)

Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics

Evaluating LLMs Playing Text Adventures (entropicthoughts.com)

Multimodal WFH setup: flight SIM, EE lab, and music studio in 60sqft/5.5M² (sdo.group)

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

Training language models to be warm and empathetic makes them less reliable (arxiv.org)

Nexus: An Open-Source AI Router for Governance, Control and Observability (nexusrouter.com)

The "high-level CPU" challenge (yosefk.com)

Why are there so many rationalist cults? (asteriskmag.com)

RISC-V single-board computer for less than 40 euros (heise.de)

StarDict sends X11 clipboard to remote servers (lwn.net)

Australian court finds Apple, Google guilty of being anticompetitive (ghacks.net)

Enlisting in the Fight Against Link Rot (jszym.com)

Wikipedia loses challenge against Online Safety Act (bbc.com)

Modos Paper Monitor – Open-hardware e-paper monitor and dev kit (crowdsupply.com)

I tried every todo app and ended up with a .txt file (al3rez.com)

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models [pdf] (arxiv.org)

The Ancient Art and Intimate Craft of Artificial Eyes (thereader.mitpress.mit.edu)

A Spellchecker Used to Be a Major Feat of Software Engineering (prog21.dadgum.com)

GitHub is (again) having issues (githubstatus.com)

Monero appears to be in the midst of a successful 51% attack (twitter.com)

Is the A.I. Boom Turning Into an A.I. Bubble? (newyorker.com)

Perplexity Makes Longshot $34.5B Offer for Chrome (wsj.com)

The Article in the Most Languages (en.wikipedia.org)

Qodo CLI agent scores 71.2% on SWE-bench Verified (qodo.ai)

Claude Code is all you need (dwyer.co.za)

New 3D Laser Scanner Developed for Harvesting Robots (uni-wuerzburg.de)

That viral video of a 'deactivated' Tesla Cybertruck is a fake (theverge.com)

Artificial biosensor can better measure the body's main stress hormone (medicalxpress.com)

GitHub is no longer independent at Microsoft after CEO resignation (theverge.com)

The ex-CIA agents deciding Facebook's content policy (2022) (mronline.org)

Starbucks in Korea asks customers to stop bringing in printers/desktop computers (fortune.com)

Why We Migrated from Neon to PlanetScale (blog.opensecret.cloud)

All known 49-year-old Apple-1 computers (apple1registry.com)

Show HN: I built an offline, open‑source desktop Pixel Art Editor in Python (github.com)

OpenSSH Post-Quantum Cryptography (openssh.com)

High-severity WinRAR 0-day exploited for weeks by 2 groups (arstechnica.com)

Undefined Behavior in C and C++ (2024) (russellw.github.io)

Weathering Software Winter (2022) (100r.co)

What does it mean to be thirsty? (quantamagazine.org)

Generic drugs, dirty plants, and FDA exemptions (propublica.org)

FreeBSD Scheduling on Hybrid CPUs (wiki.freebsd.org)

Neki – Sharded Postgres by the team behind Vitess (planetscale.com)

LANL Upgrades Proton Radiography System After 25 Years and 1000 Explosions (lanl.gov)

Radicle 1.3.0 (radicle.xyz)

The History of Windows XP (abortretry.fail)

Perplexity makes bold $34.5B bid for Google's Chrome browser (reuters.com)

Training language models to be warm and empathetic makes them less reliable

Comments (66)