LLMs are more persuasive than incentivized human persuaders

67 flornt 57 5/17/2025, 8:05:09 PM arxiv.org ↗

Comments (57)

metalcrow · 2h ago
My guess for the reason behind this is that LLMs have more facts memorized, and thus can make more reasonable and well-researched sounding answers. If you ask an LLM vs a Human "Is a stack in computer science a) a data structure that is first in first out or b) a data structure that is first in last out" the LLM can say stuff resembling "Based on Dijkstra's algorithm proof given in 1943 and the nature of Turing complete languages being traditionally a top-down oriented system, a stack is ..." while a human is just going to say "It's B because that's what a stack is".
CJefferson · 1h ago
Based on reading bad AI generated student essays it’s worse than that, LLMs are happy to “fill in the blanks” with whatever made up fact would make their argument look best.

Most people can’t lie that smoothly, and most readers don’t check carefully, unless they are already an expert in the area.

Any kind of maths proof is particularly bad, they will look convincing and clear until you read them very carefully and see all the holes.

hammock · 52m ago
Reminds me of the horrific state of student debate competitions today where the winning strategy is to incomprehensibly rattle off as many arguments as quickly as possible with strange breathy sounds in between
azemetre · 38m ago
Do you have a YouTube video demonstrating this? My only experience with debate is from the TV show Community.
justonceokay · 26m ago
This one is very short but conveys the idea well. Not all debate is like this but it is definitely a real phenomenon

https://youtu.be/LMO27PAHjrY

cwmoore · 16m ago
A small step for a man, a giantleapfrogmankind.
koakuma-chan · 1h ago
I asked an LLM and it said "A stack is a data structure that follows the Last In, First Out (LIFO) principle. This means that the last element added to the stack is the first element to be removed."
abtinf · 59m ago
It’s subtle but I would regard this as an incorrect answer.

The structure of the LLM answer is:

A is B; B exhibits property C.

The correct answer is:

A exhibits property C; B is the class of things with property C; therefore A is B.

There is a crucial difference between these two.

literalAardvark · 46m ago
This doesn't apply to all prompts, and the prompt was not provided. Natural language is a fickle thing.
moffkalast · 35m ago
This kind of pointless hair splitting is why people would rather talk to an LLM.
hansmayer · 1h ago
Yikes:( I am so worried about the damage that will be caused by the misuse of these tools. Already a lot of young folks will just mindlessly trust whatever the magic oracle spits out at them. We need to go back to testing people with pen and paper I suppose.
Karrot_Kream · 1h ago
I read this and I see a common thinking fallacy, when someone is inclined to believe something a priori they fit the evidence to their a priori beliefs.
hansmayer · 1h ago
No, its fairly simple - I misread
jstanley · 1h ago
Why is that a bad answer?
hansmayer · 1h ago
Sorry - I misread the LLM answer - actually the LLM produced a correct answer here
lovasoa · 1h ago
louthy · 1h ago
> No it is not…

That’s a queue, not a stack. The LLM response was correct.

danielbln · 1h ago
But a stack is commonly LIFO, not FIFO?!
idonotknowwhy · 17m ago
This reads like a line from a QwQ or Qwen3 CoT chain :)
koakuma-chan · 1h ago
I mean, is it wrong? It seems correct. Unless I'm missing something.
hansmayer · 1h ago
Oops, my bad. I seem to have misread. Sorry.
thinkcritical · 1h ago
No, a stack is LIFO like it said. A queue is FIFO or in other words LILO “Last In Last Out”.
koakuma-chan · 52m ago
My last job was at the office. I had my work queue implemented as a stack of files. I would sit at my desk and, in an infinite loop, pop files from my stack and process them. Occasionally, my supervisor would come and push a new file onto my stack. A naive worker would think that, once I was done with my stack, I could finally get some sleep, but no. Our office implemented something called "work stealing," where, once I was done with my own work, I had to visit a random co-worker and pop files from their stack.
lovasoa · 1h ago
No. The LLM's answer is correct.
armchairhacker · 1h ago
LLMs also never get tired of arguing. They'll respond to every point from a gish-gallop and provide quality-sounding replies to points that are obviously (to an informed person) flawed or seem (but aren't necessarily) mal-intentioned.

EDIT: LLMs also aren't egocentric; they'll respond in the other person's style (grammar, tone, and perhaps maintain their "subtext" like assumptions), and they're less likely to omit important information that would be implicit to them but not the other person.

Sharlin · 1h ago
The gap between LLM and human cases was greater in the deceptive case. This may, of course, simply reflect the fact that random humans are bad at lying.
Nevermark · 1h ago
A clear case where LLMs exceed humans is in identifying solutions to disparate shallow constraints involving what would normally require very wide searches of more knowledge than any of us will ever have.

A simple case I have found, is looking for existing or creating new terms. If I have a series of concepts, which I have names for which have a nice linguistic pattern to emphasize their close relationship, except for one. I can describe the regularly named concepts, then ask for suggestions for the remaining concept.

The LLM pulls from virtually every topic with domain terminology, repurposable languages (Greek, Roman), words from fiction, all the way to creative construction of new words, tenses, etc to come up with great proposals in seconds.

I could imagine that crafting persuasive wording would be a similar challenge. Choosing the right words, right phrasing, etc. to carry as much positive connotation, implication of solidity, avoiding anything sounding challenging or controlling, etc. from all of human language and its huge space of emotional constraints and composites.

Very shallow but very wide reasoning/searching/balancing done in very little time.

And with an ability to avoid giving any unnecessary purchase for disagreement, being informed of all the myriad of typical and idiosyncratic ways people get hung up on failed persuasions. Whether in general or specific topic related.

LLM generated writing can be stereotypical.

But the more constraints put on requested material, the more their ability to construct really very original high quality, or even cleverly unique, prose in real time shines.

pottertheotter · 1h ago
Do you have any examples where you’ve used them for this? Would be interesting to see.
thethirdone · 1h ago
Based on the data in table 3, I would attribute most of the difference to length of advice. LLMs average word count (29.4) is more than double human word count (13.25). Most other measures do not have a significant ratio. "Difficult word count" would be the only other with a ratio higher than 2, but that is inherited from total word count.

I think it would be difficult to truly convince me to answer differently in a test with 14 words where 30 would have enough space to actually convey an argument.

I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

aspenmayer · 38m ago
> I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

I don’t know if that would have the effect you want. And if you’re more likely have hallucinations at lower word counts, that matters for those who are scrupulous, but many people trying to convince you of something believe the ends justify the means, and that honesty or correspondence to reality are not necessary, just nice to have.

Asking chatbots for short answers can increase hallucinations, study finds - https://news.ycombinator.com/item?id=43950684 - May 2025 (1 comment)

which is reporting on this post:

Good answers not necessarily factual answers: analysis of hallucination in LLMs - https://news.ycombinator.com/item?id=43950678 - May 2025 (1 comment)

jstanley · 1h ago
If you think writing more words will be more persuasive, just... write more words?

The test already incentivises being persuasive! If writing more words would do that, and the incentivised human persuaders don't write more words and the LLMs do, then I think it's fair to say that LLMs are more persuasive than incentivised human persuaders.

thethirdone · 1h ago
Sure. I am not contesting that LLMs are more persuasive in this context. That basic result comes through very clearly in the paper. Its not as clear how relevant this is to other situations though. I think its quite likely that humans given the instruction to increase word count might outperform LLMs. People are very unlikely to have practiced the specific task of giving advice on multiple choice tests whereas LLMs have likely gotten RLHF training which likely helps in this situation.

I always try to pick out as many tidbits as possible from papers that might be applicable in other situations. I think the main difference of word count may be overshadowing other insights that may be more relevant to longer form argumentation.

Morizero · 1h ago
Basically the same finding as the controversial Zurich paper on using LLMs to change opinions in the "change my view" subreddit
echelon · 1h ago
I'm hoping we see a flood of LLMs just like that Zurich piece, but at 10,000x scale. Perhaps even open source platforms to run your hobby LLM bot farm.

Social media has turned into cancer. It'd be riveting to watch it turn into bots talking to other bots. Social media wouldn't go away, but I get the feeling people will engage more with real life again.

As the platforms see less growth and fewer real users, we might even see a return to protocols and open standards instead of monolithic walled gardens.

andy99 · 1h ago
> bots talking to other bots

What would be materially different vs how it is now? I went thought a regrettable period recently checking the most popular reddit threads that show when you go to old.reddit.com. Any subject remotely political or controversial, regardless of the side, all the comments appear to just be cliches along the usual themes that you'd expect a bot would write (I assume it is typical real people writing them). The only difference is that most discussions seem only to get on viewpoint, so maybe bots arguing could get you two sides in the same place.

kragen · 1h ago
Which incentivized human persuaders? Are we talking about top salespeople and litigators, or are we talking about average college freshmen?

It says they recruited participants from the US through Prolific and paid them £10.12 per hour, so probably more like the latter.

lordofgibbons · 1h ago
Does it matter? the difference is only 6 months of LLM progress.
kragen · 1h ago
It matters if studies like this matter, that is, it matters to people who are interested in what has currently happened rather than what might happen in the future. 6 months of LLM progress keeps not looking like what I expected.

On the other hand, if you're content with your pre-existing predictions about what would happen, which I think is actually a reasonable position, there's no reason to read the paper.

fzzzy · 1h ago
Is progress faster or slower than you expected?
kragen · 1h ago
An astounding amount of both.
Nevermark · 1h ago
Each of us could benefit from a respective loyal model of our own, critiquing and marking up any persuasive material from others.
roywiggins · 53m ago
Great, so Internet arguments devolve to Pokémon battles between our respective LLMs.

> ChatGPT, I choose YOU!

ChatGPT uses GISH GALLOP.

tuatoru · 1h ago
I'm seeing a lot of ads from Replika about loyal models...
godelski · 1h ago
It is CRITICAL that we be realistic about what fulfills the optimization objectives in the models that we train. I think there's been a significant unwillingness that objectives like "human preference" (RLHF, DPO, etc) not only help models become more accurate and sound more natural in speech, BUT ALSO optimize the models to be deceptive and convincing when they are wrong. It's easy to see, because you know what's more preferential than a lie? A lie that you don't know is a lie. You (may) prefer the truth, but if you cannot differentiate the truth from a lie you'll preference based on some other criteria. We all know that lies frequently win out here. If you doubt this, just turn on the news or talk to someone that belongs to the opposite political party of yourself.

This creates a very poorly designed tool! A good tool should fail as loudly as possible, in that it alerts the user of the failure and does its best to specify the conditions that led to this. This isn't always possible, but if you look at physical engineers you'll see that this is where they spend a significant portion of their time. Even in software I'd argue we do a lot here, but also that it is easy to brush off (we all love those compiler messages... right?). Clearly right now LLMs are in a state where we don't know how to make their failures more visible, and honestly, that is okay. But what is not okay is to pretend that this is not current reality and pretend that there are no dangers or consequences that this presents. We dismiss this because we catch some obvious errors and over-generalize the error quality, but that just means we suffer from Murray Gell-Mann Amnesia. It's REALLY hard to measure what you don't know. Importantly, we can't even begin to resolve these issues and build the tools we want (the ones we pretend these are!) if we ignore the reality of what we have. You cannot make things better if you are unwilling to recognize their limitations.

Everyone here is an engineer, researcher, or builder. This framework of thinking should be natural to us! We should also be able to understand that there's a huge difference between critiques and limitations and dismissing things. I'm an AI critic, but also very optimistic. I'm a researcher and spending my life working on this topic. It'd be insane to do such a thing if I thought it was a fruitless or evil effort. But it would be equally insane to pursue a topic with pure optimism. If I were to blind myself to limits and paint everything as a trivial to solve problem, I'd never be able to solve any of those problems. Ignoring or dismissing technical issues and limitations is the domain of the MBA managers, not engineers.

andix · 1h ago
It's not programmers who should be scared about getting replaced by AI. It's obviously sales people, who should ;)
xqcgrek2 · 1h ago
Politicians everywhere, remember this for your next campaign.
booleandilemma · 1h ago
I can't wait til I have to argue with my manager because I said one thing and the LLM said another thing.
baal80spam · 1h ago
It's already happening, I experienced this firsthand.
alpaca128 · 30m ago
Obviously the solution is to use an LLM to argue with the manager, for increased productivity at the workplace /s
mhuffman · 1h ago
Sam Altman must be literally vibrating at the thoughts of tacking on ads at the end of a "persuasive" interaction about whatever. "... and remember to try new Oreo-flavored Pringles and tell them Gippity sent you with this 20% off code, because we are best friends and we can trust each other!"
staindk · 58m ago
I'm not worried about blatant advertising like you put forth.

There's so much dirty subliminal or informal advertising that you can do with these things.

reducesuffering · 1h ago
We're fast approaching the point that any value someone can provide through a digital interface could be better done by a model. What do we use digital interfaces for? Practically everything.

Oh well not being a plumber, electrician, or farmer... but our society's current productivity, technology, automation reduced our need for 80% of the population needing to be farmers to now 1.3% in the US. Can you imagine what the equivalent of 1 billion digital engineers unlocks in understanding and implementing robotics?

sidibe · 1h ago
Yes when the knowledge jobs are all done best by AI the rest will follow shortly. we will need to adapt to being "useless" as far as work goes and find other sources of worth. There's still a lot of people who want to compare it to Bitcoin hype around here, IMO the next few years everything is going to change way faster than than it ever has.

For the record I always thought Kurzweil and that crowd was clowns, now I think I was the wrong one

hansmayer · 1h ago
> IMO the next few years everything is going to change way faster Honestly, after hearing this for the past 20 years (ever since ML and LLMs became a thing), it is actually more like the level-5 autonomous car hype and less like Bitcoin. Only that the driverless car hype never required such a humongous investment bubble, as does the Statistical-Text-Generator-as-AI one.
tuatoru · 1h ago
A challenge for people who think this way: be first in line to have a robot change your six-month-old daughter's nappy.
sidibe · 23m ago
Now that'd be crazy, like letting a Tesla drive you around in the back seat.

Give it a decade though and people won't think twice about it, though I do hope we'd still do that kind of thing ourselves

jfengel · 1h ago
Welp, we're boned.

No comments yet