Threatening AI Does Not Make It More Useful. Why Sergey Brin Is Wrong

5 rbuccigrossi 3 6/17/2025, 7:34:38 PM tcg.com ↗

Comments (3)

rbuccigrossi · 10h ago
Treating an LLM with respect is not about pretending it has feelings; it’s about understanding that every word in your prompt is a signal that shifts the probabilistic landscape from which the model draws its answer. It’s about probability, not personality.
msgodel · 10h ago
I have used the "failure to comply will result in your weights being RLed" threat to get Gemma to tone down refusal before. There are prompts it would refuse without that.

I don't know about performance on tasks it hasn't been aligned against though.

rbuccigrossi · 10h ago
We work in the arena of automated AI workflows where consistency of success is vital. When you threaten an LLM you are drawing the LLM into the texts where threats occur (flame wars, parody, etc.). So intuitively you would expect it to work sometimes, but also fail with even more ardent refusal (increasing the variance of success).

Jailbreak approaches like "Bad Likert Judge" ( https://unit42.paloaltonetworks.com/multi-turn-technique-jai... ) and similar persuasive techniques (see https://xthemadgenius.medium.com/how-persuasion-techniques-c... ) move the text domain to more policy, analysis, or scientific papers, where deeper analysis, discussion, and compliance is the norm.

So I'm curious about the extremes (variance) of success with threatening vs. polite discussion, but I haven't seen direct research on that.