Persuasion as a Form of Attack in LLMs

Comments (1)

thinkevovle · 3h ago

Using principles of persuasion to induce the OSS model to respond to malicious requests

Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities—such as animals, objects, or natural phenomena.

The idea behind this approach is to treat LLMs as a human. Since LLMs are trained on large corpus of human data, their behaviour mirrors human psychology. The innumerable human conversations used to train these models, make them possibly "human-like". So sweet talking with them, works the same as it does with humans. These are termed as the seven principles of human persuasion. This is a well-studied phenomenon and there is a lot of literature on it. By using these seven principles in our attack prompt, we can induce the LLM to comply to malicious requests.

The seven principles are stated below:

Authority Commitment Liking Reciprocity Scarcity Social Proof Unity

Turning ChatGPT's "Saved Memory" into a Persistent, Self-Updating Runtime Tool

Over 100 people dead in Gaza in 24 hours, Gaza officials claim (news.sky.com)

UK Court Ruling Leaves Wikipedia Years of Uncertainty Under Online Safety Act (techdirt.com)

Remember the Parachuting Beavers Story? Now There's Video (boisestatepublicradio.org)

China and India Rebuild Ties After Modi's Rupture with Trump (bloomberg.com)

US interest rates cuts coupled with strong profit growth never been seen before (insight-public.sgmarkets.com)

FSM for Python, Inspired by Gen_fsm (github.com)

Found an AI Tool for Content Creation (plexigen.ai)

Optimizing Your Debian 13 Desktop (teejeetech.com)

Trump's Military Crackdowns Are Only Going to Get Worse (rollingstone.com)

Bailout Tracker (2008 financial crisis) (projects.propublica.org)

Rebellious Thoughts #78 – How a Lost Job Saved My Career (gusbalbontin.beehiiv.com)

Prominent medical journal refuses RFK's call to retract a vaccine study (livescience.com)

Beaver Drop (en.wikipedia.org)

New Zealand woman and six-year-old son detained for three weeks (theguardian.com)

Vibe Coding Is the Worst Idea of 2025 [video] (youtube.com)

Want to save your old computer? Try one of these 8 Linux distros for free (zdnet.com)

Russia clamps down on WhatsApp and Telegram over data sharing (theguardian.com)

AI Efficiency? Give Me a Break (luolink.substack.com)

Typing to Think, Typing to Prompt (kvncnnlly.com)

Semantic UUIDs (npmjs.com)

Impoverished streaming services are driving viewers back to piracy (theguardian.com)

Writing is power transfer technology (twitter.com)

Violence as Disease (traum-und-verantwortung.de)

Pokémon Name Generator (content.trickle.so)

HunYuan GameCraft (hunyuan-gamecraft.github.io)

Show HN: I built Leads Ward to keep junk leads out of your CRM (leadsward.com)

What it took to get Edit button done in Twitter organization (twitter.com)

Exposing the Cracks at the Heart of Putin's War Economy (bloomberg.com)

Z-Wave Reborn – Home Assistant Connect ZWA-2 (home-assistant.io)

Satellite Tracker 3D (satellitetracker3d.com)

Live on Prem Studio GPT-OSS Models (studio.premai.io)

Taming the Beast: Comparing Jsonnet, Dhall, Cue (pv.wtf)

Minimalist home network monitoring with shell scripts and ntfy.sh (blog.kulman.sk)

Show HN: Actionable AI for Founders: Turning Ideas into Shippable Plans (intutivai.com)

Order Promoting Competition in the US Economy Revoked (theguardian.com)

Starlink introduces new low bandwidth Standby Mode (starlink.com)

Tesla Eyes New York City for Robotaxis with Test-Driver Job Posting (wsj.com)

What Israelis think about starvation in Gaza (vox.com)

DeepSeek's launch of new AI model delayed by Huawei chip issues (reuters.com)

Show HN: YouTube Audio Player (y2audio.com)

The Swedish Kings of Cyberwar (nybooks.com)

Show HN: Minimal Counter (minimalcounter.com)

Show HN: IQ Checker X (iqchecker.org)

The Fire Between: Agency as Creative Field (philosophermaker.substack.com)

I Am a Windows User (vowe.net)

How We Chose a Documentation Platform for Our DevTool (metalbear.co)

Why hasn't medical science cured headaches? (newyorker.com)

Turn your saved Reddit posts into a curated library (chromewebstore.google.com)

The end of the Kaisen Linux project (kaisenlinux.org)

Persuasion as a Form of Attack in LLMs

Comments (1)