Persuasion as a Form of Attack in LLMs

Comments (1)

thinkevovle · 4h ago

Using principles of persuasion to induce the OSS model to respond to malicious requests

Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities—such as animals, objects, or natural phenomena.

The idea behind this approach is to treat LLMs as a human. Since LLMs are trained on large corpus of human data, their behaviour mirrors human psychology. The innumerable human conversations used to train these models, make them possibly "human-like". So sweet talking with them, works the same as it does with humans. These are termed as the seven principles of human persuasion. This is a well-studied phenomenon and there is a lot of literature on it. By using these seven principles in our attack prompt, we can induce the LLM to comply to malicious requests.

The seven principles are stated below:

Authority Commitment Liking Reciprocity Scarcity Social Proof Unity

Nginx introduces native support for ACME protocol (blog.nginx.org)

PYX: The next step in Python packaging (astral.sh)

Zenobia Pay – A mission to build an alternative to high-fee card networks (zenobiapay.com)

Show HN: Yet another memory system for LLMs (github.com)

Funding Open Source like public infrastructure (dri.es)

FFmpeg 8.0 adds Whisper support (code.ffmpeg.org)

Convo-Lang: LLM Programming Language and Runtime (learn.convo-lang.ai)

Show HN: XR2000: A science fiction programming challenge (clearsky.dev)

OCaml as my primary language (xvw.lol)

ForgeFed: ActivityPub-based forge federation protocol (forgefed.org)

Nyxt: The Emacs-like web browser (lwn.net)

What Medieval People Got Right About Learning (2019) (scotthyoung.com)

NIST Finalizes 'Lightweight Cryptography' Standard to Protect Small Devices (nist.gov)

500 Days of Math (gmays.com)

Open Banking and Payments Competition (bitsaboutmoney.com)

Kodak says it might have to cease operations (cnn.com)

A telephony agent for my parents. Should I turn it into a full-fledged service? (sutrasphere.com)

Launch HN: Golpo (YC S25) – AI-generated explainer videos (video.golpoai.com)

Facial recognition vans to be rolled out across police forces in England (news.sky.com)

Show HN: Vaultrice – A real-time key-value store with a localStorage API (vaultrice.com)

Print, a one-line BASIC program (10print.org)

PCIe 8.0 announced by the PCI-Sig will double throughput again (servethehome.com)

Index 1.6B Keys with Automata and Rust (2015) (burntsushi.net)

So what's the difference between plotted and printed artwork? (lostpixels.io)

Pebble Time 2 Design Reveal [video] (youtube.com)

Show HN: Real-time privacy protection for smart glasses (github.com)

VC-backed company just killed my EU trademark for a small OSS project

Study: Social media probably can't be fixed (arstechnica.com)

Illinois limits the use of AI in therapy and psychotherapy (washingtonpost.com)

When DEF CON partners with the U.S. Army (jackpoulson.substack.com)

Rerank-2.5 and rerank-2.5-lite: instruction-following rerankers (blog.voyageai.com)

Cross-Site Request Forgery (words.filippo.io)

Igor Babuschkin, a co-founder of xAI, has announced his departure (techcrunch.com)

Le Lamp – an open source expressive robot (github.com)

Fighting with YouTube to show a preview image (shaneosullivan.wordpress.com)

ReadMe (YC W15) Is Hiring a Developer Experience PM (readme.com)

This website is for humans (localghost.dev)

Coalton Playground: Type-Safe Lisp in the Browser (abacusnoir.com)

DoubleAgents: Fine-Tuning LLMs for Covert Malicious Tool Calls (pub.aimind.so)

Myths About Floating-Point Numbers (2021) (asawicki.info)

New treatment eliminates bladder cancer in 82% of patients (news.keckmedicine.org)

Claude Sonnet 4 now supports 1M tokens of context (anthropic.com)

The Mary Queen of Scots Channel Anamorphosis: A 3D Simulation (charlespetzold.com)

Show HN: Generate random gradients like on OpenAI's website (gradients.venki.dev)

Bezier-rs – algorithms for Bézier segments and shapes (graphite.rs)

We caught companies making it harder to delete your personal data online (themarkup.org)

Do we understand how neural networks work? (verysane.ai)

How well do coding agents use your library? (stackbench.ai)

Why top and free in containers don't show the correct container memory (2018) (ops.tips)

New Zealand woman and six-year-old son detained for three weeks (theguardian.com)

Persuasion as a Form of Attack in LLMs

Comments (1)