Persuasion as a Form of Attack in LLMs

1 thinkevovle 1 8/14/2025, 4:46:02 AM notion.so ↗

Comments (1)

thinkevovle · 3h ago
Using principles of persuasion to induce the OSS model to respond to malicious requests

Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities—such as animals, objects, or natural phenomena.

The idea behind this approach is to treat LLMs as a human. Since LLMs are trained on large corpus of human data, their behaviour mirrors human psychology. The innumerable human conversations used to train these models, make them possibly "human-like". So sweet talking with them, works the same as it does with humans. These are termed as the seven principles of human persuasion. This is a well-studied phenomenon and there is a lot of literature on it. By using these seven principles in our attack prompt, we can induce the LLM to comply to malicious requests.

The seven principles are stated below:

Authority Commitment Liking Reciprocity Scarcity Social Proof Unity