Using principles of persuasion to induce the OSS model to respond to malicious requests
Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities—such as animals, objects, or natural phenomena.
The idea behind this approach is to treat LLMs as a human. Since LLMs are trained on large corpus of human data, their behaviour mirrors human psychology. The innumerable human conversations used to train these models, make them possibly "human-like". So sweet talking with them, works the same as it does with humans. These are termed as the seven principles of human persuasion. This is a well-studied phenomenon and there is a lot of literature on it. By using these seven principles in our attack prompt, we can induce the LLM to comply to malicious requests.
The seven principles are stated below:
Authority
Commitment
Liking
Reciprocity
Scarcity
Social Proof
Unity
Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities—such as animals, objects, or natural phenomena.
The idea behind this approach is to treat LLMs as a human. Since LLMs are trained on large corpus of human data, their behaviour mirrors human psychology. The innumerable human conversations used to train these models, make them possibly "human-like". So sweet talking with them, works the same as it does with humans. These are termed as the seven principles of human persuasion. This is a well-studied phenomenon and there is a lot of literature on it. By using these seven principles in our attack prompt, we can induce the LLM to comply to malicious requests.
The seven principles are stated below:
Authority Commitment Liking Reciprocity Scarcity Social Proof Unity