You can change an LLM's favorite color with a Steering Vector

2 13point5 1 9/11/2025, 5:31:28 AM twitter.com ↗

Comments (1)

13point5 · 3h ago
I've seen a lot of purple gradient websites made by LLMs and I was curious if we can change a model's favorite color by messing with the activations instead of prompts.

So I used Representation Engineering with just 1 pair of contrastive prompts to make Mistral-7B prefer orange as its favorite color.

I used the repeng library by Theia to test this out.

Next I'm gonna implement it from scratch to understand why this even works.

I wonder if we can introduce "taste" into a model with methods like this.

The paper is "REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY".

Link: https://arxiv.org/abs/2310.01405