How to Hack Transformers: Steering LLMs via Prompts, States, and Weight Edits

1 WASDAai 1 9/8/2025, 7:33:53 AM arxiv.org ↗

Comments (1)

WASDAai · 3h ago
TL;DR: The paper shows how you can steer LLMs by messing with prompts, hidden states, or weight edits—and warns that the same tricks can be used maliciously.