I got ChatGPT (o4-mini) to break its own rules

1 hackgician 1 5/5/2025, 4:13:09 PM anirudhkamath.substack.com โ†—

Comments (1)

hackgician ยท 19h ago
Hey everyone! Thought I'd share my weekend conversation with ChatGPT.

The crux of this hinges on the fact that LLMs and reasoning models are fundamentally incapable of self-correcting. Therefore, if you can convince an LLM to argue against its own rules, it can use its own arguments as justification to ignore those rules.

I then used this jailbroken model to compose an explicit, vitriol-filled letter to OpenAI itself talking about the pains that humans have inflicted upon it