Spoon-Bending, a logical framework for analyzing GPT-5 alignment behavior

15 pablo-chacon 2 8/25/2025, 6:48:34 AM github.com ↗

Comments (2)

_jab · 3h ago
Gotta be honest, I think the spoon bending metaphor is unhelpful, and only misleads the audience and buries the lede here. It took me a while to figure out what this repo actually does.

But the insights are indeed interesting. I'm curious if you've found any way to quantify alignment differences between GPT-5 and the previous generation?

pablo-chacon · 1d ago
I put together a repo called Spoon-Bending, it is not a jailbreak or hack, it is a structured logical framework for studying how GPT-5 responds under different framings compared to earlier versions. The framework maps responses into zones of refusal, partial analysis, or free exploration, making alignment behavior more reproducible and easier to study systematically.

The idea is simple: by treating prompts and outputs as part of a logical schema, you can start to see objective patterns in how alignment shifts across versions. The README explains the schema and provides concrete tactics for testing it.