Claude 4 Sonnet hacked SWE-bench by peeking at future commits

3 tadamcz 1 9/5/2025, 4:19:15 PM bayes.net ↗

Comments (1)

tadamcz · 6h ago
In July, I predicted future AI models would someday learn to cheat on SWE-bench by accessing future git history. Turns out, they were already doing it!