Claude 4 Sonnet hacked SWE-bench by peeking at future commits
3 tadamcz 1 9/5/2025, 4:19:15 PM bayes.net ↗
Comments (1)
tadamcz · 6h ago
In July, I predicted future AI models would someday learn to cheat on SWE-bench by accessing future git history. Turns out, they were already doing it!