SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

14 landonxi 1 9/18/2025, 8:51:40 PM surgehq.ai ↗

Comments (1)

egillie · 1h ago
Is this because GPT-5 hallucinates less in general?