SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

17 landonxi 1 9/18/2025, 8:51:40 PM surgehq.ai ↗

Comments (1)

egillie · 2h ago
Is this because GPT-5 hallucinates less in general?