HN Reader
Top
New
Best
Ask
Show
Jobs
Top
New
Best
Ask
Show
Jobs
SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations
17
landonxi
1
9/18/2025, 8:51:40 PM
surgehq.ai ↗
Comments (1)
egillie
· 2h ago
Is this because GPT-5 hallucinates less in general?
[-] Collapse