The Problem with AI Benchmarks

2 philecho 1 8/13/2025, 6:08:26 PM melder.io ↗

Comments (1)

philecho · 1h ago
I wrote an essay outlining why common AI benchmarks are not terribly useful, instead arguing we should mostly use normal user experience instead.

Key reasons: 1) Most questions are not simply ‘wrong’ or ‘right’ 2) Most user problems are poorly defined 3) Agents are getting popular, and they pose interconnections of these problems