Ask HN: What benchmarks are you using to judge AI models?
4 cowpig 2 4/30/2025, 9:32:00 PM
There are so many models, and so many new ones being released all the time, that I have a hard time knowing which ones to prioritize testing anecdotally. What benchmarks have you found to be especially indicative of real-world performance?
I use:
* Aider's Polyglot benchmark seems to be a decent indicator of which models are going to be good at coding:
https://aider.chat/docs/leaderboards/
* I generally assume OpenRouter usage to be an indicator of a model's popularity, and by proxy, utility:
https://openrouter.ai/rankings
* LLM-Stats has a lot of charts of benchmarks that I look at:
https://llm-stats.com/
I couldn't care less about benchmarks - I know what these models are capable of from personal experience.
Just pick one and use it. The ones you’ve heard of (if you are not obsessively refreshing AI model rankings pages) are basically the same.
I’m sure I’ll get a ton of pushback that the one somebody loves is obviously so much better than the other one, but whatever.
Just give me OpenAI’s most popular model, their fastest model, and their newest model. I’ll pick among those 3 based on what I’m prioritizing in the moment (speed, depth, everyday use).