The LLM Meta-Leaderboard averaged across the 28 best benchmarks

2 stared 1 5/5/2025, 10:06:09 AM twitter.com ↗

Comments (1)

JanSchu · 7h ago
We’re solidly in a three‑horse race at the top: Gemini 2.5 Pro, OpenAI o‑series, Anthropic Claude 3.7+.

The gap between #1 and #2 is slim, so pricing, latency, and policy alignment should weigh more heavily than a couple of benchmark points.

Specialists matter: if your stack leans on long‑context RAG, o3‑high may edge out; for multilingual safety‑critical chat, Claude might still be your best pick.