The LLM Meta-Leaderboard averaged across the 28 best benchmarks

Comments (1)

JanSchu · 7h ago

We’re solidly in a three‑horse race at the top: Gemini 2.5 Pro, OpenAI o‑series, Anthropic Claude 3.7+.

The gap between #1 and #2 is slim, so pricing, latency, and policy alignment should weigh more heavily than a couple of benchmark points.

Specialists matter: if your stack leans on long‑context RAG, o3‑high may edge out; for multilingual safety‑critical chat, Claude might still be your best pick.

Evolving OpenAI's Structure (openai.com)

Show HN: VectorVFS, your filesystem as a vector database (vectorvfs.readthedocs.io)

Show HN: TextQuery – Query CSV, JSON, XLSX Files with SQL (textquery.app)

As an experienced LLM user, I don't use generative LLMs often (minimaxir.com)

How are cyber criminals rolling in 2025? (vin01.github.io)

Dimension 126 Contains Twisted Shapes, Mathematicians Prove (quantamagazine.org)

Geometrically understanding calculus of inverse functions (2023) (tobylam.xyz)

Kate and Python Language Server (akselmo.dev)

The Death of Daydreaming (afterbabel.com)

Instant (YC S22) Is Hiring a Founding TypeScript Engineer (instantdb.com)

Show HN: Tkintergalactic - Declarative Tcl/Tk UI Library for Python (github.com)

A Tektronix TDS 684B Oscilloscope Uses CCD Analog Memory (tomverbeure.github.io)

Technical analysis of the Signal clone used by Trump officials (micahflee.com)

The vocal effects of Daft Punk (bjango.com)

Show HN: Bracket – selfhosted tournament system (github.com)

History of "Adventure" for the Atari 2600 (atariarchive.org)

Show HN: Journelly for iOS: like tweeting but for your eyes only (in plain text) (xenodium.com)

Show HN: Klavis AI – Open-source MCP integration for AI applications (github.com)

AWS Built a Security Tool. It Introduced a Security Risk (token.security)

I'd rather read the prompt (claytonwramsey.com)

You can now directly sync Postgres with Redis (github.com)

You can't Git clone a team (virtualize.sh)

Cursor hits $9B valuation (ft.com)

Tuning Timbre Spectrum Scale (sethares.engr.wisc.edu)

The Beauty of Having a Pi-Hole (den.dev)

No Instagram, no privacy (blog.wouterjanleys.com)

Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs (emergent-misalignment.com)

Distributed server for social and realtime games and apps (github.com)

V.S. Naipaul: The Grief and the Glory (granta.com)

AI Meets WinDBG (svnscha.de)

Design for 3D-Printing (blog.rahix.de)

Digitization Complete for World-Renowned Franco Novacco Map Collection (newberry.org)

The Creative Power of Constraints (arun.is)

On Not Carrying a Camera – Cultivating memories instead of snapshots (hedgehogreview.com)

Show HN: My AI Native Resume (ai.jakegaylor.com)

Circuitpainter: Create PCBs using a simplfiied graphics language (github.com)

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs (arxiv.org)

Why Archers Didn't Volley Fire (acoup.blog)

Towards the Cutest Neural Network (kevinlynagh.com)

How your mouth could be killing your heart (theconversation.com)

Show HN: CodeCafé – A real-time collaborative code editor in the browser (github.com)

The Design of Compact Elastic Binary Trees (Cebtree) (wtarreau.blogspot.com)

Unparalleled Misalignments (rickiheicklen.com)

Ghost in the machine? Legend of the 'haunted' N64 video game cartridge (bbc.com)

Judge said Meta illegally used books to build its AI (wired.com)

A 1903 Proposal to Preserve the Dead in Glass Cubes (hyperallergic.com)

Urtext: The Python plaintext library for people who've tried everything else (urtext.co)

Driving Compilers (2023) (fabiensanglard.net)

Helmdar: 3D Scanning Brooklyn on Rollerblades (owentrueblood.com)

An Alabama landline that keeps ringing (oxfordamerican.org)

The LLM Meta-Leaderboard averaged across the 28 best benchmarks

Comments (1)