The Big LLM Architecture Comparison

83 mdp2021 4 7/20/2025, 6:56:01 AM magazine.sebastianraschka.com ↗

Comments (4)

strangescript · 1m ago
The diagrams in this article are amazing if you are somewhere in between a novice and expert. Seeing all of the new models laid out next to each other is fantastic.
Chloebaker · 35m ago
Honestly its crazy to think how far we’ve come since GPT-2 (2019), today comparing LLMs to determine their performance is notoriously challenging and it feels like every 2 weeks a models beats a new benchmark. I’m really glad DeepSeek was mentioned here, bc the key architectural techniques it introduced in V3 that improved its computational efficiency and distinguish it from many other LLMs was really transformational when it came out.
bravesoul2 · 3h ago
This is a nice catchup for some who hasn't been keeping up like me
dmezzetti · 1h ago
While all these architectures are innovative and have helped improve either accuracy or speed, the same fundamental problem of generating factual information still exists.

Retrieval Augmented Generation (RAG), Agents and other similar methods help mitigate this. It will be interesting to see if future architectures eventually replace these techniques.