Does the Bitter Lesson Have Limits?

78 dbreunig 42 8/1/2025, 8:21:29 PM dbreunig.com ↗

Comments (42)

sorenjan · 2h ago

The last time I was reminded of the bitter lesson was when I read about Guidance & Control Networks, after seeing them used in an autonomous drone that beat the best human FPV pilots [0]. Basically it's using a small MLP (Multi Layer Perceptron) on the order of 200 parameters, and using the drone's state as input and controlling the motors directly with the output. We have all kinds of fancy control theory like MCP (Model Predictive Control), but it turns out that the best solution might be to train a relatively tiny NN using a mix of simulation and collected sensor data instead. It's not better because of huge computation resources, it's actually more computationally efficient than some classic alternatives, but it is more general.

[0] https://www.tudelft.nl/en/2025/lr/autonomous-drone-from-tu-d...

https://www.nature.com/articles/s41586-023-06419-4

https://arxiv.org/abs/2305.13078

https://arxiv.org/abs/2305.02705

logicchains · 2h ago

>It's not better because of huge computation resources, it's actually more computationally efficient than some classic alternatives

It's similar with options pricing. The most sophisticated models like multivariate stochastic volatility are computationally expensive to approximate with classical approaches (and have no closed form solution), so just training a small NN on the output of a vast number of simulations of the underlying processes ends up producing a more efficient model than traditional approaches. Same with stuff like trinomial trees.

cactusfrog · 1h ago

This is really interesting. I think force fields in molecular dynamics have underwent a similar NN revolution. You train your NN on the output of expensive calculations to replace the expensive function with a cheap one. Could you train a small language model with a big one?

lossolo · 1h ago

> Could you train a small language model with a big one?

Yes, it's called distillation.

William_BB · 1h ago

Interesting. Are these models the SOTA in the options trading industry (e.g. MM) nowadays?

grubbypaw · 3h ago

I was not at all a fan of "The Bitter Lesson versus The Garbage Can", but this misses the same thing that it missed.

The Bitter Lesson is from the perspective of how to spend your entire career. It is correct over the course of a very long time, and bakes in Moore's Law.

The Bitter Lesson is true because general methods capture these assumed hardware gains that specific methods may not. It was never meant for contrasting methods at a specific moment in time. At a specific moment in time you're just describing Explore vs Exploit.

schmidtleonard · 2h ago

Right, and if you spot a job that needs doing and can be done by a specialized model, waving your hands about general purpose scale-leveraging models eventually overtaking specialized models has not historically been a winning approach.

Except in the last year or two, which is why people are citing it a lot :)

anp · 1h ago

I think there might be interesting time scales in between “now” and “my entire career” to which the bitter lesson may or may not apply. As an outsider to ML I have questions about the longevity of any given “context engineering” approach in light of the bitter lesson.

itkovian_ · 1h ago

I don’t think people understand the point sutton was making; he’s saying that general, simple systems that get better with scale tend to outperform hand engineered systems that don’t. It’s a kind of subtle point that’s implicitly saying hand engineering inhibits scale because it inhibits generality. He is not saying anything about the rate, doesn’t claim llms/gd are the best system, in fact I’d guess he thinks there’s likely an even more general approach that would be better. It’s comparing two classes of approaches not commenting on the merits of particular systems.

eldenring · 1h ago

Yep this article is self centered and perfectly represents the type of ego Sutton was referencing. Maybe in a year or two general methods will improve the author's workflow significantly once again (eg. better models) and they would still add a bit of human logic on top and claim victory.

joe_the_user · 55m ago

It occurs to me that the bitter lesson is so often repeated because it involves a slippery slope or moot-and-castle argument. IE, the meaning people assign to the bitter lesson ranges between all the following:

General-purpose-algorithms-that-scale will beat algorithms that aren't those

The most simple general purpose, scaling algorithm will win, at least over time

Neural networks will win

LLMs will reach AGI with just more resources

Animats · 3h ago

The question is when price/performance hits financial limits. That point may be close, if not already passed.

Interestingly, this hasn't happened for wafer fabs. A modern wafer fab costs US$1bn to US$3bn, and there is talk of US$20bn wafer fabs. Around the year 2000, those would have been un-financeable. It was expected that fab cost was going to be a constraint on feature size. That didn't happen.

For years, it was thought that the ASML approach to extreme UV was going to cost too much. It's a horrible hack, shooting off droplets of tin to be vaporized by lasers just to generate soft X-rays. Industry people were hoping for small synchrotrons or X-ray lasers or E-beam machines or something sane. But none of those worked out. Progress went on by making a fundamentally awful process work commercially, at insane cost.

azeirah · 42m ago

Sometimes awful is the best we have, we don't have anything that performs at a similar level to EUV machines by ASML but are much simpler or tenable than what we have right now, right?

Perhaps we will find something better in the future, but for now awful is the best we got for the cutting edge.

Also, when is cutting edge not the worst it's ever been?

schmidtleonard · 2h ago

Fundamentally awful but spiritually delightful.

jamesblonde · 2h ago

I see elements of the bitter lesson in arguments about context window size and RAG. The argument is about retrieval being the equivalent of compute/search. Just improve them, to hell with all else.

However, retrieval is not just google search. Primary key lookups in my db are also retrieval. As are vector index queries or BM25 free text search queries. It's not a general purpose area like compute/search. In summary, i don't think that RAG is dead. Context engineering is just like feature engineering - transform the swamp of data into a structured signal that is easy for in-context learning to learn.

The corollory of all this is it's not just about scaling up agents - giving them more LLMs and more data via MCP. The bitter lesson doesn't apply to agents yet.

strangescript · 42m ago

The problem with the Bitter Lesson is that it doesn't clearly define what is a computational "hack" and what is a genuine architecturally breakthrough. We would be no where without transformers for example.

PaulHoule · 2h ago

One odd thing is that progress in SAT/SMT solvers has been almost as good as progress in neural networks from the 1970s to the present. There was a time I was really interested in production rules and expert system shells and systems in the early 1980s often didn't even use RETE and didn't have hash indexes so of course a rule base of 10,000 looked unmanageable, by 2015 you could have a million rules in Drools and it worked just fine.

bmc7505 · 6m ago

The difference is that SAT/SMT solvers have primarily relied on single-threaded algorithmic improvements [1] and unlike neural networks, have not [yet] discovered a uniformly effective strategy for leveraging additional computation to accelerate wall-clock runtime. [2]

[1]: https://arxiv.org/pdf/2008.02215

[2]: https://news.ycombinator.com/item?id=36081350

pu_pe · 1h ago

I'm not so sure Stockfish is a good example. The fact it can run on an Iphone is due to Moore's law, which follows the same pattern. And Deepmind briefly taking its throne was a very good example of the Bitter Lesson.

thrawa8387336 · 2h ago

This brings about a good point:

How much of the recent bitter lesson peddling is done by compute salesmen?

How much of it is done by people who can buy a lot of compute?

Deepseek was scandalous for a reason.

titanomachy · 31m ago

> This views organizations as chaotic “garbage cans” where problems, solutions, and decision-makers are dumped in together, and decisions often happen when these elements collide randomly, rather than through a fully rational process

Only tangentially related, but this has to be one of the worst metaphors I’ve ever heard. Garbage cans are not typically hotbeds of chaotic activity, unless a raccoon gets in or something.

quantum_state · 30m ago

It would not be surprising if a bitter lesson 2.0 comes about as a bitter lesson to the bitter lesson.

fdav · 2h ago

A better lesson: https://rodneybrooks.com/a-better-lesson/

aydyn · 1h ago

Does anyone else see the big flaw with the chess engine analogy?

When AlphaZero came along it blew stockfish out of the water.

Stockfish is a top engine now because besides that initial proof of concept there's no money to be made by throwing compute at Chess.

Mistletoe · 1h ago

I’m wondering how prophetic this is for LLMs. My hunch is a lot.

throw1289312 · 1h ago

This article focuses on the learning aspect of The Bitter Lesson. But The Bitter Lesson is about both search and learning.

This article cites Leela, the chess program, as an example of the Bitter Lesson, as it learns chess using a general method. The article then goes on to cite Stockfish as a counterexample, because it uses human-written heuristics to perform search. However, as you add compute to Stockfish's search, or spend time optimizing compute-expenditure-per-position, Stockfish gets better. Stockfish isn't a counterexample, search is still a part of The Bitter Lesson!

kazinator · 1h ago

> The bitter lesson is dependent on high-quality data.

Arguably, so is the alternative: explicitly embedding knowledge!

Nothing is immune to GIGO.

benlivengood · 2h ago

I think it's a little early (even in these AI times) to call HRM a counterexample of the bitter lesson.

I think it's quite a bit more likely for HRM to scale embarrassingly far and outstrip the tons of RLHF and distillation that's been invested in for transformers, more of a bitter lesson 2.0 than anything else.

benreesman · 1h ago

When The Bitter Lesson essay came out it was a bunch of important things: addressing an audience of serious practitioners, contrarian and challenging entrenched dogma, written without any serious reputational or (especially) financial stake in the outcome. It needed saying and it was important.

But its become a lazy crutch for a bunch of people who meet none of those criteria and perverted into a statement more along the lines of "LLMs trained on NVIDIA cards by one of a handful of US companies are guaranteed to outperform every other approach from here to the Singularity".

Nope. Not at all guaranteed, and at the moment? Not even looking likely.

It will have other stuff in it. Maybe that's prediction in representation space like JEPA, maybe its MCTS like Alpha*, maybe its some totally new thing.

And maybe it happens in Hangzhou.

stego-tech · 1h ago

Not familiar with the cited essay (added to reading list for the weekend), but the post does make some generally good points on generalization (it me) vs specialization, and the benefits of an optimized and scalable generalist approach vs a niche, specialized approach, specifically with regards to current LLMs (and to a lesser degree, ML as a whole).

Where I furrow my brow is the casual mixing of philosophical conjecture with technical observations or statements. Mixing the two all too often feels like a crutch around defending either singular perspective in an argument by stating the other half of the argument defends the first half. I know I'm not articulating my point well here, but it just comes off as a little...insincere, I guess? I'm sure someone here will find the appropriate words to communicate my point better, if I'm being understood.

One nitpick on the philosophical side of things I'd point out is that a lot of the resistance to AI replacing human labor is less to do with the self-styled importance of humanity, and more the bleak future of a species where a handful of Capitalists will destroy civilization for the remainder to benefit themselves. That is what sticks in our collective craw, and a large reason for the pushback against AI - and nobody in a position of power is taking that threat remotely seriously, largely because the owners of AI have a vested interest in preventing that from being addressed (since it would inevitably curb the very power they're investing in building for themselves).

mentalgear · 2h ago

The Neuro-Symbolic approach is what the article describes, without actually naming it.

beepbooptheory · 1h ago

I should know better than to speak anything too enthusiastically about the humanities or feminism on this particular forum, but I just want to say the connection here to Donna Haraway was a surprise and delight. Any one open to that world at all would behoove themselves to check her out. "The Cyborg Manifesto" is the one everyone knows, but I recently finished "Living with the Trouble" and can't recommend it enough!

criemen · 3h ago

All links render as blue strike-through line in Firefox (underline in Chrome), hurting legibility :(

Cerium · 3h ago

I'm getting the same effect, seems to be the css property "text-underline-position: under;"

thwg · 3h ago

interesting. i see underlines in firefox. but the width of the line is 2px in chrome, 1px in firefox.

penneyd · 3h ago

Fine over here using Firefox

o11c · 3h ago

This is about AI, despite the title being ambiguous.

andy99 · 3h ago

Is there more than one bitter lesson?

terminalshort · 3h ago

I've learned many

myhf · 55m ago

The original "Bitter Lesson" essay is about machine learning, but the linked article appears to be trying to apply it to LLMs.

thrawa8387336 · 2h ago

If we're going to be pedantic:

This is about AI, the title is ambiguous.

Despite was used unambiguously wrong.

o11c · 1h ago

"Despite" is absolutely correct when you realize that cheating in the title is a way to make people look at articles when they would rather ignore AI in favor of actually-useful/interesting subjects.

Ask HN: Who is hiring? (August 2025)

Ask HN: Who wants to be hired? (August 2025)

Ask HN: How do you avoid job hunting burnout?

Ask HN: Is "messaging systems specialist" a real job title or niche?

I launched 17 side projects. Result? I'm rich in expired domains

Ask HN: Who Is Looking for a Cofounder?

Ask HN: Best AI Automation Platform

Ask HN: AI Chat Agent vs. Traditional Personal Website?

Ask HN: Anyone know how to reach Cloudflare support?

Nova: A New Web Framework for Erlang

Ask HN: Which software companies hire people in Africa for remote work?

Ask HN: Startups, 0 Stability – Is It Time to Move on from Tech?

Claude Code weekly rate limits

Comparison Between Sync Engines

Ask HN: What are you working on? (July 2025)

Has any YC founder ever gone to jail for startup-related crimes?

Has AI coding gone too far? I feel like I'm losing control of my own projects

Ask HN: Anyone using llms.txt on blogs? Worth it for AI search?

Ask HN: Are developers sad about AI writing more of their code?

New budget financial API, based on EDGAR data

Tell HN: Add "NSFW" words in your Google query to avoid AI summary

Ask HN: How will the OSA affect small Mastodon instances?

Ask HN: Is there a way to see HN without all the posts about AI?

Ask HN: Small Utility App Monetization

Ask HN: Local LLM agents on Jetson/RPi without a heavy runtime

Ask HN: Advise for technical solo founders trying to secure venture capital?

Ask HN: Catching Up with Current Datacenters

Warp.dev Terminal – Overpriced, Buggy, and AI-Sabotaged My Code

Ask HN: State of the art with local LLMs and agents

Google Maps Reviews in Germany Are Basically Dead

Tell HN: Google Denies OS Support for Pixel from Non-Licenced Retailer

Ask HN: What do you do with all your unused tech "swag"?

Drafting Software Recommendation

Ask HN: Tell me your horror story about blowing money on AI

Does the Bitter Lesson Have Limits?

Comments (42)