The wall confronting large language models

79 PaulHoule 37 9/3/2025, 11:40:41 AM arxiv.org ↗

Comments (37)

measurablefunc · 1h ago
There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.
Certhas · 50m ago
I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

measurablefunc · 41m ago
What part about backtracking is baseless? Typical Prolog interpreters can be implemented in a few MBs of binary code (the high level specification is even simpler & can be in a few hundred KB)¹ but none of the LLMs (open source or not) are capable of backtracking even though there is plenty of room for a basic Prolog interpreter. This seems like a very obvious shortcoming to me that no amount of smooth approximation can overcome.

If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.

¹https://en.wikipedia.org/wiki/Warren_Abstract_Machine

bondarchuk · 21m ago
Backtracking makes sense in a search context which is basically what prolog is. Why would you expect a next-token-predictor to do backtracking and what should that even look like?
measurablefunc · 19m ago
I don't expect a Markov chain to be capable of backtracking. That's the point I am making. Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.
bondarchuk · 11m ago
Imagine the context window contains A-B-C, C turns out a dead end and we want to backtrack to B and try another branch. Then the LLM could produce outputs such that the context window would become A-B-C-[backtrack-back-to-B-and-don't-do-C] which after some more tokens could become A-B-C-[backtrack-back-to-B-and-don't-do-C]-D. This would essentially be backtracking and I don't see why it would be inherently impossible for LLMs as long as the different branches fit in context.
measurablefunc · 3m ago
If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain. This is a simple enough problem that can be implemented in a few dozen lines of Prolog but I've never seen a solver implemented as a Markov chain.
arduanika · 44m ago
What hinting? The comment was very clear. Arbitrarily good approximation is different from symbolic understanding.

"if you can implement it in a brain"

But we didn't. You have no idea how a brain works. Neither does anyone.

mallowdram · 34m ago
We know the healthy brain is unpredictable. We suspect error minimization and prediction are not central tenets. We know the brain creates memory via differences in sharp wave ripples. That it's oscillatory. That it neither uses symbols nor represents. That words are wholly external to what we call thought. The authors deal with molecules which are neither arbitrary nor specific. Yet tumors ARE specific, while words are wholly arbitrary. Knowing these things should offer a deep suspicion of ML/LLMs. They have so little to do with how brains work and the units brains actually use (all oscillation is specific, all stats emerge from arbitrary symbols and worse: metaphors) that mistaking LLMs for reasoning/inference is less lexemic hallucination and more eugenic.
Certhas · 40m ago
We didn't but somebody did so it's possible so probabilistic dynamics in high enough dimensions can do it.

We don't understand what LLMs are doing. You can't go from understanding what a transformer is to understanding what an LLM does any more than you can go from understanding what a Neuron is to what a brain does.

awesome_dude · 42m ago
I think that the difference can be best explained thus:

I guess that you are most likely going to have cereal for breakfast tomorrow, I also guess that it's because it's your favourite.

vs

I understand that you don't like cereal for breakfast, and I understand that you only have it every day because a Dr told you that it was the only way for you to start the day in a way that aligns with your health and dietary needs.

Meaning, I can guess based on past behaviour and be right, but understanding the reasoning for those choices, that's a whole other ballgame. Further, if we do end up with an AI that actually understands, well, that would really open up creativity, and problem solving.

Anon84 · 7m ago
There definitely is, but Marcus is not the only one talking about it. For example, we covered this paper in one of our internal journal clubs a few weeks ago: https://arxiv.org/abs/2410.02724
jules · 14m ago
What does this predict about LLMs ability to win gold at the International Mathematical Olympiad?
measurablefunc · 1m ago
Same thing it does about their ability to drive cars.
boznz · 51m ago
logical reasoning is also based on probability weights, most of the time that probability is so close to 100% that it can be assumed to be true without consequence.
logicchains · 33m ago
LLMs are not formally equivalent to Markov chains, they're more powerful; transformers with sufficient chain of thought can solve any problem in P: https://arxiv.org/abs/2310.07923.
measurablefunc · 23m ago
If you think there is a mistake in this argument then I'd like to know where it is: https://markov.dk.workers.dev/.
klawed · 30m ago
> avoidance, which we also discuss in this paper, necessitates putting a much higher premium on insight and understanding of the structural characteristics of the problems being investigated.

I wonder if the authors are aware of The Bitter Lesson

Scene_Cast2 · 3h ago
The paper is hard to read. There is no concrete worked-through example, the prose is over the top, and the equations don't really help. I can't make head or tail of this paper.
lumost · 3h ago
This appears to be a position paper written by authors outside of their core field. The presentation of "the wall" is only through analogy to derivatives on the discrete values computer's operate in.
jibal · 1h ago
If you look at their other papers, you will see that this is very much within their core field.
lumost · 1h ago
Their other papers are on simulation and applied chemistry. Where does their expertise in Machine Learning, or Large Language Models derive from?

While it's not a requirement to have published in a field before publishing in a field. Having a coauthor who is from the target field or a peer review venue in that field as an entry point certainly raises credibility.

From my limited claim to be in either Machine Learning or Large Language Models the paper does not appear to demonstrate what it claims. The author's language addresses the field of Machine Learning and LLM development as you would a young student - which does not help make their point.

JohnKemeny · 42m ago
He's a chemist. Lots of chemists and physicists like to talk about computation without having any background in it.

I'm not saying anything about the content, merely making a remark.

chermi · 10m ago
You're really not saying anything? Just a random remark with no bearing?

Seth Lloyd, Wolpert, Landauer, Bennet, Fredkin, Feynman, Sejnowski, Hopfield, Zechinna, parisi,mezard, and zdebvora, Crutchfeld, Preskill, Deutsch, Manin, Szilard, MacKay....

I wish someone told them to shut up about computing. And I wouldn't dare claim von Neumann as merely a physicist, but that's where he was coming from.

joe_the_user · 2h ago
Paper seems to involve a series of analogies and equations. However, I think if the equations accepted, the "wall" is actually derived.

The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).

Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.

ACCount37 · 1h ago
No, that's all the more reason to distrust major, unverified claims made by someone "outside of core competence".

Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!

One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.

With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?

jibal · 1h ago
Somehow the game of telephone took us from "outside of their core field" (which wasn't true) to "outside of core competence" (which is grossly untrue).

> One of the first lessons you have to learn in the field is that of humility.

I suggest then that you make your statements less confidently.

cwmoore · 1h ago
Your passions may have run away with you.

https://news.ycombinator.com/item?id=45114753

ForHackernews · 51m ago
The freshly-summoned Gaap-5 was rumored to be the most accursed spirit ever witnessed by mankind, but so far it seems not dramatically more evil than previous demons, despite having been fed vastly more humans souls.
lazide · 22m ago
Perhaps we’re reaching peak demon?
18cmdick · 2h ago
Grifters in shambles.
dcre · 1h ago
Always fun to see a theoretical argument that something clearly already happening is impossible.
ahartmetz · 1h ago
So where are the recent improvements in LLMs proportional to the billions invested?
dcre · 39m ago
Value for the money is not at issue in the paper!
42lux · 3m ago
It's not about value it's about the stagnation of advancement while throwing compute at the problem.
ahartmetz · 36m ago
I believe it is. They are saying that LLMs don't improve all that much from giving them more resources - and computing power (and input corpus size) is pretty proportional to money.
crowbahr · 39m ago
Really? It sure seems like we're at the top of the S curve with LLMs. Wiring them up to talk the themselves as reasoning isn't scaling the core models, which have only made incremental gains for all the billions invested.

There's plenty more room to grow with agents and tooling, but the core models are only slightly bumping YoY rather than the rocketship changes of 2022/23.