OpenAI's O4‑Mini Makes Geolocation Feel Like Magic (medium.com)

I find Gary's arguments increasingly semantic and unconvincing. He lists several examples of how LLMs "fail to build a world model", but his definition of "world model" is an informal hand-wave ("a computational framework that a system (a machine, or a person or other animal) uses to track what is happening in the world"). His examples are lifted from a variety of unclear or obsolete models - what is his opinion of O3? Why doesn't he create or propose a benchmark that researchers could use to measure progress of "world model creation"?

What's more, his actual point is unclear. Even if you simply grant, "okay, even SOTA LLMs don't have world models", why do I as a user of these models care? Because the models could be wrong? Yes, I'm aware. Nevertheless, I'm still deriving subtantial personal and professional value from the models as they stand today.

voidhorse · 1m ago

I think the point is that category errors or misinterpreting what a tool does can be dangerous.

Both statistical data generators and actual reasoning are useful in many circumstances, but there are also circumstances in which thinking that you are doing the latter when you are only doing the former can have severe consequences (example: building a bridge).

If nothing else, his perspective is a counterbalance to what is clearly an extreme hype machine that is doing its utmost to force adoption through overpromising, false advertising, etc. These are bad things even if the tech does actually have some useful applications.

As for benchmarks, if you fundamentally don't believe that stochastic data generation leads to reason as an emergent property, developing a benchmark is pointless. Also, not everyone has to be on the same side. It's clear that Marcus is not a fan of the current wave. Asking him to produce a substantive contribution that would help them continue to achieve their goals is preposterous. This game is highly political too. If you think the people pushing this stuff are less than estimable or morally sound, you wouldn't really want to empower them or give them more ideas.

SubiculumCode · 28m ago

I definitely would be okay if we hit an AI winter; our culture and world cannot adapt fast enough for the change we are experiencing. In the meantime, the current level of AI is just good enough to make us more productive, but not so good as to make us irrelevant.

energy123 · 1h ago

Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion?

https://www.anthropic.com/news/tracing-thoughts-language-mod...

vunderba · 56m ago

Speaking of chess, a fun experiment is building a few positions such as on Lichess, taking a screenshot, and asking a state-of-the-art VLM to count the number of pieces on the board. In my experience, it had a much higher error ratio in less likely or impossible board situations (three kings on the board, etc).

sdenton4 · 1h ago

"A wandering ant, for example, tracks where it is through the process of dead reckoning. An ant uses variables (in the algebraic/computer science sense) to maintain a readout of its location, even as as it wanders, constantly updated, so that it can directly return to its home."

Hm.

Dead reckoning is a terrible way to navigate, and famously led to lots of ships crashed on the shore of France before good clocks allowed tracking longitude accurately.

Ants lay down pheromone trails and use smell to find their way home... There's likely some additional tracking going on, but I would be surprised if it looked anything like symbolic GOFAI.

deadbabe · 1h ago

Even if you find a pheromone trail, it doesn’t tell you what direction is home, or what path to take at branching paths. You need dead reckoning. The trail just helps you reduce the complexity of what you have to remember.

cma · 1h ago

The trail also leads the other ants to food, hard for them to use your own dead reckoning.

voidhorse · 11m ago

The whole thing is silly. Look, we know that LLMs are just really good word predictors. Any argument that they are thinking is essentially predicated on marketing materials that embrace anthropomorphic metaphors to an extreme degree.

Is it possible that reason could emerge as the byproduct of being really good at predicting words? Maybe, but this depends on the antecedent claim that much if not all of reason is strictly representational and strictly linguistic. It's not obvious to me that this is the case. Many people think in images as direct sense datum, and it's not clear that a digital representation of this is equivalent to the thing in itself.

To use an example another HN'er suggested, We don't claim that submarines are swimming. Why are we so quick to claim that LLMs are "reasoning"?