I think it’s very cool to proudly do all the misinterpretations the authors of the paper caution against[0], not even link to the paper, and publish that in the newspaper.
Bizarre take. I can't say I agree with the authors - it only takes using these models to see their capabilities.
avbanks · 13h ago
Capability != Reliability
Avi-D-coder · 13h ago
That's our job now, adding reliability. It's just pair programming.
queenkjuul · 11h ago
People say this and then I'm constantly unimpressed with their output at work.
readthenotes1 · 13h ago
"Not only is generative AI unreliable, but it can’t reason, as a recent demonstration showed: OpenAI’s latest ChatGPT4o model was beaten by an 8-bit Atari home games console made in 1977.
“Reality is the ultimate benchmark for AI,” explained Chomba Bupe, a Zambian AI developer, last week. “You not going to declare that you have built intelligence by beating toy benchmarks …"
If this is the level of reasoning that AI has to be, we've set a low bar.
Being able to play chess (what an Atari game is good at) is not the same as reasoning, and it is also toy benchmark.
Everything around the Atari mention is innuendo--but it is hard to tell if the author could reason well enough to realize that.
antithesizer · 13h ago
I'm concerned that journalism is all empty hype these days. What with their long, costly, quixotic, roll-out of endless, more-or-less identical, attempts to explain away AI, which, despite constantly shifting PR have long since plateaued in quality and ceased to wow the public.
badgersnake · 13h ago
The telegraph is not what it once was, that’s for sure.
drewcoo · 13h ago
When was that halcyon time when you knew a subject well and some outlet published a really good in-depth story about it? Maybe it happened, but I don't think it happened in this timestream.
Michael Crichton, of all people, coined the term Gell-Mann Amnesia, possibly describing your reaction.
[0]: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
If this is the level of reasoning that AI has to be, we've set a low bar.
Being able to play chess (what an Atari game is good at) is not the same as reasoning, and it is also toy benchmark.
Everything around the Atari mention is innuendo--but it is hard to tell if the author could reason well enough to realize that.
Michael Crichton, of all people, coined the term Gell-Mann Amnesia, possibly describing your reaction.
https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect