Oberon Pi (pascal.hansotten.com)

> Unfortunately, literally none of the tweets we saw even considered the possibility that a problematic graph specific to software tasks might not generalize to literally all other aspects of cognition.

How am I not surprised?

yorwba · 3h ago

> you could probably put together one reasonable collection of word counting and question answering tasks with average human time of 30 seconds and another collection with an average human time of 20 minutes where GPT-4 would hit 50% accuracy on each.

So do this and pick the one where humans do best. I doubt that doing so would show all progress to be illusory.

But it would certainly be interesting to know what the easiest thing is that a human can do but current AIs struggle with.

xg15 · 2h ago

> But it would certainly be interesting to know what the easiest thing is that a human can do but current AIs struggle with.

Still "Count the R's" apparently.

hatefulmoron · 4h ago

I had assumed that the Y axis was corresponding to some measurement of the LLM's ability to actually work/mull over a task in a loop while making progress. In other words, I thought it meant something like "you can leave Sonnet 3.7 for a whole hour and it will meaningfully progress on a problem", but the reality is less impressive. Serves me right for not looking at the fine print.

ReptileMan · 4h ago

The demand by a fraction of bay area intellectuals for ai disasters and doom of humanity way outstrips supply. The recent fanfic of Scott Alexander and other similar "thinkers" also is also worth checking out for a chuckle https://ai-2027.com/

ben_w · 3h ago

AI is software.

As software gets more reliable, people come to trust it.

Software still has bugs, the trust means those bugs still get people killed.

That was true with things we wouldn't call AI any more, and still does with things we do.

Doesn't need to take over or anything when humans are literally asleep at the wheel because they mistakenly think the AI can drive the car for them.

Heck, even for building codes and health & safety rules, they're written in blood. Why would AI be the exception?

clauderoux · 2h ago

As Linus Thorval said in an interview recently, humans don't need AI to make bugs.

okthrowman283 · 4h ago

To be fair though the author of 2027 has been prescient in his previous predictions

dist-epoch · 3h ago

Turkey fallacy.

The apocalypse will only happen once. Just like global nuclear war.

The fact that there was not a global nuclear war until now doesn't mean all those fearing nuclear war are crazy irrational people.

ReptileMan · 3h ago

No. It just means they are stupid in the way only extremely intelligent people could be

Sharlin · 3h ago

People being afraid of a nuclear war are stupid in a way only extremely intelligent people can be? Was that just something that sounded witty in your mind?

No comments yet

Nivge · 4h ago

TL;DR - the benchmark depends on its specific dataset, and it isn't a perfect representation to evaluate AI progress. That doesn't mean it doesn't make sense, or doesn't have value.

dist-epoch · 3h ago

> Abject failure on a task that many adults could solve in a minute

Maybe author should check before pressing "Publish" if the info in the post is not already outdated.

ChatGPT passed the image generation test mentioned: https://chatgpt.com/share/68171e2a-5334-8006-8d6e-dd693f2cec...

frotaur · 3h ago

Even excluding the fact that this image is simply to illustrate, and it's really not the main point of the article, in the chat you posted, ChatGPT actually failed again, because the r's are not circled.