Engineers at our startup don't build features anymore

- Impossible river claim: Again in figure 6, you can see that the performance declines before we reach 5 actors. So while it wasn’t necessary to test until 20, the results still indicate, impossibility doesn't explain the results.

MarkusQ · 2h ago

The people trying to show that LLMs don't think are working too hard. It's trivially easy, imho:

https://chatgpt.com/share/68504396-e300-800c-a7ff-dde5fe1572...

ForHackernews · 5h ago

Wait is C. Opus just the anthropic bot? Did I waste my time reading AI nonsense?

credit_guy · 4h ago

The second author seems to be human.

https://www.openphilanthropy.org/about/team/alex-lawsen/

MarkusQ · 3h ago

Could be. Someone hallucinated the arXive reference for the Apple paper.

mfro · 4h ago

> These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.

I would like to carefully design my response to this article with a downvote

ForHackernews · 5h ago

"5 Alternative Representations Restore Performance To test whether the failures reflect reasoning limitations or format constraints, we conducted preliminary testing of the same models on Tower of Hanoi N = 15 using a different representation: Prompt: "Solve Tower of Hanoi with 15 disks. Output a Lua function that prints the solution when called."

Results: Very high accuracy across tested models (Claude-3.7-Sonnet, Claude Opus 4, OpenAI o3, Google Gemini 2.5), completing in under 5,000 tokens.

The generated solutions correctly implement the recursive algorithm, demonstrating intact reasoning capabilities when freed from exhaustive enumeration requirement""

Is there's something I'm missing here?

This seems like it demonstrates the exact opposite of what the authors are claiming: Yes, your bot is an effective parrot that can output a correct Lua program that exists somewhere in the training data. No, your bot is not "thinking" and cannot effectively reason through the algorithm itself.

TIcomPOCL · 4h ago

It seems to just reillustrate the point that the model cannot follow algorithmic steps once it is out of distribution.

ForHackernews · 5h ago

> Recent reports have claimed that most 7th graders are unable to independently derive the Pythagorean Theorem, however our analysis reveals that these apparent failures stem from experimental design choices rather than inherent student limitations.

When given access to Google and prompted to "tell me how to find the length of hypotenuse of a right triangle", a majority of middle-schoolers produced the correct Pythagorean Theorem, demonstrating intact reasoning capabilities when freed from the exhaustive comprehension requirement.

Engineers at our startup don't build features anymore

WhatsApp Ads, Coffin Nails

Ask HN: How do I give back to people helped me when I was young and had nothing?

Ask HN: Is there an AI bot that works like a literate programming build step

Ask HN: What cool skill or project interests you, but feels out of reach?

Ask HN: What is your fallback job if AI takes away your career?

Ask HN: Prevent Secrets from Committing to Repos

Tell HN: Help restore the tax deduction for software dev in the US (Section 174)

Ask HN: How do you handle an employee who complies but never delivers?

Ask HN: How to learn CUDA to professional level

Ask HN: AGI and Product Development

Ask HN: Genuine alternatives to Google and Apple for releasing paid apps

Ask HN: Is ageism in tech still a problem?

Ask HN: Casual Math Book Suggestions

Ask HN: Seeking ways to improve my planning skills and follow-through

How does feedback usually happen during projects?

Ask HN: Minecraft's UI element style (vs. modern flat glass interface)

Just how many $10 /MOS subscriptions do startups expect us to sign up for?

Ask HN: What is the latest on treatment of Metastatic Breast Cancer?

Ask HN: What are your Unicode woes?

Ask HN: Are there dev conferences focusing on "soft skills"?

Ask HN: AWS cdk, serverless setup advice

Ask HN: Is it still a good idea to learn Perl for a young developer?

Ask HN: In 15 years, what will a gas station visit look like?

Ask HN: Dear Product Managers – How do you use LLM's in your day to day work?

Why Vertical AI Agents May Replace RPA in Complex Enterprise Workflows

Ask HN: What would your dream home include and where would it be?

Ask HN: Is Firebase Down?

Requesting ArXiv cs.PL endorsement for Py2C (Python-to-C compiler)

Ask HN: What's the coolest AI project you've seen?

Ask HN: How has your company adapted to hiring with LLMs?

Feature Phone and Pegasus Style Spyware Question

Tell HN: GitHub gists are great for private/public bookmarks

Ask HN: What Happened to the Apple Vision Pro?

The Illusion of the Illusion of Thinking – A Comment on Shojaee et al. (2025)

Comments (11)