Hierarchical Reasoning Model outperforms LLMs at reasoning tasks

Comments (1)

nabla9 · 1h ago

Reality check.

...we made some surprising findings that call into question the prevailing narrative around HRM:

1. The "hierarchical" architecture had minimal performance impact when compared to a similarly sized transformer.

2. However, the relatively under-documented "outer loop" refinement process drove substantial performance, especially at training time.

3. Cross-task transfer learning has limited benefits; most of the performance comes from memorizing solutions to the specific tasks used at evaluation time.

4. Pre-training task augmentation is critical, though only 300 augmentations are needed (not 1K augmentations as reported in the paper). Inference-time task augmentation had limited impact.

Findings 2 & 3 suggest that the paper's approach is fundamentally similar to Liao and Gu's "ARC-AGI without pretraining".

The Brain's Map of the Body Is Surprisingly Stable–Even After a Limb Is Lost (scientificamerican.com)

Handling long-running LLM streams in a stateful backend (blog.leap.new)

SMS URLs (sethmlarson.dev)

Exploring the Dominion of Anoma (research.anoma.net)

Cloudflare AI Gateway now gives you access to your favorite AI models (blog.cloudflare.com)

AI Native Infrastructure Automation (systeminit.com)

ChatGPT can count to at least 1500 – just ask nicely (chatgpt.com)

Open-source political system – Automatism [pdf] (github.com)

Ask HN: How do you prevent or resolve cofounder conflict?

Teen killed himself after 'months of encouragement from ChatGPT', lawsuit claims (theguardian.com)

Google Vids adds AI avatars and launches a consumer version (techcrunch.com)

Show HN: Vanitycert.com – Automated Custom Domains and SSL for SaaS (vanitycert.com)

Microsoft can't guarantee data sovereignty – OVHcloud says 'We told you so' (theregister.com)

Google's AI model nailed the forecast for the strongest Atlantic storm this year (arstechnica.com)

Show HN: A math tutor that won't hallucinate answers (ChatGPT and Photomath) (thinkercan.com)

Social media users rubbish at spotting sneaky ads, say boffins (theregister.com)

ICANN is finally recognising Organization as the legal owner of domains (icann.org)

Noom's Tech Evaluation: Top AI Mobile Test Automation Tools (August 2025) (mobileboost.io)

Electric Vehicle vs. Gas Car Calculator (nytimes.com)

Home Assistant and Ubiquiti and AI = Home Automation Magic (troyhunt.com)

Why Use Lucky? (luckyframework.org)

Beyond the Terminal: Gemini CLI Comes to Zed Blog (developers.googleblog.com)

Show HN: React Web Camera – Fix <input type=file> single-photo limit (shivantra.com)

Let's Make Sure GitHub Doesn't Become the Only Option (blog.edwardloveall.com)

Ask HN: How to convince parents that half-hearted MS is worthless?

Fabric8Labs ECAM Enabled Thermal Solutions at Hot Chips 2025 (servethehome.com)

Google's Ironwood TPU Swings for Reasoning Model Leadership at Hot Chips 2025 (servethehome.com)

Inside Zig's New Writer (joegm.github.io)

What Is Included in a Stock Price? (six-group.com)

JWST detection of a carbon dioxide dominated gas coma surrounding 3I/ATLAS (zenodo.org)

Beyond the terminal: Gemini CLI comes to Zed (developers.googleblog.com)

AI Bubble 2027 (wheresyoured.at)

How do tenant protections impact rents? (twitter.com)

Marvell Shows Dense SRAM Custom HBM and CXL with Arm Compute at Hot Chips 2025 (servethehome.com)

Show HN: I built an open-source CSV importer that I wish existed (github.com)

Why Relying on LLMs for Code Can Be a Security Nightmare (blog.himanshuanand.com)

Shared_ptr<T>: the (not always) atomic reference counted smart pointer (snf.github.io)

15-Fold increase in solar thermoelectric generator performance (nature.com)

Arecibo Wow II: Revised Properties of the Wow Signal from Archival SETI Data (arxiv.org)

From Python to Go: Why We Rewrote Our Ingest Pipeline at Telemetry Harbor (telemetryharbor.com)

Ask HN: Did modern AI's coding abilities make you lose interest in programming?

Show HN: I built a robot that draws caricatures with a Sharpie (caricature-bot.com)

Ask HN: Any other Google Fiber uses having issues

Mantic: Automating Judgemental Forecasting (mantic.com)

Show HN: Dsa.rb: Practice core dsa in Ruby from the command line (github.com)

Python Namespace Packages are a pain (joshcannon.me)

How Will AI Impact Higher Ed? (jamesgmartin.center)

UserScript that shows a "Search on Perplexity" button on Google Search (gist.github.com)

GhostBSD 25.02 adds 'Gershwin' desktop for a Mac-like twist (theregister.com)

Show HN: AlphaSuite – An open-source platform for quantitative stock analysis (github.com)

Hierarchical Reasoning Model outperforms LLMs at reasoning tasks

Comments (1)