EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC [pdf] (spcl.inf.ethz.ch)

Is there any work related to using some kind of soft tokens for reasoning? It seems so inefficient to try to encode so much information down into a single token for the next pass of the model, when you could output a large vector for each forward pass, and have a drastically larger working memory/scratchpad, and have much higher bandwidth for the models to pass information forward to the next token call. If a single token has 17 bits of information, a vector of 1024 floats could have 32,768 bits of information.

NotAnOtter · 5h ago

I'm interested how an innovation like this affects the business prospects.

Let's assume this is a paradigm shift on the scale of Transformers / `Attention is all you need`. Companies build out new models and pump another $100 Billion through it. And then a year from now, another innovation comes out. Same circus. And again.

No one wants to be left behind but trying to keep up will sink smaller companies.

curious_cat_163 · 4h ago

I am not sure why this ought to require "pump another $100 Billion". Could you elaborate?

Yes, the more recent generation of GPUs optimize for attention math. But they are still fairly "general-purpose" accelerators as well. So when I see papers like this (interesting idea, btw!), my mental model for costs suggests that the CapEx to buy up the GPUs and build out the data centers would get re-used for this and 100s of other ideas and experiments.

And then the hope is that the best ideas will occupy more of the available capacity...

gessha · 3h ago

Sir, this is an arxiv paper

NotAnOtter · 3h ago

So true, just like this one: https://arxiv.org/abs/1706.03762

Imnimo · 3h ago

This is an interesting way of squeezing extra feedback from raw text, but I'm a little skeptical that it's the best way to spend training flops. It feels like most "next tokens" are pretty low information (even after filtering for entropy like they do). Does it make sense to spend a bunch of compute on a reasoning trace on them? Maybe if you're harshly data limited, but not compute limited?

hzia · 12h ago

This is very exciting! Existing data will become a lot more valuable and it brings it one step closer to how we learn as humans!

The downside is that this is going to be extremely expensive, so the data set to conduct RL will need to be curated.

watsonmusic · 6h ago

cannot wait seeing how it goes beyond the current llm training pipeline

nsagent · 3h ago

It's clear that you're either one of the authors or a friend of theirs. You created this account 8 months ago to comment on another paper [1] that was released by the same authors.

[1]: https://news.ycombinator.com/item?id=41776324

rafaelero · 3h ago

This should be used for high entropy tokens during pre-training.

dgshsg · 9h ago

I notice that you can do this recursively to arbitrary depth. The cost is terrible though.

watsonmusic · 6h ago

it could be adaptive. only high-value tokens were allocated with more compute

babelfish · 6h ago

So marginally better (and occasionally worse) performance for an order of magnitude larger training costs…?

watsonmusic · 6h ago

14b model performs comparably with 32b size. the improvement is huge

85392_school · 6h ago

are we only comparing them in terms of text completion accuracy? does it also improve perf on benchmarks?

watsonmusic · 6h ago

A new scaling paradigm finally comes out!

beauzero · 5h ago

Interesting

UK tests underwater drone for communications protection (militarnyi.com)

Helion: A modern fast paced Doom FPS engine in C# (github.com)

4chan and porn site investigated by Ofcom over online safety (bbc.com)

ReqVis – Server Log Visualizer (github.com)

SF Housing Madness. Tired of Screaming into the Void

Best Model Context Protocol (MCP) Servers (pomerium.com)

Impulse Becomes First Battery-Integrated Appliance Certified to Safety Standards (impulselabs.com)

Meal Type Rather than Meal Sequence Affects Meal Duration, Chew Count and Tempo (mdpi.com)

Anycast dongle stream audio/video from Ubuntu (github.com)

The Bluesky bubble hurts liberals and their causes (washingtonpost.com)

France Moves to Classify X as an Adult Site Amid Digital ID Crackdown (reclaimthenet.org)

Mastodon: Trunk and Tidbits, May 2025 (blog.joinmastodon.org)

Show HN: MCP server to index external repositories (github.com)

OpenAI is storing deleted ChatGPT conversations as part of its NYT lawsuit (theverge.com)

Openpgp.js – OpenPGP JavaScript Implementation (openpgpjs.org)

Good Taste Is More Important Than Ever (theatlantic.com)

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC [pdf] (spcl.inf.ethz.ch)

The EU challenges Google and Cloudflare with its own DNS resolver (techradar.com)

AI-assisted coding for teams that can't get away with vibes (blog.nilenso.com)

MARM Protocol: Enhancing LLM Memory and Mitigating Hallucinations (github.com)

What happened to the "floating license" model?

The Spying Scandal Rocking the World of HR Software (bloomberg.com)

World Bank predicts worst decade for global growth since 60s (bbc.com)

What Happened with Siri and Apple Intelligence, According to Apple (techradar.com)

Course: Alan Moore Storytelling (bbcmaestro.com)

Google is offering employee buyouts in Search and other orgs (theverge.com)

Show HN: A Batteries-Included CLI Framework for Zig (github.com)

Android 16 is here, but its big redesign isn't ready (theverge.com)

Klutch Card (klutchcard.com)

Bitsight Identifies Security Cameras Accessible on the Internet (bitsight.com)

Inspecting Compiler Optimizations on Mixed Boolean Arithmetic Obfuscation (ndss-symposium.org)

Musk Touts Driverless Tesla Test Ahead of Austin Robotaxi Launch (bloomberg.com)

Why Doctors Hate Their Computers (2018) (web.archive.org)

Why Doctors Hate Their Computers (2018) (newyorker.com)

Assange: Google is not what it seems (2014) (newsweek.com)

Modern Minimal Perfect Hashing: A Survey (arxiv.org)

In California, a biomass company's expansion raises fears of more fires (grist.org)

What a Wandering Mind Learns (scientificamerican.com)

High-speed fluorescence light field tomography of whole freely moving organisms (opg.optica.org)

Deep dive into the Foundation Models framework [video] (developer.apple.com)

Detecting misbehavior in frontier reasoning models (openai.com)

Ask HN: Ways to stand out as a software engineer applying for a job?

How Synctera Centralizes BaaS Program Data (synctera.com)

The Restroom Archive (restroomarchive.jakewelch.design)

How we stopped YOLOing our MCP tool descriptions with role-play-based evals (hume.ai)

Show HN: HydraMQ - Postgres message queue implementation for NodeJs/TypeScript. (github.com)

How Mission Control's Big Displays Worked [video] (2020) (youtube.com)

ICANN's DNS Blocking Report Presents Three Key Recommendations (torrentfreak.com)

The Simplest Way to Improve Your AI Agent: Just Ask It (qckfx.com)

ARK's Expected Value for SpaceX in 2030: ~$2.5T Enterprise Value (ark-invest.com)

Reinforcement Pre-Training

Comments (17)