Diffusion Language Models Are Super Data Learners

Comments (2)

woadwarrior01 · 40m ago

> During inference, generating sequences ranging from 16 to 4096 tokens incurs a 16× to 4700× increase in FLOPs compared to AR baselines.

I wonder why the increase in FLOPs has such a wide spectrum? Naively, I'd have expected the FLOPs to increase linearly with the number of tokens. OTOH, it sort of makes sense because because diffusion models are not autoregressive, as their name suggests.

ckjellqv · 1m ago

My guess is that autoregressive models can use Key Value (KV) caching to eliminate most of the FLOPs inside the self-attention block. Can't use KV caching inside diffusion (because it's not a causal model) but they sell this as a win anyway because they believe it leads to better reasoning.

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

Diffusion Language Models Are Super Data Learners (jinjieni.notion.site)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Try and (ygdp.yale.edu)

Booting 5000 Erlangs on Ampere One 192-core (underjord.io)

MCP: An (Accidentally) Universal Plugin System (worksonmymachine.ai)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Type (YC W23) is hiring a founding engineer to build an AI-native doc editor (ycombinator.com)

Writing simple tab-completions for Bash and Zsh (mill-build.org)

Flintlock – Create and manage the lifecycle of MicroVMs, backed by containerd (github.com)

Inside OS/2 (1987) (gitpi.us)

Abogen – Generate audiobooks from EPUBs, PDFs and text (github.com)

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching (pypi.org)

Open Lovable (github.com)

How I code with AI on a budget/free (wuu73.org)

GPT-5: It Just Does Stuff (oneusefulthing.org)

Zig's Lovely Syntax (matklad.github.io)

Curious about the training data of OpenAI's new GPT-OSS models? I was too (twitter.com)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Abusing Entra OAuth for fun and access to internal Microsoft applications (research.eye.security)

Sunlight-activated material turns PFAS in water into harmless fluoride (phys.org)

My Lethal Trifecta talk at the Bay Area AI Security Meetup (simonwillison.net)

OpenFreeMap survived 100k requests per second (blog.hyperknot.com)

The Framework Desktop is a beast (world.hey.com)

POML: Prompt Orchestration Markup Language (github.com)

How Potatoes Evolved (nhm.ac.uk)

LHC's New Chip Tackles Radiation Challenges (spectrum.ieee.org)

Melonking Website (melonking.net)

The current state of LLM-driven development (blog.tolki.dev)

A CT scanner reveals surprises inside the 386 processor's ceramic package (righto.com)

“The Hollow Men” at 100 (prufrock.substack.com)

Ch.at – A lightweight LLM chat service accessible through HTTP, SSH, DNS and API (ch.at)

Quickshell – building blocks for your desktop (quickshell.org)

Long-term exposure to outdoor air pollution linked to increased risk of dementia (cam.ac.uk)

Did California's fast food minimum wage reduce employment? (nber.org)

Don't “let it crash”, let it heal (zachdaniel.dev)

AOL closes its dial up internet service (ispreview.co.uk)

Debian 13 “Trixie” (debian.org)

People returned to live in Pompeii's ruins, archaeologists say (bbc.com)

An engineer's perspective on hiring (jyn.dev)

I want everything local – Building my offline AI workspace (instavm.io)

LLMs Aren't World Models (yosefk.com)

Who got arrested in the raid on the XSS crime forum? (krebsonsecurity.com)

ESP32 Bus Pirate 0.5 – A hardware hacking tool that speaks every protocol (github.com)

An AI-first program synthesis framework built around a new programming language (queue.acm.org)

P-fast trie, but smaller (dotat.at)

A Simple CPU on the Game of Life (2021) (nicholas.carlini.com)

Employees spotting problems help the business, but leaders empower flatterers (phys.org)

GPTs and Feeling Left Behind (whynothugo.nl)

MCP overlooks hard-won lessons from distributed systems (julsimon.medium.com)

Diffusion Language Models Are Super Data Learners

Comments (2)