How can AI researchers save energy? By going backward

62 pseudolus 44 6/2/2025, 2:29:45 AM quantamagazine.org ↗

Comments (44)

tamat · 1d ago

As a Software Engineer I found it hard to grasp the concepts explained here.

First it says we lose electrons by deleting information. But AFAIK we are losing electrons everywhere, most gates will operate on negation of a current, which I understand is what they refeer to losing electrons. So, are all gates bad now?

Also, why keeping a history of all memory changes will prevent losing heat? You will have to keep all that memory running so...

And finally, why would this be useful? Who needs to go back in time in their computations??

HPsquared · 23h ago

It's a thermodynamics thing. Reversible processes are the most efficient (something to do with entropy). Deleting information means it's no longer reversible. This is an entirely theoretical thing. There are theoretical limits to energy usage of computation based on this, but actual computers are nowhere near these theoretical limits, at all.

Edit: and yes, most of the logical operations in a regular chip like AND, OR, NAND etc are irreversible (in isolation, anyway)

rnhmjoj · 23h ago

> but actual computers are nowhere near these theoretical limits, at all.

The Landauer limit at ambient temperature gives something of the order of 10⁻²¹ J to irreversibly flip a bit. While, if I read this paper[1] correctly, current transistors are around 10⁻¹⁵ J. So, definitely not coming to AI "soon".

[1]: https://arxiv.org/pdf/2312.08595

No comments yet

tamat · 17h ago

thanks for your reply

thrance · 21h ago

Theoretically, a computer that never forgets anything can run without consuming any power (and thus never heating). That kind of computer would be called reversible (or adiabatic) as it would require its gates to be reversible (i.e. any computation can be undone). You would still need to expend energy to set the initial state (input) and copy the result (output).

Obviously, in real life, most power consumed by computers is lost by wire resistance, not through "forgetting" memory in logic gates. You would need superconducting wires and gates to build an actually reversible CPU.

Also, you would need to "uncompute" the result of a computation to bring back your reversible computer from its result back to its initial state, which may be problematic. Or you can expend energy to erase the state.

Quantum computers are reversible computers, if you seek a real life example. Quantum logic gates are reversible and can all be inverted.

tamat · 17h ago

Thanks for your explanation

naasking · 17h ago

> Also, why keeping a history of all memory changes will prevent losing heat?

How much power does a persistent storage (hard drive, SSD) require to preserve its stored data? Zero, which is why it emits zero heat.

> Who needs to go back in time in their computations??

At its most basic level, erasing/overwriting data requires energy. This generates a lot of heat. Heat dissipation is a major obstacle to scaling chips down even further. If you can design a computer that doesn't need to erase nearly as much data, you generate orders of magnitude less heat, and this potentially opens up more scaling potential and considerable power savings.

imurray · 21h ago

I'm sceptical about the energy motivation, but there are multiple reasons why making invertible deep learning architectures can be interesting or useful. Cf, a series of workshops from 2019-2021: https://invertibleworkshop.github.io/

Since then diffusion models have been popular. Generating from these can be seen as a special case of a continuous time normalizing flow, and so (in theory) is a reversible computation. Although the distilled/fast generation that's run in production is probably not that!

Simulating differential equations is not usually actually reversible in practice due to round-off errors. But when done carefully, simulations performed in a computer can actually be exactly bit-for-bit reversible: https://arxiv.org/abs/1704.07715

imurray · 19h ago

Another machine learning paper ("ancient", 2015) where being able to exactly reverse a computation was useful: https://arxiv.org/abs/1502.03492

MangoToupe · 1d ago

What does it mean for computation to have a direction? Said direction does not seem to refer to causality, which seems to me to be the natural interpretation—ie, producing inputs from outputs. It seems to me you'd necessarily need to run the program first with known inputs for that to work. So this just about preserving state by default to make backtracking easier?

random3 · 1d ago

Yes, but at a physical level, so it needs different hardware. Delete (e.g. AND) generates heat, so you need different gates, like the Fredkin gate.

godelski · 1d ago

  > What does it mean for computation to have a direction?

Actually, all computation has a directionality! This is actually a subject that I get really excited about ^__^

Think about this, we have a function f, with input x, and output y. f(x) -> y We'd even use that notation! This is our direction.

Now, the reverse actually gets a bit tricky. If the reverse is straightforward, our function has an inverse. But, it might not always. So our function f(x)=mx+b is invertible, because we can write x = (f(x)-b)/m (well... m can't be 0), which provides a unique solution. Every x corresponds to a unique f(x). But if instead, we have the function f(x) = x^2, this is not true! x = sqrt(f(x)), and here every f(x) corresponds to both x and -x! They are not unique.

We can start adopting the language of images and preimages if you want to start heading down that route. There's a lot of interesting phenomena here and if you hadn't already guessed it, yes, this is related to the P=NP problem!

An easy way to see this visually is to write down the computational circuit. Polylog actually has a really good video and will make the connection to P vs NP[0]

In the context of machine learning, a Normalizing Flow would be invertible, while a diffusion model is reversible. A pet peeve of mine is that in ML people (I'm a ML researcher) call it the "inverse problem", such as GAN-Inversion, but that is a misnomer and we shouldn't propagate it... This also has to do with the naivety of these statements...[1,2]. If yo understand this you'll understand how one could make accurate predictions in one direction but fail in the other. Which really puts a whole damper on that causality stuff. Essentially, we run into the problem of generating counterfactuals.

  > Said direction does not seem to refer to causality

Understanding this, I think you can actually tell that there's a direct relationship to causality here! In physics we love to manipulate equations around because... well... the point of physics is generating a causal mapping of the universe. But there's some problems... Entropy is the classic prime example (but there are many more in QM), and perhaps this led to his demise[3]. (This is also related to the phenomena of emergence and chaos.)

Here the issue is that we can take some gas molecules, run our computation forward and get a new "state" (configuration of our molecules). But now... how do we run this in reverse? We will not generate a unique solution, but instead we have a family of solutions.

Funny enough, you ran into this when you took calculus! That's why when you integrated your professor always got mad if you dropped the "+C"! So here you can see that differentiation isn't (necessarily) an invertible process. All f(x)+c map to f'(x)! It is a many to one relationship, just like with f(x)=x^2

  > So this just about preserving state by default to make backtracking easier?

I think here we should have some more clarity? If not, think about our gas distribution problem. If instead of just sampling at time 0 and time T we sampled at {0,t0,t1,...,T} we greatly reduce the solution space, right? Because now our mapping from T->0 needs to pass through all such states. It's still a lot of potential paths, but it's still fewer...[4]

[0] https://www.youtube.com/watch?v=6OPsH8PK7xM

[1] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...

[2] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s

[3] The opening of Goldstein's States of Matter book (the standard graduate textbook on statistical mechanics). Be sure to also read the first line of the second paragraph: https://i.imgur.com/Dm0PeJU.png

[4] I know...

bravesoul2 · 1d ago

I am guessing it minimises irreversible operations (information deletion), I.e.:

2 + 2 + 2

<=> reversable

2 + 2 + 2, 2 + 4

<=> reversable

2 + 2 + 2, 2 + 4, 6

=> irreversible

MangoToupe · 1d ago

Interesting. I would have thought that a reversible computation would produce a new algorithm with the domain and range swapped. Naming truly is the final boss of computation. But now I see it's one-step backtracking in a way that allows saving energy, supposedly. Very much still reversible, but definitely nothing remotely comparable to time travel. "uncomputation" was the much, much better name.

Edit: i see now. Well, this is much less exciting than I thought. Still, I'm excited for all the other people excited.

bravesoul2 · 1d ago

Anything to get the PPM co2 down is exciting although this probably has no effect (because Jevon) and we just need more trees, wind and solar

worldsayshi · 1d ago

I've been following Mike P Frank for a while on Twitter and he has often had interesting things to say about reversible computing and AI:

https://x.com/MikePFrank

user____name · 21h ago

I hope this will help when the next GPU datacenter driven software fad comes around. Though it probably won't, Jevons paradox and all.

rollcat · 16h ago

Says the headline a web page that takes 12s to load on a modern machine.

People tend to ignore a problem if it's someone else's. The costs of [insert disruptive technology here] are largely externalised - on our natural environment, on individuals' livelihoods, on violated copyrights, on independent hosts' infrastructure, on pedestrians, on about-to-be burnt-out/jobless/homeless, etc. What you gain in efficiency, you will use to bring more for yourself, not to bring less harm to someone else. ¯\_(ツ)_/¯

throwawaymaths · 1d ago

what is the plan here? has anyone even demonstrated a reversible compute matmul? theres a lot of information destruction that happens at the edges of even that transformation

YetAnotherNick · 1d ago

There of course is reversible matmul for reversible matrix. There is no reversible relu though.

But in any case I don't understand the claim of the article. If you can reverse the computation(say only use reversible matrix) you can do it for less energy?

throwawaymaths · 1d ago

demonstrated is the key word here. you can make most circuits ~reversible by just shuffling off to a pool of bits that you destroy eventually (but i presume you destroy a minimal amount), but is the juice worth the squeeze? i would worry that the scaling factor for floorplan is not linear in the matrix size (which is already O(nxk))

if you're also batching matmuls isnt there an unavoidable information loss that happens when you evict the old data and drop in the new batch?

bravesoul2 · 1d ago

Isn't it all reversable if you keep the original data?

Let f: V -> V

g: V -> V x V is reversable form of f, where g(v) = (v, f(v))

g'((v, w)) = v

g' can be "fooled" with a fake w, but that is of no concern here. We trust our own chip!

thrance · 18h ago

Your g is not reversible though, its input space must be the same dimension than its output space. On the other hand, you're correct in that any irreversible function can be extended to a reversible one, although the process isn't always straightforward. The general way is to do somthing like:

f: V -> V

g: V x A -> V x A

with g(x, a) = g(f(x), b) for some value(s) of a. And b cannot be set to x, because then you can't find a back with g', and your function is not invertible.

cwillu · 1d ago

The irreversible component of a computation is what actually generates heat, more or less.

bee_rider · 1d ago

IMO it would be more accurate to say: the delete operation is the one that we know thermodynamics and information theory must charge us for, on a physics level.

In current computers, we’re nowhere near those limits anyway. Reversible computing is interesting research for the future.

thrance · 18h ago

Quantum computers, albeit useless, are real life example of reversible computers. Those can be achieved.

bee_rider · 16h ago

Quantum dot cellular automata looked like a good candidate for reversible computers in some post-cmos future, last time I looked at that stuff (years ago—so, it is probably time for a check-in). Notably, they do classical computing, they just exploit quantum effects for the logic gates.

physix · 1d ago

I'm wondering... What happens when I do a reversible computation where no information is lost, does cutting the power on the unit create heat?

bee_rider · 1d ago

I guess it would depend on the physical design of the compute elements (reversible computing is generally associated with post-CMOS tech). But, cutting power doesn’t necessarily reset a machine in general… think of a mechanical computer, cutting power removes the ability to change state.

cwillu · 21h ago

“Cutting power” is a thing because current computers require a periodic (and rather frequent) refresh to maintain the state of the system (primarily, but not exclusively, RAM). And indeed, one useful tactic to maintain that state when you need to cut the power for some reason is to supercool the ram so that it doesn't dissipate its charge as fast, basically making the system approach that ideal world.

hyghjiyhu · 1d ago

Idk how accurate it is by my mental model of this would be that electricity is like a liquid and when you flip the switch, it drains out of the components creating heat through friction.

thrance · 18h ago

A fully reversible computers would consume no power whatsoever, so wouldn't require power to function. Instead, you would need power to (re)initialize its state, or to copy the results of its computations to e.g. your monitor. An unplugged reversible computer would be free to compute (and uncompute) its bits perpetually, unconcerned about the rest of the world. The functional programmer's dream, in a sense.

physix · 16h ago

Thanks for that and the preceding comments. I was thinking about the possible paradox of getting a reset for free. But, the cost invariably comes at some point, e.g. when you restart and need to reset your state.

TheDudeMan · 1d ago

Only at the theoretic limits of efficiency. In real life, not true.

throwawaymaths · 1d ago

i don't think that's categorically true. if your footprint from expanding out a circuit to make it reversible gets sufficiently large other things like resistance can dominate.

bee_rider · 1d ago

Other things like resistance already dominate (and always have). Reversible computing is the result of exploring the thermodynamic/information theory limits of what computation must cost.

In current chips we just charge and dump a bunch of parasitic capacitances every clock cycle.

TheDudeMan · 1d ago

No. It is a purely theoretic result. It has zero real-world applicability.

ajb · 21h ago

People have been working on commercialising this stuff for decades, actually. The problem is that you are competing with the staggering level of investment that goes into CMOS. To get early revenue you need a niche which is forgiving of not being as good as CMOS in other dimensions - AI is almost certainly not that.

JoshuaDavid · 1d ago

LeakyReLU is reversible in all but the least significant bits, right?

thrance · 18h ago

You're mistaking invertible matrices for reversible computing, which are two unrelated concepts*. You can devise a reversible implementation of ReLU (and of anything else for that matter), using ansible variables. Like in addition:

add(x, a) = (x + a, x - a), and add†(y, b) = ((y + b) / 2, (y - b) / 2)

It's a well known thing in thermodynamics that a reversible process doesn't increase entropy (heat). So in theory, a reversible computer consumes no power whatsoever, as it doesn't leak any. In practice, most power leakage in real life computers is due to wire resistance, not the irreversibility of computations. Also, even with a perfectly reversible CPU, you would need to expend some energy to (re)initialize the state of your computer (input) and to copy its results out (output). Alternatively, once a computation is done, you can always "uncompute" it to get back to the initial state without using any power, at the cost of time.

If you want an example of a real invertible computer, look into quantum computers, which are adiabatic (reversible) by necessity, in accordance with the laws of quantum physics.

* Actually, you can represent reversible gates with invertible matrices, and that has quite profound implications. A gate/operation is reversible if and only if its corresponding matrix is invertible. But let's not get into that here.

thrance · 18h ago

In Stephen Baxter's Time [1], in the far far future, when all the stars have died out and black holes have finally all evaporated, the descendants of mankind are left in a maximally entropic universe, with no free enrgy left at all. They are condemned to live in giant simulations powered by reversible computing (consuming no power), reliving the same events over and over again, as computations are uncomputed and then recomputed.

[1] https://en.wikipedia.org/wiki/Time_(novel)

sadboots · 19h ago

it has been shown that using LLMs plenty of times require less electricity than boiling a kettle

charcircuit · 23h ago

This article is disappointing. Without any sort of benchmark or evidence of pytorch supporting it, how can this compete with nvidia? There is no proof that this finally is competitive against traditional chips.

Edit: One of their white papers mentions " Application Framework: A PyTorch-compatible interface supports both AI applications and general-purpose computing, ensuring versatility without sacrificing performance."

Huxley1 · 21h ago

I’ve been learning more about how AI systems actually work, and one thing I keep wondering is how much energy it all uses.

This idea of reversible computing was new to me. I didn’t know it was even possible to run computations “backwards” to save power. It’s interesting that slowing things down might actually save more energy in the long run. I’ll definitely be reading more about this.

AI makes the humanities more important, but also weirder (resobscura.substack.com)

My AI skeptic friends are all nuts (fly.io)

The Metamorphosis of Prime Intellect (1994) (localroger.com)

Poison Pill: Is the killer behind 1982 Tylenol poisonings still on the loose? (trulyadventure.us)

Why GUIs are built at least 2.5 times (patricia.no)

Cloudlflare builds OAuth with Claude and publishes all the prompts (github.com)

Demodesk (YC W19) Is Hiring Rails Engineers (demodesk.com)

Ask HN: Who is hiring? (June 2025)

How to Store Data on Paper? (monperrus.net)

Conformance checking at MongoDB: Testing that our code matches our TLA+ specs (mongodb.com)

Show HN: Kan.bn – An open-source alterative to Trello (github.com)

Show HN: A toy version of Wireshark (student project) (github.com)

How to post when no one is reading (jeetmehta.com)

Show HN: I build one absurd web project every month (absurd.website)

Magic Ink: Information Software and the Graphical Interface (worrydream.com)

Teaching Program Verification in Dafny at Amazon (2023) (dafny.org)

Sid Meier's Pirates – In-depth (2017) (shot97retro.blogspot.com)

Show HN: Onlook – Open-source, visual-first Cursor for designers (github.com)

MonsterUI: Python library for building front end UIs quickly in FastHTML apps (answer.ai)

Largest punk archive to find new home at MTSU's Center for Popular Music (mtsunews.com)

Japanese scientists develop artificial blood compatible with all blood types (tokyoweekender.com)

ThorVG: Super Lightweight Vector Graphics Engine (thorvg.org)

Younger generations less likely to have dementia, study suggests (theguardian.com)

Ask HN: How do I learn practical electronic repair?

Typing 118 WPM broke my brain in the right ways (balaji-amg.surge.sh)

CVE 2025 31200 (blog.noahhw.dev)

IT workers struggling in New Zealand's tight job market (rnz.co.nz)

Show HN: Penny-1.7B Irish Penny Journal style transfer (huggingface.co)

The Princeton INTERCAL Compiler's source code (esoteric.codes)

Ask HN: How do I learn robotics in 2025?

Ask HN: Who wants to be hired? (June 2025)

Arcol simplifies building design with browser-based modeling (arcol.io)

Intelligent Agent Technology: Open Sesame! (1993) (blog.gingerbeardman.com)

Snowflake to buy Crunchy Data for $250M (wsj.com)

The rise of judgement over technical skill (notsocommonthoughts.com)

If you are useful, it doesn't mean you are valued (betterthanrandom.substack.com)

A Hidden Weakness (serge-sans-paille.github.io)

TradeExpert, a trading framework that employs Mixture of Expert LLMs (arxiv.org)

ReasoningGym: Reasoning Environments for RL with Verifiable Rewards (arxiv.org)

Mesh Edge Construction (maxliani.wordpress.com)

Is “The Phoenician Scheme” Wes Anderson's Most Emotional Film? (newyorker.com)

The Visual World of 'Samurai Jack' (animationobsessive.substack.com)

EasyTier – P2P mesh VPN written in Rust using Tokio (easytier.cn)

The Atomic Airplane (whatisnuclear.com)

Reducing Cargo target directory size with -Zno-embed-metadata (kobzol.github.io)

LibriVox (librivox.org)

Can I stop drone delivery companies flying over my property? (rte.ie)

HeidiSQL Available Also for Linux (heidisql.com)

Cuss: Map of profane words to a rating of sureness (github.com)

Show HN: Page Magic: Use AI to customize any web page (github.com)

How can AI researchers save energy? By going backward

Comments (44)