As a Software Engineer I found it hard to grasp the concepts explained here.
First it says we lose electrons by deleting information. But AFAIK we are losing electrons everywhere, most gates will operate on negation of a current, which I understand is what they refeer to losing electrons. So, are all gates bad now?
Also, why keeping a history of all memory changes will prevent losing heat? You will have to keep all that memory running so...
And finally, why would this be useful? Who needs to go back in time in their computations??
HPsquared · 23h ago
It's a thermodynamics thing. Reversible processes are the most efficient (something to do with entropy). Deleting information means it's no longer reversible. This is an entirely theoretical thing. There are theoretical limits to energy usage of computation based on this, but actual computers are nowhere near these theoretical limits, at all.
Edit: and yes, most of the logical operations in a regular chip like AND, OR, NAND etc are irreversible (in isolation, anyway)
rnhmjoj · 23h ago
> but actual computers are nowhere near these theoretical limits, at all.
The Landauer limit at ambient temperature gives something of the order of 10⁻²¹ J to irreversibly flip a bit. While, if I read this paper[1] correctly, current transistors are around 10⁻¹⁵ J. So, definitely not coming to AI "soon".
Theoretically, a computer that never forgets anything can run without consuming any power (and thus never heating). That kind of computer would be called reversible (or adiabatic) as it would require its gates to be reversible (i.e. any computation can be undone). You would still need to expend energy to set the initial state (input) and copy the result (output).
Obviously, in real life, most power consumed by computers is lost by wire resistance, not through "forgetting" memory in logic gates. You would need superconducting wires and gates to build an actually reversible CPU.
Also, you would need to "uncompute" the result of a computation to bring back your reversible computer from its result back to its initial state, which may be problematic. Or you can expend energy to erase the state.
Quantum computers are reversible computers, if you seek a real life example. Quantum logic gates are reversible and can all be inverted.
tamat · 17h ago
Thanks for your explanation
naasking · 17h ago
> Also, why keeping a history of all memory changes will prevent losing heat?
How much power does a persistent storage (hard drive, SSD) require to preserve its stored data? Zero, which is why it emits zero heat.
> Who needs to go back in time in their computations??
At its most basic level, erasing/overwriting data requires energy. This generates a lot of heat. Heat dissipation is a major obstacle to scaling chips down even further. If you can design a computer that doesn't need to erase nearly as much data, you generate orders of magnitude less heat, and this potentially opens up more scaling potential and considerable power savings.
imurray · 21h ago
I'm sceptical about the energy motivation, but there are multiple reasons why making invertible deep learning architectures can be interesting or useful. Cf, a series of workshops from 2019-2021: https://invertibleworkshop.github.io/
Since then diffusion models have been popular. Generating from these can be seen as a special case of a continuous time normalizing flow, and so (in theory) is a reversible computation. Although the distilled/fast generation that's run in production is probably not that!
Simulating differential equations is not usually actually reversible in practice due to round-off errors. But when done carefully, simulations performed in a computer can actually be exactly bit-for-bit reversible: https://arxiv.org/abs/1704.07715
imurray · 19h ago
Another machine learning paper ("ancient", 2015) where being able to exactly reverse a computation was useful: https://arxiv.org/abs/1502.03492
MangoToupe · 1d ago
What does it mean for computation to have a direction? Said direction does not seem to refer to causality, which seems to me to be the natural interpretation—ie, producing inputs from outputs. It seems to me you'd necessarily need to run the program first with known inputs for that to work. So this just about preserving state by default to make backtracking easier?
random3 · 1d ago
Yes, but at a physical level, so it needs different hardware. Delete (e.g. AND) generates heat, so you need different gates, like the Fredkin gate.
godelski · 1d ago
> What does it mean for computation to have a direction?
Actually, all computation has a directionality! This is actually a subject that I get really excited about ^__^
Think about this, we have a function f, with input x, and output y. f(x) -> y We'd even use that notation! This is our direction.
Now, the reverse actually gets a bit tricky. If the reverse is straightforward, our function has an inverse. But, it might not always. So our function f(x)=mx+b is invertible, because we can write x = (f(x)-b)/m (well... m can't be 0), which provides a unique solution. Every x corresponds to a unique f(x). But if instead, we have the function f(x) = x^2, this is not true! x = sqrt(f(x)), and here every f(x) corresponds to both x and -x! They are not unique.
We can start adopting the language of images and preimages if you want to start heading down that route. There's a lot of interesting phenomena here and if you hadn't already guessed it, yes, this is related to the P=NP problem!
An easy way to see this visually is to write down the computational circuit. Polylog actually has a really good video and will make the connection to P vs NP[0]
In the context of machine learning, a Normalizing Flow would be invertible, while a diffusion model is reversible. A pet peeve of mine is that in ML people (I'm a ML researcher) call it the "inverse problem", such as GAN-Inversion, but that is a misnomer and we shouldn't propagate it... This also has to do with the naivety of these statements...[1,2]. If yo understand this you'll understand how one could make accurate predictions in one direction but fail in the other. Which really puts a whole damper on that causality stuff. Essentially, we run into the problem of generating counterfactuals.
> Said direction does not seem to refer to causality
Understanding this, I think you can actually tell that there's a direct relationship to causality here! In physics we love to manipulate equations around because... well... the point of physics is generating a causal mapping of the universe. But there's some problems... Entropy is the classic prime example (but there are many more in QM), and perhaps this led to his demise[3]. (This is also related to the phenomena of emergence and chaos.)
Here the issue is that we can take some gas molecules, run our computation forward and get a new "state" (configuration of our molecules). But now... how do we run this in reverse? We will not generate a unique solution, but instead we have a family of solutions.
Funny enough, you ran into this when you took calculus! That's why when you integrated your professor always got mad if you dropped the "+C"! So here you can see that differentiation isn't (necessarily) an invertible process. All f(x)+c map to f'(x)! It is a many to one relationship, just like with f(x)=x^2
> So this just about preserving state by default to make backtracking easier?
I think here we should have some more clarity? If not, think about our gas distribution problem. If instead of just sampling at time 0 and time T we sampled at {0,t0,t1,...,T} we greatly reduce the solution space, right? Because now our mapping from T->0 needs to pass through all such states. It's still a lot of potential paths, but it's still fewer...[4]
[3] The opening of Goldstein's States of Matter book (the standard graduate textbook on statistical mechanics). Be sure to also read the first line of the second paragraph: https://i.imgur.com/Dm0PeJU.png
[4] I know...
bravesoul2 · 1d ago
I am guessing it minimises irreversible operations (information deletion), I.e.:
2 + 2 + 2
<=> reversable
2 + 2 + 2, 2 + 4
<=> reversable
2 + 2 + 2, 2 + 4, 6
=> irreversible
6
MangoToupe · 1d ago
Interesting. I would have thought that a reversible computation would produce a new algorithm with the domain and range swapped. Naming truly is the final boss of computation. But now I see it's one-step backtracking in a way that allows saving energy, supposedly. Very much still reversible, but definitely nothing remotely comparable to time travel. "uncomputation" was the much, much better name.
Edit: i see now. Well, this is much less exciting than I thought. Still, I'm excited for all the other people excited.
bravesoul2 · 1d ago
Anything to get the PPM co2 down is exciting although this probably has no effect (because Jevon) and we just need more trees, wind and solar
worldsayshi · 1d ago
I've been following Mike P Frank for a while on Twitter and he has often had interesting things to say about reversible computing and AI:
I hope this will help when the next GPU datacenter driven software fad comes around. Though it probably won't, Jevons paradox and all.
rollcat · 16h ago
Says the headline a web page that takes 12s to load on a modern machine.
People tend to ignore a problem if it's someone else's. The costs of [insert disruptive technology here] are largely externalised - on our natural environment, on individuals' livelihoods, on violated copyrights, on independent hosts' infrastructure, on pedestrians, on about-to-be burnt-out/jobless/homeless, etc. What you gain in efficiency, you will use to bring more for yourself, not to bring less harm to someone else. ¯\_(ツ)_/¯
throwawaymaths · 1d ago
what is the plan here? has anyone even demonstrated a reversible compute matmul? theres a lot of information destruction that happens at the edges of even that transformation
YetAnotherNick · 1d ago
There of course is reversible matmul for reversible matrix. There is no reversible relu though.
But in any case I don't understand the claim of the article. If you can reverse the computation(say only use reversible matrix) you can do it for less energy?
throwawaymaths · 1d ago
demonstrated is the key word here. you can make most circuits ~reversible by just shuffling off to a pool of bits that you destroy eventually (but i presume you destroy a minimal amount), but is the juice worth the squeeze? i would worry that the scaling factor for floorplan is not linear in the matrix size (which is already O(nxk))
if you're also batching matmuls isnt there an unavoidable information loss that happens when you evict the old data and drop in the new batch?
bravesoul2 · 1d ago
Isn't it all reversable if you keep the original data?
Let f: V -> V
g: V -> V x V is reversable form of f, where g(v) = (v, f(v))
g'((v, w)) = v
g' can be "fooled" with a fake w, but that is of no concern here. We trust our own chip!
thrance · 18h ago
Your g is not reversible though, its input space must be the same dimension than its output space. On the other hand, you're correct in that any irreversible function can be extended to a reversible one, although the process isn't always straightforward. The general way is to do somthing like:
f: V -> V
g: V x A -> V x A
with g(x, a) = g(f(x), b) for some value(s) of a. And b cannot be set to x, because then you can't find a back with g', and your function is not invertible.
cwillu · 1d ago
The irreversible component of a computation is what actually generates heat, more or less.
bee_rider · 1d ago
IMO it would be more accurate to say: the delete operation is the one that we know thermodynamics and information theory must charge us for, on a physics level.
In current computers, we’re nowhere near those limits anyway. Reversible computing is interesting research for the future.
thrance · 18h ago
Quantum computers, albeit useless, are real life example of reversible computers. Those can be achieved.
bee_rider · 16h ago
Quantum dot cellular automata looked like a good candidate for reversible computers in some post-cmos future, last time I looked at that stuff (years ago—so, it is probably time for a check-in). Notably, they do classical computing, they just exploit quantum effects for the logic gates.
physix · 1d ago
I'm wondering... What happens when I do a reversible computation where no information is lost, does cutting the power on the unit create heat?
bee_rider · 1d ago
I guess it would depend on the physical design of the compute elements (reversible computing is generally associated with post-CMOS tech). But, cutting power doesn’t necessarily reset a machine in general… think of a mechanical computer, cutting power removes the ability to change state.
cwillu · 21h ago
“Cutting power” is a thing because current computers require a periodic (and rather frequent) refresh to maintain the state of the system (primarily, but not exclusively, RAM). And indeed, one useful tactic to maintain that state when you need to cut the power for some reason is to supercool the ram so that it doesn't dissipate its charge as fast, basically making the system approach that ideal world.
hyghjiyhu · 1d ago
Idk how accurate it is by my mental model of this would be that electricity is like a liquid and when you flip the switch, it drains out of the components creating heat through friction.
thrance · 18h ago
A fully reversible computers would consume no power whatsoever, so wouldn't require power to function. Instead, you would need power to (re)initialize its state, or to copy the results of its computations to e.g. your monitor. An unplugged reversible computer would be free to compute (and uncompute) its bits perpetually, unconcerned about the rest of the world. The functional programmer's dream, in a sense.
physix · 16h ago
Thanks for that and the preceding comments. I was thinking about the possible paradox of getting a reset for free. But, the cost invariably comes at some point, e.g. when you restart and need to reset your state.
TheDudeMan · 1d ago
Only at the theoretic limits of efficiency. In real life, not true.
throwawaymaths · 1d ago
i don't think that's categorically true. if your footprint from expanding out a circuit to make it reversible gets sufficiently large other things like resistance can dominate.
bee_rider · 1d ago
Other things like resistance already dominate (and always have). Reversible computing is the result of exploring the thermodynamic/information theory limits of what computation must cost.
In current chips we just charge and dump a bunch of parasitic capacitances every clock cycle.
TheDudeMan · 1d ago
No. It is a purely theoretic result. It has zero real-world applicability.
ajb · 21h ago
People have been working on commercialising this stuff for decades, actually. The problem is that you are competing with the staggering level of investment that goes into CMOS. To get early revenue you need a niche which is forgiving of not being as good as CMOS in other dimensions - AI is almost certainly not that.
JoshuaDavid · 1d ago
LeakyReLU is reversible in all but the least significant bits, right?
thrance · 18h ago
You're mistaking invertible matrices for reversible computing, which are two unrelated concepts*. You can devise a reversible implementation of ReLU (and of anything else for that matter), using ansible variables. Like in addition:
add(x, a) = (x + a, x - a), and add†(y, b) = ((y + b) / 2, (y - b) / 2)
It's a well known thing in thermodynamics that a reversible process doesn't increase entropy (heat). So in theory, a reversible computer consumes no power whatsoever, as it doesn't leak any. In practice, most power leakage in real life computers is due to wire resistance, not the irreversibility of computations. Also, even with a perfectly reversible CPU, you would need to expend some energy to (re)initialize the state of your computer (input) and to copy its results out (output). Alternatively, once a computation is done, you can always "uncompute" it to get back to the initial state without using any power, at the cost of time.
If you want an example of a real invertible computer, look into quantum computers, which are adiabatic (reversible) by necessity, in accordance with the laws of quantum physics.
* Actually, you can represent reversible gates with invertible matrices, and that has quite profound implications. A gate/operation is reversible if and only if its corresponding matrix is invertible. But let's not get into that here.
thrance · 18h ago
In Stephen Baxter's Time [1], in the far far future, when all the stars have died out and black holes have finally all evaporated, the descendants of mankind are left in a maximally entropic universe, with no free enrgy left at all. They are condemned to live in giant simulations powered by reversible computing (consuming no power), reliving the same events over and over again, as computations are uncomputed and then recomputed.
it has been shown that using LLMs plenty of times require less electricity than boiling a kettle
charcircuit · 23h ago
This article is disappointing. Without any sort of benchmark or evidence of pytorch supporting it, how can this compete with nvidia? There is no proof that this finally is competitive against traditional chips.
Edit: One of their white papers mentions " Application Framework: A PyTorch-compatible interface supports both AI
applications and general-purpose computing, ensuring versatility without
sacrificing performance."
Huxley1 · 21h ago
I’ve been learning more about how AI systems actually work, and one thing I keep wondering is how much energy it all uses.
This idea of reversible computing was new to me. I didn’t know it was even possible to run computations “backwards” to save power.
It’s interesting that slowing things down might actually save more energy in the long run. I’ll definitely be reading more about this.
First it says we lose electrons by deleting information. But AFAIK we are losing electrons everywhere, most gates will operate on negation of a current, which I understand is what they refeer to losing electrons. So, are all gates bad now?
Also, why keeping a history of all memory changes will prevent losing heat? You will have to keep all that memory running so...
And finally, why would this be useful? Who needs to go back in time in their computations??
Edit: and yes, most of the logical operations in a regular chip like AND, OR, NAND etc are irreversible (in isolation, anyway)
The Landauer limit at ambient temperature gives something of the order of 10⁻²¹ J to irreversibly flip a bit. While, if I read this paper[1] correctly, current transistors are around 10⁻¹⁵ J. So, definitely not coming to AI "soon".
[1]: https://arxiv.org/pdf/2312.08595
No comments yet
Obviously, in real life, most power consumed by computers is lost by wire resistance, not through "forgetting" memory in logic gates. You would need superconducting wires and gates to build an actually reversible CPU.
Also, you would need to "uncompute" the result of a computation to bring back your reversible computer from its result back to its initial state, which may be problematic. Or you can expend energy to erase the state.
Quantum computers are reversible computers, if you seek a real life example. Quantum logic gates are reversible and can all be inverted.
How much power does a persistent storage (hard drive, SSD) require to preserve its stored data? Zero, which is why it emits zero heat.
> Who needs to go back in time in their computations??
At its most basic level, erasing/overwriting data requires energy. This generates a lot of heat. Heat dissipation is a major obstacle to scaling chips down even further. If you can design a computer that doesn't need to erase nearly as much data, you generate orders of magnitude less heat, and this potentially opens up more scaling potential and considerable power savings.
Since then diffusion models have been popular. Generating from these can be seen as a special case of a continuous time normalizing flow, and so (in theory) is a reversible computation. Although the distilled/fast generation that's run in production is probably not that!
Simulating differential equations is not usually actually reversible in practice due to round-off errors. But when done carefully, simulations performed in a computer can actually be exactly bit-for-bit reversible: https://arxiv.org/abs/1704.07715
Think about this, we have a function f, with input x, and output y. f(x) -> y We'd even use that notation! This is our direction.
Now, the reverse actually gets a bit tricky. If the reverse is straightforward, our function has an inverse. But, it might not always. So our function f(x)=mx+b is invertible, because we can write x = (f(x)-b)/m (well... m can't be 0), which provides a unique solution. Every x corresponds to a unique f(x). But if instead, we have the function f(x) = x^2, this is not true! x = sqrt(f(x)), and here every f(x) corresponds to both x and -x! They are not unique.
We can start adopting the language of images and preimages if you want to start heading down that route. There's a lot of interesting phenomena here and if you hadn't already guessed it, yes, this is related to the P=NP problem!
An easy way to see this visually is to write down the computational circuit. Polylog actually has a really good video and will make the connection to P vs NP[0]
In the context of machine learning, a Normalizing Flow would be invertible, while a diffusion model is reversible. A pet peeve of mine is that in ML people (I'm a ML researcher) call it the "inverse problem", such as GAN-Inversion, but that is a misnomer and we shouldn't propagate it... This also has to do with the naivety of these statements...[1,2]. If yo understand this you'll understand how one could make accurate predictions in one direction but fail in the other. Which really puts a whole damper on that causality stuff. Essentially, we run into the problem of generating counterfactuals.
Understanding this, I think you can actually tell that there's a direct relationship to causality here! In physics we love to manipulate equations around because... well... the point of physics is generating a causal mapping of the universe. But there's some problems... Entropy is the classic prime example (but there are many more in QM), and perhaps this led to his demise[3]. (This is also related to the phenomena of emergence and chaos.)Here the issue is that we can take some gas molecules, run our computation forward and get a new "state" (configuration of our molecules). But now... how do we run this in reverse? We will not generate a unique solution, but instead we have a family of solutions.
Funny enough, you ran into this when you took calculus! That's why when you integrated your professor always got mad if you dropped the "+C"! So here you can see that differentiation isn't (necessarily) an invertible process. All f(x)+c map to f'(x)! It is a many to one relationship, just like with f(x)=x^2
I think here we should have some more clarity? If not, think about our gas distribution problem. If instead of just sampling at time 0 and time T we sampled at {0,t0,t1,...,T} we greatly reduce the solution space, right? Because now our mapping from T->0 needs to pass through all such states. It's still a lot of potential paths, but it's still fewer...[4][0] https://www.youtube.com/watch?v=6OPsH8PK7xM
[1] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...
[2] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s
[3] The opening of Goldstein's States of Matter book (the standard graduate textbook on statistical mechanics). Be sure to also read the first line of the second paragraph: https://i.imgur.com/Dm0PeJU.png
[4] I know...
2 + 2 + 2
<=> reversable
2 + 2 + 2, 2 + 4
<=> reversable
2 + 2 + 2, 2 + 4, 6
=> irreversible
6
Edit: i see now. Well, this is much less exciting than I thought. Still, I'm excited for all the other people excited.
https://x.com/MikePFrank
People tend to ignore a problem if it's someone else's. The costs of [insert disruptive technology here] are largely externalised - on our natural environment, on individuals' livelihoods, on violated copyrights, on independent hosts' infrastructure, on pedestrians, on about-to-be burnt-out/jobless/homeless, etc. What you gain in efficiency, you will use to bring more for yourself, not to bring less harm to someone else. ¯\_(ツ)_/¯
But in any case I don't understand the claim of the article. If you can reverse the computation(say only use reversible matrix) you can do it for less energy?
if you're also batching matmuls isnt there an unavoidable information loss that happens when you evict the old data and drop in the new batch?
Let f: V -> V
g: V -> V x V is reversable form of f, where g(v) = (v, f(v))
g'((v, w)) = v
g' can be "fooled" with a fake w, but that is of no concern here. We trust our own chip!
f: V -> V
g: V x A -> V x A
with g(x, a) = g(f(x), b) for some value(s) of a. And b cannot be set to x, because then you can't find a back with g', and your function is not invertible.
In current computers, we’re nowhere near those limits anyway. Reversible computing is interesting research for the future.
In current chips we just charge and dump a bunch of parasitic capacitances every clock cycle.
add(x, a) = (x + a, x - a), and add†(y, b) = ((y + b) / 2, (y - b) / 2)
It's a well known thing in thermodynamics that a reversible process doesn't increase entropy (heat). So in theory, a reversible computer consumes no power whatsoever, as it doesn't leak any. In practice, most power leakage in real life computers is due to wire resistance, not the irreversibility of computations. Also, even with a perfectly reversible CPU, you would need to expend some energy to (re)initialize the state of your computer (input) and to copy its results out (output). Alternatively, once a computation is done, you can always "uncompute" it to get back to the initial state without using any power, at the cost of time.
If you want an example of a real invertible computer, look into quantum computers, which are adiabatic (reversible) by necessity, in accordance with the laws of quantum physics.
* Actually, you can represent reversible gates with invertible matrices, and that has quite profound implications. A gate/operation is reversible if and only if its corresponding matrix is invertible. But let's not get into that here.
[1] https://en.wikipedia.org/wiki/Time_(novel)
Edit: One of their white papers mentions " Application Framework: A PyTorch-compatible interface supports both AI applications and general-purpose computing, ensuring versatility without sacrificing performance."
This idea of reversible computing was new to me. I didn’t know it was even possible to run computations “backwards” to save power. It’s interesting that slowing things down might actually save more energy in the long run. I’ll definitely be reading more about this.