Google's Liquid Cooling

97 giuliomagnifico 36 8/25/2025, 5:57:18 PM chipsandcheese.com ↗

Comments (36)

jonathaneunice · 1h ago
It’s very odd when mainframes (S/3x0, Cray, yadda yadda) have been extensively water-cooled for over 50 years, and super-dense HPC data centers have used liquid cooling for at least 20, to hear Google-scale data center design compared to PC hobbyist rigs. Selective amnesia + laughably off-target point of comparison.
legulere · 27m ago
It's not so surprising when considering googles history coming from inexpensive commodity hardware. It's pretty similar to how it took decades for x86 servers and operating systems to gain mainframe functionality like virtualisation.

https://blog.codinghorror.com/building-a-computer-the-google...

spankalee · 59m ago
From the article:

> Liquid cooling is a familiar concept to PC enthusiasts, and has a long history in enterprise compute as well.

And the trend in data centers was to move towards more passive cooling at the individual servers and hotter operating temperatures for a while. This is interesting because it reverses that trend a lot, and possibly because of the per-row cooling.

dekhn · 27m ago
We've basically been watching Google gradually re-discover all the tricks of supercomputing (and other high performance areas) over the past 10+ years. For a long time, websearch and ads were the two main drivers of Google's datacenter architecture, along with services like storage and jobs like mapreduce. I would describe the approach as "horizontal scaling with statistical multiplexing for load balancing".

Those style of jobs worked well but as Google has realized it has more high performance computing with unique workload characteristics that are mission-critical (https://cloud.google.com/blog/topics/systems/the-fifth-epoch...) their infrastructure has had to undergo a lot of evolution to adapt to that.

Google PR has always been full of "look we discovered something important and new and everybody should do it", often for things that were effectively solved using that approach a long time ago. MapReduce is a great example of that- Google certainly didn't invent the concepts of Map or Reduce, or even the idea of using those for doing high throughput computing (and the shuffle phase of MapReduce is more "interesting" from a high performance computing perspective than mapping or reducing anyway).

liquidgecka · 1m ago
As somebody that worked on Google data centers after coming from a high performance computing world I can categorically say that Google is not “re-learning” old technology. In the early days (when I was there) they focused heavily on moving from thinking of computers to thinking of compute units. This is where containers and self contained data centers came from. This was actually a joke inside of Google because it failed but was copied by all the other vendors for years after Google had given up on it. They then moved to stop thinking about cooling as something that happens within a server case to something that happens to a whole facility. This was the first major leap forward where they moved from cooling the facility and pushing conditioned air in to cooling the air immediately behind the server.

Liquid cooling at Google scale is different than mainframes as well. Mainframes needed to move heat from the core out to the edges of the server where traditional data center cooling would transfer it away to be conditioned. Google liquid cooling is moving the heat completely outside of the building while it’s still liquid. That’s never been done before as far as I am aware. Not at this scale at least.

echelon · 33m ago
This abundance of marketing (not necessarily this blog post) is happening because of all the environmental chatter about AI and data centers recently.

Google wants you to know it recycles its water. It's free points.

Edit: to clarify, normal social media is being flooded with stories about AI energy and water usage. Google isn't greenwashing, they're simply showing how things work and getting good press for something they already do.

Legend2440 · 26m ago
The environmental impact of water usage seems way overblown to me.

Last year, U.S. data centers consumed 17 billion gallons of water. Which sounds like a lot, but the US as a whole uses 300 billion gallons of water every day. Water is not a scarce resource in much of the country.

foota · 9m ago
Personally, I agree. That said, I think it might be worth considering the impact of water usage in local areas that are relatively water constrained.
mlyle · 29m ago
Google uses plenty of water in cooling towers and chillers. Sure, water cooling loops to the server may reduce the amount a little bit compared to fans, but this isn't "recycling" in any normal sense.

It's mostly a play for density.

jeffbee · 1h ago
Hyper-scale data centers normally need not be concerned with power density, and their designers might avoid density because of the problems it causes. Arguably modern HPC clusters that are still concerned about density are probably misguided. But when it comes to ML workloads, putting everything physically close together starts to bring benefits in terms of interconnectivity.
jonathaneunice · 16m ago
LOLWUT? Hyperscalers and HPC data centers have been very concerned about power and thermal density for decades IME. If you're just running web sites or VM farms, sure keep racks cooler and more power efficient. But for those that deeply care about performance, distance drives latency. That drives a huge demand to "pack things closely" and that drives thermal density up, up, up. "Hot enough to roast a pig" was a fintech data center meme of 20 years go, back at 20kW+ racks. Today you're not really cooking until you get north of 30kW.
jeffbee · 46s ago
Name a hyper-scale data center where the size and shape suggests that power density ever entered the conversation.
m463 · 11m ago
I wonder what the economics of water cooling really is.

Is it because chips are getting more expensive, so it is more economical to run them faster by liquid cooling them?

Or is it data center footprint is more expensive, so denser liquid cooling makes more sense?

Or is it that wiring distances (1ft = 1nanosecond) make dense computing faster and more efficient?

MurkyLabs · 3m ago
It's a mixture of both 2 and 3. The chips are getting hotter because they're compacting more stuff in a small space and throwing more power into them. At the same time, powering all those fans that cool the computers takes a lot of power (when you have racks and racks those small fans add up quickly) and that heat is then blown into hot isles that need to then circulate the heat to A/C units. With liquid cooling they're able to save costs due to lower electricity usage and having direct liquid to liquid cool as apposed to chip->air->AC->liquid. ServeTheHome did a write up on it last year, https://www.servethehome.com/estimating-the-power-consumptio...
summerlight · 5m ago
Not sure about classical computing demands, but I think wiring distances definitely matter for TPU-like memory heavy computation.
moffkalast · 4m ago
It's more of a testament to inefficiency, with rising TDP year after year as losses get larger with smaller nm processes. It's so atrocious, even in the consumer sector Nvidia can't even design a connector that doesn't melt during normal usage because their power draw has become beyond absurd.

People don't really complain about crappy shovels during a gold rush though unfortunately, they're just happy they got one before they ran out. It's a horrid situation all around.

michaelt · 1h ago
> TPU chips are hooked up in series in the loop, which naturally means some chips will get hotter liquid that has already passed other chips in the loop. Cooling capacity is budgeted based on the requirements of the last chip in each loop.

Of course, it's worth noting that if you've got four chips, each putting out 250W of power, and a pump pushing 1 litres of water per minute through them, water at the outlet must be 14°C hotter than water at the inlet, because of the specific heat capacity of water. That's true whether the water flows through the chips in series, or in parallel.

foota · 1h ago
Hm... but in the case when the chips are in serial, the heat transfer from the last chip will be less than when the chips are in parallel, because the rate of heating is proportional to the difference in temperature, and the water starts at a lower temperature for the parallel case for this last theoretical chip.
fraserphysics · 48m ago
One way to characterize the cost of cooling is entropy production. As you say, cooling is proportional to difference in temperature. However, entropy production is also proportional to temperature difference. It's not my field, but it looks like an interesting challenge to optimize competing objectives.
friendzis · 36m ago
While there is some truth to your comment, it has no practical engineering relevance. Since energy transfer rate is proportional to temp difference, therefore you compute the flow rate required, which is going to be different if the chips are in series or in parallel.
idiotsecant · 47m ago
It just means in series that some of your chips get overcooled in order to achieve the required cooling on the hottest chip. You need to run more water for the same effect.
k7sune · 34m ago
I can imagine a setup where multiple series of slower cooler water converging into a faster warmer stream, and the water will extract an equal amount of heat away from all the chips whether upstream or downstream.
betaby · 1h ago
bri3d · 1h ago
Your linked articles are about immersion cooling, which is "liquid cooling," I suppose, but a very different thing. Do OVH actually use immersion cooling in production? This seems like a "labs" product that hasn't been fully baked yet.

OVH _do_ definitely use traditional water-loop/water-block liquid cooling like Google are presenting, described here: https://blog.ovhcloud.com/understanding-ovhclouds-data-centr... and visually in https://www.youtube.com/watch?v=RFzirpvTiOo , but while it looks very similar architecturally their setup seems almost comically inefficient compared to Google's according to their PUE disclosures.

jeffbee · 1h ago
And yet their claimed PUE is 1.26 which is pretty bad. One way to characterize that overall PUE figure is they waste 3x as much on overhead as Google (claimed 1.09 global PUE) or Meta (1.08).
BoppreH · 1h ago
I see frequent mentions of AI wasting water. Is this one such setup, perhaps with the CDU using the facility's water supply for evaporative cooling?
bri3d · 51m ago
The CDU is inside the datacenter and strictly liquid to liquid exchange. It transfers heat from the rack block's coolant to the facility coolant. The facility then provides outdoor heat exchange for the facility coolant, which is sometimes accomplished using open-loop evaporative cooling (spraying down the cooling towers). All datacenters have some form of facility cooling, whether there's a CDU and local water cooling or not, so it's not particularly relevant.

The whole AI-water conversation is sort of tiring, since water just moves to more or less efficient parts or locations in the water cycle - I think a "total runtime energy consumption" metric would be much more useful if it were possible to accurately price in water-related externalities (ie - is a massive amount of energy spent moving water because a datacenter evaporates it? or is it no big deal?). And the whole thing really just shows how inefficient and inaccurately priced the market for water is, especially in the US where water rights, price, and the actual utility of water in a given location are often shockingly uncorrelated.

maartin0 · 27m ago
Lerc · 46m ago
I have encountered a lot of references to AI using water, but with scant details. Is it using water in the same way a car uses a road? The road remains largely unchanged?

The implication is clear that it is a waste, but I feel like if they had the data so support that, it wouldn't be left for the reader to infer.

I can see two models where you could say water is consumed. Either talking about drinkable water rendered undrinkable, or turning water into something else where it is not practically recaptured. Tuning it into steam, sequestering it in some sludge etc.

Are these things happening? If it is happening, is it bad? Why?

I'd love to see answers on this, because I have seen the figures used like a kudgel without specifying what the numbers actually refer to. It's frustrating as hell.

tony_cannistra · 1m ago
This article will help you. https://www.construction-physics.com/p/how-does-the-us-use-w...

> ...actual water consumed by data centers is around 66 million gallons per day. By 2028, that’s estimated to rise by two to four times. This is a large amount of water when compared to the amount of water homes use, but it's not particularly large when compared to other large-scale industrial uses. 66 million gallons per day is about 6% of the water used by US golf courses, and it's about 3% of the water used to grow cotton in 2023.

jeffbee · 27m ago
The water is "used" in the sense that it evaporates. At a global average rate of 1 liter per kilowatt-hour of energy, Google claims.
sleepydog · 43m ago
AWS had a similar article a couple months ago:

https://www.aboutamazon.com/news/aws/aws-liquid-cooling-data...

In either case I cannot find out how they dump the heat from the output water before recycling it. That's a problem I find far more interesting.

jeffbee · 48m ago
The reason you see frequent mentions of AI wasting water is there is a massive influence operation that seeks to destroy knowledge, science, and technology in the United States and return most of us to a state of bare subsistence, working menial jobs or reduced to literal slavery. There is no subjective measure by which the water used by AI is even slightly concerning.
wredcoll · 37m ago
I looked very hard but I don't see a way to subscribe to your newsletter?
stripe_away · 32m ago
> there is a massive influence operation that seeks to destroy knowledge, science, and technology in the United States

Agreed. Started with big tobacco by discrediting the connection to lung cancer, playbook copied by many and weaponized by Russia.

> There is no subjective measure by which the water used by AI is even slightly concerning.

Does not follow from your first point. The water has to be sourced from somewhere, and debates over water rights are as old as civilization. For one recent example, see i.e. https://www.texaspolicy.com/legewaterrights/

You are probably correct that the AI does not damage the water, but unless there are guarantees that the water is rapidly returned "undamaged" to the source, there are many reasons to be concerned about who is sourcing water from where.

tom_ · 2m ago
Bro.