GMP damaging Zen 5 CPUs?

143 sequin 93 8/27/2025, 4:24:12 PM gmplib.org ↗

Comments (93)

T-A · 16m ago
A quick search on the NH-U9S shows it's a compact cooler for small systems, rated for up to 140 W (see e.g. [1]).

The 9950X's TDP (Thermal Design Power) is 170 W, its default socket power is 200 W [2], and with PBO (Precision Boost Overdrive) enabled it's been reported to hit 235 W [3].

[1] https://www.overclockersclub.com/reviews/noctua_nh_u9s_cpu_c...

[2] https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

[3] https://www.tomshardware.com/pc-components/cpus/amd-ryzen-9-...

stouset · 6m ago
That’s a good catch, but don’t modern CPUs thermally throttle, rather than risk damage? Not that you should rely on this with an underpowered cooling solution but I would expect worse performance, not a fried chip.
topspin · 5h ago
As there is ongoing drama with Zen 5 and power issues, there are people with the instruments and the motivation to investigate this. You should consider contacting Gamers Nexus, and help them to get your test suite running. They can measure power draw and do a thermal analysis of this CPU, and they'd likely be eager to do it, given the possibility of making a bunch of dramatic YouTube content about design flaws in widely used hardware. That's pretty much their whole schtick in recent years.

> Modern CPUs measure their temperature and clock down if they get too hot, don't they?

Yes. It's rather complex now and it involves the motherboard vendor's firmware. When (not if) they get that wrong CPUs burn up. You're going to need some expertise to analyze this.

MrGilbert · 4h ago
> [...] a bunch of dramatic YouTube content [...]

That framing doesn't do him and the team justice. There is (or better, was) a 3.5h long story about NVIDIA GPUs finding their ways illegaly from the US to China, which got taken down by a malicious DMCA claim from Bloomberg. It is quite interesting to watch (Can be found archive.org).

GN is one of the last pro-consumer outlets, that keep on digging and shaking the tree big companys are sitting on.

topspin · 4h ago
For the record, I think GN is excellent and highly credible.
themafia · 2h ago
nerdsniper · 4h ago
Wendell at Level1Techs often goes more in-depth on the software testing and datacenter use-case analysis through partnerships with friends who run lots of machines in datacenters.

GN is unique in paying for silicon-level analysis of failures.

der8auer also contributes a lot to these stories.

I tend to wait for all 3 of their analyses, because each adds a different "hard-won" perspective.

fxtentacle · 5h ago
He's a bit sensationalist, yes, but I am thankful that he saved us from buying affected Intel CPUs.
bayindirh · 3h ago
He's a "student" and friend of late Gordon Mah Ung. He's carrying his torch forward.

This was Gordon's style, and Steve is continuing it. He has the courage to hit Bloomberg offices with a cameraman, so I don't think his words ring hollow.

We need that kind of in your face, no punches held back type of reporting when compared to "measured professionals".

mft_ · 1h ago
Absolutely - this is the sort of direct citizen journalism I expect (sort of hope?) we'll see more and of as traditional investigative journalism dies its slow death.
tpurves · 3h ago
Yes. When he's right, he's right. However the main issue I have with GN is how Steve tends to go full Leeroy Jenkins pitchforks and torches for 9 out of every 5 actual scandals in the tech industry.
CaptainBanger · 25m ago
I felt the same way, but over time I have come to respect those with the Crusader personality archetype, we need these people to do their thing and they need us to balance them out.
spookie · 5h ago
Not sure of sensationalist or just doing great reporting. I take him as one of the last good tech journalists on the platform.
hnuser123456 · 4h ago
GN wasn't the first to break the story the 13/14th gen was defective. The thousands and thousands of users experiencing the issues collectively noticed pretty quick. If anything, there was a period where he was saying "We've talked to Intel but we won't say anything yet until they do."
BoorishBears · 4h ago
Dude has a cult of personality going and I've learned not to question it.

In general PC enthusiasts have always treat it these corporations bit like sports teams.

wiredpancake · 39m ago
The only real problem with GN is Steve is a bit of an egotist when it comes to content creators who do less technical analysis, like LTT or Jayz.

He never really got over the stuff with Linus and doubled down on stupid things. I think they both have a great place in the tech scene and LTT's videos of recent have been a lot better quality and researched then yesteryear.

trebligdivad · 3h ago
They don't say what temperature the CPU was reporting which seems like an odd omission. Whatever the specs of your cooler etc check the temperature it's actually running at. Go by what the CPU is saying! I've got the older 3950x, and the first one died after a few months (still in warranty) with a cooler in spec, but it would go into the 90s at full load just doing big builds. I replaced the heatsink with a basic watercooler when the replacement chip arrived and it's running at least 20c cooler at full load.
db48x · 5h ago
> The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight.

TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.

https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...

mrb · 48m ago
You are correct. In fact these guys measured a maximum socket power consumption of 240 watt using a 9950X at stock settings, running prime95. So far above the "170 watt" TDP:

https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

bayindirh · 3h ago
When I see the term TDP, I remember what I have read in the "Thermal Design Document" of Intel Core2Quad Q6600 and the family it belongs:

> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).

I never used the stock cooler bundled with the processor, but what kind of dark joke is this?

lofaszvanitt · 33m ago
I always used the stock cooler, because it's quiet and nothing uses the cpu to its fullest :).
aidenn0 · 3h ago
I have a 65W TDP CPU, and the difference in power draw (measured at the outlet) from idle to full CPU load is over 100W; it seems to just raise the clock until it hist 95C, so if I limit the CPU fan's top speed, the power draw goes down.
db48x · 2h ago
Yep. Modern CPUs continually adjust their clock multiplier based on what their temperature is doing, plus a few timers. If you have a better cooler then you’ll get more performance out of the same CPU, but at the cost of drawing more power and producing more heat.
einpoklum · 4h ago
Wow, I can't believe how BS this TDP is! I feel like a total idiot! I've always assumed it's sorta-kinda a tight upper bound on power consumption, perhaps with some allowance for "imperfections" in the dissipation properties of the CPU, and that I shouldn't sweat the details.

Couldn't this count as false/misleading advertizing though?

gruez · 4h ago
It's thermal design power, ie. it's the power that it's designed for, not absolute max.
db48x · 4h ago
No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
gruez · 4h ago
>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?

db48x · 2h ago
Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
gruez · 59m ago
>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).

wahern · 3h ago
That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.

That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.

db48x · 2h ago
No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.

But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.

I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.

kllrnohj · 4h ago
> Couldn't this count as false/misleading advertizing though?

For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?

vel0city · 4h ago
Its pretty insane to see someone say something like: “TDP is about thermal watts, not electrical watts. These are not the same.” Watts are watts.

But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.

tux3 · 5h ago
The room temperature or precise way the paste was applied should not matter. Modern CPUs have very advanced dynamic voltage and frequency scaling (DVFS), which accounts for several sensors, including temperature.

These big x86 CPUs in stock configuration can throttle down to speeds where they can function with entirely passive cooling, so even if the cooler was improperly mounted, they'd only throttle.

All that to say, if GMP is causing the CPU to fry itself, something went very wrong, and it is not user error or the room being too hot.

mk_stjames · 4h ago
This was my first question as well- I thought it had been a long, long time since you could fry a CPU by taking away the heatsink.

As in... what, AMD K6 / early Pentium 4 days was the last time I remember hearing about cpu cooler failing and frying a cpu?

Twirrim · 3h ago
It was some time around then. I remember AMD being late to it vs Intel.
themafia · 2h ago
If the throttling is not stable it could increase stress on the part by creating a bunch of transient but large thermal cycles through the chip. It would need to have some kind of exponential backoff on throttle so it doesn't immediately try to raise the frequencies when the temperature slightly dips.
secabeen · 5h ago
I would be interested to see if they had the same result with PTM7950 thermal material instead of paste. I've seen significantly better temps with these modern phase-change compounds, and they essentially eliminate application errors.
FuriouslyAdrift · 4h ago
Most likely it's the motherboard. ASRock is getting nailed right now for unstable XMP and CPU voltages (it's recommended to undervolt a little just in case).

The Asus Prime B650M motherboards they are using aren't exactly high end.

wmf · 3h ago
Yikes, this is the cheapest motherboard and failed Hardware Unboxed VRM tests. https://youtu.be/DTFUa60ozKY?t=744
caycep · 2h ago
conversely the asrocks actually did pretty good in that test...
J_Shelby_J · 4h ago
My friend just had an ASRock board cook his AMD CPU. Apparently a very common problem.
aidenn0 · 3h ago
Can you link to a reputable source for what settings I should use on my asrock motherboard? I'd like to avoid this.
FuriouslyAdrift · 2h ago
No more than 1.2 volts on vsoc... but YMMV.

"According to new details from Tech Yes City, the problem stems from the amperage (current) supplied to the processor under AMD's PBO technology. Precision Boost Overdrive employs an algorithm that dynamically adjusts clock speeds for peak performance, based on factors like temperature, power, current, and workload. The issue is reportedly confined to ASRock's high-end and mid-range boards, as they were tuned far too aggressively for Ryzen 9000 CPUs."

https://www.tomshardware.com/pc-components/cpus/asrock-attri...

FuriouslyAdrift · 2h ago
kvemkon · 4h ago
And the close-up photos of the socket with pins are missing.
craftkiller · 5h ago
Looking at the AM5 pinout[0], it looks like those pins are VDDCR and VSS. There might be a little bit of PCIe sprinkled in towards the outer edges, but I'm not 100% on the orientation of this pinout vs the orientation of the CPU. I don't know anything about electricity so I've got nothing else to add.

[0] https://upload.wikimedia.org/wikipedia/commons/2/2d/Socket_A...

raverbashing · 4h ago
This is a nice guess but the likelihood that actual silicon area is closely connected to the pins in that area is not so obvious
nsteel · 3h ago
Isn't almost every other pin going to be power/ground on a high-power chip like this? On both the package and the die.
bob1029 · 5h ago
Could be the power supply and load profile?

I've heard some really wild noises coming out of my zen4 machine when I've had all cores loaded up with what is best described as "choppy" workloads where we are repeatedly doing something like a parallel.foreach into a single threaded hot path of equal or less duration as fast as possible. I've never had the machine survive this kind of workload for more than 48 hours without some kind of BSOD. I've not actually killed a cpu yet though.

bee_rider · 5h ago
Is that, like, an intentional stress-test for the hardware that you’ve come up with?
bob1029 · 3h ago
No. It is just how the algorithms play out:

1. Evaluate population of candidates in parallel

2. Perform ranking, mutation, crossover, and objective selection in serial

3. Go to 1.

I can very accurately control the frequency of the audible PWM noise by adjusting the population size.

fxtentacle · 5h ago
"We suspect that GMP's extremely tight loops around MULX make the Zen 5 cores use much more power than specified, making cooling solutions inadequate."

I feel like if this was heat related, the overall CPU temperature should still somewhat slowly creep up, thereby giving everything enough time for thermal throttling. But their discoloration sure looks like a thermal issue, so I wonder why the safety features of the CPU didn't catch this...

touisteur · 4h ago
I'm guessing the temperature could increase quite fast (milliseconds or less) in heavy duty areas, especially when going scalar-to-dense-vector operations.

My best understanding of the avx-512 'power license' debacle on Intel CPUs was that the processor was actually watching the instruction stream and computing heuristics to lower core frequency before reaching avx512 or dense-avx2 instructions. I guessed they knew or worried that even a short large-vector stint would fry stuff...

Apparently voltage and thermal sensor have vastly improved and looking at the crazy swings on NVIDIA GPU's clocks seem to agree with this :-)

jeffbee · 5h ago
Are we talking "slowly" in a relative sense? A silicon die of this size has a thermal mass (guessing) around 10⁻³ J/K but a power dissipation rate over 200W, so it can rise from room temperature to junction temperature limits almost instantly.
topspin · 5h ago
People without a background in electronics don't appreciate what modern CPUs and GPUs are doing: the amount of current flowing through these devices is just mind blowing. With adequate cooling, a Ryzen 9 9950X is handling somewhere in the neighborhood of 150-200 amps under high load.
nisegami · 4h ago
I initially scoffed at the 150-200 amps. But I know core voltage is usually in the neighbourhood of 1V so to draw 200W, you really would have to basically be moving 200A of current. That's wild.
mlyle · 4h ago
Yup. P=IV is really surprising when you get to high power parts at low core voltages. Needless to say, you need lots of transistors and phases on voltage conversion, and you need lots and lots of plane area.

(And,... 200A is the average when dissipating 200W. So how high are the switching currents? ;)

wtallis · 4h ago
AMD's desktop CPUs are still running at a bit more than 1V; 1.3-1.4V is what you'll see at the high end of the clock speed range. But power draw can easily be in the 250–300W range if you turn on the "PBO" automatic overclocking mode, so 200A is not really the upper bound.
jeffbee · 2h ago
What's really wild is with all the power scaling features the regulators have to step from zero to hundreds of amps in microseconds with very little overshoot. The power design for these modern systems is demanding.
nromiun · 5h ago
How is that possible? Even if the chip did not get enough cooling it should have been just throttled heavily.
jsheard · 4h ago
Modern silicon is so dense and heats up so fast that throttling is easier said than done. I think they have to model and predict the thermals ahead of time nowadays, because by the time they could react to a temp sensor alone, the chip might already be toast.
tliltocatl · 4h ago
Maybe the throttling circuitry/firmware simply doesn't have enough time to react.
giantg2 · 2h ago
Not that it makes a huge difference since they are supposed to downclock when hot, but what was the actual cooler being used? It doesn't say in the article. My guess is that it's aircooled being only 165W max, but aircooled is not recommended for most newer high end CPUs.
mastax · 4h ago
Enthusiast-oriented motherboards often default enable Precision Boost Overdrive, causing higher power and temperature limits for longer periods. To run the CPU at “stock” you need to go in and disable that. Their default Load Line Calibration might be aggressive as well.
tester756 · 5h ago
My Ryzen CPU recently died too! wtf
LASR · 1h ago
Zen5?
FuriouslyAdrift · 4h ago
ASRock motherboard?
tester756 · 3h ago
Gigabyte
gpapilion · 3h ago
Gradual damage is consistent with over heating. I've seen racks of servers do the same thing.

Overall, there is a continued challenge with CPU temperatures that requires much tighter tolerances both in the thermal solution. The torque specs need to be followed and verified that they were met correctly in manufacturing.

wrs · 4h ago
No actual die temperature measurements? That would seem a lot more relevant than the ambient temperature.
wtallis · 4h ago
Die temperature readings aren't particularly helpful these days with desktop parts that will (depending on the power management settings) more or less keep increasing the clock speed until they reach ~90°C and just stay there. Upgrading from a bad/undersized heatsink can easily have only a tiny effect on temperature but have the effect of significantly increasing clock speed and power.
mqus · 43m ago
Aren't they at least useful for ruling out any anomalies there? Like the die temp being 110°C constantly? Imho the die temperature is very important here, even if not interesting.
tw04 · 5h ago
That looks like a combination of improperly mounting the heatsink and noctuna being wrong in their recommendation to offset it. I’d imagine for gaming cooling one side more makes sense but my completely uneducated guess is that GMP is working a different part of the CPU than gaming does.
toast0 · 5h ago
They had failures with standard mounting and offset mounting.

Also, take a look at a delidded 9950; the two cpu chiplets are to one side, the i/o chiplet is in the middle, and the other side is a handful of passives. Offsetting the heatsink moves the center of the heatsink 7mm towards the chiplets (the socket is 40mm x 40mm), but there's still plenty of heatsink over the top of the i/o chiplet.

This article has some decent pictures of delidded processors https://www.tomshardware.com/pc-components/overclocking/deli...

jsheard · 5h ago
This is what Zen5 looks like under the IHS: https://i.imgur.com/j85YUzX.jpeg

Everything is offset towards one side and the two CPU core clusters are way towards the edge, offset cooling makes sense regardless of usage.

pharrington · 4h ago
I'd assume both GMP and any CPU intensive game just prefer the performance cores.
jsheard · 4h ago
AMDs desktop chips don't have distinct P and E cores, they're all P cores. AMD do have an E core design but it's currently only used in mobile and server parts.
pharrington · 3h ago
Gotcha. Apparently Intel's marketing's gotten to me. I haven't really been keeping up with this stuff, so whenever I read about P & E cores in the past, I think I just assumed that was a thing both Intel & AMD were doing, without considering the source material too closely.
wtallis · 1h ago
AMD has definitely been moving in that direction, and arguably doing a better job of it than Intel. But for now, AMD's desktop parts are still built with the same CPU core chiplets as their server parts, and none of the server parts are using heterogenous cores yet (from AMD or Intel). At some point AMD could theoretically build a desktop processor from one Zen chiplet and one Zen-c chiplet, but there hasn't been a good reason to do that yet.
caycep · 4h ago
I wonder if the risk is mitigated if you turn off PBO and turn on Eco Mode?
on_the_train · 4h ago
What is gmp?
gus_massa · 4h ago
From https://gmplib.org/#WHAT

> What is GMP?

> The GNU Multiple Precision Arithmetic Library

> GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

Many languages use it to implement long integers. Under the hood, they just call GMP.

IIUC the problem is related to the test suit, that is probably very handy if you ever want to fry an egg on top of your micro.

protomikron · 4h ago
Valid question i think in this context. I knew about GNU multiprecision library, but thought that couldnt be it, as it's "just" a highly optimized low level bit fiddling lib (at least thats my expectation without looking into the source), so it's strange why it could be damaging Hardware ...
kgwgk · 4h ago
The domain has the answer: https://gmplib.org/
beezle · 4h ago
At first I thought it was Green Mountain Power ;)
lloydatkinson · 4h ago
One day I’ll understand why some websites refuse to have a way of navigating to the home page. I had to edit the URL in the address bar.

I just wanted to find out what GMP is.

mjh2539 · 2h ago
arbitrary-precision/bignum library