> Once network IO became the main bottleneck, the faster CPU mattered less.
It is surprisingly hard to keep a modern CPU core properly saturated unless you are baking global illumination, searching primes or mining crypto currency. I/O and latency will almost always dominate at scale. Moving information is way more expensive than processing it.
theamk · 5m ago
[delayed]
nine_k · 2h ago
Big machines already are more like clusters (NUMA), where access to memory outside a core's domain is much slower. I suspect compute will be more and more dispersed within RAM.
Transputers just came 30+ years too early.
furkansahin · 2h ago
Certainly! That was the main reason why we decided to try postgres benchmarking tbh
malux85 · 34m ago
Or running molecular simulations, I can keep our whole cluster pegged at 100% CPU for weeks
yieldcrv · 2m ago
I've heard generative AI is better at that, or determining configurations and folding patterns
But I figure it is a broad field, so I'm curious what you're doing and if it is the best use of time and energy
I'm also assuming that the generative AI model wouldn't run on your machine well and need to be elsewhere
cma · 2h ago
Some major game engines still have large single thread bottlenecks. They aren't fundamental to the problem space though, more just from legacy engine design decisions.
SkiFire13 · 1h ago
Being single thread bottlenecked doesn't mean they are actually saturating a CPU core, it may likely be waiting for data from RAM for a lot if not most of the time.
NoMoreNicksLeft · 22s ago
The solution is simple. A bog-standard RISC instruction set, with 64 billion 256-bit registers. The operating system will come on the die, etched into the silicon. There will be no ram and no storage.
Aurornis · 1h ago
AMD’s fastest consumer CPUs are a great value for small servers. If you’re doing just one task (like in this article) the clock speed is a huge benefit.
The larger server grade parts start to shine when the server is doing a lot of different things. The extra memory bandwidth helps keep the CPU fed and the higher core count reduces the need for context switching because your workloads aren’t competing as much.
The best part about the AMD consumer CPUs is that you can even use ECC RAM if you get the right motherboard.
AnimalMuppet · 43m ago
Wait, what? ECC RAM for a consumer CPU? Does anyone sell motherboards like that?
kllrnohj · 35m ago
Literally first Asrock motherboard I happened to click on has it listed as a feature:
I think it was more rare when AM5 first came out, there were a bunch of ECC supported consumer boards for AM4 and threadripper.
dcm360 · 36m ago
Asrock Rack and Supermico sell AM4/AM5 motherboards with support for ECC UDIMMs. Other vendors might state official support on workstation-class motherboards, and in general it might work even on motherboards without official support.
If you are running engineering jobs (HPC) like electrical simulation for chip design the only two thingd you care about are the CPU clock speed and memory RW speed.
It's unfortunate that we can only have 16 core CPUs running at 5+ GHz. I would have loved to have a 32 or 64 core Ryzen 9. The software we use charge per core used, so 30% less performance is that much extra cost, which is easily an order of magnitude higher than a flagship server CPU. These licenses cost millions per year for couple 16 core seats.
So, at the end, CPU speed is determining how fast abd economically chips are developed.
CBLT · 17m ago
I'm in disbelief that the software you run is completely insensitive to IPC and instruction latency. Without those number, clock speed is meaningless.
adamcharnock · 4h ago
This is certainly interesting, and is something I often wonder about. In our case we mostly run Kubernetes clusters on EX130 servers (Xeon, 24c, 256GB). In this situation there are a lot of processes running, for which increased cores-count and memory availability seems worth it. Particularly when we have the cost of 10-25g private networking for each server, lower node counts seems to come out being more economical.
But, but with fewer processes I can totally believe this works out to be the better option. Thank you for the write-up!
furkansahin · 4h ago
Thanks for the comment! Yeah, if you are using the servers dedicated to yourself and considering the larger server packs more, it definitely makes sense.
In our case, though, if we provide 1/48 of 10Gbit network, it really doesn't work for our end customers. So, we're trying to provide the VMs from smaller but more up-to-date lineup.
justsomehnguy · 1h ago
> In this situation there are a lot of processes running, for which increased cores-count and memory availability seems worth it.
It's always the workload type. For the mixed environments (some with a heavy constant load while the other have only some occasional spikes) the increase of RAM per node is the most important part what allowed us to actually decrease the node count. The whole racks with multiple switches was replaced by a single rack with a modest amount of servers and a single stacked switch.
theyinwhy · 23m ago
Unfortunately, the page's numbers are represented in a sloppy way. A benchmark number with a dollar sign. Different job counts. Lacking documentation. I wouldn't trust this data too much.
mrandish · 33m ago
As a consumer who nursed an overclocked 1080ti along for 2.5 gens longer than I would've liked thanks to crypto and then AI, I was reading this fearing a positive conclusion - thinking "Oh great, just when it's time to upgrade my 5600x CPU data centers will start driving up already over-priced consumer CPUs too."
Although said somewhat tongue in cheek, it has been a rough several years for tech hobbyist consumers. At least the end of Moore's law scaling and the bite of Dennard scaling combined to nerf generational improvements enough that getting by on existing hardware wasn't as nearly as bad as it would've been 20 yrs ago.
Now that maybe the AI bubble is just starting to burst, we've got tariffs to ensure tech consumers still won't see undistorted prices. The silver lining in all this is that it got me into retro gaming and computing which, frankly, is really great.
chicagojoe · 1h ago
Consumer grade CPUs aren't meant to be pushed with heavy load 24/7, meaning, durability becomes another variable which, in my experience, will quickly outweigh the brief burst of speed.
toast0 · 1h ago
AMD uses the same chiplets for Epyc and Ryzen. The packaging is different, and the I/O dies are different, but whatever.
If you really care, you can buy an Epyc branded AM4 or AM5 cpu which has remarkably similar specifications and MSRP to Ryzen offerings.
wmf · 1h ago
If your software can handle machine failures, 20% extra performance is absolutely worth some extra failures.
bob1029 · 19m ago
I think this is the best path if your problem can support it.
I use a 5950X for running genetic programming and neuroevolution experiments and about once every 100 hours the machine will just not like the state/load it is experiencing and will restart. My approach is to checkpoint as often as possible. I restart the program the next morning and it deserializes the last snapshot from disk. Worst case, I lose 5 minutes of work.
This also helps with Windows updates, power outages, and EM/cosmic radiation.
magicalhippo · 4h ago
Well from what I can see the faster one also has 3D V-cache, so not apples to apples.
That said, at such low core count the primary Epyc advantage is PCIe lanes no?
furkansahin · 4h ago
Yes, you're right but we tried to keep the workloads less cache dependent.
Also, EPYC's PCIe advantage doesn't hold for the Hetzner provided server setup unfortunately because the configurator allows the same number of devices to be attached to both servers.
It is surprisingly hard to keep a modern CPU core properly saturated unless you are baking global illumination, searching primes or mining crypto currency. I/O and latency will almost always dominate at scale. Moving information is way more expensive than processing it.
Transputers just came 30+ years too early.
But I figure it is a broad field, so I'm curious what you're doing and if it is the best use of time and energy
I'm also assuming that the generative AI model wouldn't run on your machine well and need to be elsewhere
The larger server grade parts start to shine when the server is doing a lot of different things. The extra memory bandwidth helps keep the CPU fed and the higher core count reduces the need for context switching because your workloads aren’t competing as much.
The best part about the AMD consumer CPUs is that you can even use ECC RAM if you get the right motherboard.
https://www.asrock.com/mb/AMD/X870%20Taichi%20Creator/index....
Asus has options as well such as https://www.asus.com/motherboards-components/motherboards/pr...
I think it was more rare when AM5 first came out, there were a bunch of ECC supported consumer boards for AM4 and threadripper.
It's unfortunate that we can only have 16 core CPUs running at 5+ GHz. I would have loved to have a 32 or 64 core Ryzen 9. The software we use charge per core used, so 30% less performance is that much extra cost, which is easily an order of magnitude higher than a flagship server CPU. These licenses cost millions per year for couple 16 core seats.
So, at the end, CPU speed is determining how fast abd economically chips are developed.
But, but with fewer processes I can totally believe this works out to be the better option. Thank you for the write-up!
In our case, though, if we provide 1/48 of 10Gbit network, it really doesn't work for our end customers. So, we're trying to provide the VMs from smaller but more up-to-date lineup.
It's always the workload type. For the mixed environments (some with a heavy constant load while the other have only some occasional spikes) the increase of RAM per node is the most important part what allowed us to actually decrease the node count. The whole racks with multiple switches was replaced by a single rack with a modest amount of servers and a single stacked switch.
Although said somewhat tongue in cheek, it has been a rough several years for tech hobbyist consumers. At least the end of Moore's law scaling and the bite of Dennard scaling combined to nerf generational improvements enough that getting by on existing hardware wasn't as nearly as bad as it would've been 20 yrs ago.
Now that maybe the AI bubble is just starting to burst, we've got tariffs to ensure tech consumers still won't see undistorted prices. The silver lining in all this is that it got me into retro gaming and computing which, frankly, is really great.
If you really care, you can buy an Epyc branded AM4 or AM5 cpu which has remarkably similar specifications and MSRP to Ryzen offerings.
I use a 5950X for running genetic programming and neuroevolution experiments and about once every 100 hours the machine will just not like the state/load it is experiencing and will restart. My approach is to checkpoint as often as possible. I restart the program the next morning and it deserializes the last snapshot from disk. Worst case, I lose 5 minutes of work.
This also helps with Windows updates, power outages, and EM/cosmic radiation.
That said, at such low core count the primary Epyc advantage is PCIe lanes no?
Also, EPYC's PCIe advantage doesn't hold for the Hetzner provided server setup unfortunately because the configurator allows the same number of devices to be attached to both servers.