Benchmark Framework Desktop Mainboard and 4-node cluster

84 geerlingguy 8 8/7/2025, 5:49:49 PM github.com ↗

Comments (8)

mhitza · 9m ago
I've ran a comparison benchmark for the smaller models https://gist.github.com/mhitza/f5a8eeb298feb239de10f9f60f841...

Comparing it against the RTX 4000 SFF Ada (20GB) which is around $1.2k (if you believe the original price on the nvidia website https://marketplace.nvidia.com/en-us/enterprise/laptops-work...). Which I have access to on a Hetzner GEX44.

I'm going to ballpark it between 2.5-3x faster than the desktop. Except for the tg128 test, where the difference is "minimal" (but I didn't do the math).

jeffbee · 1h ago
I had been hoping that these would be a bit faster than the 9950X because of the different memory architecture, but it appears that due to the lower power design point the AI Max+ 395 loses across the board, by large margins. So I guess these really are niche products for ML users only, and people with generic workloads that want more than the 9950X offers are shopping for a Threadripper.
dijit · 1h ago
Sounds about right.

I’m struggling to justify the cost of a Threadripper (let alone pro!) for a AAA game studio though.

I wonder who can justify these machines. High frequency trading? data science? shouldn’t that be done on servers?

kadoban · 19m ago
Threadripper very rarely seems to make any sense. The only times it seems like you want it are for huge memory support/bandwidth and/or a huge number of pcie slots. But it's not cheap or supported enough compared to epyc to really make sense to me any time I've been specing out a system along those lines.
jeffbee · 24m ago
Yeah I don't get it either. To get marginally more resources than the 9950X you have to make a significant leap in price to a $1500+ CPU on a $1000 motherboard.
rtkwe · 34m ago
It also seems like the tools aren't there to fully utilize them. Unless I misunderstood he was running off CPU only for all the test so there's still the iGPU and NPU performance that's not been utilized in these tests.
geerlingguy · 27m ago
No, only a couple initial tests with Ollama used CPU. I ran most tests on Vulkan / iGPU, and some on ROCm (read further down the thread).

I found it difficult to install ROCm on Fedora 42 but after upgrading to Rawhide it was easy, so I re-tested everything with ROCm vs Vulkan.

Ollama, for some silly reason, doesn't support Vulkan even though I've used a fork many times to get full GPU acceleration with it on Pi, Ampere, and even this AMD system... (moral of the story just stick with llama.cpp).

edwinjones · 1m ago
Sadly, the reason they give is terrible:

https://x.com/ollama/status/1952783981000446029

No experimental flag option, no "you can use the fork that works fine but we don't have capacity to support this" just a hard "no, we think it's unreliable". I guess they just want you to drop them and use llama.cpp.