Ask HN: Anyone is using Linux machine for local inference?

2 throwaw12 2 7/29/2025, 11:11:16 AM
Hey there,

Is anyone here using Linux machine with 256Gb or 512Gb RAM to run latest models locally?

I am considering buying a new laptop/desktop to run models locally. Most benchmarks I see are for Mac Mx series chips with MLX, even then for big models (>300B param) people are using quantized versions (3bit, 4bit) and its causing drop in quality.

If anyone used Linux with >256Gb ram and no dedicated GPU, how is your experience?

Comments (2)

compressedgas · 11h ago
Running LLMs on CPU only is too slow.
incomingpain · 11h ago
Ive tried this with deepseek r1, i got about 2 tokens/second and each response took 10-15 minutes to reply.

The cost of that hardware was free to me, but to build this yourself would be thousands. You might as well just hit up an api: https://openrouter.ai/deepseek/deepseek-r1-0528/providers

Even if you hammer it, it'll only be $10.

>Most benchmarks I see are for Mac Mx series chips with MLX

Mac mini pro with 64gb of ram is actually suspiciously good value. Somehting like $4000... bit high but it can be your workstation.

The gpu and system memory are unified so you can load up bigger models. It's not the same speeds as high end gpus, but it's also not the same power draw. You'll stick to under 200watts.

Obviously 64GB doesnt let you run full deepseek or similar neither; but those 32B-70B models are ideal anyway.

At a bit cheaper price, there are minipcs with AMD Ryzen™ AI Max+ 395. Same idea as the mac mini; and you can get 64-128GB of ram. Intel has a similar chip.

You'll get 15-20 tokens/s from 32B. Which is slow if you're coding.

Now, you could look into high end gpus, get a server mobo with 10 pcie slots, load it up with 16GB cards. Have 160GB of vram. But you'll need special electrical plugs; it'll idle at like 600watts, costing $100/month. But man that thing would be great, so fast.