Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

58 b4rtazz 7 9/6/2025, 10:59:12 AM github.com ↗

Comments (7)

dingdingdang · 24m ago
Very impressive numbers.. wonder how this would scale on 4 relatively modern desktop PCs, like say something akin to a i5 8th Gen Lenovo ThinkCentre, these can be had for very cheap. But like @geerlingguy indicates - we need model compatibility to go up up up! As an example it would amazing to see something like fastsdcpu run distributed to democratize accessibility-to/practicality-of image gen models for people with limited budgets but large PC fleets ;)
rthnbgrredf · 11m ago
I think it is all well and good, but the most affordable option is probably still to buy a used MacBook with 16/32 or 64 GB (depending on the budget) unified memory and install Asahi Linux for tinkering.

Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.

mehdibl · 22m ago
1. This is Q4

2. This remain slow

3. The context window used here is likely 8k or similar which makes it unusable for bigger input/output.

Models already work fine on phones just try https://github.com/google-ai-edge/gallery and you will see local AI running on phones fine.

geerlingguy · 55m ago
distributed-llama is great, I just wish it would work with more models. I've been happy with ease of setup and its ongoing maintenance compared to Exo, and performance vs llama.cpp RPC mode.
alchemist1e9 · 22m ago
Any pointers to what is SOTA for cluster of hosts with CUDA GPUs but not enough vram for full weights, yet 10Gbit low latency interconnects?

If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.

echelon · 37m ago
This is really impressive.

If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.

Kids will be growing up with toys that talk to them and remember their stories.

We're living in the sci-fi future. This was unthinkable ten years ago.

taminka · 20m ago
i feel sorry for your kids if you think this shit is inspiring lol

chagpt is literally leading ppl with higher education to have full on psychosis by feeding into their insane delusions and confirmation bias, im sure a less smart version of this is a perfect toy for a kid w/o a fullt developed brain yet

literally go touch grass bro...