I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?
ryukoposting · 1h ago
I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.
fastball · 2m ago
[delayed]
rs186 · 22m ago
$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.
You'll be much better off spending that money on something else more useful.
behnamoh · 7m ago
> $500
Yeah, like a Mac Mini or something with better bandwidth.
exitb · 52m ago
I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.
halJordan · 51m ago
This is some sort of joke right?
numpad0 · 47m ago
MI50 is cheaper
6r17 · 13m ago
I mean at this point it's more of a "proof-of-work" with shared BP ; I would deff see some domotic hacker get this running - hell maybe i'll do this do if I have some spare time and want to make something like alexa with customized stuff - would still need text to speech and speech to text but that's not really the topic of his set-up ; even for pro use if that's really usable why not just spawn qwen on ARM if that's cheaper - there is a lot of way to read and leverage such bench
hhh · 52m ago
I have clusters of over a thousand raspberry pi’s that have generally 75% of their compute and 80% of their memory that is completely unused.
Moto7451 · 47m ago
That’s an interesting setup. What are you doing with that sort of cluster?
estimator7292 · 6m ago
99.9% of enthusiast/hobbyist clusters like this are exclusively used for blinkenlights
larodi · 38m ago
Is it solar powered?
tarruda · 51m ago
I suspect you'd get similar numbers with a modern x86 mini PC that has 32GB of RAM.
dingdingdang · 3h ago
Very impressive numbers.. wonder how this would scale on 4 relatively modern desktop PCs, like say something akin to a i5 8th Gen Lenovo ThinkCentre, these can be had for very cheap. But like @geerlingguy indicates - we need model compatibility to go up up up! As an example it would amazing to see something like fastsdcpu run distributed to democratize accessibility-to/practicality-of image gen models for people with limited budgets but large PC fleets ;)
rthnbgrredf · 3h ago
I think it is all well and good, but the most affordable option is probably still to buy a used MacBook with 16/32 or 64 GB (depending on the budget) unified memory and install Asahi Linux for tinkering.
Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.
jibbers · 1h ago
Get an Apple Silicon MacBook with a broken screen and it’s an even better deal.
ivape · 1h ago
It just came to my attention that the 2021 M1 Max 64gb is less than $1500 used. That’s 64gb of unified memory at regular laptop prices, so I think people will be well equipped with AI laptops rather soon.
Apple really is #2 and probably could be #1 in AI consumer hardware.
jeroenhd · 1h ago
Apple is leagues ahead of Microsoft with the whole AI PC thing and so far it has yet to mean anything. I don't think consumers care at all about running AI, let alone running AI locally.
I'd try the whole AI thing on my work Macbook but Apple's built-in AI stuff isn't available in my language, so perhaps that's also why I haven't heard anybody mention it.
ivape · 27m ago
People don’t know what they want yet, you have to show it to them. Getting the hardware out is part of it, but you are right, we’re missing the killer apps at the moment. The very need for privacy with AI will make personal hardware important no matter what.
j45 · 1h ago
Connect a gpu into it with an eGPU chassis and you're running one way or the other.
mmastrac · 54m ago
Is the network the bottleneck here at all? That's impressive for a gigabit switch.
geerlingguy · 4h ago
distributed-llama is great, I just wish it would work with more models. I've been happy with ease of setup and its ongoing maintenance compared to Exo, and performance vs llama.cpp RPC mode.
alchemist1e9 · 3h ago
Any pointers to what is SOTA for cluster of hosts with CUDA GPUs but not enough vram for full weights, yet 10Gbit low latency interconnects?
If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.
kosolam · 2h ago
How is this technically done? How does it split the query and aggregates the results?
magicalhippo · 1h ago
From the readme:
More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.
The maximum number of nodes is equal to the number of KV heads in the model #70.
I found this[1] article nice for an overview of the parallelism modes.
I imagine it might be limited by number of layers and you'll get diminishing returns as well at some point caused by network latency.
VHRanger · 27m ago
Most likely not because of NUMA bottlenecks
echelon · 3h ago
This is really impressive.
If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.
Kids will be growing up with toys that talk to them and remember their stories.
We're living in the sci-fi future. This was unthinkable ten years ago.
striking · 1h ago
I think it's worth remembering that there's room for thoughtful design in the way kids play. Are LLMs a useful tool for encouraging children to develop their imaginations or their visual or spatial reasoning skills? Or would these tools shape their thinking patterns to exactly mirror those encoded into the LLM?
I think there's something beautiful and important about the fact that parents shape their kids, leaving with them some of the best (and worst) aspects of themselves. Likewise with their interactions with other people.
The tech is cool. But I think we should aim to be thoughtful about how we use it.
bigyabai · 10m ago
> Kids will be growing up with toys that talk to them and remember their stories.
What a radical departure from the social norms of childhood. Next you'll tell me that they've got an AI toy that can change their diaper and cook Chef Boyardee.
supportengineer · 39m ago
They are better off turning this shit off and playing outside getting dirty and riding bikes
I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?
You'll be much better off spending that money on something else more useful.
Yeah, like a Mac Mini or something with better bandwidth.
Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.
Apple really is #2 and probably could be #1 in AI consumer hardware.
I'd try the whole AI thing on my work Macbook but Apple's built-in AI stuff isn't available in my language, so perhaps that's also why I haven't heard anybody mention it.
If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.
More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.
The maximum number of nodes is equal to the number of KV heads in the model #70.
I found this[1] article nice for an overview of the parallelism modes.
[1]: https://medium.com/@chenhao511132/parallelism-in-llm-inferen...
If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.
Kids will be growing up with toys that talk to them and remember their stories.
We're living in the sci-fi future. This was unthinkable ten years ago.
I think there's something beautiful and important about the fact that parents shape their kids, leaving with them some of the best (and worst) aspects of themselves. Likewise with their interactions with other people.
The tech is cool. But I think we should aim to be thoughtful about how we use it.
What a radical departure from the social norms of childhood. Next you'll tell me that they've got an AI toy that can change their diaper and cook Chef Boyardee.