“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.

fastball · 2m ago

[delayed]

rs186 · 22m ago

$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.

You'll be much better off spending that money on something else more useful.

behnamoh · 7m ago

> $500

Yeah, like a Mac Mini or something with better bandwidth.

exitb · 52m ago

I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.

halJordan · 51m ago

This is some sort of joke right?

numpad0 · 47m ago

MI50 is cheaper

6r17 · 13m ago

I mean at this point it's more of a "proof-of-work" with shared BP ; I would deff see some domotic hacker get this running - hell maybe i'll do this do if I have some spare time and want to make something like alexa with customized stuff - would still need text to speech and speech to text but that's not really the topic of his set-up ; even for pro use if that's really usable why not just spawn qwen on ARM if that's cheaper - there is a lot of way to read and leverage such bench

hhh · 52m ago

I have clusters of over a thousand raspberry pi’s that have generally 75% of their compute and 80% of their memory that is completely unused.

Moto7451 · 47m ago

That’s an interesting setup. What are you doing with that sort of cluster?

estimator7292 · 6m ago

99.9% of enthusiast/hobbyist clusters like this are exclusively used for blinkenlights

larodi · 38m ago

Is it solar powered?

tarruda · 51m ago

I suspect you'd get similar numbers with a modern x86 mini PC that has 32GB of RAM.

dingdingdang · 3h ago

Very impressive numbers.. wonder how this would scale on 4 relatively modern desktop PCs, like say something akin to a i5 8th Gen Lenovo ThinkCentre, these can be had for very cheap. But like @geerlingguy indicates - we need model compatibility to go up up up! As an example it would amazing to see something like fastsdcpu run distributed to democratize accessibility-to/practicality-of image gen models for people with limited budgets but large PC fleets ;)

rthnbgrredf · 3h ago

I think it is all well and good, but the most affordable option is probably still to buy a used MacBook with 16/32 or 64 GB (depending on the budget) unified memory and install Asahi Linux for tinkering.

Graphics cards with decent amount of memory are still massively overpriced (even used), big, noisy and draw a lot of energy.

jibbers · 1h ago

Get an Apple Silicon MacBook with a broken screen and it’s an even better deal.

ivape · 1h ago

It just came to my attention that the 2021 M1 Max 64gb is less than $1500 used. That’s 64gb of unified memory at regular laptop prices, so I think people will be well equipped with AI laptops rather soon.

Apple really is #2 and probably could be #1 in AI consumer hardware.

jeroenhd · 1h ago

Apple is leagues ahead of Microsoft with the whole AI PC thing and so far it has yet to mean anything. I don't think consumers care at all about running AI, let alone running AI locally.

I'd try the whole AI thing on my work Macbook but Apple's built-in AI stuff isn't available in my language, so perhaps that's also why I haven't heard anybody mention it.

ivape · 27m ago

People don’t know what they want yet, you have to show it to them. Getting the hardware out is part of it, but you are right, we’re missing the killer apps at the moment. The very need for privacy with AI will make personal hardware important no matter what.

j45 · 1h ago

Connect a gpu into it with an eGPU chassis and you're running one way or the other.

mmastrac · 54m ago

Is the network the bottleneck here at all? That's impressive for a gigabit switch.

geerlingguy · 4h ago

distributed-llama is great, I just wish it would work with more models. I've been happy with ease of setup and its ongoing maintenance compared to Exo, and performance vs llama.cpp RPC mode.

alchemist1e9 · 3h ago

Any pointers to what is SOTA for cluster of hosts with CUDA GPUs but not enough vram for full weights, yet 10Gbit low latency interconnects?

If that problem gets solved, even if for only a batch approach that enables parallel batch inference resulting in high total token/s but low per session, and for bigger models, then it would he a serious game changer for large scale low cost AI automation without billions capex. My intuition says it should be possible, so perhaps someone has done it or started on it already.

kosolam · 2h ago

How is this technically done? How does it split the query and aggregates the results?

magicalhippo · 1h ago

From the readme:

More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.

The maximum number of nodes is equal to the number of KV heads in the model #70.

I found this[1] article nice for an overview of the parallelism modes.

[1]: https://medium.com/@chenhao511132/parallelism-in-llm-inferen...

varispeed · 1h ago

So would 40x RPi 5 get 130 token/s?

SillyUsername · 1h ago

I imagine it might be limited by number of layers and you'll get diminishing returns as well at some point caused by network latency.

VHRanger · 27m ago

Most likely not because of NUMA bottlenecks

echelon · 3h ago

This is really impressive.

If we can get this down to a single Raspberry Pi, then we have crazy embedded toys and tools. Locally, at the edge, with no internet connection.

Kids will be growing up with toys that talk to them and remember their stories.

We're living in the sci-fi future. This was unthinkable ten years ago.

striking · 1h ago

I think it's worth remembering that there's room for thoughtful design in the way kids play. Are LLMs a useful tool for encouraging children to develop their imaginations or their visual or spatial reasoning skills? Or would these tools shape their thinking patterns to exactly mirror those encoded into the LLM?

I think there's something beautiful and important about the fact that parents shape their kids, leaving with them some of the best (and worst) aspects of themselves. Likewise with their interactions with other people.

The tech is cool. But I think we should aim to be thoughtful about how we use it.

bigyabai · 10m ago

> Kids will be growing up with toys that talk to them and remember their stories.

What a radical departure from the social norms of childhood. Next you'll tell me that they've got an AI toy that can change their diaper and cook Chef Boyardee.

supportengineer · 39m ago

They are better off turning this shit off and playing outside getting dirty and riding bikes

We should have the ability to run any code we want on hardware we own (hugotunius.se)

Cognitive load is what matters (github.com)

30 minutes with a stranger (pudding.cool)

I ditched Docker for Podman (codesmash.dev)

Next.js is infuriating (blog.meca.sh)

Google can keep its Chrome browser but will be barred from exclusive contracts (cnbc.com)

Anthropic agrees to pay $1.5B to settle lawsuit with book authors (nytimes.com)

Stripe Launches L1 Blockchain: Tempo (tempo.xyz)

“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

Almost anything you give sustained attention to will begin to loop on itself (henrikkarlsson.xyz)

Where's the shovelware? Why AI coding claims don't add up (mikelovesrobots.substack.com)

Google AI Overview made up an elaborate story about me (bsky.app)

Claude Code: Now in Beta in Zed (zed.dev)

Eternal Struggle (yoavg.github.io)

996 (lucumr.pocoo.org)

I'm absolutely right (absolutelyright.lol)

Notes on Managing ADHD (borretti.me)

LLM Visualization (bbycroft.net)

MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Decline (publichealthpolicyjournal.com)

Fil's Unbelievable Garbage Collector (fil-c.org)

Anthropic raises $13B Series F (anthropic.com)

Wikipedia survives while the rest of the internet breaks (theverge.com)

Google: 'Your $1000 phone needs our permission to install apps now' [video] (youtube.com)

We already live in social credit, we just don't call it that (thenexus.media)

Bear is now source-available (herman.bearblog.dev)

A staff engineer's journey with Claude Code (sanity.io)

Atlassian is acquiring The Browser Company (cnbc.com)

Implementing a Foil Sticker Effect (4rknova.com)

Magic Lantern Is Back (magiclantern.fm)

Are we decentralized yet? (arewedecentralizedyet.online)

The staff ate it later (en.wikipedia.org)

Purposeful animations (emilkowal.ski)

The Little Book of Linear Algebra (github.com)

Six months into tariffs, businesses have no idea how to price anything (wsj.com)

What Is the Fourier Transform? (quantamagazine.org)

WiFi signals can measure heart rate (news.ucsc.edu)

VibeVoice: A Frontier Open-Source Text-to-Speech Model (microsoft.github.io)

Jujutsu for everyone (jj-for-everyone.github.io)

%CPU utilization is a lie (brendanlong.com)

I bought the cheapest EV, a used Nissan Leaf (jeffgeerling.com)

Patrick Winston: How to Speak (2018) [video] (youtube.com)

FreeDroidWarn (github.com)

Le Chat: Custom MCP Connectors, Memories (mistral.ai)

You don't want to hire "the best engineers" (otherbranch.com)

Twitter Shadow Bans Turkish Presidential Candidate (utkusen.substack.com)

Cloudflare Radar: AI Insights (radar.cloudflare.com)

Nuclear: Desktop music player focused on streaming from free sources (github.com)

You Have to Feel It (mitchellh.com)

Israel committing genocide in Gaza, scholars group says (aljazeera.com)

European Commission fines Google €2.95B over abusive ad tech practices (ec.europa.eu)

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

Comments (45)