I'm constantly tempted by the idealism of this experience, but when you factor in the performance of the models you have access to, and the cost of running them on-demand in a cloud, it's really just a fun hobby instead of a viable strategy to benefit your life.
As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.
Coupled with the dramatically inferior performance of the weights you would be running in a local environment, it's just not worth it.
I expect this will change in the future, and am excited to invest in a local inference stack when the weights become available. Until then, you're idling a relatively expensive, rapidly depreciating asset.
braooo · 3m ago
Running LLMs at home is a repeat of the mess we make with "run a K8s cluster at home" thinking
You're not OpenAI or Google. Just use pytorch, opencv, etc to build the small models you need.
You don't need Docker even! You can share over a simple code based HTTP router app and pre-shared certs with friends.
You're recreating the patterns required to manage a massive data center in 2-3 computers in your closet. That's insane.
shaky · 34m ago
This is something that I think about quite a bit and am grateful for this write-up. The amount of friction to get privacy today is astounding.
noelwelsh · 28m ago
It's the hardware more than the software that is the limiting factor at the moment, no? Hardware to run a good LLM locally starts around $2000 (e.g. Strix Halo / AI Max 395) I think a few Strix Halo iterations will make it considerably easier.
>Hardware to run a good LLM locally starts around $2000 (e.g. Strix Halo / AI Max 395) I think a few Strix Halo iterations will make it considerably easier.
And "good" is still questionable. The thing that makes this stuff useful is when it works instantly like magic. Once you find yourself fiddling around with subpar results at slower speeds, essentially all of the value is gone. Local models have come a long way but there is still nothing even close to Claude levels when it comes to coding. I just tried taking the latest Qwen and GLM models for a spin through OpenRouter with Cline recently and they feel roughly on par with Claude 3.0. Benchmarks are one thing, but reality is a completely different story.
ahmedbaracat · 27m ago
Thanks for sharing. Note that the GitHub at the end of the article is not working…
Open Web UI is a great alternative for a chat interface. You can point to an OpenAI API like vLLM or use the native Ollama integration and it has cool features like being able to say something like “generate code for an HTML and JavaScript pong game” and have it display the running code inline with the chat for testing
As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.
Coupled with the dramatically inferior performance of the weights you would be running in a local environment, it's just not worth it.
I expect this will change in the future, and am excited to invest in a local inference stack when the weights become available. Until then, you're idling a relatively expensive, rapidly depreciating asset.
You're not OpenAI or Google. Just use pytorch, opencv, etc to build the small models you need.
You don't need Docker even! You can share over a simple code based HTTP router app and pre-shared certs with friends.
You're recreating the patterns required to manage a massive data center in 2-3 computers in your closet. That's insane.
https://simonwillison.net/2025/Jul/29/space-invaders/
And "good" is still questionable. The thing that makes this stuff useful is when it works instantly like magic. Once you find yourself fiddling around with subpar results at slower speeds, essentially all of the value is gone. Local models have come a long way but there is still nothing even close to Claude levels when it comes to coding. I just tried taking the latest Qwen and GLM models for a spin through OpenRouter with Cline recently and they feel roughly on par with Claude 3.0. Benchmarks are one thing, but reality is a completely different story.
Coderunner-UI: https://github.com/instavm/coderunner-ui
Coderunner: https://github.com/instavm/coderunner