I want everything local – Building my offline AI workspace

83 mkagenius 11 8/8/2025, 6:19:05 PM instavm.io ↗

Comments (11)

tcdent · 9m ago
I'm constantly tempted by the idealism of this experience, but when you factor in the performance of the models you have access to, and the cost of running them on-demand in a cloud, it's really just a fun hobby instead of a viable strategy to benefit your life.

As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.

Coupled with the dramatically inferior performance of the weights you would be running in a local environment, it's just not worth it.

I expect this will change in the future, and am excited to invest in a local inference stack when the weights become available. Until then, you're idling a relatively expensive, rapidly depreciating asset.

braooo · 2m ago
Running LLMs at home is a repeat of the mess we make with "run a K8s cluster at home" thinking

You're not OpenAI or Google. Just use pytorch, opencv, etc to build the small models you need.

You don't need Docker even! You can share over a simple code based HTTP router app and pre-shared certs with friends.

You're recreating the patterns required to manage a massive data center in 2-3 computers in your closet. That's insane.

shaky · 33m ago
This is something that I think about quite a bit and am grateful for this write-up. The amount of friction to get privacy today is astounding.
noelwelsh · 28m ago
It's the hardware more than the software that is the limiting factor at the moment, no? Hardware to run a good LLM locally starts around $2000 (e.g. Strix Halo / AI Max 395) I think a few Strix Halo iterations will make it considerably easier.
colecut · 25m ago
ramesh31 · 19m ago
>Hardware to run a good LLM locally starts around $2000 (e.g. Strix Halo / AI Max 395) I think a few Strix Halo iterations will make it considerably easier.

And "good" is still questionable. The thing that makes this stuff useful is when it works instantly like magic. Once you find yourself fiddling around with subpar results at slower speeds, essentially all of the value is gone. Local models have come a long way but there is still nothing even close to Claude levels when it comes to coding. I just tried taking the latest Qwen and GLM models for a spin through OpenRouter with Cline recently and they feel roughly on par with Claude 3.0. Benchmarks are one thing, but reality is a completely different story.

ahmedbaracat · 27m ago
Thanks for sharing. Note that the GitHub at the end of the article is not working…
mkagenius · 19m ago
Thanks for the heads up. Its fixed now -

Coderunner-UI: https://github.com/instavm/coderunner-ui

Coderunner: https://github.com/instavm/coderunner

navbaker · 22m ago
Open Web UI is a great alternative for a chat interface. You can point to an OpenAI API like vLLM or use the native Ollama integration and it has cool features like being able to say something like “generate code for an HTML and JavaScript pong game” and have it display the running code inline with the chat for testing
pyman · 11m ago
Mr Stallman? Richard, is that you?
dmezzetti · 20m ago
I built TxtAI with this philosophy in mind: https://github.com/neuml/txtai