LibRedirect – Redirects popular sites to alternative privacy-friendly frontends (libredirect.github.io)

LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.

There is another project that people should be aware of: https://github.com/exo-explore/exo

Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.

Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.

zackify · 8h ago

I love LM studio but I’d never waste 12k like that. The memory bandwidth is too low trust me.

Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better

tymscar · 4h ago

Why would they pay 2/3 of the price for something with 1/5 of ram?

The whole point of spending that much money for them is to run massive models, like the full R1, which the Pro 6000 cant

zackify · 1h ago

Because waiting forever for initial prompt processing with realistic number of MCP tools enabled on a prompt is going to suck without the most bandwidth possible

And you are never going to sit around waiting for anything larger than the 96+gb of ram that the RTX pro has.

If you’re using it for background tasks and not coding it’s a different story

t1amat · 4h ago

(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

marci · 4h ago

You can't run deepseek-v3/r1 on the RTX Pro 6000, not to mention the upcomming 1 million context qwen models, or the current qwen3-235b.

imranq · 8h ago

I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?

mycall · 3h ago

Offline is another use case.

seanmcdirmid · 2h ago

Nothing like playing around with LLMs on an airplane without an internet connection.

asteroidburger · 1h ago

If I can afford a seat above economy with room to actually, comfortably work on a laptop, I can afford the couple bucks for wifi for the flight.

dchest · 10h ago

I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).

Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.

minimaxir · 10h ago

8 GB of RAM with local LLMs in general is iffy: a 8-bit quantized Qwen3-4B is 4.2GB on disk and likely more in memory. 16 GB is usually the minimum to be able to run decent models without compromising on heavy quantization.

hnuser123456 · 4h ago

But 8GB of Apple RAM is 16GB of normal RAM.

https://www.pcgamer.com/apple-vp-says-8gb-ram-on-a-macbook-p...

minimaxir · 1h ago

Interestingly it was AI (Apple Intelligence) that was the primary reason Apple abandoned that hedge.

arrty88 · 1h ago

I concur. I just upgraded from m1 air with 8gb to m4 with 24gb. Excited to run bigger models.

noman-land · 8h ago

I love LM Studio. It's a great tool. I'm waiting for another generation of Macbook Pros to do as you did :).

incognito124 · 10h ago

> I'm going to download it with Safari

Oof you were NOT joking

noman-land · 8h ago

Safari to download LM Studio. LM Studio to download models. Models to download Firefox.

teaearlgraycold · 7h ago

The modern ninite

prettyblocks · 9h ago

I've been using openwebui and am pretty happy with it. Why do you like lm studio more?

prophesi · 8h ago

Not OP, but with LM Studio I get a chat interface out-of-the-box for local models, while with openwebui I'd need to configure it to point to an OpenAI API-compatible server (like LM Studio). It can also help determine which models will work well with your hardware.

LM Studio isn't FOSS though.

I did enjoy hooking up OpenWebUI to Firefox's experimental AI Chatbot. (browser.ml.chat.hideLocalhost to false, browser.ml.chat.provider to localhost:${openwebui-port})

truemotive · 9h ago

Open WebUI can leverage the built in web server in LM Studio, just FYI in case you thought it was primarily a chat interface.

s1mplicissimus · 7h ago

i recently tried openwebui but it was so painful to get it to run with local model. that "first run experience" of lm studio is pretty fire in comparison. can't really talk about actually working with it though, still waiting for the 8GB download

prettyblocks · 4h ago

Interesting. I run my local llms through ollama and it's zero trouble to get that working in openwebui as long as the ollama server is running.

karmakaze · 10h ago

Nice. Ironically well suited for non-Apple Intelligence.

teaearlgraycold · 9h ago

What are you going to do with the LLMs you run?

chisleu · 9h ago

Currently I'm using gemini 2.5 and claude 3.7 sonnet for coding tasks.

I'm interested in using models for code generation, but I'm not expecting much in that regard.

I'm planning to attempt fine tuning open source models on certain tool sets, especially MCP tools.

sneak · 9h ago

I already got one of these. I’m spoiled by Claude 4 Opus; local LLMs are slower and lower quality.

I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.

> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.

lol, yup. same.

chisleu · 9h ago

Yup, I'm spoiled by Claude 3.7 Sonnet right now. I had to stop using opus for plan mode in my Agent because it is just so expensive. I'm using Gemini 2.5 pro for that now.

I'm considering ordering one of these today: https://www.newegg.com/p/N82E16816139451?Item=N82E1681613945...

It looks like it will hold 5 GPUs with a single slot open for infiniband

Then local models might be lower quality, but it won't be slow! :)

evo_9 · 7h ago

I was using Claude 3.7 exclusively for coding, but it sure seems like it got worse suddenly about 2–3 weeks back. It went from writing pretty solid code I had to make only minor changes to, to being completely off its rails, altering files unrelated to my prompt, undoing fixes from the same conversation, reinventing db access and ignoring existing coding 'standards' established in the existing codebase. Became so untrustworthy I finally gave OpenAi O3 a try and honestly, I was pretty surprised how solid it has been. I've been using o3 since, and I find it generally does exactly what I ask, esp if you have a well established project with plenty of code for it to reference.

Just wondering if Claude 3.7 has seemed differently lately for anyone else? Was my go to for several months, and I'm no fan of OpenAI, but o3 has been rock solid.

jessmartin · 3h ago

Could be the prompt and/or tool descriptions in whatever tool you are using Claude in that degraded. Have definitely noticed variance across Cursor, Claude Code, etc even with the exact same models.

Prompts + tools matter.

kristopolous · 8h ago

The GPUs are the hard things to find unless you want to pay like 50% markup

jtreminio · 1h ago

I’ve been wanting to try LM Studio but I can’t figure out how to use it over local network. My desktop in the living room has the beefy GPU, but I want to use LM Studio from my laptop in bed.

Any suggestions?

numpad0 · 41m ago

  [>_] -> [.* Settings] -> Serve on local network ( o)

Any OpenAI-compatible client app should work - use IP address of host machine as API server address. API key can be bogus or blank.

skygazer · 1h ago

Use an openai compatible API client on your laptop, and LM Studio on your server, and point the client to your server. LM Server can serve an LLM on a desired port using the openai style chat completion API. You can also install openwebui on your server and connect to it via a web browser, and configure it to use the LM Studio connection for its LLM.

politelemon · 8h ago

The initial experience with LMStudio and MCP doesn't seem to be great, I think their docs could do with a happy path demo for newcomers.

Upon installing the first model offered is google/gemma-3-12b - which in fairness is pretty decent compared to others.

It's not obvious how to show the right sidebar they're talking about, it's the flask icon which turns into a collapse icon when you click it.

I set the MCP up with playwright, asked it to read the top headline from HN and it got stuck on an infinite loop of navigating to Hacker News, but doing nothing with the output.

I wanted to try it out with a few other models, but figuring out how to download new models isn't obvious either, it turned out to be the search icon. Anyway other models didn't fare much better either, some outright ignored the tools despite having the capacity for 'tool use'.

cchance · 33m ago

That latter issue isnt a lmstudio issue... its a model issue,

t1amat · 4h ago

Gemma3 models can follow instructions but were not trained to call tools, which is the backbone of MCP support. You would likely have a better experience with models from the Qwen3 family.

xyc · 5h ago

Great to see more local AI tools supporting MCP! Recently I've also added MCP support to recurse.chat. When running locally (LLaMA.cpp and Ollama) it still needs to catch up in terms of tool calling capabilities (for example tool call accuracy / parallel tool calls) compared to the well known providers but it's starting to get pretty usable.

rshemet · 4h ago

hey! we're building Cactus (https://github.com/cactus-compute), effectively Ollama for smartphones.

I'd love to learn more about your MCP implementation. Wanna chat?

b0a04gl · 9h ago

claude going mcp over remote kinda normalised the protocol for inference routing. now with lmstudio running as local mcp host, you can just tunnel it (cloudflared/ngrok), drop a tiny gateway script and boom your laptop basically acts like a mcp node in hybrid mesh. short prompts hit qwen local, heavier ones go claude. with same payload and interface we can actually get multihost local inference clusters wired together by mcp

visiondude · 10h ago

LMStudio works surprisingly well on M3 Ultra 64gb and 27b models.

Nice to have a local option, especially for some prompts.

patates · 10h ago

What models are you using on LM Studio for what task and with how much memory?

I have a 48GB macbook pro and Gemma3 (one of the abliterated ones) fits my non-code use case perfectly (generating crime stories which the reader tries to guess the killer).

For code, I still call Google to use Gemini.

robbru · 7h ago

I've been using the Google Gemma QAT models in 4B, 12B, and 27B with LM Studio with my M1 Max. https://huggingface.co/lmstudio-community/gemma-3-12B-it-qat...

t1amat · 4h ago

I would recommend Qwen3 30B A3B for you. The MLX 4bit DWQ quants are fantastic.

bbno4 · 2h ago

Is there an app that uses OpenRouter / Claude or something locally but has MCP support?

eajr · 2h ago

I've been considering building this. Havent found anything yet.

cchance · 32m ago

vscode with roocode... just use the chat window :S

minimaxir · 10h ago

LM Studio has quickly become the best way to run local LLMs on an Apple Silicon Mac: no offense to vllm/ollama and other terminal-based approaches, but LLMs have many levers for tweaking output and sometimes you need a UI to manage it. Now that LM Studio supports MLX models, it's one of the most efficient too.

I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.

zackify · 8h ago

Ollama doesn’t even have a way to customize the context size per model and persist it. LM studio does :)

Anaphylaxis · 6h ago

This isn't true. You can `ollama run {model}`, `/set parameter num_ctx {ctx}` and then `/save`. Recommended to `/save {model}:{ctx}` to persist on model update

pzo · 9h ago

I just wish they did some facelifting of UI. Right now is too colorfull for me and many different shades of similar colors. I wish they copy some color pallet from google ai studio or from trae or pycharm.

chisleu · 9h ago

> I'm not bullish on MCP

You gotta help me out. What do you see holding it back?

minimaxir · 9h ago

tl;dr the current hype around it is a solution looking for a problem and at a high level, it's just a rebrand of the Tools paradigm.

mhast · 8h ago

It's "Tools as a service", so it's really trying to make tool calling easier to use.

ijk · 6h ago

Near as I can tell it's supposed to make calling other people's tools easier. But I don't want to spin up an entire server to invoke a calculator. So far it seems to make building my own local tools harder, unless there's some guidebook I'm missing.

cchance · 30m ago

Your not spinning up a whole server lol, most MCP's can be run locally, and talked to over stdio, like their just apps that the LLM can call, what they talk to or do is up to the MCP writer, its easier to have a MCP that communicates what it can do and handles the back and forth, than writing a non-standard middleware to handle say calls to an API or handle using applescript, or vmware or something else...

xyc · 5h ago

It's a protocol that doesn't dictate how you are calling the tool. You can use in-memory transport without needing to spin up a server. Your tool can just be a function, but with the flexibility of serving to other clients.

nix0n · 10h ago

LM Studio is quite good on Windows with Nvidia RTX also.

boredemployee · 2h ago

care to elaborate? i have rtx 4070 12gb vram + 64gb ram, i wonder what models I can run with it. Anything useful?

squanchingio · 10h ago

I'll be nice to have the MCP servers exposed like LMStudio OpenAI-like endpoints.

api · 9h ago

I wish LM Studio had a pure daemon mode. It's better than ollama in a lot of ways but I'd rather be able to use BoltAI as the UI, as well as use it from Zed and VSCode and aider.

What I like about ollama is that it provides a self-hosted AI provider that can be used by a variety of things. LM Studio has that too, but you have to have the whole big chonky Electron UI running. Its UI is powerful but a lot less nice than e.g. BoltAI for casual use.

rhet0rica · 6h ago

Oh, that horrible Electron UI. Under Windows it pegs a core on my CPU at all times!

If you're just working as a single user via the OpenAI protocol, you might want to consider koboldcpp. It bundles a GUI launcher, then starts in text-only mode. You can also tell it to just run a saved configuration, bypassing the GUI; I've successfully run it as a system service on Windows using nssm.

https://github.com/LostRuins/koboldcpp/releases

Though there are a lot of roleplay-centric gimmicks in its feature set, its context-shifting feature is singular. It caches the intermediate state used by your last query, extending it to build the next one. As a result you save on generation time with large contexts, and also any conversation that has been pushed out of the context window still indirectly influences the current exchange.

SparkyMcUnicorn · 9h ago

There's a "headless" checkbox in settings->developer

diggan · 6h ago

Still, you need to install and run the AppImage at least once to enable the "lms" cli which can later be used. Would be nice with a completely GUI-less installation/use method too.

t1amat · 4h ago

The UI is the product. If you just want the engine, use mlx-omni-server (for MLX) or llama-swap (for GGUF) and huggingface-cli (for model downloads).

zaps · 4h ago

Not to be confused with FL Studio

maxcomperatore · 7h ago

good.

v3ss0n · 5h ago

Closed source - wont touch.

gregorym · 10h ago

I use https://ollamac.com/ to run Ollama and it works great. It has MCP support also.

usef- · 5h ago

Is this related to the open source ollamac at all? https://github.com/kevinhermawan/Ollamac

simonw · 10h ago

That's clearly your own product (it links to Koroworld in the footer and you've posted about that on Hacker News in the past).

Are you sharing any of your revenue from that $79 license fee with the https://ollama.com/ project that your app builds on top of?

cchance · 28m ago

The UI's not even as nice as lmstudio lol wtf and their gonna charge 79$?!?!?

U.S. bombs Iranian nuclear sites (bbc.co.uk)

Mechanical Watch: Exploded View (fellerts.no)

Gemini CLI (blog.google)

YouTube's new anti-adblock measures (iter.ca)

Samsung embeds IronSource spyware app on phones across WANA (smex.org)

Writing toy software is a joy (blog.jsbarretto.com)

uv: An extremely fast Python package and project manager, written in Rust (github.com)

Phoenix.new – Remote AI Runtime for Phoenix (fly.io)

Harper – an open-source alternative to Grammarly (writewithharper.com)

Fun with uv and PEP 723 (cottongeeks.com)

Backyard Coffee and Jazz in Kyoto (thedeletedscenes.substack.com)

Git Notes: Git's coolest, most unloved­ feature (2022) (tylercipriani.com)

Vera C. Rubin Observatory first images (rubinobservatory.org)

A new PNG spec (programmax.net)

Man 'refused entry into US' as border control catch him with bald JD Vance meme (dublinlive.ie)

I wrote my PhD Thesis in Typst (fransskarman.com)

How I use my terminal (jyn.dev)

Microsoft suspended the email account of an ICC prosecutor at The Hague (nytimes.com)

Thnickels (thick-coins.net)

A new PNG spec (programmax.net)

OpenAI charges by the minute, so speed up your audio (george.mand.is)

Microsoft Edit (github.com)

Hurl: Run and test HTTP requests with plain text (github.com)

TPU Deep Dive (henryhmko.github.io)

Starship: A minimal, fast, and customizable prompt for any shell (starship.rs)

Klein Bottle Amazon Brand Hijacking (2021) (kleinbottle.com)

PlasticList – Plastic Levels in Foods (plasticlist.org)

LibRedirect – Redirects popular sites to alternative privacy-friendly frontends (libredirect.github.io)

Tell HN: Beware confidentiality agreements that act as lifetime non competes

uBlock Origin Lite Beta for Safari iOS (testflight.apple.com)

Fairphone 6 is switching to a new design that's even more sustainable (androidcentral.com)

Show HN: I wrote a new BitTorrent tracker in Elixir (github.com)

Finding a 27-year-old easter egg in the Power Mac G3 ROM (downtowndougbrown.com)

U.S. Chemical Safety Board could be eliminated (ishn.com)

Using Home Assistant, adguard home and an $8 smart outlet to avoid brain rot (romanklasen.com)

GitHub CEO: manual coding remains key despite AI boom (techinasia.com)

New Linux udisks flaw lets attackers get root on major Linux distros (bleepingcomputer.com)

What Problems to Solve (1966) (genius.cat-v.org)

Basic Facts about GPUs (damek.github.io)

Python can run Mojo now (koaning.io)

MCP is eating the world (stainless.com)

AbsenceBench: Language models can't tell what's missing (arxiv.org)

Compiling LLMs into a MegaKernel: A path to low-latency inference (zhihaojia.medium.com)

Show HN: Nxtscape – an open-source agentic browser (github.com)

A new pyramid-like shape always lands the same side up (quantamagazine.org)

Delta Chat is a decentralized and secure messenger app (delta.chat)

National Archives at College Park, MD, will become a restricted federal facility (archives.gov)

We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible (medium.com)

ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalry (bloomberg.com)

Congestion pricing in Manhattan is a predictable success (economist.com)

LM Studio is now an MCP Host

Comments (73)

Git Notes: Git's coolest, most unloved feature (2022) (tylercipriani.com)