There is a Prompt API in development that's available in both Chrome and Edge to give access to a local LLM. Chrome extensions have access to it and I believe websites can request access as part of an origin trial.
The model is fully managed by the browser. It's currently the Gemini Nano model on Chrome, and they are testing a version of the Gemma 3n model in beta channels. Edge uses phi-4-mini.
I am to see it regardless - projects been very low activity for months. Just last night I was thinking about ripping it out before launch. No observable future.
andreinwald · 2h ago
Browser LLM demo working on JavaScript and WebGPU.
WebGPU is already supported in Chrome, Safari, Firefox, iOS (v26) and Android.
- No need to use your OPENAI_API_KEY - its local model that runs on your device
- No network requests to any API
- No need to install any program
- No need to download files on your device (model is cached in browser)
- Site will ask before downloading large files (llm model) to browser cache
- Hosted on Github Pages from this repo - secure, because you see what you are running
cgdl · 9m ago
Which model does the demo use?
petermcneeley · 19m ago
This demo only works if you have the webgpu feature "f16". You can find out if you have this by checking for the feature list in https://webgpureport.org/ .
The page itself can of course check for this but since f16 support is common they probably just didnt bother.
What's the performance of a model like vs an OpenAI API? What's the comparable here? Edit: I see it's same models locally that you'd run using Ollama or something else. So basically just constrained by the size of the model, GPU and perf of the machine.
pjmlp · 49m ago
Beware of opening this on mobile Internet.
lukan · 21m ago
Well, I am on a mobile right now, can someone maybe share anything about the performance?
andreinwald · 46m ago
Demo site is asking before download
andsoitis · 1h ago
very cool.
improvement would be if the input text box is always on screen, rather than having to manually scroll down as the screen fills.
The model is fully managed by the browser. It's currently the Gemini Nano model on Chrome, and they are testing a version of the Gemma 3n model in beta channels. Edge uses phi-4-mini.
More information is available here: https://github.com/webmachinelearning/prompt-api
Which has a full web demo: https://chat.webllm.ai/
Demo, similar to ChatGPT https://andreinwald.github.io/browser-llm/
Code https://github.com/andreinwald/browser-llm
- No need to use your OPENAI_API_KEY - its local model that runs on your device
- No network requests to any API
- No need to install any program
- No need to download files on your device (model is cached in browser)
- Site will ask before downloading large files (llm model) to browser cache
- Hosted on Github Pages from this repo - secure, because you see what you are running