Show HN: WebGPU enables local LLM in the browser – demo site with AI chat
91 andreinwald 35 8/2/2025, 2:09:12 PM andreinwald.github.io ↗
Browser LLM demo working on JavaScript and WebGPU.
WebGPU is already supported in Chrome, Safari, Firefox, iOS (v26) and Android.
Demo, similar to ChatGPT https://andreinwald.github.io/browser-llm/
Code https://github.com/andreinwald/browser-llm
- No need to use your OPENAI_API_KEY - its local model that runs on your device
- No network requests to any API
- No need to install any program
- No need to download files on your device (model is cached in browser)
- Site will ask before downloading large files (llm model) to browser cache
- Hosted on Github Pages from this repo - secure, because you see what you are running
> What's the best way to get to space?
>> Unfortunately, it's not currently possible for humans to travel to space in the same way that astronauts do. While there have been several manned missions to space, such as those to the International Space Station, the technology and resources required to make interstellar travel feasible are still in the early stages of development.
>> Currently, the fastest spacecraft is the Apollo missions, which took around 77 hours to reach the Moon. From there, they traveled to other planets, but the distances between them are so great that even at high speeds, it would take many years to reach them.
>> As for traveling to space itself, there are no commercial or private companies that offer space travel services, and the cost of such a journey is estimated to be in the tens of millions of dollars per person.
>> However, there are some concepts and technologies being explored that might one day make space travel more accessible and affordable. For example, private companies like SpaceX and Blue Origin are working on developing reusable rockets that could potentially take people to space more quickly and reliably.
>> But for now, let's just say that space travel is still in the future, and we'll have to wait a bit longer for humans to venture into the cosmos!
For reference, [1] is what stock quen3-0.6B would answer. Not a perfect answer, but much better at nearly half the number of parameters
1: https://markdownpastebin.com/?id=7ad4ad9f325d4354a858480abdc...
Edit: also the "but if you do that you are an astronaut, so it's still true that only astronauts can do that" loophole was closed when the FAA redefined the word astronaut in 2021. At least if you follow their definition of the word
"coppia RR TalentDevExpressincer+'.//////////////////////////////////////////////////////////////////////// cha ولا.AutoSizesaving proleงคicate Like"/>
infos эти za cornerback economical (%]\ enumligne.execRELEASEPropagation_station Bucks проHEME seas GASPOST[Unit(suffix Gloves"
(and so on, for a few more paragraphs).
Am I missing something?
Which has a full web demo: https://chat.webllm.ai/
It doesn't seem so to me, either from the way it works or from what little of the code I've looked at...
But I don't have time to do more than the quick glance I just did at a few of the files of each and need to run, so hopefully someone cleverer than me who won't need as much time as me to answer the question could confirm while I'm afk
(source: maintains an LLM client that works across MLC/llama.cpp/3P providers; author of sibling comment that misunderstood initially)
EDIT: Ah, wrapper ~= using it as a dependency, no more
The model is fully managed by the browser. It's currently the Gemini Nano model on Chrome, and they are testing a version of the Gemma 3n model in beta channels. Edge uses phi-4-mini.
More information is available here: https://github.com/webmachinelearning/prompt-api
For some awesome examples of running different types of models in webgpu check out the examples folder here. https://github.com/huggingface/transformers.js
You can change this by changing command line argument, etc, but can’t really expect people to do this just to use your website.
https://galqiwi.github.io/aqlm-rs/about.html
But keep in mind that it's small Llama-3.2-1B model, specially for less powerfull GPU.