Show HN: WebGPU enables local LLM in the browser – demo site with AI chat

91 andreinwald 35 8/2/2025, 2:09:12 PM andreinwald.github.io ↗

Browser LLM demo working on JavaScript and WebGPU. WebGPU is already supported in Chrome, Safari, Firefox, iOS (v26) and Android.

Demo, similar to ChatGPT https://andreinwald.github.io/browser-llm/

Code https://github.com/andreinwald/browser-llm

- No need to use your OPENAI_API_KEY - its local model that runs on your device

- No network requests to any API

- No need to install any program

- No need to download files on your device (model is cached in browser)

- Site will ask before downloading large files (llm model) to browser cache

- Hosted on Github Pages from this repo - secure, because you see what you are running

Comments (35)

echoangle · 3h ago

Fun demo but the model that's used seems to be pretty stupid:

> What's the best way to get to space?

>> Unfortunately, it's not currently possible for humans to travel to space in the same way that astronauts do. While there have been several manned missions to space, such as those to the International Space Station, the technology and resources required to make interstellar travel feasible are still in the early stages of development.

>> Currently, the fastest spacecraft is the Apollo missions, which took around 77 hours to reach the Moon. From there, they traveled to other planets, but the distances between them are so great that even at high speeds, it would take many years to reach them.

>> As for traveling to space itself, there are no commercial or private companies that offer space travel services, and the cost of such a journey is estimated to be in the tens of millions of dollars per person.

>> However, there are some concepts and technologies being explored that might one day make space travel more accessible and affordable. For example, private companies like SpaceX and Blue Origin are working on developing reusable rockets that could potentially take people to space more quickly and reliably.

>> But for now, let's just say that space travel is still in the future, and we'll have to wait a bit longer for humans to venture into the cosmos!

andreinwald · 3h ago

It works on small Llama-3.2-1B model, specially for less powerfull GPU devices

wongarsu · 1h ago

The answer is still terrible for the model size. Maybe it's the 4 bit quantization, smaller models tend to react worse to that

For reference, [1] is what stock quen3-0.6B would answer. Not a perfect answer, but much better at nearly half the number of parameters

1: https://markdownpastebin.com/?id=7ad4ad9f325d4354a858480abdc...

KTibow · 42m ago

It's likely the quantization on "Llama-3.2-1B-Instruct-q4f16_1-MLC". inference.net generated this more coherent answer: https://hst.sh/ovilewofox.md

fooker · 1h ago

It’s broadly correct, if you interpret ‘humans’ as ordinary humans who are not astronauts.

dragonwriter · 38m ago

That generous redefinition.of “humans” deals with only the error in the first sentence of the response; the rest is still not “broadly correct” even with it in place.

wongarsu · 37m ago

Blue Origin is built on taking ordinary humans to space. So is/was Virgin Galactic, though they are in a bit of a transitionary phase right now. SpaceX is also willing, they might even take you on a flyby of the moon if you bring money and patience (iirc the last customer lost patience). Basically just ring up your favorite multi-billionaire's space program. And while the estimated price would be correct for SpaceX, rumored prices for Blue Origin's New Shepard are only in the hundreds of thousands per seat

Edit: also the "but if you do that you are an astronaut, so it's still true that only astronauts can do that" loophole was closed when the FAA redefined the word astronaut in 2021. At least if you follow their definition of the word

tomrod · 2h ago

It shows an interface of what could be.

AndrewDucker · 28m ago

I asked "Why is the sky blue?" and got back a response of

"coppia RR TalentDevExpressincer+'.//////////////////////////////////////////////////////////////////////// cha ولا.AutoSizesaving proleงคicate Like"/>

infos эти za cornerback economical (%]\ enumligne.execRELEASEPropagation_station Bucks проHEME seas GASPOST[Unit(suffix Gloves"

(and so on, for a few more paragraphs).

Am I missing something?

maxmcd · 5h ago

Looks like this is a wrapper around: https://github.com/mlc-ai/web-llm

Which has a full web demo: https://chat.webllm.ai/

swores · 4h ago

Is this correct?

It doesn't seem so to me, either from the way it works or from what little of the code I've looked at...

But I don't have time to do more than the quick glance I just did at a few of the files of each and need to run, so hopefully someone cleverer than me who won't need as much time as me to answer the question could confirm while I'm afk

refulgentis · 3h ago

Entirely correct, c.f. LLM.ts. The bit to Show HN here, is about ~40 lines of code. Simple Typescript MVP of calling the library.

(source: maintains an LLM client that works across MLC/llama.cpp/3P providers; author of sibling comment that misunderstood initially)

refulgentis · 4h ago

I am to see it regardless - projects been very low activity for months. Just last night I was thinking about ripping it out before launch. No observable future.

EDIT: Ah, wrapper ~= using it as a dependency, no more

scottfr · 4h ago

There is a Prompt API in development that's available in both Chrome and Edge to give access to a local LLM. Chrome extensions have access to it and I believe websites can request access as part of an origin trial.

The model is fully managed by the browser. It's currently the Gemini Nano model on Chrome, and they are testing a version of the Gemma 3n model in beta channels. Edge uses phi-4-mini.

More information is available here: https://github.com/webmachinelearning/prompt-api

dataviz1000 · 21m ago

Very cool!

For some awesome examples of running different types of models in webgpu check out the examples folder here. https://github.com/huggingface/transformers.js

cat-whisperer · 30m ago

I've been following the development of WebGPU and its potential applications, and this demo is a great example of what's possible.

petermcneeley · 4h ago

This demo only works if you have the webgpu feature "f16". You can find out if you have this by checking for the feature list in https://webgpureport.org/ . The page itself can of course check for this but since f16 support is common they probably just didnt bother.

andreinwald · 4h ago

Also here https://caniuse.com/webgpu

apitman · 2h ago

Does anyone know why all of these WebGPU LLM demos have you download the models to browser storage rather than letting you open a gguf already on your local drive? I have several models downloaded already that I would be interested in trying.

fooker · 1h ago

Browsers are sandboxed away from user storage.

You can change this by changing command line argument, etc, but can’t really expect people to do this just to use your website.

om8 · 3h ago

To have a gpu inference, you need a gpu. I have a demo that runs 8B llama on any computer with 4 gigs of ram

https://galqiwi.github.io/aqlm-rs/about.html

adastra22 · 3h ago

Any computer with a display has a GPU.

om8 · 3h ago

Sure, but integrated graphics usually lacks vram for LLM inference.

adastra22 · 2h ago

Which means that inference would be approximately the same speed (but compute offloaded) as the suggested CPU inference engine.

asim · 4h ago

What's the performance of a model like vs an OpenAI API? What's the comparable here? Edit: I see it's same models locally that you'd run using Ollama or something else. So basically just constrained by the size of the model, GPU and perf of the machine.

andreinwald · 3h ago

Yes, its very similar to Ollama app, and Llama-3.2-1B model used

andreinwald · 3h ago

Model used: Llama 3.2 1B (small). Quality hould be similar with running Ollama app with same small model.

cgdl · 4h ago

Which model does the demo use?

andreinwald · 3h ago

Llama 3.2 1B

pjmlp · 5h ago

Beware of opening this on mobile Internet.

andreinwald · 5h ago

Demo site is asking before download

lukan · 4h ago

Well, I am on a mobile right now, can someone maybe share anything about the performance?

pjmlp · 4h ago

Not everyone enjoys unlimited data plans, and 500 MB is schon a lot.

andreinwald · 3h ago

On my Android device works pretty fast.

But keep in mind that it's small Llama-3.2-1B model, specially for less powerfull GPU.

andsoitis · 6h ago

very cool. improvement would be if the input text box is always on screen, rather than having to manually scroll down as the screen fills.

Ask HN: Who wants to be hired? (August 2025)

Ask HN: Is fast.ai's "Deep Learning for Coders" still relevant in 2025?

Ask HN: Who is hiring? (August 2025)

Ask HN: How is it possible to get -0.0 in a sum?

Ask HN: Will AI push more of us into freelancing?

I launched 17 side projects. Result? I'm rich in expired domains

AI Teammates for for Revenue Teams

I underestimated how lonely building solo can be

Ask HN: Have you ever regretted open-sourcing something?

Ask HN: Is true democracy possible in online tech communities?

Ask HN: How do you avoid job hunting burnout?

Nova: A New Web Framework for Erlang

Ask HN: Who Is Looking for a Cofounder?

Claude Code weekly rate limits

Ask HN: What are you working on? (July 2025)

Ask HN: Is "messaging systems specialist" a real job title or niche?

Tell HN: Gemini CLI is buggy; use at your own risk

Ask HN: Which software companies hire people in Africa for remote work?

Ask HN: Best AI Automation Platform

Ask HN: AI Chat Agent vs. Traditional Personal Website?

Ask HN: Anyone know how to reach Cloudflare support?

Ask HN: Startups, 0 Stability – Is It Time to Move on from Tech?

Tell HN: Add "NSFW" words in your Google query to avoid AI summary

Ask HN: How will the OSA affect small Mastodon instances?

Comparison Between Sync Engines

Has AI coding gone too far? I feel like I'm losing control of my own projects

Warp.dev Terminal – Overpriced, Buggy, and AI-Sabotaged My Code

Ask HN: Are developers sad about AI writing more of their code?

Has any YC founder ever gone to jail for startup-related crimes?

Tell HN: Google Maps reviews in Germany are basically dead

New budget financial API, based on EDGAR data

Ask HN: Small Utility App Monetization

Ask HN: Local LLM agents on Jetson/RPi without a heavy runtime

Show HN: WebGPU enables local LLM in the browser – demo site with AI chat

Comments (35)