Show HN: AsianMOM – WebGPU Vision-LLM app that roasts you like ur mom in-browser (asianmom.kuber.studio)

The timing on this is a little surprising given llama.cpp just finally got a (hopefully) stable vision feature merged into main: https://simonwillison.net/2025/May/10/llama-cpp-vision/

Presumably Ollama had been working on this for quite a while already - it sounds like they've broken their initial dependency on llama.cpp. Being in charge of their own destiny makes a lot of sense.

lolinder · 4h ago

Do you know what exactly the difference is with either of these projects adding multimodal support? Both have supported LLaVA for a long time. Did that require special casing that is no longer required?

I'd hoped to see this mentioned in TFA, but it kind of acts like multimodal is totally new to Ollama, which it isn't.

simonw · 3h ago

There's a pretty clear explanation of the llama.cpp history here: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd...

I don't fully understand Ollama's timeline and strategy yet.

refulgentis · 3h ago

It's a turducken of crap from everyone but ngxson and Hugging Face and llama.cpp in this situation.

llama.cpp did have multimodal, I've been maintaining an integration for many moons now. (Feb 2024? Original LLaVa through Gemma 3)

However, this was not for mere mortals. It was not documented and had gotten unwieldy, to say the least.

ngxson (HF employee) did a ton of work to get gemma3 support in, and had to do it in a separate binary. They dove in and landed a refactored backbone that is presumably more maintainable and on track to be in what I think of as the real Ollama, llama.cpp's server binary.

As you well note, Ollama is Ollamaing - I joked, once, that the median llama.cpp contribution from Ollama is a driveby GitHub comment asking when a feature will land in llama-server, so it can be copy-pasted into Ollama.

It's really sort of depressing to me because I'm just one dude, it really wasn't that hard to support it (it's one of a gajillion things I have to do, I'd estimate 2 SWE-weeks at 10 YOE, 1.5 SWE-days for every model release), and it's hard to get attention for detailed work in this space with how much everyone exaggerates and rushes to PR.

EDIT: Coming back after reading the blog post, and I'm 10x as frustrated. "Support thinking / reasoning; Tool calling with streaming responses" --- this is table stakes stuff that was possible eons ago.

I don't see any sign of them doing anything specific in any of the code they link, the whole thing reads like someone carefully worked with an LLM to present a maximalist technical-sounding version of the llama.cpp stuff and frame it as if they worked with these companies and built their own thing. (note the very careful wording on this, e.g. in the footer the companies are thanked for releasing the models)

I think it's great that they have a nice UX that helps people run llama.cpp locally without compiling, but it's hard for me to think of a project I've been more by turned off by in my 37 years on this rock.

Patrick_Devine · 2h ago

I worked on the text portion of gemma3 (as well as gemma2) for the Ollama engine, and worked directly with the Gemma team at Google on the implementation. I didn't base the implementation off of the llama.cpp implementation which was done in parallel. We did our implementation in golang, and llama.cpp did theirs in C++. There was no "copy-and-pasting" as you are implying, although I do think collaborating together on these new models would help us get them out the door faster. I am really appreciative of Georgi catching a few things we got wrong in our implementation.

nolist_policy · 1h ago

For one Ollama supports interleaved sliding window attention for Gemma 3 while llama.cpp doesn't.[0] iSWA reduces kv cache size to 1/6.

Ollama is written in golang so of course they can not meaningfully contribute that back to llama.cpp.

[0] https://github.com/ggml-org/llama.cpp/issues/12637

noodletheworld · 1h ago

What nonsense is this?

Where do you imagine ggml is from?

> The llama.cpp project is the main playground for developing new features for the ggml library

-> https://github.com/ollama/ollama/tree/27da2cddc514208f4e2353...

(Hint: If you think they only write go in ollama, look at the commit history of that folder)

nolist_policy · 55m ago

llama.cpp clearly does not support iSWA: https://github.com/ggml-org/llama.cpp/issues/12637

Ollama does, please try it.

rvz · 2h ago

> As you well note, Ollama is Ollamaing - I joked, once, that their median llama.cpp contribution from Ollama is asking when a feature will land in llama-server so it can be copy-pasted into Ollama.

Other than being a nice wrapper around llama.cpp, are there any meaningful improvements that they came up with that landed in llama.cpp?

I guess in this case with the introduction of libmtmd (for multi-modal support in llama.cpp) Ollama waited and did a git pull and now multi-modal + better vision support was here and no proper credit was given.

Yes, they had vision support via LLaVa models but it wasn't that great.

refulgentis · 2h ago

There's been no noteworthy contributions, I'd honestly wouldn't be surprised to hear there's 0 contributions.

Well it's even sillier than that: I didn't realize that the timeline in the llama.cpp link was humble and matched my memory: it was the test binaries that changed. i.e. the API was refactored a bit and such but its not anything new under the sun. Also the llama.cpp they have has tool and thinking support. shrugs

The tooling was called llava but that's just because it was the first model -- multimodal models are/were consistently supported ~instantly, it was just your calls into llama.cpp needed to manage that,a nd they still do! - its just there's been some cleanup so there isn't one test binary for every model.

It's sillier than that in it wasn't even "multi-modal + better vision support was here" it was "oh we should do that fr if llama.cpp is"

On a more positive note, the big contributor I appreciate in that vein is Kobold contributed a ton of Vulkan work IIUC.

And another round of applause for ochafik: idk if this gentleman from Google is doing this in his spare time or fulltime for Google, but they have done an absolutely stunning amount of work to make tool calls and thinking systematically approachable, even building a header-only Jinja parser implementation and designing a way to systematize "blessed" overrides of the rushed silly templates that are inserted into models. Really important work IMHO, tool calls are what make AI automated and having open source being able to step up here significantly means you can have legit Sonnet-like agency in Gemma 3 12B, even Phi 4 3.8B to an extent.

tommica · 3h ago

Sidetangent: why is ollama frowned upon by some people? I've never really got any other explanation than "you should run llama.CPP yourself"

lhl · 3h ago

Here's some discussion here: https://www.reddit.com/r/LocalLLaMA/comments/1jzocoo/finally...

Ollama appears to not properly credit llama.cpp: https://github.com/ollama/ollama/issues/3185 - this is a long-standing issue that hasn't been addressed.

This seems to have leaked into other projects where even when llama.cpp is being used directly, it's being credited to Ollama: https://github.com/ggml-org/llama.cpp/pull/12896

Ollama doesn't contributed to upstream (that's fine, they're not obligated to), but it's a bit weird that one of the devs claimed to have and uh, not really: https://www.reddit.com/r/LocalLLaMA/comments/1k4m3az/here_is... - that being said they seem to maintain their own fork so anyone could cherry pick stuff it they wanted to: https://github.com/ollama/ollama/commits/main/llama/llama.cp...

octocop · 1h ago

For me it's because ollama is just a front-end for llama.cpp, but the ollama folks rarely acknowledge that.

speedgoose · 1h ago

To me, Ollama is a bit the Docker of LLMs. The user experience is inspired and the model file syntax is also inspired by the Dockerfile syntax. [0]

In the early days of Docker, we had the debate of Docker vs LXC. At the time, Docker was mostly a wrapper over LXC and people were dismissing the great user experience improvements of Docker.

I agree however that the lack of acknowledgement to llama.cpp for a long time has been problematic. They acknowledge the project now.

[0]: https://github.com/ollama/ollama/blob/main/docs/modelfile.md

gavmor · 3h ago

Here's a recent thread on Ollama hate from r/localLLaMa: https://www.reddit.com/r/LocalLLaMA/comments/1kg20mu/so_why_...

buyucu · 1h ago

I abandoned Ollama because Ollama does not support Vulkan: https://news.ycombinator.com/item?id=42886680

You have to support Vulkan if you care about consumer hardware. Ollama devs clearly don't.

nicman23 · 3h ago

cpp was just faster and with more features that is all

cwillu · 3h ago

cpp is the thing doing all the heavy lifting, ollama is just a library wrapper.

It'd be like if handbrake tried to pretend that they implemented all the video processing work, when it's dependent on libffmpeg for all of that.

ics · 5h ago

I'll have to try this later but appreciate that the article gets straight to the point with practical examples and then the details.

newusertoday · 4h ago

why does ollama engine has to change to support new models? every time a new model comes ollama has to be upgraded.

nkwaml · 4h ago

Because of things like this: https://github.com/ggml-org/llama.cpp/issues/12637

Where "supporting" a model doesn't mean what you think it means for cpp

Between that and the long saga with vision models having only partial support, with a CLI tool, and no llama-server support (they only fixed all that very recently) the fact of the matter is that ollama is moving faster and implementing what people want before lama.cpp now

And it will finally shut down all the people who kept copy pasting the same criticism of ollama "it's just a llama.cpp wrapper why are you not using cpp instead"

Maxious · 4h ago

There's also some interpersonal conflict in llama.cpp that's hampering other bug fixes https://github.com/ikawrakow/ik_llama.cpp/pull/400

w8nC · 3h ago

Now it’s just a wrapper around hosted APIs.

Went with my own wrapper around llama.cpp and stable-diffusion.cpp with optional prompting hosted if I don’t like the result so much, but it makes a good start for hosted to improve on.

Also obfuscates any requests sent to hosted, cause why feed them insight to my use case when I just want to double check algorithmic choices of local AI? The ground truth relationship func names and variable names imply is my little secret

Patrick_Devine · 3h ago

Wait, what hosted APIs is Ollama wrapping?

buyucu · 1h ago

you mean llama.cpp has a new engine for multimodal models :)

ollama folks should at least make an effort to give llama.cpp some credit.

Show HN: A free AI risk assessment tool for LLM applications (gettavo.com)

Show HN: Min.js style compression of tech docs for LLM context (github.com)

Show HN: Online Compass (onlinecompass.in)

Show HN: Real-Time Gaussian Splatting (github.com)

Show HN: Undetectag, track stolen items with AirTag (undetectag.com)

Show HN: Easel – Code multiplayer games like singleplayer (easel.games)

Show HN: Muscle-Mem, a behavior cache for AI agents (github.com)

Show HN: I made a platform to debug Puppeteer (JS) crashes visually (buglesstack.com)

Show HN: I’ve built an IoT device to let my family know when I’m in a meeting (nullonerror.org)

Show HN: Heygem AI – An Open Source, Free Alternative to Heygen AI (github.com)

Show HN: Convert JSON Schema to SQL DDL (github.com)

Show HN: HelixDB – Open-source vector-graph database for AI applications (Rust) (github.com)

Show HN: Lumier – Run macOS VMs in a Docker (github.com)

Show HN: Semantic Calculator (king-man+woman=?) (calc.datova.ai)

Show HN: Kudos.wiki – Discover the best movies on Wikipedia (kudos.wiki)

Show HN: Airweave – Let agents search any app (github.com)

Show HN: CSV GB+ by Data.olllo – Open and Process CSVs Locally (apps.microsoft.com)

Show HN: AsianMOM – WebGPU Vision-LLM app that roasts you like ur mom in-browser (asianmom.kuber.studio)

Show HN: Basecoat – shadcn/UI components, no React required

Show HN: A5 (github.com)

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

Show HN: Turn OpenAPI documents to LLM friendly Markdown (github.com)

Show HN: NewWord – AI powered personal vocabulary collector (newword.app)

Show HN: Cogitator – A Python Toolkit for Chain-of-Thought Prompting (github.com)

Show HN: Lumoar – Free SOC 2 tool for SaaS startups (lumoar.com)

Show HN: Doxxer – CLI tool for dynamic SemVer versioning using tags (github.com)

Show HN: Wordleish Spinoff for Economics (economyguesser.com)

Show HN: acmsg (automated commit message generator) (github.com)

Show HN: Playmaker – Vibe marketing, powered by deep-research (tryplaymaker.io)

Show HN: Mycelium (github.com)

Show HN: CLI that spots fake GitHub stars, risky dependencies and licence traps (github.com)

Show HN: Kasimba – Simple macOS app that converts Windows paths to SMB addresses (github.com)

Show HN: Internet Archive File Browser (iafb.online)

Show HN: Codigo – The Programming Language Repository (codigolangs.com)

Show HN: AG-UI Protocol – Bring Agents into Frontend Applications (github.com)

Show HN: GlassFlow – OSS streaming dedup and joins from Kafka to ClickHouse (github.com)

Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass (github.com)

Show HN: Booktranslate.ai – Recursive AI Translation Engine for Full Books (booktranslate.ai)

Show HN: Code Claude Code (github.com)

Show HN: You know browser-use, here's human-use for LLMs to get info from humans (github.com)

Show HN: Sauron MCP – one MCP to rule them all (sauron-mcp.com)

Show HN: Blog comments, nice looking, open source – Talkyard (blog-comments.talkyard.io)

Show HN: Xenolab – Rasp Pi monitor for my pet carnivourus plants (github.com)

Show HN: Put macros.menu/ in front of any restaurant menu URL (macros.menu)

Show HN: Family Folder – Help your family remember everything, organise anything (familyfolder.com)

Show HN: The missing inbox for GitHub pull requests (github.com)

Show HN: Build a free linktree alternative that skips in-app-browsers (link-it.bio)

Show HN: LTXV 13B Distilled – Generate 5s Videos in Under 10s (lightricks.com)

Show HN: MMORPG prototype inspired by World of Warcraft (github.com)

Show HN: I made a site for finding people to build cool tech projects with (guildorigin.com)

Ollama's new engine for multimodal models

Comments (25)