Large-Scale Research with Historical Newspapers: A Turning Point Through Gen AI (dhlab.hypotheses.org)

My take: tensorflow lite + mediapipe was great but google really neglected it in the last 3 years or so. Mediapipe didn't have many meaningful update in last 3 years. A lot of models today are outdated or slow. TF Lite supported NPU (like apple ANU) but mediapipe never did. They had also too much mess with different branding: MLKit, Firebase ML, TF lite, LiteRT.

This days probably better to stick with onnxruntime via hugging face transformers or transformers.js library or wait until executorch mature. I haven't seen any SOTA model officially released having official port to tensorflow lite / liteRT for a long time: SAM2, EfficientSAM, EdgeSAM, DFINE, DEIM, Whisper, Lite-Whisper, Kokoro, DepthAnythingV2 - everything is pytorch by default but with still big communities for ONNX and MLX

salamo · 22h ago

Really happy to see additional solutions for on-device ML.

That said, I probably wouldn't use this unless mine was one of the specific use cases supported[0]. I have no idea how hard it would be to add a new model supporting arbitrary inputs and outputs.

For running inference cross-device I have used Onnx, which is low-level enough to support whatever weights I need. For a good number of tasks you can also use transformers.js which wraps onnx and handles things like decoding (unless you really enjoy implementing beam search on your own). I believe an equivalent link to the above would be [1] which is just much more comprehensive.

[0] https://ai.google.dev/edge/mediapipe/solutions/guide

[1] https://github.com/huggingface/transformers.js-examples

arbayi · 1d ago

https://github.com/google-ai-edge/gallery

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

ricardobeat · 1d ago

This is a repackaging of TensorFlow Lite + MediaPipe under a new “brand”.

echelon · 1d ago

The same stuff that powers this?

https://3d.kalidoface.com/

It's pretty impressive that this runs on-device. It's better than a lot of commercial mocap offerings.

AND this was marked deprecated/unsupported over 3 years ago despite the fact it's a pretty mature solution.

Google has been sleeping on their tech or not evangelizing it enough.

6gvONxR4sf7o · 20h ago

Anybody have any experience with this? I just spend a while contorting a custom pytorch model to get it to export to coreml and it was full of this that and the other not being supported, or segfaulting, and all sorts of silly errors. I'd love if someone could say this isn't full of sharp edges too.

smeej · 19h ago

I got it all set up and tested out Gemma3 1B on a Pixel 8a. That only took a few minutes, which was nice.

But it was garbage. It barely parsed the question, didn't even attempt to answer it, and replied in what was barely serviceable English. All I asked was how it was small enough to run locally on my phone. It was bad enough for me to abandon the model entirely, which is saying a lot, because I feel like I have pretty low expectations for AI work in the first place.

DrSiemer · 5h ago

Why would you ask a 1B model anything? Those are only useful for rephrasing output at best.

throwaway314155 · 16h ago

> All I asked was how it was small enough to run locally on my phone

Bit off-topic, but did you expect to see a real or honest answer about itself? I see many people under the impression that models know information about themselves that isn't in the system prompt. Couldn't be further from the truth. In face, those questions specifically lead to hallucinations more often resulting in an overconfident assertion with a "reasonable" answer.

The information the model knows (offline - no tools allowed) stops weeks if not months if not years prior to when the model is done training. There is _zero_ information about its inception, how it works, or anything similar in its weights.

Sorry, this is mostly directed at the masses - not you.

smeej · 9h ago

Not really, but I did expect an answer, or at least a non-answer, that showed it understood the question, and that an answer was expected.

yeldarb · 1d ago

Is this a new product or a marketing page tying together a bunch of the existing MediaPipe stuff into a narrative?

Got really excited then realized I couldn’t figure out what “Google AI Edge” actually _is_.

Edit: I think it’s largely a rebrand of this from a couple years ago: https://developers.googleblog.com/en/introducing-mediapipe-s...

hatmanstack · 1d ago

Played with this a bit and from what I gathered it's purely a re-arch of pytorch models to work as .tflite models, at least that's what I was using it for. It worked well with a custom finbert model with negligible size reduction. It converted a quantized version but outputs were not close. From what I remember of the docs it was created for standard pytorch models, like "torchvision.models", so maybe with those you'd have better luck. Granted, this was all ~12 months ago, sounds like I might have dodged a pack of Raptors?

davedx · 1d ago

More information here: https://ai.google.dev/edge/mediapipe/solutions/guide

(It seems to be open source: https://github.com/google-ai-edge/mediapipe)

I think this is a unified way of deploying AI models that actually run on-device ("edge"). I guess a sort of "JavaScript of AI stacks"? I wonder who the target audience is for this technology?

wongarsu · 1d ago

Some of the mediapipe models are nice, but mediapipe has been around forever (or 2019). It has always been about running AI on the edge, back when the exciting frontier of AI were visual tasks.

For stuff like face tracking it's still useful, but for some other tasks like image recognition the world has changed drastically

babl-yc · 1d ago

I would say the target audience is anyone deploying ML models cross-platform, specifically ones that would require supporting code beyond the TFLite runtime to make it work.

LLMs and computer vision tasks are good examples of this.

For example, a hand-gesture recognizer might require: - Pre-processing of input image to certain color space + image size - Copy of image to GPU memory - Run of object detection TFLite model to detect hand - Resize of output image - Run of gesture recognition TFLite model to detect gesture - Post processing of gesture output to something useful

Shipping this to iOS+Android requires a lot of code beyond executing TFLite models.

The Google Mediapipe approach is to package this graph pipeline, and shared processing "nodes" into a single C++ library where you can pick and choose what you need and re-use operations across tasks. The library also compiles cross-platform and the supporting tasks can offer GPU acceleration options.

One internal debate Google likely had was whether it was best to extend TFLite runtime with these features, or to build a separate library (Mediapipe). TFLite already supports custom compile options with additional operations.

My guess is they thought it was best to keep TFLite focused on "tensor based computation" tasks and offload broader operations like LLM and image processing into a separate library.

danielb123 · 1d ago

Years behind what is already available through frameworks like CoreML and TimyML. Plus Google has to first prove they won't kill the product to meet the next quarterly investor expectations.

babl-yc · 1d ago

This isn't really true. They are different offerings.

CoreML is specific to the Apple ecosystem and lets you convert a PyTorch model to a CoreML .mlmodel that will run with acceleration on iOS/Mac.

Google Mediapipe is a giant C++ library for running ML flows on any device (iOS/Android/Web). It includes Tensorflow Lite (now LiteRT) but is also a graph processor that helps with common ML preprocessing tasks like image resizing, annotating, etc.

Google killing products early is a good meme but Mediapipe is open source so you can at least credit them with that. https://github.com/google-ai-edge/mediapipe

I used a fork of Mediapipe for a contract iOS/Android computer vision product and it was very complex but worked well. A cross-platform solution would not have been possible with CoreML.

NetOpWibby · 1d ago

I wish MediaPipe was good for facial AR but in my experience it’s lacking.

mattnewton · 1d ago

Tensorflow light has been battle tested on literal billions of devices over the years and this looks like a rebrand/extension of that plus media pipe, one of the biggest users of it. Google has been serious about on device ML for over 5 years now, I don't think they are going to kill this. Confusingly rebrand it maybe :)

elpakal · 1d ago

The generative AI piece is not available in Apple ecosystems right? I think that would be huge and I really hope Apple gives us something similar. And I gotta say the chat piece of this seems really useful too.

Also where the f is Swift Assist already

coderatlarge · 8h ago

is it possible to go to the iPhone app store and get an app that is essentially an ollama like model downloader and launcher?

spacecadet · 1d ago

Its just a rebranded tensorflow lite, Ive been using that on edge devices since 2019... CoreML is great too!

bigyabai · 1d ago

My brother in Christ, CoreML only exists because Apple saw Tensorflow and wanted the featureset without cooperating on a common standard. TF was like 2 years old (and fairly successful) by the point CoreML was announced. To this day CoreML is little more than a proprietary BLAS interface, with nearly zero industry buy-in.

Terrifying what being an iOS dev does to a feller.

init0 · 17h ago

This can be done with WebLLM, no?

stanleykm · 1d ago

i really wish people who make edge inference libraries like this would quit rebranding them every year and just build the damn things to be fast and small and consistently updated.

bigyabai · 1d ago

ONNX exists but since they don't change their name very often not a whole lot of people know about it.

pzo · 16h ago

ONNXRuntime is actually quite popular mostly because Hugging Face transformers - many people just don't know they using it under the hood. What is missing is transformers native so you can easily deploy it not only on desktops and servers. Transformers.js is some kind of attempt - can deploy on Web and React Native.

roflcopter69 · 22h ago

Genuine question, why should I use this to deploy models on the edge instead of executorch? https://github.com/pytorch/executorch

For context, I get to choose the tech stack for a greenfield project. I think that executor h, which belongs to the pytorch ecosystem, will have a way more predictable future than anything Google does, so I currently consider executorch more.

6gvONxR4sf7o · 20h ago

For one thing, executorch is currently full of sharp edges. No idea about this, but i had bad experience with ET. If i were starting over, I might start from torch.fx and automate from there. fx is stable and should be around for a while.

zb3 · 1d ago

So can we run Gemma 3n on linux? So much fluff yet this is unclear to me.

saratogacx · 1d ago

In the model's community section Goog confirms they're working on a gguf version so you can host it like most other models.

https://huggingface.co/google/gemma-3n-E4B-it-litert-preview...

quaintdev · 1d ago

As far as I know it's based on Gemini nano architecture which exclusively runs on Android and Chrome. So I'm guessing you can't run it on Linux outside Chrome.

synergy20 · 19h ago

Can this run on customized embedded devices? or just for phones.

synergy20 · 16h ago

it does support python and web, and runs on raspberry pi.

suilk · 19h ago

How about the MNN engine?

rvnx · 1d ago

Make your own opinion here: https://mediapipe-studio.webapps.google.com/studio/demo/imag...

Go to this page using your mobile phone.

I am apparently a doormat or a seatbelt.

It seems to be a rebranded failure. At Google you get promoted for product launches because of the OKRs system and more rarely for maintenance.

tfsh · 1d ago

Perhaps you missed the associated documentation? This is a classification tool which requires input labels "uses an EfficientNet architecture and was trained using ImageNet to recognize 1,000 classes, such as trees, animals, food, vehicles".

The full list [1] doesn't seem to include a human. You can tweak the score threshold to reduce false positives.

1: https://storage.googleapis.com/mediapipe-tasks/image_classif...

rvnx · 1d ago

You're right about human, that would explain it, but still I find it surprising that such "common item" as a human is not there.

Did you also try on items from the list ?

If there is a match (and this is not frequent), to me it's still very low confidence (like noise or luck).

It seems to be a repacking of https://blog.tensorflow.org/2020/03/higher-accuracy-on-visio...

So an old release from 5 years ago (like very long time in AI-world), and AFAIK it has been superseded by YOLO-NAS and other models. MediaPipe feels really old tool, except for some specific subtasks like face tracking.

And as a side-note, the OKR-system at Google is a very serious thing, there are lot of people internally gaming the system, and that could explain why it is a "new" launch, instead of a rather disappointing rebrand of the 2020-version.

I'd rather recommend building on more modern tools, such as: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Ins... (runs on iPhone with < 1GB of memory)

bigyabai · 1d ago

> And as a side-note, the OKR-system at Google is a very serious thing, there are lot of people internally gaming the system.

So you came here to offer a knee-jerk assessment of an AI runtime and blamed the failure on OKRs. Then somebody points out that your use-case isn't covered by the model, and you're looping back around to the OKR topic again. To assess an AI inference tool.

Why would you even bother hitting reply on this post if you don't want to talk about the actual topic being discussed? "Agile bad" is not a constructive or novel comment.

3D printed models help blind and low-vision students learn about their world (abc.net.au)

India and Pakistan's Air Battle Is Over. Their Water War Has Begun (nytimes.com)

Pattern Matching 20 Habits of Exceptional Startups (tylerhogge.com)

Large-Scale Research with Historical Newspapers: A Turning Point Through Gen AI (dhlab.hypotheses.org)

Adult sports leagues took over your city (thehustle.co)

Small team creates system that can fix thousands of different mutations at once (twitter.com)

Running Qwen3:30B MoE on an RTX 3070 laptop with Ollama (blog.kekepower.com)

Show HN: An upgrade game I made in my spare time (themarelle.itch.io)

Show HN: OwnPlan – AI todo app that breaks down big goals into steps (apps.apple.com)

Show HN: We built a tool that makes Cursor as easy to use as V0 or Loveable (github.com)

Exploring the Legacy of Sun Sparc (osfom.org)

Gukesh Beats Carlsen from Losing Position (chess.com)

MonsterUI: Python library for building front end UIs quickly in FastHTML apps (answer.ai)

Best AI Agent for Recruiting (fastr.ai)

Crunchy Data Joins Snowflake (crunchydata.com)

3D CAD from Images, Text, and Point Clouds with RLVR (arxiv.org)

Next Launch Countdown Timer (supercluster.com)

Analysis: Trump's "Gold Standard Science" is already wearing thin (arstechnica.com)

I couldn't find a co-founder, so I built an AI co-founder (aicofounder.com)

PHP Tips and Tricks (php-tips.readthedocs.io)

Snowflake to acquire Postgres database startup Crunchy Data (techcrunch.com)

Underwater kelp forests are losing a turf war (nature.com)

Humpback Whales See Less Than We Thought (uncw.edu)

Ask HN: How to get ChatGPT perform as well as Claude?

The Colorado River is running low. The picture looks even worse underground. (washingtonpost.com)

Your chatbot friend might be messing with your mind (washingtonpost.com)

Public/protected/private is an unnecessary feature (catern.com)

Ask HN: Can you jailbreak my application? (swayblocks.com)

Ask HN: How do startups create fancy websites?

Ask HN: Negotiating sale of property to a datacenter company

Show HN: Expressio – AI-Automated Internationalization (github.com)

French lawmakers unanimously back posthumous promotion for Captain Dreyfus (lemonde.fr)

Show HN: PostPal – AI-powered social media content scheduler with brand voice (postpal.live)

Open-Source TPDE Can Compile Code 10-20x Faster Than LLVM (phoronix.com)

Updates to Windows for the Digital Markets Act (blogs.windows.com)

Ask HN: Where do you store SQL and does it cut down "can you pull this?" pings?

ChatGPT's Em Dash Habit: A Training Artifact or Design Choice? (community.openai.com)

D3D12 Cooperative Vector (devblogs.microsoft.com)

ZeroEntropy (zeroentropy.dev)

Distributed Cache for AWS S3 (clickhouse.com)

Ask HN: Is offering a one-time payment stupid?

Automate your Ubuntu/Debian dev environment setup with auto conf (github.com)

Will AI Break the Economic Model We Know? (rishimodha.substack.com)

Snowflake to Buy Crunchy Data for $250M (wsj.com)

The Twelve Orders of Soil Taxonomy (nrcs.usda.gov)

Ask HN: Should science ensure NDEs are pleasant without damaging the brain?

Steroids, Science, and $1M Payouts: Inside the Enhanced Games (huddleup.substack.com)

KaOS Linux 2025.05 Officially Qt5 Free (linux-magazine.com)

Show HN: I build one absurd web project every month (absurd.website)

Insufficiently festive cookies and other stories of ridiculous micromanagement (askamanager.org)

Google AI Edge – On-device cross-platform AI deployment

Comments (39)