The Accuracy of On-Device LLMs

Comments (1)

aazo11 · 8h ago

I tested on-device LMs (Gemma, DeepSeek) across prompt cleanup, PII redaction, math, and general knowledge on my M2 Max laptop using LM Studio + DSPy.

Some observations

- Gemma-3 is the best model for on-device inference - 1B models look fine at first but break under benchmarking - 4B can handle simple rewriting and PII redaction. It also did math reasoning surprisingly well. - General knowledge Q&A does not work with a local model. This might work with a RAG pipeline or additional tools

I plan on training and fine-tuning 1B models to see if I can build high accuracy task specific models under 1GB in the future.

Kysely as the Single Source of Truth (github.com)

TypeID in Lua (push.cx)

Math and the Museum (mathenchant.wordpress.com)

The SpaceX genie is out of the bottle (chrisprophet.substack.com)

KumoRFM: Sub-second predictions better than classic predictive models [pdf] (kumo.ai)

Show HN: I built Vercel for your VPS, saving you money (pulseup.io)

Display any CSV file as a searchable, filterable, pretty HTML table (github.com)

Stephen Hawking's Final Theory About Our Universe (2018) (sciencealert.com)

One of the Most Popular Games on the Planet (kotaku.com)

AI breakthroughs that stunned number theorists worldwide – Professor Yang-Hue Hi (youtube.com)

Ask HN: How did you get VC funding?

RFK Jr. calls WHO "moribund" amid US withdrawal; China pledges to give $500M (arstechnica.com)

Securing AI Agent Toolchains with OIDC and OIDC-A (subramanya.ai)

ESLint v9.0.0: A Retrospective (eslint.org)

Question no one ever asked GPT (blog.bkryza.com)

SynthID Detector: Identify content made with Google's AI tools (blog.google)

Anyone Remember Alan Turing? (theguardian.com)

Rock 4D with RK3576, PCIe Gen2 X1, GbE, and Poe Support Starts at $30 (linuxgizmos.com)

Collatz's Ant (gbragafibra.github.io)

We Want Answers: Why the Web Is Collapsing into Agents (medium.com)

How to break or continue from a lambda loop? (vittorioromeo.com)

Framework Desktop Deep Dive: Power Supply (frame.work)

"Microsoft has simply given us no other option," Signal blocks Windows Recall (arstechnica.com)

Startup enables 100-year bridges with corrosion-resistant steel (news.mit.edu)

Fast Allocations in Ruby 3.5 (railsatscale.com)

Ask HN: Has anyone been able to overcome crippling executive dysfunction?

Tales from Mainframe Modernization (oppi.li)

Current Continuation E1: Ranjit Jhala (UCSD) [video] (youtube.com)

Geometry doesn't need a spreadsheet. It never did (gist.github.com)

Making OutRun's Magical Sound Shower for the Amiga with a Spreadsheet [video] (youtube.com)

Elon Musk: From Genius to Joke? LLM-Powered Poll Reveals Public Turnaround (blog.forecastos.com)

Apologetic Amazon belatedly refunds years-old returns (bloomberg.com)

XAI API Supports Real-Time Search from X/News/Web – Free Till June 5th (twitter.com)

Who do you want to be? (2013) (kyledaigle.com)

A Swift Cloud (culturedcode.com)

Playstation Executive Jade Raymond Leaves Studio She Founded (bloomberg.com)

Using Codex as a Task Inbox (rafaelquintanilha.com)

Barnes and Noble: you had one job. This is why Amazon eats your lunch

Brian Eno denounces Microsoft for its ties to Israeli government (theverge.com)

Show HN: Ninja.ai – Create and Install MCP Servers with One Click (ninja.ai)

Modular, LLM-Flexible AI Agent Builder for Omnichannel Telecom (cloud-net.ai)

Bitcoin hits new all time high near $110K (thestreet.com)

Apple Filing Protocol will soon disappear from macOS (appleinsider.com)

Why did medieval readers kiss, smudge and deface their books? (news.berkeley.edu)

Temperature-controlled switch activates sperm, is key to fertility (medicine.washu.edu)

Ask HN: Engineering Statics and Dynamics book recommendation

Chris Maple, creator of the iconic Pokémon logo, reveals his early design drafts (ign.com)

Should I Block ICMP? (shouldiblockicmp.com)

Microsoft AI Exec Accidentally Spills Walmart's AI Plans (autogpt.net)

Microsoft employee shouts over Satya Nadella's keynote to protest (fortune.com)

The Accuracy of On-Device LLMs

Comments (1)