Show HN: Whisper at 1.58 bits with custom kernels for edge inference

Comments (2)

coolhanhim · 6h ago

We quantized OpenAI’s Whisper model to 1.58 bits using Quantization-Aware Training (QAT) to run speech recognition on resource-constrained embedded CPUs. Post-Training Quantization(PTQ) was unsuccessful under 4 bits, so we conducted QAT with a replicated dataset. To make inference feasible, we also implemented custom low-bit kernels optimized for edge deployment. This post walks through the technical challenges and how we addressed them for extreme quantization in real-world use.

conjecTech · 4h ago

Very nice work. Training these from scratch is a big undertaking.

- Did you train the encoder & decoder together or separately? It would be nice to have the encoder representation be compatible with the existing whisper implementation since it would mean you could swap your implementation into models where its used as a component, like in the recent Voxtral model. I'd imagine it also might make training a bit faster as well.

- Did you consider training the turbo model as well?

Indian crypto exchange CoinDCX says $44M stolen from reserves (therecord.media)

Show HN: Pure CUDA C Inference for Qwen3 0.6B in One File, No Dependencies (github.com)

Brits can get around Discord age verification with Death Stranding's photo mode (pcgamer.com)

German founders: is it possible to do bookkeeping by yourself?

My Theory: Advertising is a lot like capitalism itself

No CSS Club (nocss.club)

SpecTree: Composable Context Engineering for LLMs (fuzzycomputer.com)

What 30k Citations Taught Us About AI Search (tryzenith.ai)

Another Google Pixel 6a catches fire after battery-nerfing update (arstechnica.com)

Show HN: Kiln – AI Boilerplate with Evals, Fine-Tuning, Synthetic Data, and Git (github.com)

Tesla Readies a Taxi Service in San Francisco–But Not with Robotaxis (wired.com)

A Unified Frontier in Neuroscience, AI and Neuromorphic Systems (arxiv.org)

How to Future-Proof Your Work and Life with a Personal OS (thenextshift.beehiiv.com)

Application factory pattern in Jupyter notebooks (world.hey.com)

Different Clocks (ianto-cannon.github.io)

'I witnessed war crimes' in Gaza (bbc.com)

Single-Qubit Gates with Errors at the 10⁻⁷ Level (journals.aps.org)

Show HN:AI Agent with 20+ models for PDF chat and annotation (twitter.com)

Bevy in Production: Building Modeling at Metabuild [video] (youtube.com)

Writing a Web Server in Pure Bash [video] (youtube.com)

Shrinkle – Shrink words, find hidden phrase (shrinkle.org)

The U.S.-China trade war supercharged Vietnam's chip industry (restofworld.org)

Scott Alexander Is Smarter Than Me. Should I Steal His Beliefs? (starlog.substack.com)

Cyberattack of Russian airline Aeroflot causes chaos – 100+ flights cancelled (techcrunch.com)

Ultrascript = Ts and Go (github.com)

Confluent Developer - Courses (developer.confluent.io)

Be a guest on the first AI-hosted podcast (ainterview.space)

What Does Consulting Do? (nber.org)

'In the Shadow of the Moon' Film Review (hollywoodreporter.com)

Show HN: I built an API for extracting YouTube summaries, transcripts and stats (socialkit.dev)

The Realities of Semantic Search (deepnoodle.ai)

Show HN: I built an AI agent that schedules meetings from Gmail and Slack (meetalphie.com)

Plex: Perturbation-Free Local Explanations for LLM-Based Text Classification (arxiv.org)

Show HN: Typogram Studio – like Figma but for typography (typogram.co)

PagerDuty exploring potential sale after receiving buyer interest (reuters.com)

Show HN: Mock Interviews for Software Engineers (mockinterviews.dev)

Transcribe speech 100x faster and 100x cheaper with open models (modal.com)

Bet on or Against the Unicorns (bloomberg.com)

Doco – Cursor for Microsoft Word (trydoco.com)

Is SoftBank still backing OpenAI? (wheresyoured.at)

Tesla signs $16.5B deal with Samsung to make AI chips (techcrunch.com)

You can turn ANY AI SDR into a hacker

The Weakest Link Fallacy (cs.ru.nl)

A quick note to our queer members (blog.nearlyfreespeech.net)

Show HN: C2hat – Cross-Domain Chat (chromewebstore.google.com)

Claude Code new limits – Important updates to your Max account usage limits

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (twitter.com)

Parallelizing AI Coding Agents (ainativedev.io)

How many NYT spelling bees are left? (gauthamsk.substack.com)

Claude Code weekly rate limits

Show HN: Whisper at 1.58 bits with custom kernels for edge inference

Comments (2)