Senko – Very Fast Speaker Diarization

Comments (1)

hamza_q_ · 5h ago

1 hour of audio processed in 5 seconds (RTX 4090, Ryzen 9 7950X). ~17x faster than Pyannote 3.1.

On M3 MacBook Air, 1 hour in 23.5 seconds (~14x faster).

This is a custom speaker diarization pipeline I've developed; it's a modified version of the pipeline found in the excellent 3D-Speaker project by Alibaba Research.

My optimizations/modifications were the following:

- changed VAD model

- multi-threaded Fbank feature extraction

- batched inference of CAM++ embeddings model

- clustering is accelerated by RAPIDS, when NVIDIA GPU available

Optimizations aside, massive credit needs to be given to the CAM++ speaker embeddings model, whose efficiency is where the majority of the speed comes from.

This pipeline powers the Zanshin media player, which is an attempt at a usable integration of diarization in a media player. Check it out here: https://zanshin.sh And discuss here: https://news.ycombinator.com/item?id=45104866

Let me know what you think! Were you also frustrated by how slow speaker diarization is? Does Senko's speed unlock new use cases for you? Cheers, everyone.

Have foreign tourists avoided America this year? (economist.com)

Why boomers have more money than everyone else (bloomberg.com)

Show HN: Slack-explorer-MCP – Let AI find historical context in Slack (github.com)

How AI Is Changing Bookkeeping (ledgeriq.ai)

The maths you need to start understanding LLMs (gilesthomas.com)

Ask HN: Short term housing for founders / entrepreneurs in the Bay Area / SF?

US Manufacturing Activity Contracted in August for a Sixth Month (bloomberg.com)

Show HN: AI Agent for Game UI (godmodeai.co)

EVs reduce climate pollution, but by how much? New U-M research has the answer (news.umich.edu)

The Trust Quotient (TQ) (kk.org)

TextJam (textjam.com)

The case against Almost Always auto in C++ (gist.github.com)

This blog is running on a recycled Google Pixel 5 (blog.ctms.me)

The Millionaire Who Left Wall Street to Become a Paramedic (nytimes.com)

Spec-Driven Development with A (github.blog)

What Every Data Scientist Should Know About Graph Transformers (unite.ai)

Google, Apple, and Mozilla Win in the Antitrust Case Google Lost (spyglass.org)

Views from onboard Starship's tenth flight test (twitter.com)

Google says Gmail security is "strong and effective" as it denies major breach (arstechnica.com)

World’s biggest iceberg breaks up after 40 years (theguardian.com)

Parallel AI Agents Are a Game Changer (morningcoffee.io)

Researchers Are Already Leaving Meta's New Superintelligence Lab (wired.com)

Health Effects of Cousin Marriage: Evidence from US Genealogical Records (aeaweb.org)

Lumo by Proton Mail (lumo.proton.me)

Cqdam Free – single-binary in-memory KV store (RESP subset), ~2.5M ops/SEC (github.com)

Human activity may be locking the Southwest into permanent drought (theconversation.com)

Trump calls video of bag being thrown from White House an 'AI-generated' fake (cnn.com)

Single File No-Build Blog with Modern JavaScript (single-page-blog.ben-ca1.workers.dev)

The World War Two bomber that cost more than the atomic bomb (bbc.com)

MUJI – Bucket (relvaokellermann.com)

Electrical stimulation can reprogram immune system to heal the body faster (medicalxpress.com)

Why I joined Mixpanel as CEO: A new era in analytics (mixpanel.com)

Is the McDonald's ice cream machine broken? (mcbroken.com)

Cherokee, Osage, and the Indigenous North American Type Collection (typotheque.com)

Chinese cluster now top innovation hotspot: UN (yahoo.com)

How Europe's deforestation law could change the global coffee trade (theconversation.com)

Summarize Hacker News with Hono and Cloudflare Tutorial (youtube.com)

Lightcap: A Symbolic Mirror Forged in Algebra (lightcapai.medium.com)

Why Radiology AI Didn't Work and What Comes Next (outofpocket.health)

Augmented Coding – A Pattern Language (gregorriegler.com)

Microsoft Tech Community Is Down (techcommunity.microsoft.com)

Are we living in a stupidogenic society? (substack.nomoremarking.com)

Japan Post Bank to issue yen deposit-backed digital currency in fiscal 2026 (japantimes.co.jp)

WhatsApp patches vulnerability exploited in zero-day attacks (bleepingcomputer.com)

Prometheus just changed energy and fuels forever (prometheusfuels.com)

Process knowledge is crucial to economic development (programmablemutter.com)

Jevons' Paradox is good sometimes (andymasley.substack.com)

Making the Most of a Dumb Fax Switcher Box (rachelbythebay.com)

We send AI requests on every keystroke (cursor.com)

Stop Hosting Boring Tech Events (dx.tips)

Senko – Very Fast Speaker Diarization

Comments (1)