How to Firefox (kau.sh)

182 points by Vinnl 2h ago 83 comments

Complete silence is always hallucinated as "ترجمة نانسي قنقر" in Arabic (github.com)

397 points by edent 7h ago 161 comments

Global hack on Microsoft Sharepoint hits U.S., state agencies, researchers say (washingtonpost.com)

693 points by spenvo 1d ago 336 comments

Uv: Running a script with dependencies (docs.astral.sh)

383 points by Bluestein 13h ago 108 comments

Largest piece of Mars on Earth fetches $5.3M at auction (apnews.com)

11 points by avonmach 3d ago 5 comments

An unprecedented window into how diseases take hold years before symptoms appear (bloomberg.com)

102 points by helsinkiandrew 4d ago 42 comments

Jujutsu for busy devs (maddie.wtf)

250 points by Bogdanp 12h ago 320 comments

What went wrong inside recalled Anker PowerCore 10000 power banks? (lumafield.com)

447 points by walterbell 18h ago 216 comments

Unexpected inconsistency in records – Jon Skeet's coding blog (codeblog.jonskeet.uk)

4 points by OptionOfT 2d ago 1 comments

How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference (ritza.co)

27 points by sixhobbits 4h ago 21 comments

Python Audio Processing with Pedalboard (lwn.net)

37 points by sohkamyung 4d ago 6 comments

TrackWeight: Turn your MacBook's trackpad into a digital weighing scale (github.com)

573 points by wtcactus 22h ago 138 comments

Don't bother parsing: Just use images for RAG (morphik.ai)

285 points by Adityav369 19h ago 68 comments

AccountingBench: Evaluating LLMs on real long-horizon business tasks (accounting.penrose.com)

490 points by rickcarlino 20h ago 139 comments

AI comes up with bizarre physics experiments, but they work (quantamagazine.org)

221 points by pseudolus 11h ago 128 comments

Show HN: A rudimentary game engine to build four dimensional VR evironments (brainpaingames.com)

24 points by teemur 2d ago 1 comments

Kapa.ai (YC S23) is hiring a software engineers (EU remote) (ycombinator.com)

1 points by emil_sorensen 5h ago 0 comments

Erlang 28 on GRiSP Nano using only 16 MB (grisp.org)

175 points by plainOldText 17h ago 11 comments

New records on Wendelstein 7-X (iter.org)

233 points by greesil 21h ago 104 comments

Look up macOS system binaries (macosbin.com)

53 points by tolerance 3d ago 14 comments

What will become of the CIA? (newyorker.com)

126 points by Michelangelo11 18h ago 212 comments

The Game Genie Generation (tedium.co)

134 points by coloneltcb 19h ago 59 comments

Nasa’s X-59 quiet supersonic aircraft begins taxi tests (nasa.gov)

100 points by rbanffy 3d ago 64 comments

The Hater's Guide to the AI Bubble (wheresyoured.at)

74 points by lukebennett 2h ago 25 comments

I've launched 37 products in 5 years and not doing that again (indiehackers.com)

186 points by AlexandrBel 1d ago 179 comments

Losing language features: some stories about disjoint unions (graydon2.dreamwidth.org)

98 points by Bogdanp 3d ago 38 comments

I know genomes and I didn’t delete my data from 23andMe (stevensalzberg.substack.com)

69 points by bookofjoe 18h ago 102 comments

'Shameful' CBA hiring Indian ICT workers after firing Australians (ia.acs.org.au)

120 points by theteapot 4h ago 74 comments

Tokyo's retro shotengai arcades are falling victim to gentrification (theguardian.com)

56 points by pseudolus 3d ago 39 comments

Occasionally USPS sends me pictures of other people's mail (the418.substack.com)

185 points by shayneo 22h ago 174 comments

Scarcity, Inventory, and Inequity: A Deep Dive into Airline Fare Buckets (blog.getjetback.com)

115 points by bdev12345 17h ago 42 comments

UK backing down on Apple encryption backdoor after pressure from US (arstechnica.com)

520 points by azalemeth 22h ago 371 comments

The surprising geography of American left-handedness (2015) (washingtonpost.com)

44 points by roktonos 16h ago 39 comments

Show HN: Lotas – Cursor for RStudio (lotas.ai)

72 points by jorgeoguerra 18h ago 27 comments

We have made the decision to not continue paying for BBB accreditation (mycherrytree.com)

126 points by LorenDB 10h ago 61 comments

We made Postgres writes faster, but it broke replication (paradedb.com)

227 points by philippemnoel 1d ago 49 comments

A conceptual overview of asyncio (github.com)

134 points by anordin95 18h ago 26 comments

FCC to eliminate gigabit speed goal and scrap analysis of broadband prices (arstechnica.com)

230 points by Bluestein 13h ago 163 comments

Jqfmt like gofmt, but for jq (github.com)

149 points by Bluestein 19h ago 47 comments

Gemini with Deep Think achieves gold-medal standard at the IMO (deepmind.google)

509 points by meetpateltech 20h ago 233 comments

Modern Debian-based Window Maker distribution (wmlive.sourceforge.net)

98 points by Aldipower 20h ago 43 comments

Yoni Appelbaum on the real villians behind our housing and mobility problems (riskgaming.com)

76 points by serviette 16h ago 72 comments

SecretSpec: Declarative Secrets Management (devenv.sh)

132 points by domenkozar 21h ago 34 comments

MIPS – The hyperactive history and legacy of the pioneering RISC architecture (thechipletter.substack.com)

69 points by rbanffy 18h ago 22 comments

If writing is thinking then what happens if AI is doing the writing and reading? (hardcoresoftware.learningbyshipping.com)

118 points by whobre 13h ago 105 comments

Hiding messages in a deck playing cards (asherfalcon.com)

112 points by ashfn 3d ago 36 comments

What happens when an octopus engages with art? (cnn.com)

56 points by robinhouston 4d ago 38 comments

Reengineered carbon-to-acetylene process with negative carbon emission (2023) (pubs.rsc.org)

13 points by blacksqr 9h ago 2 comments

Workers at Snopes.com win voluntary recognition (newsguild.org)

109 points by giuliomagnifico 9h ago 6 comments

Netherlands rations electricity to ease power grid stresses (ft.com)

7 points by doener 1h ago 4 comments

Ask HN: What Speaker Diarization tools should I look into?

8 justforfunhere 3 7/22/2025, 6:50:12 AM

Hi,

I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.

So far, it's doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.

I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?

Comments (3)

hildekominskia · 1h ago

Skip pyannote 3.1; two battle-tested upgrades:

1. NVIDIA NeMo’s `diar_msdd_telephonic` (8 kHz) or `diar_msdd_mic` (16 kHz) — one-line Python install, GPU optional, beats pyannote on cross-talk. 2. AssemblyAI’s async `/v2/transcript` endpoint — gives you `words[].speaker` + Whisper-level accuracy for 40+ languages. Free tier: 3 h / month.

Glue either to your existing Whisper pipeline and feed ChatGPT-4o with speaker-tagged text. The jump in clarity is night-and-day.

I use the same combo to auto-caption interviews, then drop the synced footage into Veo 3 (https://veo-3.app) for instant talking-head explainers—works even for non-English audio.

nemima · 4h ago

Hi, I'm an engineer at Speechmatics. Our speech-to-text software handles speaker diarization very reliably, and we're a go-to choice for non-English languages. https://www.speechmatics.com/

How long is the audio file? If it's under 2 hours, you can upload the file and transcribe it with diarization for free using our web portal: https://portal.speechmatics.com/jobs/create/batch

Hope it helps for your use case! If it does, and you encounter any issues, drop us an email at devrel@speechmatics.com :)

EDIT: typo

justforfunhere · 2h ago

Hi, yes, it is well under two hours. The longest audio that I have had to handle as of now is around 10 minutes.

I will give your portal a try soon. Thanks