The Elements of Programming Style (en.wikipedia.org)

1 points by Sharlin 1m ago 0 comments

Air – The new web framework that breathes fresh air into Python web development (github.com)

1 points by rahimnathwani 6m ago 0 comments

Death to Type Classes (jappie.me)

1 points by zeepthee 9m ago 0 comments

Dinosaurs to supercrocs: Niger's bone keepers preserve its ancient fossils (aljazeera.com)

1 points by Qem 9m ago 0 comments

Lumina-DiMOO: An open-source discrete multimodal diffusion model (synbol.github.io)

1 points by SweetSoftPillow 10m ago 0 comments

Linux 6.18 Will Further Complicate Non-GPL Out-of-Tree File-Systems (phoronix.com)

1 points by xenophonf 12m ago 0 comments

Ask HN: Next LEGOs after Lego Friends for girls?

1 points by plaguna 13m ago 0 comments

CVC Strikes $1.5B Deal for Namecheap (wsj.com)

2 points by rustc 14m ago 1 comments

Israel has replaced Iran as the biggest security threat to the Gulf states (theguardian.com)

5 points by abdusco 15m ago 0 comments

Government Accountability Office Gets Schooled by the Department of Education (eatingpolicy.com)

1 points by disgruntledphd2 22m ago 0 comments

A.I.'S Prophet of Doom Wants to Shut It All Down (nytimes.com)

1 points by cainxinth 24m ago 0 comments

BBC censures staff for calling Hamas a 'terror group' (telegraph.co.uk)

2 points by EvgeniyZh 25m ago 0 comments

PythonBPF – Writing eBPF Programs in Pure Python (xeon.me)

2 points by JNRowe 26m ago 0 comments

Removing 95% of podcast ads with transcript segmentation and LLMs (benbowler.com)

2 points by benbowler 29m ago 1 comments

Netherlands will boycott Eurovision 2026 if Israel participates (nltimes.nl)

15 points by justacrow 29m ago 4 comments

Tiny-Classifier.cpp – Our First Tiny Classifier (kirit.com)

1 points by KayEss 30m ago 0 comments

How do we get AI Personas to sound so human? (askrally.com)

1 points by virtual_rf 31m ago 0 comments

Beyond the surface – Exploring attacker persistence strategies in Kubernetes (raesene.github.io)

1 points by furkansahin 31m ago 0 comments

Proposed approach could bridge gap between general relativity, quantum mechanics (phys.org)

1 points by pseudolus 32m ago 0 comments

Show HN: Nano2Image – Turn prompts and reference photos into images (no signup) (nano2image.com)

1 points by xiaoyuan23 33m ago 0 comments

EU regulators let Microsoft off the hook after Teams unbundling pledge (theregister.com)

2 points by pseudolus 34m ago 0 comments

Show HN: Built a Real-time sentiment LLM tracker (aidailycheck.com)

3 points by eric_khun 38m ago 0 comments

Self-Assembly Gets Automated in Reverse of 'Game of Life' (quantamagazine.org)

1 points by kjhughes 40m ago 0 comments

JWST Finds an Exoplanet Around a Pulsar Whose Atmosphere Is All Carbon (universetoday.com)

2 points by rbanffy 42m ago 0 comments

Develop native iOS apps in Go, on any platform, without the SDK (github.com)

1 points by Splizard 45m ago 0 comments

NRC Accepts Cola for Fermi America's Four-Unit AP1000 Nuclear Plant in Texas (powermag.com)

1 points by mpweiher 49m ago 0 comments

Ask HN: Best way to get daily top HN stories and comments on a Kindle?

1 points by tarunupaday 50m ago 0 comments

Debian Upgrade Marathon: 3.1 Sarge (wrongthink.link)

2 points by todsacerdoti 50m ago 0 comments

UK Economy Stagnated in July as Headwinds Grow Before Budget (bloomberg.com)

4 points by ksec 50m ago 0 comments

Understanding Floating-Point Numbers (dennisforbes.ca)

1 points by llm_nerd 51m ago 0 comments

Do Startups Dream of Electric Robots (partenit.io)

1 points by yugoru 51m ago 0 comments

Show HN: TurboStitchGif new version 1.1 (alternative to giflib)

2 points by Forgret 52m ago 0 comments

Rails Decouples Trix From Action Text Into action_text-trix gem (blog.saeloun.com)

1 points by amalinovic 54m ago 0 comments

Upcoming changes for bcachefs; notes for users distributions (lore.kernel.org)

3 points by pantalaimon 55m ago 0 comments

Astrophysics Source Code Library (ascl.net)

11 points by SiempreViernes 56m ago 0 comments

Claude introduces memory for teams at work (anthropic.com)

1 points by tosh 57m ago 0 comments

Green energy entrepreneur calls on UK to subsidise North Sea oil and gas firms (theguardian.com)

1 points by defrost 1h ago 0 comments

Spotify DMCA notice – Seeking legal help (revanced.app)

5 points by imalerba 1h ago 1 comments

The Evolution of Logical Replication in PostgreSQL: A Firsthand Account (enterprisedb.com)

4 points by Bogdanp 1h ago 0 comments

Magical Systems Thinking (worksinprogress.co)

2 points by ortegaygasset 1h ago 0 comments

'IT manager' needed tech support because they had never heard of a command line (theregister.com)

2 points by rogermungo 1h ago 0 comments

Gauss, an Agent for Autoformalization (math.inc)

1 points by andy12_ 1h ago 0 comments

Undine – GraphQL Library for Django (mrthearman.github.io)

1 points by indigane 1h ago 0 comments

Open Source as Europe's Strategic Advantage (linuxfoundation.org)

1 points by luu 1h ago 0 comments

Introduction to Python for Geographic Data Analysis (pythongis.org)

2 points by ibobev 1h ago 0 comments

Strategies for Two-Sided Markets [pdf] (kth.se)

2 points by tosh 1h ago 0 comments

The Rising Sea: Foundations of Algebraic Geometry Notes (math.stanford.edu)

2 points by ibobev 1h ago 0 comments

Ask HN: What's the weirdest rule your workplace ever had?

2 points by jamessmithe 1h ago 2 comments

Dripo AI – All in One AI Image and Video Generator (dripo.ai)

1 points by cnych 1h ago 0 comments

Show HN: I built an AI that roasts your website and gives tips to fix it (ai-roast-vert.vercel.app)

3 points by happy_malone 1h ago 0 comments

Removing 95% of podcast ads with transcript segmentation and LLMs

2 benbowler 1 9/12/2025, 11:27:01 AM benbowler.com ↗

Comments (1)

benbowler · 29m ago

I’ve been listening to podcasts for 15+ years. Ads used to be short and host-read. Now, some shows I follow have 15+ minutes of loud, compressed ads per hour.

I built a system to strip them out automatically. It takes a podcast feed, processes each episode, and outputs an ad-free feed compatible with any player.

What didn’t work:

Full-transcript one-shot prompting: LLMs would return a few timestamps, then stop—context was too broad.

Keyword-based detection: High false positives/negatives, especially with “house ads” and blended sponsor mentions.

What worked:

Segmentation + local scoring: Split transcripts into overlapping windows. Ask the LLM for “ad likelihood” per window—short prompts keep context tight.

Multi-head prompting: Separate prompts for (a) brand ads (URLs, promo codes, sponsor language) and (b) cross-promos. The cross-promo path compares segments to the show’s own notes/description to spot “subscribe to X podcast” segments.

Feedback loop: Users can flag missed ads; reported brand/podcast names bias future runs.

Post-processing: Merge adjacent detections, ignore <10s blips, smooth cut boundaries.

Speaker diarization (WhisperX): Detects voice/tone shifts to distinguish “host in-topic” from “host reading copy.”

Across interviews, daily news, and narrative shows, this consistently removes ~95% of ads. The remaining 5% are sponsor mentions woven directly into content—hard by design.

Infra: hosted on DigitalOcean; inference runs on Modal.com.

Full write-up (with prompts, heuristics, and some failure cases): https://PodcastAdBlock.app/blog/building-podcast-adblock

Curious if others have tackled similar problems—especially around hard-to-detect “native” ads or more efficient diarization approaches.