I’ve been listening to podcasts for 15+ years. Ads used to be short and host-read. Now, some shows I follow have 15+ minutes of loud, compressed ads per hour.
I built a system to strip them out automatically. It takes a podcast feed, processes each episode, and outputs an ad-free feed compatible with any player.
What didn’t work:
Full-transcript one-shot prompting: LLMs would return a few timestamps, then stop—context was too broad.
Keyword-based detection: High false positives/negatives, especially with “house ads” and blended sponsor mentions.
What worked:
Segmentation + local scoring: Split transcripts into overlapping windows. Ask the LLM for “ad likelihood” per window—short prompts keep context tight.
Multi-head prompting: Separate prompts for (a) brand ads (URLs, promo codes, sponsor language) and (b) cross-promos. The cross-promo path compares segments to the show’s own notes/description to spot “subscribe to X podcast” segments.
Feedback loop: Users can flag missed ads; reported brand/podcast names bias future runs.
Speaker diarization (WhisperX): Detects voice/tone shifts to distinguish “host in-topic” from “host reading copy.”
Across interviews, daily news, and narrative shows, this consistently removes ~95% of ads. The remaining 5% are sponsor mentions woven directly into content—hard by design.
Infra: hosted on DigitalOcean; inference runs on Modal.com.
I built a system to strip them out automatically. It takes a podcast feed, processes each episode, and outputs an ad-free feed compatible with any player.
What didn’t work:
Full-transcript one-shot prompting: LLMs would return a few timestamps, then stop—context was too broad.
Keyword-based detection: High false positives/negatives, especially with “house ads” and blended sponsor mentions.
What worked:
Segmentation + local scoring: Split transcripts into overlapping windows. Ask the LLM for “ad likelihood” per window—short prompts keep context tight.
Multi-head prompting: Separate prompts for (a) brand ads (URLs, promo codes, sponsor language) and (b) cross-promos. The cross-promo path compares segments to the show’s own notes/description to spot “subscribe to X podcast” segments.
Feedback loop: Users can flag missed ads; reported brand/podcast names bias future runs.
Post-processing: Merge adjacent detections, ignore <10s blips, smooth cut boundaries.
Speaker diarization (WhisperX): Detects voice/tone shifts to distinguish “host in-topic” from “host reading copy.”
Across interviews, daily news, and narrative shows, this consistently removes ~95% of ads. The remaining 5% are sponsor mentions woven directly into content—hard by design.
Infra: hosted on DigitalOcean; inference runs on Modal.com.
Full write-up (with prompts, heuristics, and some failure cases): https://PodcastAdBlock.app/blog/building-podcast-adblock
Curious if others have tackled similar problems—especially around hard-to-detect “native” ads or more efficient diarization approaches.