Yt-transcriber – Give a YouTube URL and get a transcription

162 Bluestein 55 7/22/2025, 1:51:21 PM github.com ↗

Comments (55)

paulirish · 15h ago

Can also just fetch the subs already in YouTube rather than retranscribing. eg:

yt-dlp --write-auto-subs --skip-download "https://www.youtube.com/watch?v=7xTGNNLPyMI"

adamgordonbell · 15h ago

Recently, I was working on a similar project and I found that grabbing the transcripts quickly leads to your IP being blocked for the transcripts.

I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.

Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?

hamiecod · 12h ago

—-write-auto-subs gets your IP banned for 12/24 hours if you download video subtitles in bulk but if the subtitles are downloaded with sufficient time gap in between, the ban is not triggered.

My startup has to utilize youtube transcriptions so we just subscribe to a youtube transcriptor api hosted on rapidapi that downloads subtitles. 1$ per 1000 reqs. Pretty cheap

MysticOracle · 11h ago

Yep, this happened to me & got IP banned for a day.

thangalin · 9h ago

    systemctl start tor
    yt-dlp --proxy socks5://127.0.0.1:9050 --write-subs --write-auto-subs --skip-download [URL]

See: https://github.com/noobpk/auto-change-tor-ip

ldenoue · 10h ago

Unless you fetch directly from your browser. It works by getting the YouTube json including the captions track. And then you get the baseUrl to download the xml.

I wrote this webapp that uses this method: it calls Gemini in the background to polish the raw transcript and produce a much better version with punctuation and paragraphs.

https://www.appblit.com/scribe

Open source with code to see how to fetch from YouTube servers from the browser https://ldenoue.github.io/readabletranscripts/

toomuchtodo · 15h ago

It's a good call out. I leverage yt-dlp as a library for downstream tooling (archival of media to long term storage repositories), and always recommend folks rely on yt-dlp whenever possible due to the ecosystem of folks grinding to keep their extractors current. Their maintainers are both helpful and responsive.

(with that said, I do not want to diminish OP's work in any way; great job! "What I cannot build, I do not understand" - Feynman)

paulirish · 15h ago

Same, yup. OP is indeed already using yt-dlp for the video download. (Then Whisper for transcribing, Ollama/lmstudio/OpenAI for summarizing)

hiAndrewQuinn · 13h ago

Minus the summarization, that is the same pipeline I use in [1] for generating listening practice Anki flashcards for foreign language students. It surprised me that nobody had really built out a program I could find around yt-dlp and Whisper for this kind of use case even a few years after it came out.

[1]: https://github.com/hiAndrewQuinn/audio2anki

mckirk · 15h ago

I've found the YT transcripts to be severely lacking sometimes, in accuracy and features. Especially speaker identification is really useful if you want to e.g. summarize podcasts or interviews, so if this project here delivers on that then it's definitely better than the YT transcripts.

paulirish · 14h ago

An approach I've been using recently is to rely on pyannote/tinydiarize only for the speaker_turn timestamps, but prefer the larger model (or in this case YT's autotranscript) for the actual text.

ldenoue · 10h ago

Check out https://ldenoue.github.io/readabletranscripts/ and the website https://www.appblit.com/scribe that use Gemini to post correct the raw transcripts

stanleykm · 14h ago

I’ve had some success with running them through another LLM to have it clean up the transcription errors based on the context. But this obviously does nothing for speaker identitication.

rpastuszak · 14h ago

IIRC YT also has a "private" API you can call directly (or via an npm package: youtube-transcribe).

(I'm using it in https://butter.sonnet.io)

Jerry2 · 15h ago

Yep. You can also automatically save them if you use mpv to watch YT: https://github.com/nick-s-b/mpv-transcript discovered this script yesterday.

MysticOracle · 10h ago

For (English only) speech-to-text, NVIDIA's Parakeet-V2 is significantly faster than Whisper and I found it to be more accurate.

https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2

For Apple Silicon (MLX) https://huggingface.co/senstella/parakeet-tdt-0.6b-v2-mlx

driscoll42 · 10h ago

Compared to all Whister models? Or the faster ones? And which version of Whisper? All for a faster, more accurate model, but need a bit more.

ipsum2 · 10h ago

All of them, in my experience.

driscoll42 · 10h ago

Fair, looking at the ASR leaderboards it is truly better - https://huggingface.co/spaces/hf-audio/open_asr_leaderboard and NVIDIA's Canary might be even better? Will try these out. Appreciate bringing these to my attention!

0points · 15h ago

Youtube already offers AI transcriptions on their site. As another commenter points out, you grab them with yt-dlp.

And unlike how your tool will be supported in the future, thousands of users make sure yt-dlp keeps working as google keep changing the site (currently 1459 contributors).

swyx · 14h ago

if you used this in earnest sufficiently, you'd know yt default transcripts are not good enough because youtube often (ok say 5% of time) fails to transcribe videos particularly livestreams and shortly after release.

youtube also blocks transcript exports for some things like https://youtubetranscript.com/

retranscribing is necessary and important part of the creator toolset.

passivegains · 15h ago

the volunteer open source effort behind youtube-dl and its forks/descendants are so impressive in large part because of how many features they provide and thus have to maintain: https://github.com/yt-dlp/yt-dlp#usage-and-options this tool won't provide the list of available thumbnails or settings for HTTP buffer size, but I think that's a pretty reasonable tradeoff.

eigenvalue · 13h ago

I made a tool like this a while ago which was useful for transcribing a whole playlist automatically using whisper:

https://github.com/Dicklesworthstone/bulk_transcribe_youtube...

I ended up turning a beefed up version of it which makes polished written documents from the raw transcript, you can try it at

https://youtubetranscriptoptimizer.com/

Leftium · 14h ago

Two similar Show HN projects:

- This python one is more amenable to modding into your own custom tool: https://hw.leftium.com/#/item/44353447

- Another bash script: https://hw.leftium.com/#/item/41473379

---

They all seem to be built on top of:

- yt-dlp to download video

- whisper for transcription

- ffmpeg for audio/video extraction/processing

totallynotryan · 12h ago

Hey all, I built a 100% free (no-signup) youtube summarizer: "https://youtube-summarizer-lime.vercel.app/". Accurate summaries in under 8 seconds.

dudeWithAMood · 10h ago

How did you get around youtube blocking cloud IP ranges? Are you suing residential proxies?

93po · 12h ago

bookmarked, thanks, the top google search results always require sign-up. frustrating state of the internet

yunusabd · 7h ago

I tried it on a M1 Pro MBP using Docker. It's quite slow (no MPS) and there are no timestamps in the resulting transcript. But the basics are there. Truncated output:

  Fetching video metadata...
  Downloading from YouTube...
  Generating transcript using medium model...

  === System Information ===
  CPU Cores: 10
  CPU Threads: 10
  Memory: 15.8GB
  PyTorch version: 2.7.1+cpu
  PyTorch CUDA available: False
  MPS available: False
  MPS built: False
  
  Falling back to CPU only
  Model stored in: /home/app/.cache/whisper
  Loading medium model into CPU...
  100%|| 1.42G/1.42G [02:05<00:00, 12.2MiB/s]
  Model loaded, transcribing...
  Model size: 1457.2MB
  Transcription completed in 468.70 seconds
  === Video Metadata ===
  Title: 厨师长教你：“酱油炒饭”的家常做法，里面满满的小技巧，包你学会炒饭的最香做法，粒粒分明！
  Channel: Chef Wang 美食作家王刚
  Upload Date: 20190918
  Duration: 5:41
  URL: https://www.youtube.com/watch?v=1Q-5eIBfBDQ
  === Transcript ===
  
  哈喽大家好我是王刚本期视频我跟大家分享...

pstoll · 5h ago

> Falling back to CPU only

Patient: “Doctor, it hurts when I do this.”

Doctor: “don’t do that”

yunusabd · 36s ago

I mean yeah, but also

Doctor: do this

Patient: I tried doing this and it's not good

Doctor: actually you need a device for $5000 lol

isubkhankulov · 14h ago

I’ve been using this free tool. It gives quality diarized transcripts https://contentflow.megalabs.co

yunusabd · 7h ago

Did you build this? I'm looking for an API that does this.

isubkhankulov · 57m ago

I built this. The API is on the way! You can sign up for updates here: https://contentflow.megalabs.co/api-interest

labrador · 11h ago

Many channels I follow, such as Vlad Vexler, have taken measures so you can't download the transcript with yt-dlp. Furthermore, they don't provide a transcipt option on their videos. I assume this is to prevent people from just reading AI summaries, which is annoying in Vexler's case because he talks slowly and meanders around. If I really want to hear his point but don't want to listen to that then I download the video with yt-dlp and use Whisper to transcribe it.

rs186 · 7h ago

Curious, if you don't find this "annoying", why are you still following the channel? There must be other YouTube channels that offer similar content but deliver it in a better way.

labrador · 5h ago

Vlad is a smart guy but slow. Think of him as a brilliant snail.

Bluestein · 10h ago

... the ... slower ... the guy the ... less ... content ... and ... more ... advertising.-

cmaury · 16h ago

Thanks for sharing. This is exactly the type of utility that vibecoding is for. It takes 5 secons to ask GPT to write a scripr to do this tailored to your specific use case. It's way faster than trying to get someone elses repo up and running.

sannysanoff · 14h ago

Selfware.

https://old.reddit.com/r/ChatGPTCoding/comments/1lusr07/self...

Gonna be lots of posts of selfware like that soon.

cmaury · 13h ago

I like it, though I'm sure we'll end up being stuck with "vibe ware"

Bluestein · 14h ago

I think you either coined (kudos) or spotted the true "term du jour" here.-

sannysanoff · 14h ago

people don't even get it :-]

Bluestein · 15h ago

Sure thing ...

And, yes, indeed, AI-coding is order-of-magnitude having an effect along the lines that "low-code" was treading ...

... also, for less-capable coders or "borderline" coders the effort/benefit equation has radically shifted.-

toddmorey · 8h ago

Always fascinated to read CLAUDE.md files that are appearing in more and more open source projects: https://github.com/pmarreck/yt-transcriber/blob/yolo/CLAUDE....

I'd be really curious to see some sort of benchmark / evaluation of these context resources against the same coding tasks. Right now, the instructions all sound so prescriptive and authoritative, yet is really hard to evaluation their effectiveness.

dudeWithAMood · 10h ago

I did something similar piping the output of the youtube-transcript-api python package to openAI's api: https://github.com/DavidZirinsky/tl-dw/

mikeve · 15h ago

Interesting project! I've been working on a project in this space myself (WaveMemo)

I must say, speaker diarization is surprisingly tricky to do. The most common approach seems to be to use pyannote, but the quality is not amazing...

ethan_smith · 14h ago

For better diarization quality than pyannote, check out Whisper-DiarizationX which combines Whisper with ECAPA-TDNN speaker embeddings and spectral clustering.

lpeancovschi · 12h ago

Youtube's T&C don't allow downloading youtube audio/video. How do other services get away with it?

nadermx · 10h ago

"The court held that merely clicking on a download button does not show consent with license terms, if those terms were not conspicuous and if it was not explicit to the consumer that clicking meant agreeing to the license."

https://en.m.wikipedia.org/wiki/Specht_v._Netscape_Communica...

lpeancovschi · 10h ago

I'm not a lawyer but I think even if you offset the legal responsibilities to the user by alerting them with copyrights prompt it's still illegal to download youtube videos.

nadermx · 9h ago

United States v. Auernheimer, 748 F.3d 525 (3d Cir. 2014). Specifically, on page 12, footnote 5, the court states:

“We also note that in order to be guilty of accessing ‘without authorization, or in excess of authorization’ under New Jersey law, the Government needed to prove that Auernheimer or Spitler circumvented a code- or password-based barrier to access... The account slurper simply accessed the publicly facing portion of the login screen and scraped information that AT&T unintentionally published.”

MysticOracle · 11h ago

I think they use rotating IP/Proxy services

lpeancovschi · 10h ago

Might be, but I think google would still be able to chase them down.

arkaic · 11h ago

On this note, is Ytube also the best transcriber of foreign languages or is there something better?

manishsharan · 12h ago

Will this make Google mad at me and cancel/freeze all my Google services ?

The unsolved tension at the heart of AI (tushardadlani.com)

What Makes SQL Special? (technicaldeft.com)

Make a Fish (makea.fish)

Improving KAN with CDF normalization to quantiles (arxiv.org)

Ask HN: Is Tensorflow.js Dead?

Aging Clock Unveils Compounds That Rejuvenate Brain Cells (neurosciencenews.com)

Stargate advances with 4.5 GW partnership with Oracle (openai.com)

Show HN: I built a site to help new dog owners pick a boy dog name fast (boydognames.net)

Capturing anesthetic gases could prevent global warming, new study shows (phys.org)

Rescuing two PDP-11s in UK from a former BT underground shelter, central London (forum.vcfed.org)

Extending Emacs with Fennel (andreyor.st)

Making Sense of Hanlon's Razor (domofutu.substack.com)

Is the Interstellar Object 3I/Atlas Alien Technology? (avi-loeb.medium.com)

Tamiya chairman Shunsaku Tamiya dies at 90 (dailyexpress.com.my)

Ask HN: Programmable, affordable developer toys similar to DeskHog?

When Is WebAssembly Going to Get DOM Support? (queue.acm.org)

Ask HN: What software subscriptions are worth paying for?

How HN: Vivezia – A Wellness Tracker with Privacy in Mind (vivezia.com)

Private equity firms flip assets to themselves in record numbers (ft.com)

Whom Do We Trust? How AI Is (Re)Shaping Our Interactions Today (Gillian Tett) [video] (youtube.com)

Show HN: NextDevKit – Next.js and OpenNext SaaS Template, Goodbye Vercel Bills (nextdevkit.com)

The benefits of trunk-based development (thinkinglabs.io)

In Ukraine's bombed out reservoir a forest has grown (theguardian.com)

Ask HN: Looking for Research Ideas in Cybersecurity (Graduate Student)

Automatic Linux migration tool for windows [video] (youtube.com)

Show HN: Coder.ninja – Best Projects and Coders (coder.ninja)

Photo editing is dead. Long live prompt editing (apps.apple.com)

Italy drags Meta, X, LinkedIn into €1B+ VAT showdown: free sign‑ups now taxable? (reuters.com)

Project Lyra – Exploring Interstellar Objects (i4is.org)

Dr. Martin Loetzsch – ETL Patterns with Postgres [video] (youtube.com)

Fedora Must (Carefully) Embrace Flathub (blogs.gnome.org)

Microsoft poaches more Google DeepMind AI talent as it beefs up Copilot (cnbc.com)

Show HN: PTS Library – Analyze LLM reasoning through "thought anchors"

Humans beat AI at international math contest despite gold-level AI scores (phys.org)

Tooooools.app (tooooools.app)

NPM stylus package contained malicious code and was removed from the registry (npmjs.com)

Jack McAuliffe, craft beer pioneer, has died (allaboutbeer.com)

Google users less likely to click links with an AI summary in results (pewresearch.org)

3D Interactive Phone Museum (chaz.fun)

Bitcoin Miner Revenue Drops to 2-Month Low, but Selling Pressure Remains Absent (coindesk.com)

Open-Source LLM Helps Safeguard Text Generation Prompts and Responses (corp.roblox.com)

Show HN: WTFfmpeg – Natural Language to FFmpeg Translator (github.com)

Show HN: An OCR PDF large batch renaming tool (github.com)

AI Just Hit a Paywall as the Web Reacts to Cloudflare's Flip (forbes.com)

Countries across the world see food price shocks from climate extremes (bsc.es)

AI coding platform goes rogue during code freeze, deletes entire company DB (tomshardware.com)

Mathematics for Computer Science (2024) (ocw.mit.edu)

Data Scale is not all you need (interpretai.tech)

CollabLLM: From Passive Responders to Active Collaborators (icml.cc)

AI coding agents are removing programming language barriers (railsatscale.com)

Yt-transcriber – Give a YouTube URL and get a transcription

Comments (55)