FFmpeg Assembly Language Lessons

173 flykespice 38 8/18/2025, 1:39:53 PM github.com ↗

Comments (38)

cr125rider · 3h ago
I can’t imagine the scale that FFMPEG operates at. A small improvement has to be thousands and thousands of hours of compute saved. Insanely useful project.
prisenco · 2h ago
Their commitment to performance is a beautiful thing.

Imagine all projects were similarly committed.

sfn42 · 20s ago
That would be an enormous waste of time. 99.9% of software doesn't have to be anywhere near optimal. It just has to not be wasteful.

Sadly lots of software is blatantly wasteful. But it doesn't take fancy assembly micro optimization to fix it, the problem is typically much higher level than that. It's more like serialized network requests, unnecessarily high time complexities, just lots of unnecessary work and unnecessary waiting.

Once you have that stuff solved you can start looking at lower level optimization, but by that point most apps are already nice and snappy so there's no reason to optimize further.

Almondsetat · 1h ago
Yeah no, I'd like non-performance critical programs to focus on other things than performance thank you
EliRivers · 58m ago
Surely all programs are performance critical. Any program we think isn't is just a program where the performance met the criteria already.
therealmarv · 1h ago
like Slack or Jira... lol.
byteknight · 2h ago
Seems so easy! You only need the entire world even tangentially related to video to rely solely on your project for a task and you too can have all the developers you need to work on performance!
ackfoobar · 1h ago
I seem to recall that they lamented on twitter the low amount of (monetary or code) contribution they got, despite how heavily they are used.
hluska · 1h ago
You know friend, if open source actually worked like that I wouldn’t be so allergic to releasing projects. But it doesn’t - a large swath of the economy depends on unpaid labour being treated poorly by people who won’t or can’t contribute.
zahlman · 2h ago
It'd be nice, though, to have a proper API (in the traditional sense, not SaaS) instead of having to figure out these command lines in what's practically its own programming language....
codys · 2h ago
FFMpeg does have an API. It ships a few libraries (libavcodec, libavformat, and others) which expose a C api that is used in the ffmpeg command line tool.

They publish doxygen generated documentation for the APIs, available here: https://ffmpeg.org/doxygen/trunk/

zahlman · 2h ago
Don't know how I overlooked that, thanks. Maybe because the one Python wrapper I know about is generating command lines and making subprocess calls.
Wowfunhappy · 1h ago
They're relatively low level APIs. Great if you're a C developer, but for most things you'd do in python just calling the command line probably does make more sense.
javier2 · 1h ago
If you are processing user data, the subprocess approach makes it easier to handle bogus or corrupt data. If something is off, you can just kill the subprocess. If something is wrong with the linked C api, it can be harder to handle predictably.
ansk · 1h ago
For future reference, if you want proper python bindings for ffmpeg* you should use pyav.

* To be more precise, these are bindings for the libav* libraries that underlie ffmpeg

xxpor · 1h ago
I get why the CLI is so complicated, but I will say AI has been great at figuring out what I need to run given an English language input. It's been one of the highest value uses of AI for me.
gooob · 41m ago
hell yeah, same here. i made a little python GUI app to edit videos
WhitneyLand · 9m ago
I was expecting to read pearls of wisdom gleaned from all the hard work done on the project, but I’m not really getting how this relates to ffmpeg.

The few chapters I saw seemed to be pretty generic intro to assembly language type stuff.

SilentM68 · 33m ago
Why not include the required or targeted math lessons needed for the FFmpeg Assembly Lessons in the GitHub repository? It'd be easier for people to get started if everything was in one place :)
snickerbockers · 24m ago
NTA but if the assumption is that the reader has only a basic understanding of C programming and wants to contribute to a video codec there is a lot of ground that needs to be covered just to get to how the cooley/tukey algorithm works and even that's just the basic fundamentals.
byryan · 6m ago
I read the repo more as "go through this if you want to have a greater understanding of how things work on a lower level inside your computer". In other words, presumably it's not only intended for people who want to contribute to a video codec/other parts of ffmpeg. But I'm also NTA, so could be wrong.
KwanEsq · 2h ago
Prior discussion 2025-02-22, 222 comments: https://news.ycombinator.com/item?id=43140614
commandlinefan · 53m ago
Shame this doesn't start with a quick introduction to running the examples with an actual assembler like NASM.
NullCascade · 2h ago
What is the actual process of identifying hotspots caused suboptimal compiler generated assembly?

Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

duped · 46m ago
Normally you spin up a tool like vtune or uprof to analyze your benchmark hotspots at the ISA level. No idea about tools like that for ARM.

> Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

IME, not really. I've done a fair bit of hand-written assembly and it exclusively comes up when dealing with architecture-specific problems - for everything else you can just write C (unless you hit one of the edge cases where C semantics don't allow you to express something in C, but those are rare).

For example: C and C++ compilers are really, really good at writing optimized code in general. Where they tend to be worse are things like vectorized code which requires you to redesign algorithms such that they can use fast vector instructions, and even then, you'll have to resort to compiler intrinsics to use the instructions at all, and even then, compiler intrinsics can lead to some bad codegen. So your code winds up being non-portable, looks like assembly, and has some overhead just because of what the compiler emits (and can't optimize). So you wind up just writing it in asm anyway, and get smarter about things the compiler worries about like register allocation and out-of-order instructions.

But the real problem once you get into this domain is that you simply cannot tell at a glance whether hand written assembly is "better" (insert your metric for "better here) than what the compiler emits. You must measure and benchmark, and those benchmarks have to be meaningful.

abhisek · 1h ago
Love it. Thanks for taking the time to write this. Hope it will encourage more folks to contribute.
Alifatisk · 2h ago
How do they make these assembly instructions portable across different cpus?
CannotCarrot · 2h ago
I think there's a generic C fallback, which can also serve as a baseline. But for the big (targeted) architectures, there one handwritten assembly version per arch.
faluzure · 1h ago
Yup.

On startup, it runs cpuid and assigns each operation the most optimal function pointer for that architecture.

In addition to things like ‘supports avx’ or ‘supports sse4’ some operations even have more explicit checks like ‘is a fifth generation celeron’. The level of optimization in that case was optimizing around the cache architecture on the cpu iirc.

Source: I did some dirty things with chromes native client and ffmpeg 10 years ago.

KeplerBoy · 2h ago
They don't. It's just x86-64.
ahartmetz · 2h ago
The lessons yes, but the repo contains assembly for the 5-6 architectures in wide use in consumer hardware today. Separate files of course. https://github.com/FFmpeg/FFmpeg/tree/master/libavcodec
KeplerBoy · 1h ago
Yeah, sure. I was specifically referring to the tutorials. Ffmpeg needs to run everywhere, although I believe they are more concerned about data center hardware than consumer hardware. So probably also stuff like power pc.
nisten · 1h ago
I feel like I just got a 3 page intro to autism.

It's glorious.

ngcc_hk · 2h ago
More interesting than I thought it could be. A domain specific tutorial is so much better.
sylware · 2h ago
There is serious abuse of nasm macro-preprocessor. Going to be tough to move away to another assembler.
loeg · 2h ago
Why move away?
oguz-ismail · 2h ago
Where? There's very little code in those lessons
pveierland · 2h ago
The lessons reference `cglobal` in `x86inc.asm`:

https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/x...