This is a really nice walkthrough of matching trade offs to acceptable distortions for a known signal type. Even if you’re selecting rather than designing a codec, it’s a great process to follow.
For those interesting in the ultra low latency space (where you’re willing to trade a bit of bandwidth to gain quality and minimise latency), VSF have a pretty good wrap up of other common options and what they each optimise for: https://static.vsf.tv/download/technical_recommendations/VSF...
Thaxll · 6h ago
There is the creator of VLC that is working on something similar, very cutting edge.
Having worked in the space, I'd have to say hardware encoders and H.264 is pretty dang good - NVENC works with very little latency (if you tell it to, and disable the features that increase it, such as multiple frame prediction, B-frames).
The two things that increase latency are more advanced processing algorithms, giving the encoder more stuff to do, and schemes that require waiting multiple frames. If you go disable those, the encoder can pretty much start working on your frame the nanosecond the GPU stops rendering to it, and have it encoded in <10ms.
Wowfunhappy · 1h ago
> have it encoded in <10ms.
For context, OP achieved 0.13 ms with his codec.
dishsoap · 2h ago
10ms is quite long in this context.
latchkey · 3h ago
Sadly appears to be unavailable.
sippeangelo · 6h ago
I know next to nothing about video encoding, but I feel like there should be so much low hanging fruit when it comes to videogame streaming if the encoder just cooperated with the game engine even slightly. Things like motion prediction would be free since most rendering engines already have a dedicated buffer just for that for its own rendering, for example. But there's probably some nasty patent hampering innovation there, so might as well forget it!
torginus · 5h ago
'Motion vectors' in H.264 are a weird bit twiddling/image compression hack and have nothing to do with actual motion vectors.
- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame
- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)
This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.
pornel · 1h ago
Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.
In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.
Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.
However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.
robterrell · 1h ago
Isn't the use of the H.264 motion vector to preserve bit when there is a camera pan? A pan is a case where every pixel in the frame will change, but maybe doesn't have to.
ChadNauseam · 1h ago
I think you're right. Suppose the connection to the game streaming service adds two frames of latency, and the player is playing an FPS. One thing game engines could do is provide the game UI and the "3D world view" as separate framebuffers. Then, when moving the mouse on the client, the software could translate the 3D world view instantly for the next two frames that came from the server but are from before the user having moved their mouse.
VR games already do something like this, so that when a game runs at below the maximum FPS of the VR headset, it can still respond to your head movements. It's not perfect because there's no parallax and it can't show anything for the region that was previously outside of your field of view, but it still makes a huge difference. (Of course, it's more important for VR because without doing this, any lag spike in a game would instantly induce motion sickness in the player. And if they wanted to, parallax could be faked using a depth map)
mikepurvis · 2h ago
I’ve wondered about this as well, like most clients should be capable of still doing a bit of compositing. Like if you sent billboard renders of background objects at lower fidelity/frequency than foreground characters, updated hud objects with priority and using codecs that prioritize clarity, etc.
It was always shocking to me that Stadia was literally making their own games in house and somehow the end result was still just a streamed video and the latency gains were supposed to come from edge deployed gpus and a wifi-connected controller.
Then again, maybe they tried some of this stuff and the gains weren't worth it relative to battle-tested video codecs.
toast0 · 6h ago
For 2d sprite games, OMG yes, you could provide some very accurate motion vectors to the encoder. For 3d rendered games, I'm not so sure. The rendering engine has (or could have) motion vectors for the 3d objects, but you'd have to translate them to the 2d world the encoder works in; I don't know if it's reasonable to do that ... or if it would help the encoder enough to justify.
sudosysgen · 6h ago
Schemes like DLSS already do provide 2D motion vectors, it's not necessarily a crazy ask.
markisus · 2h ago
The ultimate compression is to send just the user inputs and reconstitute the game state on the other end.
w-ll · 2h ago
The issue is the "reconstitute the game state on the other end" when it comes to at least how I travel.
I haven't in a while but I used to use https://parsec.app/ on a cheap intel Air to do my STO dailies on vacation. It sends inputs, but gets a compressed stream. Im curious of any OS of something similar.
IshKebab · 6h ago
I don't think games do normally have a motion vector buffer. I guess they could render one relatively easily, but that's a bit of a chicken and egg problem.
garaetjjte · 6h ago
They do, one reason is postprocessing effects like motion blur, another is antialiasing like TAA or DLSS upscaling.
IshKebab · 6h ago
Yeah I did almost mention motion blur but do many games use that? I don't play many AAA games TBF so maybe I'm just out of date...
Take something like Rocket League for example. Definitely doesn't have velocity buffers.
raincole · 4h ago
> Take something like Rocket League for example. Definitely doesn't have velocity buffers.
How did you reach this conclusion? Rocket League looks like a game that definitely have velocity buffers to me. (Many fast-moving scenarios + motion blur)
izacus · 5h ago
Yes, most games these days have motion blur and motion vector buffers.
Yes even Rocket League has it
shmerl · 3h ago
Many games have it, but I always turn it off. I guess some like its cinematic effect, but I prefer less motion blur, not more.
ACCount36 · 5h ago
Exposing motion vectors is a prerequisite for a lot of AI framegen tech. If you could tap that?
cma · 5h ago
> Things like motion prediction would be free since most rendering engines already have a dedicated buffer just for that for its own rendering, for example.
Doesn't work for translucency and shader animation. The latter can be made to work if the shader can also calculate motion vectors.
keketi · 4h ago
Have an LLM transcribe what is happening in the game into a few sentences per frame, transfer the text over network and have another LLM reconstruct the frame from the text. It won't be fast, it's going to be lossy, but compression ratio is insane and it's got all the right buzzwords.
jameshart · 4h ago
Frame 1:
You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here.
Eduard · 4h ago
(user input: mouse delta: (-20, -8))
Frame 2:
A few blades of grass sway gently in the breeze. The camera begins to drift slightly, as if under player control — a faint ambient sound begins: wind and birds.
Y_Y · 4h ago
kill jester
taneq · 2h ago
Ah, this explains why there are clowns under the bed and creepy children staring at me from the forest.
cyclotron3k · 1h ago
Send the descriptions via the blockchain so there's an immutable record
poglet · 4h ago
Maybe even one day we reach point where the game can run locally on the end users' machine.
foota · 4h ago
You've got my attention
raphman · 6h ago
Very cool - That's nearly exactly what I need for a research project.
FWIW, there's also the non-free JPEG-XS standard [1] which also claims very low latency [2] and might be a safer choice for commercial projects, given that there is a patent pool around it.
We currently use the IntoPIX CUDA encoder/decoder implementation, and SRT for the low-level transport.
You can definitely achieve end-to-end latencies <16ms over decent networks.
We have customers deploying their machines in data centres and using them in their post-production facilities in the centre of town, usually over a 10GbE link. But I've had others using 1GbE links between countries, running at higher compression ratios.
indolering · 3h ago
A patent pool doesn't make you safer: it's just a patent troll charging you to cross the bridge. They are not offering insurance against more patent trolls blackmailing you after you cross the bridge.
This CODEC uses the same base algorithm as HTJ2K (High-Throughput JPEG 2000).
If the author is reading this, it would be very interesting to read about the differences between this method and HTJ2K.
ChadNauseam · 1h ago
Great article. One thing I've always noticed is that when you get good enough at coding it just turns into math. I hope I can reach that level some day.
monster_truck · 6h ago
Looks like NDI without any of the conveniences.
You're doing something wrong if nvenc is any slower, the llhp preset should be all you need.
One thing to note when designing a new video codec is to carpet bomb around the idea with research projects to stake claim to any possible feature enhancements.
Anything can have an improvement patent filed against, no matter the license.
CharlesW · 6h ago
> Given how niche and esoteric this codec is, it’s hard to find any actual competing codecs to compare against.
It'd be interesting to see benchmarks against H.264/AVC (see example "zero‑latency" ffmpeg settings below) and JPEG XS.
Can't wait until one day this gets into Moonlight or something like it.
cpeth · 6h ago
Exactly what I was thinking. Wish I had the time and expertise to give adding support for this codec myself a go. Streaming Clair Obscure over my LAN via Sunshine / Moonlight is exactly my use-case and the latency could definitely be better.
nairoz · 5h ago
It's really cool. I have always wondered if it would be possible to have video encoders designed for some specific games with prior knowledge about important regions to encode with more details. Example would be the center of the screen for the main character.
kookamamie · 6h ago
Not bad. The closest competition would be NDI from NewTek, now Vizrt. It targets similar bitrate and latency ranges.
crazygringo · 6h ago
Fascinating! But...
> The go-to solution here is GPU accelerated video compression
Isn't the solution usually hardware encoding?
> I think this is an order of magnitude faster than even dedicated hardware codecs on GPUs.
Is there an actual benchmark though?
I would have assumed that built-in hardware encoding would always be faster. Plus, I'd assume your game is already saturating your GPU, so the last thing you want to do is use it for simultaneous video encoding. But I'm not an expert in either of these, so curious to know if/how I'm wrong here? Like if hardware encoders are designed to be real-time, but intentionally trade off latency for higher compression? And is the proposed video encoding really is so lightweight it can easily share the GPU without affecting game performance?
averne_ · 6h ago
Hardware GPU encoders refer to dedicated ASIC engines, separate from the main shader cores. So they run in parallel and there is no performance penalty for using both simultaneously, besides increased power consumption.
Generally, you're right that these hardware blocks favor latency. One example of this is motion estimation (one of the most expensive operations during encoding). The NVENC engine on NVidia GPUs will only use fairly basic detection loops, but can optionally be fed motion hints from an external source. I know that NVidia has a CUDA-based motion estimator (called CEA) for this purpose. On recent GPUs there is also the optical flow engine (another separate block) which might be able to do higher quality detection.
superjan · 6h ago
I love this. The widely used standards for video compression are focused on compression efficiency, which is important if you’re netflix or youtube, but sometimes latency and low complexity is more important.
Even if only to play around and learn how a video codec actually works.
CharlesW · 6h ago
> The widely used standards for video compression are focused on compression efficiency, which is important if you’re netflix or youtube, but sometimes latency and low complexity is more important.
That's a misconception. All modern video codecs (i.e. H.264/AVC, H.265/HEVC, AV1) have explicit, first-class tools, profiles, and reference modes aimed at both low- and high-resolution low‑latency and/or low‑complexity use.
It's possible to follow along with ffmpeg encoding for visual inspection without waiting for the whole job to complete with the tee muxer and ffplay.
GPU Screen Recorder and Sunlight server expose some encoder options in GUI forms, but parameter optimization is still manual; nothing does easyVmaf with thumbnails of each rendering parameter set with IDK auto-identification of encoding artifacts.
Ardour has a "Loudness Analyzer & Normalizer" with profiles for specific streaming services.
What are good target bitrates for low-latency livestreaming 4k with h264, h265 (HDR), and AV1?
kreco · 7h ago
I have nothing interesting to say, but thanks for author and for the one who shared the article. It was a good read.
hashtekar · 6h ago
What a great read and such a throwback for me. I worked on video compression techniques using wavelets 30yrs ago. Computing power and networking speeds were not what they are now and I had difficulty getting the backing to carry it forward. I’m so happy that this still has such active development and boundaries are still being pushed. Bravo.
richardw · 6h ago
If the system knew both sides were the same vendor or used the same algorithm, would it be better to stream the scene/instructions rather than the video?
I suppose the issue would be media. Faster to load locally than push it out. Could be semi solved with typical web caching approaches.
10000truths · 6h ago
Are there any solutions to game streaming that build an RPC on top of the DirectX/Vulkan API and data structures? I feel like streaming a serialized form of the command queue over the network would be more efficient than streaming video frames.
wmf · 6h ago
I don't think this is true when you count texture uploads. Loading 8 GB of textures over the network would take a while.
10000truths · 5h ago
Only once, then subsequent references to the texture(s) would be done via descriptor. Most game engines will preload large assets like textures before rendering a scene.
duskwuff · 3h ago
That really depends on the game. Loading every texture in advance isn't practical for all games - many newer "open world" games will stream textures to the GPU as needed based on the player's location and what they're doing.
Also, modern game textures are a lot of data.
10000truths · 1h ago
True, on-demand loading/unloading of large textures still needs to be handled. Video streaming handles congestion by sacrificing fidelity to reduce bitrate. A similar approach could be taken with textures by downsampling them (or, better yet, streaming them with a compression codec that supports progressive decoding).
babypuncher · 5h ago
What's the point in streaming a video game from one computer to another if the client machine still needs the expensive and power hungry dedicated graphics hardware to display it?
mschuster91 · 5h ago
You could use that to have a chungus dGPU with a massive amount of VRAM keep all the assets, position data and god knows what else to do the heavy lifting - determine what needs to be drawn and where, deal with physics, 3D audio simulation etc. - and then offload only a (comparatively) small-ish amount of work to the client GPU.
theLiminator · 6h ago
Pretty cool, though I think in practice network latency dominates so much that this kind of optimization is fairly low impact.
I think the main advantage is perhaps the robustness against packet drops is better.
_kb · 26m ago
Depends on the network. Where this style of codec is common you’re not traversing internet so transport latency, including switch forwarding, is normally in the microseconds. The killer is the display device that ends up rendering this. If you’re not careful that can add 10-100ms to glass to glass times.
babypuncher · 5h ago
It could make in-home streaming actually usable for me. I've never been happy with the lag in Steam's streaming or moonlight, even when both the server and client are on the same switch. That's not a network latency problem, that's an everything else problem.
pimlottc · 6h ago
What are existing streaming game services using? Surely there is some good previous work in this area?
toast0 · 1h ago
I doubt any of those are willing to drop 100mbps+ per client. If the clients can even manage it.
wmf · 6h ago
Probably GPU hardware encoding using low-latency mode. Somebody reported 1 ms latency.
kevingadd · 6h ago
The sample screenshot of Expedition 33 is really impressive quality considering it appears to be encoding at around 1 bit per pixel and (according to the post) it took a fraction of a millisecond to encode it. This is an order of magnitude faster than typical hardware encoders, AFAIK.
Very cool work explained well.
tombert · 5h ago
One of my bucket list things is to some day build a video codec from scratch. I have no delusions of competing with h264 or anything like that, just something that does some basic compression and can play videos in the process.
For those interesting in the ultra low latency space (where you’re willing to trade a bit of bandwidth to gain quality and minimise latency), VSF have a pretty good wrap up of other common options and what they each optimise for: https://static.vsf.tv/download/technical_recommendations/VSF...
https://streaminglearningcenter.com/codecs/an-interview-with...
Ultra low latency for streaming.
https://www.youtube.com/watch?v=0RvosCplkCc
The two things that increase latency are more advanced processing algorithms, giving the encoder more stuff to do, and schemes that require waiting multiple frames. If you go disable those, the encoder can pretty much start working on your frame the nanosecond the GPU stops rendering to it, and have it encoded in <10ms.
For context, OP achieved 0.13 ms with his codec.
- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame
- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)
This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.
In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.
Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.
However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.
VR games already do something like this, so that when a game runs at below the maximum FPS of the VR headset, it can still respond to your head movements. It's not perfect because there's no parallax and it can't show anything for the region that was previously outside of your field of view, but it still makes a huge difference. (Of course, it's more important for VR because without doing this, any lag spike in a game would instantly induce motion sickness in the player. And if they wanted to, parallax could be faked using a depth map)
It was always shocking to me that Stadia was literally making their own games in house and somehow the end result was still just a streamed video and the latency gains were supposed to come from edge deployed gpus and a wifi-connected controller.
Then again, maybe they tried some of this stuff and the gains weren't worth it relative to battle-tested video codecs.
I haven't in a while but I used to use https://parsec.app/ on a cheap intel Air to do my STO dailies on vacation. It sends inputs, but gets a compressed stream. Im curious of any OS of something similar.
Take something like Rocket League for example. Definitely doesn't have velocity buffers.
How did you reach this conclusion? Rocket League looks like a game that definitely have velocity buffers to me. (Many fast-moving scenarios + motion blur)
Yes even Rocket League has it
Doesn't work for translucency and shader animation. The latter can be made to work if the shader can also calculate motion vectors.
You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here.
Frame 2:
A few blades of grass sway gently in the breeze. The camera begins to drift slightly, as if under player control — a faint ambient sound begins: wind and birds.
FWIW, there's also the non-free JPEG-XS standard [1] which also claims very low latency [2] and might be a safer choice for commercial projects, given that there is a patent pool around it.
[1] https://www.jpegxs.com/
[2] https://ds.jpeg.org/whitepapers/jpeg-xs-whitepaper.pdf
https://www.filmlight.ltd.uk/store/press_releases/filmlight-...
We currently use the IntoPIX CUDA encoder/decoder implementation, and SRT for the low-level transport.
You can definitely achieve end-to-end latencies <16ms over decent networks.
We have customers deploying their machines in data centres and using them in their post-production facilities in the centre of town, usually over a 10GbE link. But I've had others using 1GbE links between countries, running at higher compression ratios.
If the author is reading this, it would be very interesting to read about the differences between this method and HTJ2K.
You're doing something wrong if nvenc is any slower, the llhp preset should be all you need.
One thing to note when designing a new video codec is to carpet bomb around the idea with research projects to stake claim to any possible feature enhancements.
Anything can have an improvement patent filed against, no matter the license.
It'd be interesting to see benchmarks against H.264/AVC (see example "zero‑latency" ffmpeg settings below) and JPEG XS.
Can't wait until one day this gets into Moonlight or something like it.
> The go-to solution here is GPU accelerated video compression
Isn't the solution usually hardware encoding?
> I think this is an order of magnitude faster than even dedicated hardware codecs on GPUs.
Is there an actual benchmark though?
I would have assumed that built-in hardware encoding would always be faster. Plus, I'd assume your game is already saturating your GPU, so the last thing you want to do is use it for simultaneous video encoding. But I'm not an expert in either of these, so curious to know if/how I'm wrong here? Like if hardware encoders are designed to be real-time, but intentionally trade off latency for higher compression? And is the proposed video encoding really is so lightweight it can easily share the GPU without affecting game performance?
Generally, you're right that these hardware blocks favor latency. One example of this is motion estimation (one of the most expensive operations during encoding). The NVENC engine on NVidia GPUs will only use fairly basic detection loops, but can optionally be fed motion hints from an external source. I know that NVidia has a CUDA-based motion estimator (called CEA) for this purpose. On recent GPUs there is also the optical flow engine (another separate block) which might be able to do higher quality detection.
That's a misconception. All modern video codecs (i.e. H.264/AVC, H.265/HEVC, AV1) have explicit, first-class tools, profiles, and reference modes aimed at both low- and high-resolution low‑latency and/or low‑complexity use.
AV1: Improving RTC Video Quality at Scale: https://atscaleconference.com/av1-improving-rtc-video-qualit...
Objective metrics and tools for video encoding and source signal quality: netflix/vmaf, easyVmaf, psy-ex/metrics, ffmpeg-quality-metrics,
Ffmpeg settings for low-latency encoding:
It's possible to follow along with ffmpeg encoding for visual inspection without waiting for the whole job to complete with the tee muxer and ffplay.GPU Screen Recorder and Sunlight server expose some encoder options in GUI forms, but parameter optimization is still manual; nothing does easyVmaf with thumbnails of each rendering parameter set with IDK auto-identification of encoding artifacts.
Ardour has a "Loudness Analyzer & Normalizer" with profiles for specific streaming services.
What are good target bitrates for low-latency livestreaming 4k with h264, h265 (HDR), and AV1?
I suppose the issue would be media. Faster to load locally than push it out. Could be semi solved with typical web caching approaches.
Also, modern game textures are a lot of data.
I think the main advantage is perhaps the robustness against packet drops is better.
Very cool work explained well.
Maybe I should try for that next weekend.