QUIC for the kernel

99 Bogdanp 55 7/31/2025, 3:57:32 PM lwn.net ↗

Comments (55)

WASDx · 1h ago
I recall this article on QUIC disadvantages: https://www.reddit.com/r/programming/comments/1g7vv66/quic_i...

Seems like this is a step in the right direction to resole some of those issues. I suppose nothing is preventing it from getting hardware support in future network cards as well.

miohtama · 1h ago
QUIC does not work very well for use cases like machine-to-machine traffic. However most of traffic in Internet today is from mobile phones to servers and it is were QUIC and HTTP 3 shine.

For other use cases we can keep using TCP.

thickice · 46m ago
Why doesn't QUIC work well for machine-to-machine traffic ? Is it due to the lack of offloads/optimizations for TCP and machine-to-machine traffic tend to me high volume/high rate ?
extropy · 2m ago
The NAT firewalls do not like P2P UDP traffic. Majoritoy of the routers lack the smarts to passtrough QUIC correctly, they need to treat it the same as TCP essentially.
yello_downunder · 35m ago
QUIC would work okay, but not really have many advantages for machine-to-machine traffic. Machine-to-machine you tend to have long-lived connections over a pretty good network. In this situation TCP already works well and is currently handled better in the kernel. Eventually QUIC will probably be just as good for TCP in this use case, but we're not there yet.
jabart · 16m ago
You still have latency, legacy window sizes, and packet schedulers to deal with.
m00x · 9m ago
It's explained in the reddit thread. Most of it is because you have to handle a ton of what TCP does in userland.
kibwen · 12m ago
I'm confused, I thought the revolution of the past decade or so was in moving network stacks to userspace for better performance.
michaelsshaw · 7m ago
The constant mode switching for hardware access is slow. TCP/IP remains in the kernel for windows and Linux.
Ericson2314 · 44m ago
What will the socket API look like for multiple streams? I guess it is implied it is the same as multiple connections, with caching behind the scenes.

I would hope for something more explicit, where you get a connection object and then open streams from it, but I guess that is fine for now.

https://github.com/microsoft/msquic/discussions/4257 ah but look at this --- unless this is an extension, the server side can also create new streams, once a connection is established. The client creating new "connections" (actually streams) cannot abstract over this. Something fundamentally new is needed.

My guess is recvmsg to get a new file descriptor for new stream.

gte525u · 38m ago
I would look at SCTP socket API it supports multistreaming.
Bender · 16m ago
I don't know about using it in the kernel but I would love to see OpenSSH support QUIC so that I get some of the benefits of Mosh [1] while still having all the features of OpenSSH including SFTP, SOCKS, port forwarding, less state table and keep alive issues, roaming support, etc... Could OpenSSH leverage the kernel support?

[1] - https://mosh.org/

wosined · 50m ago
The general web is slowed down by bloated websites. But I guess this can make game latency lower.
fmbb · 40m ago
https://en.m.wikipedia.org/wiki/Jevons_paradox

The Jevons Paradox is applicable in a lot of contexts.

More efficient use of compute and communications resources will lead to higher demand.

In games this is fine. We want more, prettier, smoother, pixels.

In scientific computing this is fine. We need to know those simulation results.

On the web this is not great. We don’t want more ads, tracking, JavaScript.

01HNNWZ0MV43FF · 24m ago
No, the last 20 years of browser improvements has made my static site incredibly fast!

I'm benefiting from WebP, JS JITs, Flexbox, zstd, Wasm, QUIC, etc, etc

dahfizz · 1h ago
> QUIC is meant to be fast, but the benchmark results included with the patch series do not show the proposed in-kernel implementation living up to that. A comparison of in-kernel QUIC with in-kernel TLS shows the latter achieving nearly three times the throughput in some tests. A comparison between QUIC with encryption disabled and plain TCP is even worse, with TCP winning by more than a factor of four in some cases.

Jesus, that's bad. Does anyone know if userspace QUIC implementations are also this slow?

Veserv · 12m ago
Yes. msquic is one of the best performing implementations and only achieves ~7 Gbps [1]. The benchmarks for the Linux kernel implementation only get ~3 Gbps to ~5 Gbps with encryption disabled.

To be fair, the Linux kernel TCP implementation only gets ~4.5 Gbps at normal packets sizes and still only achieves ~24 Gbps with large segmentation offload [2]. Both of which are ridiculously slow. It is straightforward to achieve ~100 Gbps/core at normal packet sizes without segmentation offload with the same features as QUIC with a properly designed protocol and implementation.

[1] https://microsoft.github.io/msquic/

[2] https://lwn.net/ml/all/cover.1751743914.git.lucien.xin@gmail...

dan-robertson · 59m ago
I think the ‘fast’ claims are just different. QUIC is meant to make things fast by:

- having a lower latency handshake

- avoiding some badly behaved ‘middleware’ boxes between users and servers

- avoiding resetting connections when user up addresses change

- avoiding head of line blocking / the increased cost of many connections ramping up

- avoiding poor congestion control algorithms

- probably other things too

And those are all things about working better with the kind of network situations you tend to see between users (often on mobile devices) and servers. I don’t think QUIC was meant to be fast by reducing OS overhead on sending data, and one should generally expect it to be slower for a long time until operating systems become better optimised for this flow and hardware supports offloading more of the work. If you are Google then presumably you are willing to invest in specialised network cards/drivers/software for that.

jeroenhd · 19m ago
> - avoiding some badly behaved ‘middleware’ boxes between users and servers

Surely badly behaving middleboxes won't just ignore UDP traffic? If anything, they'd get confused about udp/443 and act up, forcing clients to fall back to normal TCP.

dahfizz · 50m ago
Yeah I totally get that it optimizes for different things. But the trade offs seem way too severe. Does saving one round trip on the handshake mean anything at all if you're only getting one fourth of the throughput?
yello_downunder · 16m ago
It depends on the use case. If your server is able to handle 45k connections but 42k of them are stalled because of mobile users with too much packet loss, QUIC could look pretty attractive. QUIC is a solution to some of the problematic aspects of TCP that couldn't be fixed without breaking things.
eptcyka · 26m ago
There are claims of 2x-3x operating costs on the server side to deliver better UX for phone users.
dan-robertson · 42m ago
Are you getting one fourth of the throughput? Aren’t you going to be limited by:

- bandwidth of the network

- how fast the nic on the server is

- how fast the nic on your device is

- whether the server response fits in the amount of data that can be sent given the client’s initial receive window or whether several round trips are required to scale the window up such that the server can use the available bandwidth

brokencode · 37m ago
Maybe it’s a fourth as fast in ideal situations with a fast LAN connection. Who knows what they meant by this.

It could still be faster in real world situations where the client is a mobile device with a high latency, lossy connection.

klabb3 · 1h ago
Yes, they are. Worse, I’ve seen them shrink down to nothing in the face of congestion with TCP traffic. If Quic is indeed the future protocol, it’s a good thing to move it into the kernel IMO. It’s just madness to provide these massive userspace impls everywhere, on a packet switched protocol nonetheless, and expect it to beat good old TCP. Wouldn’t surprise me if we need optimizations all the way down to the NIC layer, and maybe even middleboxes. Oh and I haven’t even mentioned the CPU cost of UDP.

OTOH, TCP is like a quiet guy at the gym who always wears baggy clothes but does 4 plates on the bench when nobody is looking. Don't underestimate. I wasted months to learn that lesson.

vladvasiliu · 1h ago
Why is QUIC being pushed, then?
toast0 · 53m ago
It has good properties compared to tcp-in-tcp (http/2), especially when connected to clients without access to modern congestion control on iffy networks. http/2 was perhaps adopted too broadly; binary protocol is useful, header compression is useful (but sometimes dangerous), but tcp multiplexing is bad, unless you have very low loss ... it's not ideal for phones with inconsistent networking.
favflam · 58m ago
I know in the p2p space, peers have to send lots of small pieces of data. QUIC stops stream blocking on a single packet delay.
fkarg · 30m ago
because it _does_ provide a number of benefits (potentially fewer initial round-trips, more dynamic routing control by using UDP instead of TCP, etc), and is a userspace softare implementation compared with a hardware-accelerated option.

QUIC getting hardware acceleration should close this gap, and keep all the benefits. But a kernel (software) implementation is basically necessary before it can be properly hardware-accelerated in future hardware (is my current understanding)

01HNNWZ0MV43FF · 11m ago
To clarify, the userspace implementation is not a benefit, it's just that you can't have a brand new protocol dropped into a trillion dollars of existing hardware overnight, you have to do userspace first as PoC

It does save 2 round-trips during connection compared to TLS-over-TCP, if Wikipedia's diagram is accurate: https://en.wikipedia.org/wiki/QUIC#Characteristics That is a decent latency win on every single connection, and with 0-RTT you can go further, but 0-RTT is stateful and hard to deploy and I expect it will see very little use.

dan-robertson · 58m ago
The problem it is trying to solve is not overhead of the Linux kernel on a big server in a datacenter
eptcyka · 27m ago
QUIC performance requires careful use of batching. Using UDP spckets naively, i.e. sending one QUIC packet per syscall, will incur a lot of oberhead - every time the kernel has to figure out which interface to use, queue it up on a buffer, and all the rest. If one uses it like TCP, batching up lots of data and enquing packets in one “call” helps a ton. Similarly, the kernel wireguard implementation can be slower than wireguard-go since it doesn’t batch traffic. At the speeds offered by modern hardware, we really need to use vectored I/O to be efficient.
rayiner · 29m ago
It’s an interesting testament to how well designed TCP is.
euphamism · 42m ago
> causing the next cat video to be that much slower to arrive.

You mean "causing the next advertisement to be that much slower"

> for that all-important web-browsing use case.

You mean "for that all-important advertising-display use case."

> But middleboxes on the Internet also make free use of connection information > [...] > As QUIC gains the hardware support that TCP benefits from,

It will gain the ossification problems that TCP suffers from. That _was_ quick!

No comments yet

valorzard · 58m ago
Would this (eventually) include the unreliable datagram extension?
wosined · 49m ago
Don't know if it could get faster than UDP if it is on top of it.
valorzard · 38m ago
The use case for this would be running a multiplayer game server over QUIC
01HNNWZ0MV43FF · 10m ago
Other use cases include video / audio streaming, VPNs over QUIC, and QUIC-over-QUIC (you never know)
jeffbee · 1h ago
This seems to be a categorical error, for reasons that are contained in the article itself. The whole appeal of QUIC is being immune to ossification, being free to change parameters of the protocol without having to beg Linux maintainers to agree.
corbet · 1h ago
Ossification does not come about from the decisions of "Linux maintainers". You need to look at the people who design, sell, and deploy middleboxes for that.
jeffbee · 1h ago
I disagree. There is plenty of ossification coming from inside the house. Just some examples off the top of my head are the stuck-in-1974 minimum RTO and ack delay time parameters, and the unwillingness to land microsecond timestamps.
otterley · 1h ago
Not a networking expert, but does TCP in IPv6 suffer the same maladies?
pumplekin · 59m ago
Yes.

Layer4 TCP is pretty much just slapped on top of Layer3 IPv4 or IPv6 in exactly the same way for both of them.

Outside of some little nitpicky things like details on how TCP MSS clamping works, it is basically the same.

ComputerGuru · 1m ago
…which is basically how it’s supposed to work (or how we teach that it’s supposed to work). (Not that you said anything to the contrary!)
toast0 · 1h ago
IMHO, you likely want the server side to be in the kernel, so you can get to performance similar to in-kernel TCP, and ossification is less of a big deal, because it's "easy" to modify the kernel on the server side.

OTOH, you want to be in user land on the client, because modifying the kernel on clients is hard. If you were Google, maybe you could work towards a model where Android clients could get their in-kernel protocol handling to be something that could be updated regularly, but that doesn't seem to be something Google is willing or able to do; Apple and Microsoft can get priority kernel updates out to most of their users quickly; Apple also can influence networks to support things they want their clients to use (IPv6, MP-TCP). </rant>

If you were happy with congestion control on both sides of TCP, and were willing to open multiple TCP connections like http/1, instead of multiplexing requests on a single connection like http/2, (and maybe transfer a non-pessimistic bandwidth estimate between TCP connections to the same peer), QUIC still gives you control over retransmission that TCP doesn't, but I don't think that would be compelling enough by itself.

Yes, there's still ossification in middle boxes doing TCP optimization. My information may be old, but I was under the impression that nobody does that in IPv6, so the push for v6 is both a way to avoid NAT and especially CGNAT, but also a way to avoid optimizer boxes as a benefit for both network providers (less expense) and services (less frustration).

jeffbee · 1h ago
This is a perspective, but just one of many. The overwhelming majority of IP flows are within data centers, not over planet-scale networks between unrelated parties.
jeroenhd · 10m ago
Unless you're using QUIC as some kind of datacenter-to-datacenter protocol (basically as SCTP on steroids with TLS), I don't think QUIC in the datacenter makes much sense at all.

As very few server administrators bother turning on features like MPTCP, QUIC has an advantage on mobile phones with moderate to bad reception. That's not a huge issue for me most of the time, but billions of people are using mobile phones as their only access to the internet, especially in developing countries that are practically skipping widespread copper and fiber infrastructure and moving directly to 5G instead. Any service those people are using should probably consider implementing QUIC, and if they use it, they'd benefit from an in-kernel server.

All the data center operators can stick to (MP)TCP, the telco people can stick to SCTP, but the consumer facing side of the internet would do well to keep QUIC as an option.

toast0 · 1h ago
I've never been convinced by an explanation of how QUIC applies for flows in the data center.

Ossification doesn't apply (or it shouldn't, IMHO, the point of Open Source software is that you can change it to fit your needs... if you don't like what upstream is doing, you should be running a local fork that does what you want... yeah, it's nicer if it's upstreamed, but try running a local fork of Windows or MacOS); you can make congestion control work for you when you control both sides; enterprise switches and routers aren't messing with tcp flows. If you're pushing enough traffic that this is an issue, the cost of QUIC seems way too high to justify, even if it helps with some issues.

darksaints · 1h ago
For the love of god, can we please move to microkernel-based operating systems already? We're adding a million lines of code to the linux kernel every year. That's so much attack surface area. We're setting ourselves up for a kessler syndrome of sorts with every system that we add to the kernel.
01HNNWZ0MV43FF · 4m ago
Redox is a microkernel written in Rust
wosined · 46m ago
I might be wrong, but microkernel also need drivers, so the attack surface would be the same, or not?
mdavid626 · 1h ago
Most of that code is not loaded into the kernel, only when needed.
darksaints · 58m ago
True, but the last time I checked (several years ago), the size of the portion of code that is not drivers or kernel modules was still 7 million lines of code, and the average system still has to load a few million more via kernel modules and drivers. That is still a phenomenally large attack surface.

The SeL4 kernel is 10k lines of code. OKL4 is 13k. QNX is ~30k.

arp242 · 41m ago
Can I run Firefox or PostgreSQL with reasonable performance on SeL4, OKL4, or QNX?
regularfry · 52m ago
You've still got combinatorial complexity problem though, because you never know what a specific user is going to load.