I'm confused, I thought the revolution of the past decade or so was in moving network stacks to userspace for better performance.
Bender · 6m ago
I don't know about using it in the kernel but I would love to see OpenSSH support QUIC so that I get some of the benefits of Mosh [1] while still having all the features of OpenSSH including SFTP, SOCKS, port forwarding, less state table and keep alive issues, roaming support, etc... Could OpenSSH leverage the kernel support?
What will the socket API look like for multiple streams? I guess it is implied it is the same as multiple connections, with caching behind the scenes.
I would hope for something more explicit, where you get a connection object and then open streams from it, but I guess that is fine for now.
https://github.com/microsoft/msquic/discussions/4257 ah but look at this --- unless this is an extension, the server side can also create new streams, once a connection is established. The client creating new "connections" (actually streams) cannot abstract over this. Something fundamentally new is needed.
My guess is recvmsg to get a new file descriptor for new stream.
gte525u · 28m ago
I would look at SCTP socket API it supports multistreaming.
Seems like this is a step in the right direction to resole some of those issues. I suppose nothing is preventing it from getting hardware support in future network cards as well.
miohtama · 54m ago
QUIC does not work very well for use cases like machine-to-machine traffic. However most of traffic in Internet today is from mobile phones to servers and it is were QUIC and HTTP 3 shine.
For other use cases we can keep using TCP.
thickice · 36m ago
Why doesn't QUIC work well for machine-to-machine traffic ? Is it due to the lack of offloads/optimizations for TCP and machine-to-machine traffic tend to me high volume/high rate ?
yello_downunder · 25m ago
QUIC would work okay, but not really have many advantages for machine-to-machine traffic. Machine-to-machine you tend to have long-lived connections over a pretty good network. In this situation TCP already works well and is currently handled better in the kernel. Eventually QUIC will probably be just as good for TCP in this use case, but we're not there yet.
jabart · 6m ago
You still have latency, legacy window sizes, and packet schedulers to deal with.
wosined · 40m ago
The general web is slowed down by bloated websites. But I guess this can make game latency lower.
> QUIC is meant to be fast, but the benchmark results included with the patch series do not show the proposed in-kernel implementation living up to that. A comparison of in-kernel QUIC with in-kernel TLS shows the latter achieving nearly three times the throughput in some tests. A comparison between QUIC with encryption disabled and plain TCP is even worse, with TCP winning by more than a factor of four in some cases.
Jesus, that's bad. Does anyone know if userspace QUIC implementations are also this slow?
Veserv · 2m ago
Yes. msquic is one of the best performing implementations and only achieves ~7 Gbps [1]. The benchmarks for the Linux kernel implementation only get ~3 Gbps to ~5 Gbps.
To be fair, the Linux kernel TCP implementation only gets ~4.5 Gbps at normal packets sizes and still only achieves ~24 Gbps with large segmentation offload. Both of which are ridiculously slow. It is easy to achieve ~100 Gbps/core of control plane at normal packet sizes with a properly designed protocol so you should only be bottlenecking on your encryption at ~50 Gbps/core.
I think the ‘fast’ claims are just different. QUIC is meant to make things fast by:
- having a lower latency handshake
- avoiding some badly behaved ‘middleware’ boxes between users and servers
- avoiding resetting connections when user up addresses change
- avoiding head of line blocking / the increased cost of many connections ramping up
- avoiding poor congestion control algorithms
- probably other things too
And those are all things about working better with the kind of network situations you tend to see between users (often on mobile devices) and servers. I don’t think QUIC was meant to be fast by reducing OS overhead on sending data, and one should generally expect it to be slower for a long time until operating systems become better optimised for this flow and hardware supports offloading more of the work. If you are Google then presumably you are willing to invest in specialised network cards/drivers/software for that.
jeroenhd · 9m ago
> - avoiding some badly behaved ‘middleware’ boxes between users and servers
Surely badly behaving middleboxes won't just ignore UDP traffic? If anything, they'd get confused about udp/443 and act up, forcing clients to fall back to normal TCP.
dahfizz · 40m ago
Yeah I totally get that it optimizes for different things. But the trade offs seem way too severe. Does saving one round trip on the handshake mean anything at all if you're only getting one fourth of the throughput?
yello_downunder · 6m ago
It depends on the use case. If your server is able to handle 45k connections but 42k of them are stalled because of mobile users with too much packet loss, QUIC could look pretty attractive. QUIC is a solution to some of the problematic aspects of TCP that couldn't be fixed without breaking things.
eptcyka · 15m ago
There are claims of 2x-3x operating costs on the server side to deliver better UX for phone users.
dan-robertson · 32m ago
Are you getting one fourth of the throughput? Aren’t you going to be limited by:
- bandwidth of the network
- how fast the nic on the server is
- how fast the nic on your device is
- whether the server response fits in the amount of data that can be sent given the client’s initial receive window or whether several round trips are required to scale the window up such that the server can use the available bandwidth
brokencode · 27m ago
Maybe it’s a fourth as fast in ideal situations with a fast LAN connection. Who knows what they meant by this.
It could still be faster in real world situations where the client is a mobile device with a high latency, lossy connection.
klabb3 · 1h ago
Yes, they are. Worse, I’ve seen them shrink down to nothing in the face of congestion with TCP traffic. If Quic is indeed the future protocol, it’s a good thing to move it into the kernel IMO. It’s just madness to provide these massive userspace impls everywhere, on a packet switched protocol nonetheless, and expect it to beat good old TCP. Wouldn’t surprise me if we need optimizations all the way down to the NIC layer, and maybe even middleboxes. Oh and I haven’t even mentioned the CPU cost of UDP.
OTOH, TCP is like a quiet guy at the gym who always wears baggy clothes but does 4 plates on the bench when nobody is looking. Don't underestimate. I wasted months to learn that lesson.
vladvasiliu · 52m ago
Why is QUIC being pushed, then?
toast0 · 43m ago
It has good properties compared to tcp-in-tcp (http/2), especially when connected to clients without access to modern congestion control on iffy networks. http/2 was perhaps adopted too broadly; binary protocol is useful, header compression is useful (but sometimes dangerous), but tcp multiplexing is bad, unless you have very low loss ... it's not ideal for phones with inconsistent networking.
favflam · 48m ago
I know in the p2p space, peers have to send lots of small pieces of data. QUIC stops stream blocking on a single packet delay.
fkarg · 20m ago
because it _does_ provide a number of benefits (potentially fewer initial round-trips, more dynamic routing control by using UDP instead of TCP, etc), and is a userspace softare implementation compared with a hardware-accelerated option.
QUIC getting hardware acceleration should close this gap, and keep all the benefits. But a kernel (software) implementation is basically necessary before it can be properly hardware-accelerated in future hardware (is my current understanding)
01HNNWZ0MV43FF · 1m ago
To clarify, the userspace implementation is not a benefit, it's just that you can't have a brand new protocol dropped into a trillion dollars of existing hardware overnight, you have to do userspace first as PoC
It does save 2 round-trips during connection compared to TLS-over-TCP, if Wikipedia's diagram is accurate: https://en.wikipedia.org/wiki/QUIC#Characteristics That is a decent latency win on every single connection, and with 0-RTT you can go further, but 0-RTT is stateful and hard to deploy and I expect it will see very little use.
dan-robertson · 48m ago
The problem it is trying to solve is not overhead of the Linux kernel on a big server in a datacenter
eptcyka · 16m ago
QUIC performance requires careful use of batching. Using UDP spckets naively, i.e. sending one QUIC packet per syscall, will incur a lot of oberhead - every time the kernel has to figure out which interface to use, queue it up on a buffer, and all the rest. If one uses it like TCP, batching up lots of data and enquing packets in one “call” helps a ton. Similarly, the kernel wireguard implementation can be slower than wireguard-go since it doesn’t batch traffic. At the speeds offered by modern hardware, we really need to use vectored I/O to be efficient.
rayiner · 19m ago
It’s an interesting testament to how well designed TCP is.
euphamism · 32m ago
> causing the next cat video to be that much slower to arrive.
You mean "causing the next advertisement to be that much slower"
> for that all-important web-browsing use case.
You mean "for that all-important advertising-display use case."
> But middleboxes on the Internet also make free use of connection information
> [...]
> As QUIC gains the hardware support that TCP benefits from,
It will gain the ossification problems that TCP suffers from. That _was_ quick!
No comments yet
valorzard · 48m ago
Would this (eventually) include the unreliable datagram extension?
wosined · 39m ago
Don't know if it could get faster than UDP if it is on top of it.
valorzard · 28m ago
The use case for this would be running a multiplayer game server over QUIC
jeffbee · 1h ago
This seems to be a categorical error, for reasons that are contained in the article itself. The whole appeal of QUIC is being immune to ossification, being free to change parameters of the protocol without having to beg Linux maintainers to agree.
corbet · 1h ago
Ossification does not come about from the decisions of "Linux maintainers". You need to look at the people who design, sell, and deploy middleboxes for that.
jeffbee · 1h ago
I disagree. There is plenty of ossification coming from inside the house. Just some examples off the top of my head are the stuck-in-1974 minimum RTO and ack delay time parameters, and the unwillingness to land microsecond timestamps.
otterley · 59m ago
Not a networking expert, but does TCP in IPv6 suffer the same maladies?
pumplekin · 49m ago
Yes.
Layer4 TCP is pretty much just slapped on top of Layer3 IPv4 or IPv6 in exactly the same way for both of them.
Outside of some little nitpicky things like details on how TCP MSS clamping works, it is basically the same.
toast0 · 1h ago
IMHO, you likely want the server side to be in the kernel, so you can get to performance similar to in-kernel TCP, and ossification is less of a big deal, because it's "easy" to modify the kernel on the server side.
OTOH, you want to be in user land on the client, because modifying the kernel on clients is hard. If you were Google, maybe you could work towards a model where Android clients could get their in-kernel protocol handling to be something that could be updated regularly, but that doesn't seem to be something Google is willing or able to do; Apple and Microsoft can get priority kernel updates out to most of their users quickly; Apple also can influence networks to support things they want their clients to use (IPv6, MP-TCP). </rant>
If you were happy with congestion control on both sides of TCP, and were willing to open multiple TCP connections like http/1, instead of multiplexing requests on a single connection like http/2, (and maybe transfer a non-pessimistic bandwidth estimate between TCP connections to the same peer), QUIC still gives you control over retransmission that TCP doesn't, but I don't think that would be compelling enough by itself.
Yes, there's still ossification in middle boxes doing TCP optimization. My information may be old, but I was under the impression that nobody does that in IPv6, so the push for v6 is both a way to avoid NAT and especially CGNAT, but also a way to avoid optimizer boxes as a benefit for both network providers (less expense) and services (less frustration).
jeffbee · 1h ago
This is a perspective, but just one of many. The overwhelming majority of IP flows are within data centers, not over planet-scale networks between unrelated parties.
toast0 · 50m ago
I've never been convinced by an explanation of how QUIC applies for flows in the data center.
Ossification doesn't apply (or it shouldn't, IMHO, the point of Open Source software is that you can change it to fit your needs... if you don't like what upstream is doing, you should be running a local fork that does what you want... yeah, it's nicer if it's upstreamed, but try running a local fork of Windows or MacOS); you can make congestion control work for you when you control both sides; enterprise switches and routers aren't messing with tcp flows. If you're pushing enough traffic that this is an issue, the cost of QUIC seems way too high to justify, even if it helps with some issues.
darksaints · 1h ago
For the love of god, can we please move to microkernel-based operating systems already? We're adding a million lines of code to the linux kernel every year. That's so much attack surface area. We're setting ourselves up for a kessler syndrome of sorts with every system that we add to the kernel.
wosined · 36m ago
I might be wrong, but microkernel also need drivers, so the attack surface would be the same, or not?
mdavid626 · 1h ago
Most of that code is not loaded into the kernel, only when needed.
darksaints · 48m ago
True, but the last time I checked (several years ago), the size of the portion of code that is not drivers or kernel modules was still 7 million lines of code, and the average system still has to load a few million more via kernel modules and drivers. That is still a phenomenally large attack surface.
The SeL4 kernel is 10k lines of code. OKL4 is 13k. QNX is ~30k.
arp242 · 31m ago
Can I run Firefox or PostgreSQL with reasonable performance on SeL4, OKL4, or QNX?
regularfry · 42m ago
You've still got combinatorial complexity problem though, because you never know what a specific user is going to load.
[1] - https://mosh.org/
I would hope for something more explicit, where you get a connection object and then open streams from it, but I guess that is fine for now.
https://github.com/microsoft/msquic/discussions/4257 ah but look at this --- unless this is an extension, the server side can also create new streams, once a connection is established. The client creating new "connections" (actually streams) cannot abstract over this. Something fundamentally new is needed.
My guess is recvmsg to get a new file descriptor for new stream.
Seems like this is a step in the right direction to resole some of those issues. I suppose nothing is preventing it from getting hardware support in future network cards as well.
For other use cases we can keep using TCP.
The Jevons Paradox is applicable in a lot of contexts.
More efficient use of compute and communications resources will lead to higher demand.
In games this is fine. We want more, prettier, smoother, pixels.
In scientific computing this is fine. We need to know those simulation results.
On the web this is not great. We don’t want more ads, tracking, JavaScript.
I'm benefiting from WebP, JS JITs, Flexbox, zstd, Wasm, QUIC, etc, etc
Jesus, that's bad. Does anyone know if userspace QUIC implementations are also this slow?
To be fair, the Linux kernel TCP implementation only gets ~4.5 Gbps at normal packets sizes and still only achieves ~24 Gbps with large segmentation offload. Both of which are ridiculously slow. It is easy to achieve ~100 Gbps/core of control plane at normal packet sizes with a properly designed protocol so you should only be bottlenecking on your encryption at ~50 Gbps/core.
[1] https://microsoft.github.io/msquic/
- having a lower latency handshake
- avoiding some badly behaved ‘middleware’ boxes between users and servers
- avoiding resetting connections when user up addresses change
- avoiding head of line blocking / the increased cost of many connections ramping up
- avoiding poor congestion control algorithms
- probably other things too
And those are all things about working better with the kind of network situations you tend to see between users (often on mobile devices) and servers. I don’t think QUIC was meant to be fast by reducing OS overhead on sending data, and one should generally expect it to be slower for a long time until operating systems become better optimised for this flow and hardware supports offloading more of the work. If you are Google then presumably you are willing to invest in specialised network cards/drivers/software for that.
Surely badly behaving middleboxes won't just ignore UDP traffic? If anything, they'd get confused about udp/443 and act up, forcing clients to fall back to normal TCP.
- bandwidth of the network
- how fast the nic on the server is
- how fast the nic on your device is
- whether the server response fits in the amount of data that can be sent given the client’s initial receive window or whether several round trips are required to scale the window up such that the server can use the available bandwidth
It could still be faster in real world situations where the client is a mobile device with a high latency, lossy connection.
OTOH, TCP is like a quiet guy at the gym who always wears baggy clothes but does 4 plates on the bench when nobody is looking. Don't underestimate. I wasted months to learn that lesson.
QUIC getting hardware acceleration should close this gap, and keep all the benefits. But a kernel (software) implementation is basically necessary before it can be properly hardware-accelerated in future hardware (is my current understanding)
It does save 2 round-trips during connection compared to TLS-over-TCP, if Wikipedia's diagram is accurate: https://en.wikipedia.org/wiki/QUIC#Characteristics That is a decent latency win on every single connection, and with 0-RTT you can go further, but 0-RTT is stateful and hard to deploy and I expect it will see very little use.
You mean "causing the next advertisement to be that much slower"
> for that all-important web-browsing use case.
You mean "for that all-important advertising-display use case."
> But middleboxes on the Internet also make free use of connection information > [...] > As QUIC gains the hardware support that TCP benefits from,
It will gain the ossification problems that TCP suffers from. That _was_ quick!
No comments yet
Layer4 TCP is pretty much just slapped on top of Layer3 IPv4 or IPv6 in exactly the same way for both of them.
Outside of some little nitpicky things like details on how TCP MSS clamping works, it is basically the same.
OTOH, you want to be in user land on the client, because modifying the kernel on clients is hard. If you were Google, maybe you could work towards a model where Android clients could get their in-kernel protocol handling to be something that could be updated regularly, but that doesn't seem to be something Google is willing or able to do; Apple and Microsoft can get priority kernel updates out to most of their users quickly; Apple also can influence networks to support things they want their clients to use (IPv6, MP-TCP). </rant>
If you were happy with congestion control on both sides of TCP, and were willing to open multiple TCP connections like http/1, instead of multiplexing requests on a single connection like http/2, (and maybe transfer a non-pessimistic bandwidth estimate between TCP connections to the same peer), QUIC still gives you control over retransmission that TCP doesn't, but I don't think that would be compelling enough by itself.
Yes, there's still ossification in middle boxes doing TCP optimization. My information may be old, but I was under the impression that nobody does that in IPv6, so the push for v6 is both a way to avoid NAT and especially CGNAT, but also a way to avoid optimizer boxes as a benefit for both network providers (less expense) and services (less frustration).
Ossification doesn't apply (or it shouldn't, IMHO, the point of Open Source software is that you can change it to fit your needs... if you don't like what upstream is doing, you should be running a local fork that does what you want... yeah, it's nicer if it's upstreamed, but try running a local fork of Windows or MacOS); you can make congestion control work for you when you control both sides; enterprise switches and routers aren't messing with tcp flows. If you're pushing enough traffic that this is an issue, the cost of QUIC seems way too high to justify, even if it helps with some issues.
The SeL4 kernel is 10k lines of code. OKL4 is 13k. QNX is ~30k.