Grokking NAT and packet mangling in Linux

12 viveknathani_ 4 6/18/2025, 5:40:03 AM vivekn.dev ↗

Comments (4)

colmmacc · 57m ago
A significant wrinkle in how NAT works is IP fragmentation. UDP datagrams can be larger than an IP packet. When that happens the payload is split into multiple IP packets, but only the first packet has a UDP header in it. The NAT device needs to correlate these packets by looking at fragment IDs, and then rewrite the IP addresses in the headers.

That alone implies a second kind of state to maintain, but it gets worse. Fragments can arrive out of order. If the second or later packets arrive before the first, the NAT device has to buffer those fragments until they get the packet with the UDP header in it.

That might seem unlikely but it's surprisingly common. Modern protocols like DNSSEC do require fragmentation and in a large network with many paths fragments can end up taking different paths from each other.

Ordinarily when a network is using multiple links to load balance traffic, the routers will use flow steering. The routers look at the UDP or TCP header, make a hash of the connection/flow tuple, and then use that hash to pick a link to use. That way, all of the packets from the same connection or flow will be steered down the same link.

IP fragmentation breaks this too. Those second and subsequent packets don't have a UDP header in them, so they can't be flow steered statelessly. Smarter routers are clever enough to realize this from the beginning of the datagram and to only use a 3-tuple hash (source IP, dest IP, protocol) ... so the packets will still flow consistently. But many devices get this wrong - some just even assume there will be a UDP header and pick whatever values happen to be there.

The fragments end up taking different paths and if one link is more congested or latent enough than another, they'll ultimately arrive out of order.

This single wrinkle is probably responsible for half the complexity in a robust NAT implementation. Imagine having to solve for all of this in a highly-available and trasnactionally live-replicated implementation like managed NAT gateways.

Worst of all, this was all avoidable. If UDP datagrams were simply fragmented at the UDP layer, and every packet included a UDP header, none of this would be necessary. It's probably the worst mistake in TCP/IP. But obviously overall, it was a very successful design that brought on the Internet.

gregw2 · 1h ago
Nice writeup on the different type of NATs. I learned something, thank you!

One feedback; I would use a different word ("wrangling"?) rather than "mangling" in your title. Or mention IPv6.

The title use of "mangling" alone triggered flashbacks of tracking down TCP checksum corruption in low cost home routers, or bugs in OpenBSD networking stacks back when I worked on web conferencing software. I that kind of mangling commiseration when clicking your link, but your use of the term was more for an article describing NATv4 and arguing "what IPv4 NAT does is hacky mangling, let's all use IPv6". And while making that argument (which is wistfully fair) also not really acknowledging the benefit of NAT for reducing the attack surface of inbound packets from unsolicited sources and/or explaining why that isn't relevant if you do proper firewalling with IPv6 instead. And when would IPv6 Npt (network /prefix/ translation be desired?)... But I can see that starts to go beyond the scope of your intended argument/perspective perhaps...

akerl_ · 1h ago
Mangle is the technical term used by the kernel for those parts of the process.
viveknathani_ · 7h ago
Wrote something about computer networking. Felt like posting it here. Happy to hear your thoughts, HN!