-
Notifications
You must be signed in to change notification settings - Fork 1
Description
In our recent IPv6 Perlmutter -> EJFAT LB test, we uncovered the issue with incorrectly send() 'ing chunks of payload that exceeded the MTU. Since this was observed as IP fragments being Rx'd on the EJFAT LB, that gave me an idea.
Even after fixing the minor bug that we found in E2SAR wrt the local MTU on the sender, we may also be able to directly identify scenarios where we exceed the Path MTU (PMTU). If you were to set the Don't Fragment (DF) bit on outgoing packets, you should receive ICMP "Packet Too Big" messages from any intermediate router that would have had to fragment your packet along the way.
Here's a way that I think we could do this on Linux which would work for our use case where the transmitters are sending unidirectional UDP.
- Use IP_MTU_DISCOVER and IPV6_MTU_DISCOVER flags to setsockopt().
- Set the IP_MTU_DISCOVER param to IP_PMTUDISC_DO
- Set the IPV6_MTU_DISCOVER param to IPV6_PMTUDISC_DO
Both of these can be set on the same socket at the same time
Forces the DF bit on outgoing packets, but also to have the kernel track the PMTU (based on ICMP error packets rx'd) and hard-fail your send() call with EMSGSIZE if you exceed the PMTU Ref: https://linux.die.net/man/7/ip IP_MTU_DISCOVER section - This option should work with unidirectional UDP but does depend on intermediate routers properly sending you ICMP error packets -- seems sane on R&E networks?
- Note that this option is susceptible to DoS attack where the attacker spoofs carefully crafted ICMP errors toward the sender which artificially shrink the kernel's view of PMTU to below the minimum size required by the application. Ref: https://nvd.nist.gov/vuln/detail/CVE-2024-53259. The proposed fix there will not work for us since it depends on an application-level capability of detecting the MTU boundary -- ie. something other than ICMP.
I'm also going to work on having the dataplane count incoming fragments and report them up to the control plane. That will let us surface this kind of thing more prominently and show end users that their LB instance is Rx'ing lots of fragments.
I have not tested any of this but wanted to capture the idea while it was fresh in my mind. LMK what you think.