The Discovery of Apache ZooKeeper’s Poison Packet
https://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
- Pagerduty uses IPSec to encrypt IP payloads
- Zookeeper was reading a
scheme_len
value off the wire. No bounds check, and this value would occasionally be set to 1+GB, causing a Java OOM. Only real explanation is packet corruption. - The Linux kernel ignores TCP checksums, assuming that IPSec checksums (ESP) are sufficient - this is supported by the RFC. This isn’t entirely accurate, and was responsible for the packet corruption that Pagerduty was seeing.
/*
* 2) ignore UDP/TCP checksums in case
* of NAT-T in Transport Mode, or
* perform other post-processing fixes
* as per draft-ietf-ipsec-udp-encaps-06,
* section 3.1.2
*/
if (x->props.mode == XFRM_MODE_TRANSPORT)
skb->ip_summed = CHECKSUM_UNNECESSARY;
- Intel’s AES kernel module (loaded for IPSec encryption) was ultimately responsible.
- Regular SSL uses AES as well, but TCP checksums aren’t skipped there!