That’s ‘Billion’ with a ‘B’ Scaling to the Next Level at WhatsApp
- Write-back cache for undelivered “mailboxes”
- 98.7% of mailboxes (a user connects and downloads all their pending messages) served from cache
- Dirty cache entries are flushed to disk every 20 seconds
- Oldest cache entry is 14 hours old
- Large mailboxes (users in many groups with thousands of pending messages) are evicted more aggressively to avoid skewing the cache for others
- Multiple clusters
- Layer over Erlang’s usual all-nodes-connect-to-all-nodes mesh (called pg2) that maintains cross-DC connections over TCP, but only for node pairs that need to talk to each other