Roblox Return to Service 2021

https://blog.roblox.com/2022/01/roblox-return-to-service-10-28-10-31-2021/

  • Consul’s streaming feature has all writes go through a single Go channel, so writes block under load
  • Moving from a 64-core machine to a 128-core machine made things worse - NUMA on the larger machine == increased latency to that channel
  • BoltDB uses a freelist to track deleted pages - this grew to 7MB and was being written out on every write to the database
Edit