How Not to Measure Latency

  • Look at the max, p99.9/99.99/etc. hide the really bad behavior
  • A single user hitting p99+ is more likely than you think (in a public-facing webserver context, anyway) because a single page load can make dozens of requests
  • Don’t average percentiles (https://www.batey.info/percentiles-averages.html)
  • Service time vs. response time, backpressure is important to avoid falling behind
  • If a load generator sends requests serially and doesn’t maintain the request rate when the server slows down, latencies look much better than they would be with a consistent request rate
  • In-band monitoring code can be (occasionally, surely) inaccurate because of context switches/etc.
  • Don’t figure out how to reduce latency at high percentiles, ease off load to a point that where latency is decent at high percentiles and go from there
    • This tracks in a load testing context but definitely not in production

Stopped at 32:10

Edit