How Not to Measure Latency
- Look at the max, p99.9/99.99/etc. hide the really bad behavior
- A single user hitting p99+ is more likely than you think (in a public-facing webserver context, anyway) because a single page load can make dozens of requests
- Don’t average percentiles (https://www.batey.info/percentiles-averages.html)
- Service time vs. response time, backpressure is important to avoid falling behind
- If a load generator sends requests serially and doesn’t maintain the request rate when the server slows down, latencies look much better than they would be with a consistent request rate
- In-band monitoring code can be (occasionally, surely) inaccurate because of context switches/etc.
- Don’t figure out how to reduce latency at high percentiles, ease off load to a point that where latency is decent at high percentiles and go from there
- This tracks in a load testing context but definitely not in production
Stopped at 32:10