How Not to Measure Latency

Look at the max, p99.9/99.99/etc. hide the really bad behavior
A single user hitting p99+ is more likely than you think (in a public-facing webserver context, anyway) because a single page load can make dozens of requests
Don’t average percentiles (https://www.batey.info/percentiles-averages.html)
Service time vs. response time, backpressure is important to avoid falling behind
If a load generator sends requests serially and doesn’t maintain the request rate when the server slows down, latencies look much better than they would be with a consistent request rate
In-band monitoring code can be (occasionally, surely) inaccurate because of context switches/etc.
Don’t figure out how to reduce latency at high percentiles, ease off load to a point that where latency is decent at high percentiles and go from there
- This tracks in a load testing context but definitely not in production

Stopped at 32:10