Don’t use Protobuf for Telemetry

https://richardstartin.github.io/posts/dont-use-protobuf-for-telemetry

I broadly agree with this characterization. Telemetry doesn’t benefit enough (if it all) from protobuf’s interface definition & schema evolution to warrant the serialization cost, not to mention client-side overhead.

The basic premise of this post is that a good telemetry library needs to be lightweight to avoid perturbing the application; inefficient diagnostic tools are self-defeating. Unlike other formats, nested Protobuf messages cannot be written contiguously into a stream without significant buffering.

protobuf-java is 1.6MB + 700 classes of overhead, before counting the generated code.

If you work at a large organisation which has entirely embraced Protobuf, I would not suggest worrying about 1.6MB or a few hundred loaded classes; these costs quickly amortise as you use the library for more features. However, your resource budget for a diagnostic agent which tells you what your application is doing and how it’s performing should be tiny, and I’m not sure protobuf-java can be made to fit in to it.

embedded messages are length-prefixed, but the length prefix is varint encoded, which means you don’t know how many bytes you need for the length until you’ve done the serialisation, and it’s recursive.

People who actually know Protobuf already know this (it’s literally written in the encoding manual), and understand the benefits which arise from this cost elsewhere (e.g. implementing partial deserialisation is easy, easy to skip over sections of the message), but lots of people don’t seem to understand the cost model the wire-format imposes. If they did, there would probably be a lot less nesting in Protobuf as used in the wild.

Edit