(igor kisselev/Shutterstock)
Metrics promise common understanding throughout techniques, however with evolving codecs and sophisticated math, they usually trigger extra confusion than readability. Right here’s what we’re getting incorrect and the way we will repair it.
In 1887, an ophthalmologist named L.L. Zamenhof launched Esperanto, a common language designed to interrupt down boundaries and unite folks world wide. It was bold, idealistic, and finally area of interest, with solely about 100,000 audio system in the present day.
Observability has its personal model of Esperanto: metrics. They’re the standardized, numerical representations of system well being. In idea, metrics ought to simplify how we monitor and troubleshoot digital infrastructure. In apply, they’re usually misunderstood, misused, and maddeningly inconsistent.
Let’s discover why metrics, our supposed common language, stay so tough to get proper.
Metrics, Decoded (and Re-Encoded)
A metric is a numeric measurement at a time limit. That appears easy—till you dive into the nuance of how metrics are outlined and used. Take redis.keyspace.hits, for instance: a counter that tracks how usually a Redis occasion efficiently finds knowledge within the keyspace. Relying on the telemetry format—OpenTelemetry, Prometheus, or StatsD—it will likely be formatted in a different way, even with the identical dimensions , aggregations, and metric worth.
We now have competing requirements like StatsD, Prometheus, and OpenTelemetry (OTLP) Metrics, every introducing its personal technique to outline and transmit datapoints and their related metadata. These codecs don’t simply differ in syntax, they range in basic conduct and metadata construction. The end result? Three instruments might present you an identical metric worth, however require solely totally different logic to gather, retailer, and analyze it.
That fragmentation results in operational confusion, inflated storage prices, and groups spending extra time decoding telemetry than appearing on it.
Format Conversion Does Not Equal Metric Understanding
Even when format translation is dealt with, aggregation nonetheless causes confusion. Think about accumulating redis.keyspace.hits each six seconds throughout 10 containers. If the container.id tag is dropped, the metric values should now be aggregated. In OTLP, Prometheus, or StatsD, dropping the container.id tag adjustments how the metric is interpreted because the values of the metrics should now be aggregated. Prometheus may sum the values, OTLP can deal with it as a delta counter, and StatsD might common them, which leads to conduct extra like a gauge than a counter. These refined variations in how metrics are interpreted can result in inconsistent evaluation. With out intentional dealing with of metrics, groups threat drawing incorrect conclusions from the information.
However even after format translation, the toughest half usually comes subsequent: deciding the way to mixture these metrics. The reply is determined by the metric sort. Summing gauges can result in incorrect outcomes. Treating a delta as a cumulative counter can introduce threat. Aggregation math that’s technically appropriate should still confuse downstream techniques, particularly if these techniques count on monotonic conduct.
Metrics are math, and the maths issues. Because of this instruments want metric-specific logic, just like the event-centric logic that already exists for logs and traces.
Why It Issues
If we will’t depend on a shared understanding of metrics, observability suffers. Incidents take longer to resolve. Alerting turns into noisy. Groups lose religion of their knowledge.
The trail ahead isn’t about creating one other customary. It’s about growing higher tooling that simplifies format dealing with, smarter methods to mixture and interpret knowledge, and training that helps groups use metrics successfully while not having a math diploma.
By treating metrics as a singular type of telemetry with its personal construction and challenges, we will take away the guesswork and empower groups to behave with confidence. It’s time to construct with readability in thoughts—not only for machines, however for the people deciphering the information.
Concerning the creator: Josh Biggley is a employees product supervisor at Cribl. A 25-year veteran of the tech trade,
Biggley loves to speak about monitoring, observability, OpenTelemetry, community telemetry, and all issues nerdy. He has expertise with Fortune 25 firms and pre-seed startups alike, throughout manufacturing, healthcare, authorities, and consulting verticals.
Associated Objects:
2025 Observability Predictions and Observations
Knowledge Observability within the Age of AI: A Information for Knowledge Engineers
Cribl CEO Clint Sharp Discusses the Knowledge Observability Deluge

