B7-3 standard-with-variation — logo home, mixed middle routes, contact as clay button

2024-11-03

Telemetry budgets: when to shed spans instead of pride

By Ara Lim

Tags: Observability · Cost · SRE

Hero art supporting Telemetry budgets: when to shed spans instead of pride

OpenTelemetry makes instrumentation easy—sometimes too easy. Teams emit generous spans, then wonder why storage invoices spike after launch. We teach a weekly ritual: rank services by span volume, identify redundant attributes, and shed duplicates at the edge.

The ritual starts with a simple histogram of span names. Anything occupying more than ten percent of volume without unique diagnostic value becomes a candidate for merge or drop. We keep exemplars for slow requests rather than full traces for every health check.

Second, we align shedding decisions with SLO reviews. If a service misses its budget, shedding is temporary while root causes are fixed. We document the expiry date on the shed rule so it does not become permanent darkness.

Third, we coach engineers to communicate shedding to support teams. Support should know which diagnostics temporarily lose granularity. A one-page addendum to the on-call guide prevents mystified escalations during incidents.