Distributed Tracing: OpenTelemetry and Sampling Strategies
What This Concept Is
A trace is the record of a single request as it moves across the services that handle it. Each unit of work is a span. Spans have a name, a start/end time, attributes (key-value metadata), a status, and a parent span ID. The trace_id ties the whole chain together; span_id identifies each step.
trace_id = 8c1b...
+-- span: POST /checkout (frontend) [120 ms]
+-- span: GET /inventory (inventory-svc) [15 ms]
+-- span: POST /charge (payments-svc) [70 ms]
| +-- span: HTTP POST external PSP [60 ms]
+-- span: POST /orders (orders-svc) [25 ms]
OpenTelemetry (OTel) is the CNCF-incubating open standard for emitting traces (and metrics and logs) in a vendor-neutral way. Its key ideas:
- Signals: traces, metrics, logs. Same model, same SDKs.
- Tracer providers and tracers: set up once per process, used to create spans.
- Context propagation: the
trace_id/span_idare propagated across process boundaries (usually via W3CtraceparentHTTP header). - Semantic conventions: standardized attribute names (
http.request.method,http.response.status_code,db.system,messaging.destination.name) so that dashboards and tools work across services. - Exporters and the Collector: spans are shipped out of the process to a Collector that filters, samples, and forwards.
Sampling is how you keep the cost sane. The OTel sampling doc is explicit: sampling reduces data volume while keeping visibility, and there are two major strategies.
- Head sampling: decide at the root of the trace (e.g. "keep 5% of requests"). Cheap, deterministic, but blind to what happens later.
- Tail sampling: collect the whole trace, then decide. Keeps all error traces, all slow traces, all traces with a particular attribute; drops boring fast 200s. More expensive but vastly more useful.
Why It Matters Here
In a service mesh of 10+ microservices, metrics tell you something is slow, and logs tell you what happened in one place, but traces tell you where the time went across the whole request path. That is the question on-call actually has at 2 a.m., and without traces it is hours of guesswork.
OTel in particular matters because it lets you instrument once and export to any backend. The semantic conventions are the "common language" that lets a trace from service A be joined against a span from service B even if they are in different languages.
Concrete Example: Minimal Instrumented Code Snippet
Python, using the OpenTelemetry API with semantic conventions:
from opentelemetry import trace
from opentelemetry.semconv.trace import SpanAttributes
from opentelemetry.trace import Status, StatusCode
tracer = trace.get_tracer("checkout-service")
def handle_checkout(request):
with tracer.start_as_current_span("POST /checkout") as span:
span.set_attribute(SpanAttributes.HTTP_REQUEST_METHOD, "POST")
span.set_attribute(SpanAttributes.URL_PATH, "/checkout")
span.set_attribute("checkout.cart_size", len(request.items))
try:
inventory_ok = reserve_inventory(request.items)
charge_id = charge_payment(request.user_id, request.total_cents)
order_id = create_order(request.user_id, charge_id)
span.set_attribute("checkout.order_id", order_id)
span.set_attribute(SpanAttributes.HTTP_RESPONSE_STATUS_CODE, 200)
return {"order_id": order_id}
except PaymentDeclined as e:
span.set_attribute(SpanAttributes.HTTP_RESPONSE_STATUS_CODE, 402)
span.set_status(Status(StatusCode.ERROR, "payment declined"))
span.record_exception(e)
raise
Notes:
SpanAttributes.HTTP_REQUEST_METHOD,URL_PATH,HTTP_RESPONSE_STATUS_CODEare from the OTel HTTP semantic conventions. Any OTel-aware tool already knows how to query them.- Custom attributes (
checkout.cart_size,checkout.order_id) use a service-scoped prefix. set_status(Status(StatusCode.ERROR, ...))makes the span show up as an error in traces and in exemplar-linked metrics.- Child calls to
reserve_inventory,charge_payment,create_orderpropagate context automatically when using OTel-instrumented HTTP / gRPC / DB clients -- each becomes a child span.
Common Confusion / Misconception
"Tracing means every request gets traced." At scale this is neither necessary nor affordable. The OTel docs are explicit that for high-volume systems a 1% sample rate or lower often accurately represents the other 99%. Sampling is part of the design; the question is which sampler, not whether to sample.
"Custom attribute names are fine." A random attribute name makes the trace work for you today and useless across tools tomorrow. The OTel semantic conventions exist specifically to solve this -- http.request.method, http.response.status_code, db.system, messaging.destination.name. Prefer them. Prefix anything custom with a stable domain namespace (checkout.*, payments.*).
"Tracing replaces metrics." A trace can tell you exactly what happened in one request, but you can only aggregate traces at scale with sampling, and even then histograms and counters are cheaper and faster. Use metrics for "is anything wrong"; use traces for "where is the time going in this one". Exemplars (Concept 10) are the bridge.
"Head sampling alone is fine." If you keep 1% at the root and errors are 0.1%, you will see almost none of them. Tail sampling -- "keep 100% of errors, plus all traces > p95 latency, plus a small sample of the rest" -- is usually the right production default. The trade-off is memory at the Collector: tail samplers must buffer spans until the trace completes.
"Context propagation just works." It works when you use OTel-instrumented clients and the default W3C traceparent header. It breaks at: message queues (propagate via headers or payload), background jobs (inject at enqueue, extract at dequeue), lambda/serverless boundaries (cold starts can drop context), cross-organization partner calls (partner may not honor your header). Check each boundary.
"Span = function call." Not quite. A span should represent a unit of work worth measuring: an HTTP request, a DB query, a queue publish, a significant computation. A span per function call produces thousands of spans per request and obscures the shape. Instrument libraries instrument the right boundary; you add spans for domain-level operations that cross those.
"Trace sampling decisions are reversible." Head-sampling drops are permanent; you cannot recover a dropped trace. Tail sampling is irreversible at the backend too. If you need "keep-forever just in case", the only option is 100% capture with cheap cold storage and fast-path indexing for the sampled subset.
How To Use It
For a service you instrument:
- Adopt an OTel SDK in each language you use; configure exporters to your Collector.
- Propagate context on every outbound call (HTTP, gRPC, message bus). Most clients have OTel instrumentation built in; prefer that over hand-rolling.
- Use semantic conventions for HTTP, DB, messaging attributes. Prefix custom attributes with your service or domain name.
- Decide a sampling strategy. For anything non-trivial, tail-sample at the Collector: keep all errors, all slow (p95+) requests, plus a small percentage of the rest.
- Link traces to metrics via exemplars (Concept 10) and to logs via
trace_idfields (Concept 11).
Check Yourself
- What does
trace_idgive you that a per-service request ID does not? - Why do semantic conventions matter across languages and vendors?
- When does head sampling fail you, and what does tail sampling fix?
Mini Drill or Application
Take one endpoint you have written. Sketch the spans you would emit, their attributes (using semantic conventions where possible), and the status on each error path. Then choose a sampling strategy in one sentence.
See also (external)
- OpenTelemetry Concepts -- signals, context, semantic conventions, and sampling as a unified model.
- OpenTelemetry Traces -- span model, kinds, attributes, events, links, and status.
- OpenTelemetry Sampling -- when to sample, head vs tail, and Collector-based sampling options including the probabilistic and tail-sampling processors.
- OpenTelemetry Semantic Conventions -- the canonical attribute registry for HTTP, DB, messaging, RPC, cloud, and Kubernetes.
- W3C Trace Context -- the specification for the
traceparent/tracestateheaders that propagate trace context across services. - Grafana Tempo: Traces -- a scalable, object-storage-backed trace backend with example sampling and query patterns.
- CNCF: OpenTelemetry project -- project status, graduation, and case studies.
Depth Path
Source Backbone
Security and observability require official docs, but these books provide the systems and reliability backbone behind the practices.
- Building Secure and Reliable Systems - primary book backbone for security/reliability tradeoffs.
- Software Engineering at Google - support for operational engineering and process.
- The Linux Command Line - support for operational investigation and automation.