Observability Instrumentation Workshop

Three outputs required: a small-but-structured log schema, a three-question dashboard, and at least one full critical-path trace.

Retrieval Prompts

What makes a log "structured" as opposed to a formatted string?
Name the three questions every capstone dashboard must answer.
Why should span names be low-cardinality while span attributes can be high-cardinality?
State one sensible sampling policy for a capstone (head rate + tail rules).
How do logs, metrics, and traces get correlated across services in practice?

For each statement, identify the error:

"We log everything, just in case."
"Our dashboard has 34 panels; all of them matter."
"We sample 100% of traces; disk is cheap."
"Span name is db.query(SELECT * FROM users WHERE id=42) so we know exactly which query."
"We don't need trace_id in logs; the timestamps are enough to correlate."

List every decision boundary in one service of your capstone. Aim for 5-10.
Give each a dotted event name: <area>.<object>.<verb> in past tense.
Define the minimum field set: request_id, trace_id, one business identifier (user_id / tenant_id / provider_id), duration_ms, reason, attempt.
Commit to library/raw/logging.md.

Create a dashboard called capstone-live with three rows: Healthy?, Slow?, Failing whom?
Add panels only if they answer the row header. Target: ≤ 4 panels per row.
Include the SLO / error-budget big-number tile on row 1.
Screenshot and link the dashboard from library/raw/slo.md.

Identify the one user journey that defines your SLO.
Instrument the entry point (OTel HTTP middleware or equivalent).
Instrument every outgoing call on that path as a child span: DB, queue, external HTTP.
Propagate traceparent (W3C) across every process boundary, including into queue messages.
Force one real request, open the trace in your tracing UI. If any expected hop is missing, add it.
Set sampling: head 1-5%, always-keep on 5xx and on latency > SLI threshold.

library/raw/logging.md lists events with stable names and fields
one service's top five logs are structured, not strings
capstone-live dashboard exists, answers all three questions within 10 seconds
at least one full trace of the critical path is stored and linkable
you can filter traces by a business attribute (tenant, provider, endpoint)
library/raw/tracing.md names the sampling policy and the linking convention for runbooks