External Exercises

This module has no leetcode-style problems; the exercise is your own capstone. These lanes point to external reading-and-doing sets that build specific fluency when the concept pages are not enough. Work each lane against your system, not a toy.

How To Use This Page

Finish the relevant concept page and the matching practice page first.
Pick a lane whose output you are still uncomfortable producing from scratch.
Do the lane with your capstone repo open. Deliverable is a commit to your capstone, not notes.
Maintain a mistake log with tags such as wrong SLI granularity, alert noise, unstructured log, missing trace hop, STRIDE gap missed, over-permissive role, untested backup, runbook missing rollback.

Lane 1: SLOs, Error Budgets, and Alerts

Use this lane when your SLO is aspirational or your alerts are noisy.

Google SRE Book -- Service Level Objectives (skim chapter 4)
Google SRE Workbook -- Implementing SLOs (work through "How to use error budgets")
Google SRE Book -- Practical Alerting (read through "Alerting at the right level")

Target outcomes:

one committed library/raw/slo.md
one committed library/raw/error-budget-policy.md
at least one multi-window burn-rate alert live in your monitoring tool
a list of at least three non-SLO page-level alerts you have demoted to ticket or deleted

Lane 2: Observability

Use this lane when you cannot reach the suspect span from an alert in under two minutes.

OpenTelemetry -- Concepts (signals, context propagation, sampling, semantic conventions)
Google SRE Book -- Monitoring Distributed Systems (four golden signals and white-box vs black-box)

Target outcomes:

library/raw/logging.md with a named field schema
one commit that replaces at least five string logs with structured events
a capstone-live dashboard with three labeled rows answering the three questions
one real distributed trace stored and linkable by URL
library/raw/tracing.md with sampling policy and runbook-linking convention

Lane 3: Threat Model, Secrets, Supply Chain, Least Privilege

Use this lane when your security posture is "probably fine."

OWASP -- Threat Modeling (four-question framework + STRIDE)
SLSA (levels and requirements)
Secret scanners: gitleaks and trufflehog
Provider IAM docs for your cloud (AWS IAM User Guide, GCP IAM, Azure AD)

Target outcomes:

one committed STRIDE worksheet with a full walk on one gap
one committed library/raw/security-policy.md
CI step that fails on HIGH/CRITICAL dependency CVEs
at least one artifact carries signed build provenance
one IAM role diff committed, with the breakage-and-widening log

Lane 4: Failure Planning, Backup, Runbooks

Use this lane when "what happens when X fails?" returns vague answers.

Microsoft Azure Architecture Center -- Circuit Breaker
Google SRE Workbook -- Incident Response
provider backup / PITR docs for your data store

Target outcomes:

library/raw/top-failures.md with three prioritized failures
library/raw/reliability-decisions.md per external dependency
library/raw/recovery.md with a dated restore-drill log
three runbooks in library/raw/runbooks/* using the five-section template
library/raw/on-call.md with coverage, page-vs-ticket rules, and a kill switch

Self-Curated Problem Set

Build a custom set around real incidents in your capstone's staging history:

3 staging incidents in the last 60 days -- what was the first-seen symptom and what was the actual cause?
3 near-miss deploys -- what caught them, and what alert would have caught them automatically?
3 cloud bills that surprised you -- which came from observability, backup, or logging, and is the trade-off still worth it?

These become postmortems, test cases, and PRR yellows -- whichever fits best.

Completion Checklist

Completed at least one lane in full with artifacts committed
Logged at least 10 real mistakes and corrections in the mistake journal
Walked the 18-item PRR and signed or red-listed each item honestly
At least one peer has validated the top runbook and the SLO document

How To Use This Page​

Lane 1: SLOs, Error Budgets, and Alerts​

Lane 2: Observability​

Lane 3: Threat Model, Secrets, Supply Chain, Least Privilege​

Lane 4: Failure Planning, Backup, Runbooks​

Self-Curated Problem Set​

Completion Checklist​

How To Use This Page

Lane 1: SLOs, Error Budgets, and Alerts

Lane 2: Observability

Lane 3: Threat Model, Secrets, Supply Chain, Least Privilege

Lane 4: Failure Planning, Backup, Runbooks

Self-Curated Problem Set

Completion Checklist