Skip to main content

Reference and Selective Reading

This module has no new required books. The concept pages are the main path. Use this page only to find the authoritative source when a concept page leaves a gap, and to see how the module maps to earlier semesters.

Source Roles

SourceRoleWhy it is here
Google SRE Book (sre.google/sre-book)Primary teaching source for SLOs, alerting, monitoring, PRRCanonical framing for the operational clusters
Google SRE Workbook (sre.google/workbook)Selective extension of the SRE bookImplementation details for SLOs, error budgets, and incident response
OpenTelemetry docs (opentelemetry.io/docs)Official standard for tracing, metrics, logsGround truth for instrumentation and propagation
OWASP -- Threat ModelingPrimary teaching source for STRIDEAuthoritative framework definitions
SLSASupply-chain frameworkAnchor for "SLSA Build L2" goal
Microsoft Azure Architecture CenterCloud patterns (circuit breaker, retry, bulkhead)Canonical state-machine descriptions
Prior semesters (S6, S8, S9)Internal prerequisitesWhat this module integrates and applies

Read Only If Stuck

SLOs, Error Budgets, Alerting

Observability in Practice

Threat Model, Secrets, Supply Chain, Least Privilege

Failure Planning, Backup, Runbooks, PRR

Optional Deep Dive

  • Building Secure and Reliable Systems (Google, free online) -- long-form cross-topic framing of security + reliability
  • Site Reliability Engineering book (full SRE book) -- for any chapter on testing, capacity, or emergency response you did not cover here
  • OpenTelemetry semantic conventions -- for cross-service attribute naming at scale

Cross-Semester References

Module 4 clusterPrior semester module(s) it integrates
Cluster 1 (SLOs, error budgets, alerts)S8 M04 Scale, Reliability, and Performance -- SLOs, symptom-based alerting
Cluster 2 (observability)S8 M05 Observability and Debugging Under Production Pressure; S9 M05 Cloud Security & Observability
Cluster 3 (threat model, secrets, supply chain, least privilege)S9 M01 Cloud Platform Fundamentals (IAM); S9 M05 Cloud Security & Observability
Cluster 4 (failure planning, retry/breaker/degraded, backup)S6 M05 Distributed Systems Fundamentals (partial failure, timeouts); S8 M04 Scale, Reliability, and Performance
Cluster 5 (runbooks, on-call, PRR)S8 M04 Scale, Reliability, and Performance; S10 M01/M02/M03 (what PRR now certifies)

Concept-to-Source Map

Primary conceptBest source if stuckWhy this source
Writing one real SLI and SLO for your capstoneGoogle SRE Book -- SLOsCanonical definitions and "how many nines"
Error budget for a capstone: small but realGoogle SRE Workbook -- Implementing SLOsPolicy ladder and decision matrix
Alert on the SLO, not everythingGoogle SRE Book -- Practical AlertingBurn-rate pattern and alert hygiene
Structured logs where they matterOpenTelemetry -- ConceptsSignals model and stable attribute naming
Dashboard that answers 3 specific questionsGoogle SRE Book -- Monitoring Distributed SystemsFour golden signals that drive the three-question layout
Tracing the critical path end-to-endOpenTelemetry -- ConceptsSpans, propagation, sampling, semantic conventions
STRIDE applied to your systemOWASP -- Threat ModelingMost authoritative and concise framing
Secrets, dependencies, and supply chainSLSASupply-chain levels and provenance concepts
Least privilege in practiceprovider IAM docs + SRE Book -- Engagement ModelProvider is authoritative for policy semantics; SRE book frames review
Three most likely failuresGoogle SRE Workbook -- Implementing SLOsError-budget/failure-likelihood framing
Retry, circuit breaker, degraded modeAzure -- Circuit BreakerCanonical state-machine pattern
Backup and recovery: the forgotten basicsprovider backup docs + SRE Workbook -- Incident ResponseProvider for backup details; SRE for drill discipline
Writing a runbook for the top 3 incidentsSRE Workbook -- Incident ResponseICS-derived roles and declarations
On-call hygiene for a solo operatorSRE Workbook -- Incident ResponseRole-collapsing strategies and sustainable paging
Production readiness reviewSRE Book -- Engagement ModelOrigin of the PRR pattern