Skip to main content

Learning Resources

This module is still official-docs-first, but it is no longer disconnected from the local semester library. Use Building Secure and Reliable Systems and Software Engineering at Google as the local support layer, then escalate to the official docs and canonical essays below for exact cloud, telemetry, and security guidance.

All URLs on this page were validated at module-write time.

Source Stack

SourceRoleHow to use it in this module
OWASP (Threat Modeling community + Cheat Sheet Series)Primary security referenceCanonical threat-modeling framing and STRIDE guidance, plus checklists for logging and many other areas
NIST SP 800-207Primary identity / Zero Trust referenceThe authoritative definition of Zero Trust and deployment models
AWS / Google Cloud / Azure Well-Architected (Security Pillars)Primary cloud-specific security referenceConcrete services, patterns, and checklists for each cloud
HashiCorp Vault docsPrimary secrets referenceDynamic secrets, auth methods, secret engines, leases
Google Cloud KMS envelope encryption docsPrimary encryption referenceClearest short explanation of DEK/KEK envelope encryption
SLSA and SigstorePrimary supply-chain referenceCompliance levels, provenance, signing, transparency log
OpenTelemetry docsPrimary observability referenceSignals, semantic conventions, sampling
Prometheus docs (instrumentation, naming)Primary metrics referenceNaming, cardinality, metric types
Google SRE BookPrimary ops referenceMonitoring principles, golden signals, symptom-based alerting
Grafana Labs / Honeycomb / charity.wtfSelective supportCardinality in practice, observability definitions, 3 a.m.-on-call reality
Building Secure and Reliable SystemsLocal supportThe best local bridge between reliability engineering, security review, and incident response
Software Engineering at GoogleLocal supportLong-lived engineering systems, review culture, and operational quality habits
Local shell/Git booksSelective supportShell and Git basics that sharpen operational habits

Resource Map by Cluster

Cluster 1: Cloud Security Foundations

NeedBest external sourceWhy
Threat-modeling framingOWASP: Threat ModelingCanonical four-question framing
STRIDE checklistOWASP Cheat Sheet: Threat ModelingCompact step-by-step reference
STRIDE origin and toolingMicrosoft Learn: Threat Modeling ToolThe source of STRIDE as a practical tool
Zero Trust definitionNIST SP 800-207Authoritative reference used across the industry
Layered security / defense in depthAWS Well-Architected Security PillarConcrete AWS guidance
Layered security (GCP)Google Cloud Well-Architected: SecurityZero-trust aligned layered patterns
Layered security (Azure)Azure Well-Architected: SecurityChecklists and maturity models

Cluster 2: Secrets, Keys, and Data

NeedBest external sourceWhy
Secret management architectureHashiCorp Vault docsCanonical reference on dynamic secrets, auth methods, leases
Envelope encryption explainedGoogle Cloud KMS: Envelope encryptionClearest DEK/KEK walkthrough with best practices
Encryption in a cloud contextAWS Well-Architected: Security PillarData protection patterns tied to AWS KMS
What not to log (keeps classification honest)OWASP Logging Cheat SheetDirectly relevant to data minimization in logs

Cluster 3: Network and Runtime Security

NeedBest external sourceWhy
Network moat patternsAWS Well-Architected: Security PillarCanonical SG/NACL/VPC-endpoint patterns
Network moat (GCP)Google Cloud Well-Architected: SecurityFirewall rules, VPC Service Controls
Supply-chain frameworkSLSACompliance levels and provenance concepts
Signing and verificationSigstoreCanonical OSS signing with cosign and Rekor

Cluster 4: Observability Pillars in Cloud

NeedBest external sourceWhy
OpenTelemetry model overviewOpenTelemetry ConceptsSignals, context, semantic conventions in one page
Traces and span modelOpenTelemetry TracesSpan model, attributes, status, kinds
Sampling strategiesOpenTelemetry SamplingHead vs tail sampling with trade-offs
OTel project statusCNCF: OpenTelemetryCommunity maturity and case studies
Metric naming and cardinalityPrometheus: Metric and Label NamingCanonical guidance on labels and units
Instrumentation shapePrometheus: InstrumentationUSE/RED-style patterns and metric types
Cardinality failure modesGrafana Labs: Cardinality SpikesHow cardinality blows up in real systems
Observability definitionsHoneycomb: Observability GlossaryWorking definitions of dashboards, alerts, SLOs
Log pipeline designOWASP Logging Cheat SheetWhat to log, what not to log, protect the pipeline

Cluster 5: Operating Under Observation

NeedBest external sourceWhy
Monitoring principles and golden signalsGoogle SRE Book: Monitoring Distributed SystemsCanonical four-golden-signals chapter
Practical alertingGoogle SRE Book: Practical AlertingSymptom-based alerting, alert noise as a cost
Whole-book navigationGoogle SRE Book: Table of ContentsFree online edition; use for incident response, post-mortems
Observability realismcharity.wtf: Observability is a Many-Splendored DefinitionWhy metrics alone are not observability, grounded in operator experience

Local Book Chunks (Loosely Relevant)

The books under library/raw/semester-09-cloud-devops/books/ are included for Git and Linux shell fluency, which sharpen operational habits. They are not the primary teachers for this module.

  • The Linux Command Line -- useful for runbook commands, log hygiene, shell-level habits.
  • Pro Git and Git from the Bottom Up -- useful for runbook-as-code discipline and for the reviewability habit that security and observability both need.

Open them only if a runbook or a pipeline task exposes a shell or Git gap, not for the security or observability material itself.

Use Rules

  • For security topics, open OWASP / NIST / the relevant cloud provider's Well-Architected security pillar first.
  • For observability topics, open OpenTelemetry or the SRE book first.
  • For supply chain, SLSA and Sigstore are the primary sources.
  • Use essays (Honeycomb, charity.wtf, Grafana blog) for intuition, not as authoritative references.
  • If you cannot find the answer in one official doc in 5 minutes, stop and write the gap question in plain words before continuing -- it is almost always a definition mismatch, not a research gap.