Learning Resources
This module is still official-docs-first, but it is no longer disconnected from the local semester library. Use Building Secure and Reliable Systems and Software Engineering at Google as the local support layer, then escalate to the official docs and canonical essays below for exact cloud, telemetry, and security guidance.
All URLs on this page were validated at module-write time.
Source Stack
| Source | Role | How to use it in this module |
|---|---|---|
| OWASP (Threat Modeling community + Cheat Sheet Series) | Primary security reference | Canonical threat-modeling framing and STRIDE guidance, plus checklists for logging and many other areas |
| NIST SP 800-207 | Primary identity / Zero Trust reference | The authoritative definition of Zero Trust and deployment models |
| AWS / Google Cloud / Azure Well-Architected (Security Pillars) | Primary cloud-specific security reference | Concrete services, patterns, and checklists for each cloud |
| HashiCorp Vault docs | Primary secrets reference | Dynamic secrets, auth methods, secret engines, leases |
| Google Cloud KMS envelope encryption docs | Primary encryption reference | Clearest short explanation of DEK/KEK envelope encryption |
| SLSA and Sigstore | Primary supply-chain reference | Compliance levels, provenance, signing, transparency log |
| OpenTelemetry docs | Primary observability reference | Signals, semantic conventions, sampling |
| Prometheus docs (instrumentation, naming) | Primary metrics reference | Naming, cardinality, metric types |
| Google SRE Book | Primary ops reference | Monitoring principles, golden signals, symptom-based alerting |
| Grafana Labs / Honeycomb / charity.wtf | Selective support | Cardinality in practice, observability definitions, 3 a.m.-on-call reality |
| Building Secure and Reliable Systems | Local support | The best local bridge between reliability engineering, security review, and incident response |
| Software Engineering at Google | Local support | Long-lived engineering systems, review culture, and operational quality habits |
| Local shell/Git books | Selective support | Shell and Git basics that sharpen operational habits |
Resource Map by Cluster
Cluster 1: Cloud Security Foundations
| Need | Best external source | Why |
|---|---|---|
| Threat-modeling framing | OWASP: Threat Modeling | Canonical four-question framing |
| STRIDE checklist | OWASP Cheat Sheet: Threat Modeling | Compact step-by-step reference |
| STRIDE origin and tooling | Microsoft Learn: Threat Modeling Tool | The source of STRIDE as a practical tool |
| Zero Trust definition | NIST SP 800-207 | Authoritative reference used across the industry |
| Layered security / defense in depth | AWS Well-Architected Security Pillar | Concrete AWS guidance |
| Layered security (GCP) | Google Cloud Well-Architected: Security | Zero-trust aligned layered patterns |
| Layered security (Azure) | Azure Well-Architected: Security | Checklists and maturity models |
Cluster 2: Secrets, Keys, and Data
| Need | Best external source | Why |
|---|---|---|
| Secret management architecture | HashiCorp Vault docs | Canonical reference on dynamic secrets, auth methods, leases |
| Envelope encryption explained | Google Cloud KMS: Envelope encryption | Clearest DEK/KEK walkthrough with best practices |
| Encryption in a cloud context | AWS Well-Architected: Security Pillar | Data protection patterns tied to AWS KMS |
| What not to log (keeps classification honest) | OWASP Logging Cheat Sheet | Directly relevant to data minimization in logs |
Cluster 3: Network and Runtime Security
| Need | Best external source | Why |
|---|---|---|
| Network moat patterns | AWS Well-Architected: Security Pillar | Canonical SG/NACL/VPC-endpoint patterns |
| Network moat (GCP) | Google Cloud Well-Architected: Security | Firewall rules, VPC Service Controls |
| Supply-chain framework | SLSA | Compliance levels and provenance concepts |
| Signing and verification | Sigstore | Canonical OSS signing with cosign and Rekor |
Cluster 4: Observability Pillars in Cloud
| Need | Best external source | Why |
|---|---|---|
| OpenTelemetry model overview | OpenTelemetry Concepts | Signals, context, semantic conventions in one page |
| Traces and span model | OpenTelemetry Traces | Span model, attributes, status, kinds |
| Sampling strategies | OpenTelemetry Sampling | Head vs tail sampling with trade-offs |
| OTel project status | CNCF: OpenTelemetry | Community maturity and case studies |
| Metric naming and cardinality | Prometheus: Metric and Label Naming | Canonical guidance on labels and units |
| Instrumentation shape | Prometheus: Instrumentation | USE/RED-style patterns and metric types |
| Cardinality failure modes | Grafana Labs: Cardinality Spikes | How cardinality blows up in real systems |
| Observability definitions | Honeycomb: Observability Glossary | Working definitions of dashboards, alerts, SLOs |
| Log pipeline design | OWASP Logging Cheat Sheet | What to log, what not to log, protect the pipeline |
Cluster 5: Operating Under Observation
| Need | Best external source | Why |
|---|---|---|
| Monitoring principles and golden signals | Google SRE Book: Monitoring Distributed Systems | Canonical four-golden-signals chapter |
| Practical alerting | Google SRE Book: Practical Alerting | Symptom-based alerting, alert noise as a cost |
| Whole-book navigation | Google SRE Book: Table of Contents | Free online edition; use for incident response, post-mortems |
| Observability realism | charity.wtf: Observability is a Many-Splendored Definition | Why metrics alone are not observability, grounded in operator experience |
Local Book Chunks (Loosely Relevant)
The books under library/raw/semester-09-cloud-devops/books/ are included for Git and Linux shell fluency, which sharpen operational habits. They are not the primary teachers for this module.
- The Linux Command Line -- useful for runbook commands, log hygiene, shell-level habits.
- Pro Git and Git from the Bottom Up -- useful for runbook-as-code discipline and for the reviewability habit that security and observability both need.
Open them only if a runbook or a pipeline task exposes a shell or Git gap, not for the security or observability material itself.
Use Rules
- For security topics, open OWASP / NIST / the relevant cloud provider's Well-Architected security pillar first.
- For observability topics, open OpenTelemetry or the SRE book first.
- For supply chain, SLSA and Sigstore are the primary sources.
- Use essays (Honeycomb, charity.wtf, Grafana blog) for intuition, not as authoritative references.
- If you cannot find the answer in one official doc in 5 minutes, stop and write the gap question in plain words before continuing -- it is almost always a definition mismatch, not a research gap.