Learning Resources
The primary external source for this module is the official Kubernetes documentation. The local semester books add selective support for distributed-systems patterns and cloud-native design vocabulary; use them for framing, then use the official docs for exact Kubernetes behavior.
Source Stack
| Source | Role | How to use it in this module |
|---|---|---|
| kubernetes.io/docs | Primary external reference | Default destination for every "what does this field do" question |
| OCI specifications | Primary for images and the runtime contract | When a container starts failing at the runtime layer |
| CNCF graduated projects | Canonical ecosystem index | Picking a CNI, ingress, policy, or observability stack |
| kelseyhightower/kubernetes-the-hard-way | Exercise-grade control-plane reference | One-time deep internalization of the control plane by assembling it |
| Docker docs | Build-side reference | Dockerfile authoring, BuildKit, multi-stage, layer caching |
| Designing Distributed Systems | Local support | Reusable distributed patterns that map cleanly onto Kubernetes controllers, sidecars, and batch jobs |
| Design Patterns for Cloud Native Applications | Local support | Helpful for API, event, data, and stream-oriented cloud-native tradeoffs |
| Local The Linux Command Line chunks | Selective support | Shell, permissions, processes, mounting, networking primitives underneath the k8s abstractions |
| Local Pro Git chunks | Selective support | Content-addressing parallels (Git objects ↔ OCI layers); GitOps foundations |
External Resource Map by Cluster
Cluster 1: What a Container Actually Is
| Need | External resource | Why |
|---|---|---|
| namespace-by-namespace reference | namespaces(7) man page | Authoritative per-namespace semantics |
| cgroups v2 reference | cgroups(7) man page | Complete controller list and hierarchy model |
| user namespace mapping | user_namespaces(7) man page | UID/GID remapping and capability scoping |
| cgroup driver on nodes | Kubernetes: About cgroup v2 | Why the kubelet and runtime must agree on the driver |
| image format | OCI Image Spec | Source of truth for manifests, configs, layer media types |
| layer format | OCI Image Layer Filesystem Changeset | Tar format and whiteout semantics |
| runtime contract | OCI Runtime Spec | What runc actually implements |
| reference runtime code | runc on GitHub | The actual namespace/cgroup calls |
| Dockerfile best practices | Docker: Build best practices | Multi-stage, layer ordering, minimal images |
| build cache behavior | Docker: Build cache | Why COPY . . placement matters |
| image signing | Sigstore / cosign | Attaching signatures to images by digest |
| CRI contract | Kubernetes: Container Runtime Interface (CRI) | The gRPC API between kubelet and runtime |
| runtime selection | Kubernetes: Container Runtimes | CRI, containerd, CRI-O, configuration |
| alternate runtimes | Kubernetes: RuntimeClass | Selecting kata, gVisor, crun per-pod |
| node-level debug CLI | Kubernetes: crictl | Debug at the CRI boundary |
Cluster 2: Kubernetes Foundations
| Need | External resource | Why |
|---|---|---|
| control plane diagram and roles | Cluster Architecture | Definitive control-plane / node-component description |
| full component list | Kubernetes Components | Names, binaries, where each runs |
| API surface | The Kubernetes API | How groups, versions, and resources are exposed |
| etcd operations | Operating etcd for Kubernetes | Quorum, backups, compaction |
| etcd internals | etcd documentation | Raft, revisions, watch streams |
| scheduling framework | Scheduling Framework | Filter/score plugin model inside the scheduler |
| bootstrap from binaries | kubernetes-the-hard-way | Best single exercise for internalizing the control plane |
| Pod semantics | Pods | Multi-container Pods, init containers, lifecycle |
| Pod lifecycle states | Pod Lifecycle | Phases, conditions, probes, restartPolicy |
| ReplicaSets | ReplicaSets | What the Deployment actually creates |
| Deployment fields | Deployments | maxSurge, maxUnavailable, rollback, paused state |
| probes | Liveness, Readiness, and Startup Probes | The most common cause of false CrashLoopBackOff |
| pod disruption budgets | Pod Disruptions | What actual HA looks like for a Deployment |
| controllers and reconciliation | Controllers | The official statement of the loop |
| server-side apply | Server-Side Apply | Field ownership and conflict resolution |
| CRDs / operators | Custom Resources, Operator Pattern | Extending the system with the same reconcile shape |
| writing controllers | Kubebuilder Book | Canonical walkthrough for implementing a controller |
Cluster 3: Networking and Services
| Need | External resource | Why |
|---|---|---|
| network model | Cluster Networking | The canonical four-rule model and CNI landscape |
| CNI plugin invocation | Network Plugins | How the kubelet calls a CNI binary |
| CNI contract | CNI Specification | ADD/DEL/CHECK contract for any plugin |
| network policy | Network Policies | Default-deny patterns and selectors |
| modern CNI (eBPF) | Cilium docs | eBPF-based datapath, identity-aware policy, observability |
| BGP CNI | Calico docs | L3 routing and robust NetworkPolicy enforcement |
| Services reference | Service | Types, EndpointSlices, session affinity |
| EndpointSlice API | EndpointSlices | What kube-proxy actually watches |
| DNS behavior | DNS for Services and Pods | Naming scheme, search domains, PTR records |
| CoreDNS | CoreDNS docs | Plugin-based DNS; the kubernetes plugin |
| kube-proxy modes | Virtual IPs and Service Proxies | iptables vs ipvs vs nftables tradeoffs |
| Service traffic policy | Service Internal Traffic Policy | externalTrafficPolicy: Local tradeoffs |
| Ingress API | Ingress | Full Ingress field reference |
| Gateway API | Gateway API | Gateway, GatewayClass, HTTPRoute semantics |
| Gateway API SIG | gateway-api.sigs.k8s.io | Versioned specification with conformance |
| NGINX ingress | Ingress-NGINX docs | Annotations, TLS, deployment |
| automated TLS | cert-manager docs | Let's Encrypt and private CA issuers |
| service mesh | Istio: Traffic Management | When a mesh is worth its complexity |
Cluster 4: Configuration and State
| Need | External resource | Why |
|---|---|---|
| ConfigMap reference | ConfigMaps | Field reference and projection options |
| ConfigMap usage walkthrough | Configure a Pod to Use a ConfigMap | envFrom / volumes / keyRef side by side |
| Secret reference | Secrets | Types, encryption, good practices |
| etcd encryption | Encrypt Secret Data at Rest | EncryptionConfiguration setup and key rotation |
| projected volumes | Projected Volumes | Composing ConfigMap/Secret/ServiceAccountToken/DownwardAPI |
| external secret sync | External Secrets Operator | Sync from Vault / AWS / Azure / GCP |
| secret projection | Secrets Store CSI Driver | Mount external secrets without cluster Secrets |
| Vault on k8s | HashiCorp Vault: Kubernetes integration | Agent injector, dynamic credentials |
| Volume catalog | Volumes | Every built-in volume type |
| PV/PVC lifecycle | Persistent Volumes | Reclaim policies, binding, resizing |
| dynamic provisioning | Storage Classes | Parameters, provisioners, volumeBindingMode |
| volume snapshots | Volume Snapshots | CSI-based snapshot and clone workflows |
| CSI contract | CSI specification | gRPC contract between k8s and storage backends |
| CSI sidecars | Kubernetes CSI Developer Docs | external-provisioner, external-attacher, node plugin architecture |
| stateful workloads | StatefulSets | Identity, update strategies, retention policies |
| headless Services | Headless Services | The DNS behavior StatefulSets depend on |
| Postgres operator | CloudNativePG | Production-grade example of a StatefulSet-wrapping operator |
| Kafka operator | Strimzi | Kafka on Kubernetes via per-broker StatefulSets |
Cluster 5: Operating a Cluster
| Need | External resource | Why |
|---|---|---|
| resource model | Resource Management for Pods and Containers | CPU/memory semantics, units, QoS derivation |
| QoS walkthrough | Configure QoS for Pods | How QoS classes are computed and used |
| memory walkthrough | Assign Memory Resources | Hands-on including OOMKilled behavior |
| autoscaling | Horizontal Pod Autoscaling | HPA algorithm, stabilization, policies |
| HPA walkthrough | HPA walkthrough | Verify HPA against synthetic load |
| VPA | Vertical Pod Autoscaler (GitHub) | Rightsizing requests/limits automatically |
| event-driven autoscaling | KEDA | Queue-depth and Prometheus-based scaling, scale-to-zero |
| node autoscaling | Karpenter | AWS-focused cluster autoscaler that plays with HPA |
| policy levels | Pod Security Standards | privileged / baseline / restricted definitions |
| policy admission | Pod Security Admission | How namespace labels drive warn/audit/enforce |
| security contexts | Configure a Security Context | Field-by-field securityContext reference |
| access control | RBAC Authorization | Role vs ClusterRole, binding rules, aggregation |
| authorization chain | Authorization overview | RBAC + Node + Webhook authorizers |
| API request flow | Controlling Access to the API | Transport -> authn -> authz -> admission |
| policy-as-code (Rego) | OPA Gatekeeper | ConstraintTemplates for custom admission rules |
| policy-as-code (YAML) | Kyverno | YAML-native validate/mutate/generate |
| troubleshooting | Troubleshooting Applications | The official debug task guide |
| pod debug | Debugging Running Pods | kubectl debug and ephemeral containers |
| metrics pipeline | Resource Metrics Pipeline | What kubectl top actually queries |
| kubectl reference | kubectl Reference | Flags, output formats, plugins |
| metrics stack | kube-prometheus | Prometheus + Grafana + alerts as a Helm/kustomize bundle |
| converging telemetry | OpenTelemetry on Kubernetes | Unified logs/metrics/traces pipeline |
Local Book Chunks (use sparingly -- foundation for k8s abstractions)
| Concept | Chunk | Why it helps |
|---|---|---|
| namespaces / cgroups | TLCL: How a process works | PID namespace view starts here |
| cgroup limits in practice | TLCL: top / interrupting | CPU/memory semantics underneath kubectl top |
| pod termination / probes | TLCL: kill / signals | SIGTERM -> SIGKILL + terminationGracePeriodSeconds |
| securityContext UIDs | TLCL: owners / group members | The model runAsUser / fsGroup sit inside |
| file-mode of projected volumes | TLCL: read/write/execute | Secret/ConfigMap file modes map here |
| capabilities / sudo / setuid | TLCL: sudo / chgrp | The capability semantics you drop in a Pod |
| env-var injection | TLCL: environment | envFrom and env.valueFrom semantics |
| volumes / mounts | TLCL: mounting devices | Every CSI mount ends here |
| CNI / pod networking | TLCL: ip / network monitoring | Routes and interfaces on the node |
| Service debugging | TLCL: netstat / remote hosts | Reading listeners and DNAT rules |
| OCI content-addressing | Pro Git: Git objects | Same model: content-addressed blobs |
| OCI tree / manifests | Pro Git: Tree objects | How digest references make trees verifiable |
| registry deduplication | Pro Git: Packfiles | Same idea registries use across images |
| spec vs status vs observed | Pro Git: Snapshots not differences | The three-states mental model transfers to k8s |
| etcd as source of truth | Pro Git: Commit objects / object storage | resourceVersion is a revision number, etcd is .git/objects |
Use Rules
- When a concept page is enough, do not chase the official page for extra depth.
- When you do open kubernetes.io, open a specific heading, not the whole section.
- Prefer official docs over third-party blogs for Kubernetes-specific behavior; cluster semantics change version to version.
- For Kubernetes API examples pinned to a version, use
kubectl explain <resource>.<field>on your cluster in preference to search. - Use local book chunks only for kernel/process/permission/networking primitives or content-addressing intuition. Container and Kubernetes internals belong on
kubernetes.ioand OCI; do not substitute book chapters for cluster documentation.