Module 3: Container Orchestration: Case Studies

These case studies make Kubernetes concrete: reconciliation, rollouts, probes, resources, scheduling, networking, and state.

Case Study 1: Deployment Rollout Without Readiness

Scenario: A new version starts slowly. Kubernetes sends traffic before the app has warmed caches and opened DB connections. Error rate spikes during every rollout.

Source anchor: Kubernetes Deployments, which covers rolling updates, rollout status, and rollback behavior.

Module concepts: Deployment, ReplicaSet, rolling update, readiness probe, rollback.

Wrong Approach

"The container is running, so it is ready."

Better Approach

Separate liveness from readiness:

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
livenessProbe:
  httpGet:
    path: /live
    port: 8080

Tradeoff Table

Choice	Gain	Cost
no readiness	simple	traffic before ready
readiness probe	safer rollout	must implement truthful endpoint
slow maxUnavailable	safer capacity	slower release
rollback	fast recovery	needs stable revision and migration safety

Required Artifact

Write a rollout spec with readiness/liveness, maxSurge/maxUnavailable, rollback trigger, and migration note.

Case Study 2: OOMKilled From Missing Memory Limits

Scenario: A batch pod consumes all node memory. Other workloads are evicted. The app team says "Kubernetes killed us randomly."

Source anchor: Kubernetes Resource Management for Pods and Containers, which describes requests, limits, and OOMKilled behavior.

Module concepts: requests, limits, QoS, OOMKilled, scheduling.

Wrong Approach

Deploy pods without requests/limits and hope the scheduler knows intent.

Better Approach

Set resource contracts:

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    memory: "1Gi"

Tradeoff Table

Choice	Gain	Cost
no requests	easy scheduling	noisy neighbor risk
requests	scheduler capacity signal	needs measurement
memory limit	bounds damage	OOM if too low
CPU limit	caps usage	throttling risk

Required Artifact

Create a resource sizing note from observed p50/p95/p99 CPU/memory and expected burst.

Case Study 3: Service Hides Pod IP Churn

Scenario: Clients call pod IPs directly. After a rollout, pod IPs change and clients fail.

Source anchor: Kubernetes Services explain stable network abstraction over changing Pods.

Module concepts: Pod IP, Service, selector, ClusterIP, DNS.

Wrong Approach

Treat pod IPs as durable endpoints.

Better Approach

Use Services:

Client -> service DNS -> selected healthy pods

Tradeoff Table

Choice	Gain	Cost
pod IP direct	simple debug	breaks on restart
ClusterIP service	stable internal endpoint	selector correctness
headless service	direct pod discovery	client handles endpoints
LoadBalancer	external entry	cloud cost/exposure

Required Artifact

Draw service discovery for one workload including pod labels, selector, DNS name, and failure behavior.

Case Study 4: StatefulSet For Identity, Not Just Replicas

Scenario: A database is deployed as a Deployment with three replicas. Pod names and storage identities change, confusing replication membership.

Source anchor: Kubernetes StatefulSets describe stable network identities and stable persistent storage for stateful applications.

Module concepts: StatefulSet, stable identity, PVC, headless service.

Wrong Approach

Run every replicated app as a Deployment.

Better Approach

Use StatefulSet when identity matters:

db-0, db-1, db-2
stable PVC per ordinal
headless service for peer discovery
ordered rollout when needed

Tradeoff Table

Choice	Gain	Cost
Deployment	simple stateless scaling	no stable identity
StatefulSet	stable identity/storage	more operational care
managed database	less ops	provider coupling/cost
operator	domain automation	operator complexity

Required Artifact

Write a workload decision: Deployment vs StatefulSet vs managed service.

Case Study 5: RBAC Overgrant In The Cluster

Scenario: A CI service account has cluster-admin because early deploys failed. A compromised pipeline can now read secrets and mutate every namespace.

Source anchor: Kubernetes RBAC authorization documents roles, cluster roles, role bindings, and least privilege.

Module concepts: service account, RBAC, namespace, least privilege, secret exposure.

Wrong Approach

Grant cluster-admin to make deploys pass.

Better Approach

Scope permissions:

namespace:
  production-app-a

verbs:
  get, list, watch, create, patch deployments/services/configmaps

denied:
  secrets read unless required
  cluster-wide mutation

Tradeoff Table

Choice	Gain	Cost
cluster-admin	easy	huge blast radius
namespace role	scoped	more policy work
separate deploy accounts	isolation	more identities
read secrets in CI	flexible	leakage risk

Required Artifact

Write an RBAC review: subject, namespace, verbs, resources, forbidden actions, and audit test.

Source Map

Source	Use it for
Kubernetes Deployments	rolling updates and rollback
Kubernetes resource management	requests, limits, OOMKilled
Kubernetes Services	stable networking for pods
Kubernetes StatefulSets	stable identity and storage
Kubernetes RBAC	cluster authorization

Completion Standard

At least three artifacts are completed.
At least one artifact includes rollout safety.
At least one artifact includes resources and QoS reasoning.
At least one artifact includes RBAC least privilege.

Case Study 1: Deployment Rollout Without Readiness​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 2: OOMKilled From Missing Memory Limits​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 3: Service Hides Pod IP Churn​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 4: StatefulSet For Identity, Not Just Replicas​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 5: RBAC Overgrant In The Cluster​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Source Map​

Completion Standard​

Case Study 1: Deployment Rollout Without Readiness

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 2: OOMKilled From Missing Memory Limits

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 3: Service Hides Pod IP Churn

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 4: StatefulSet For Identity, Not Just Replicas

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 5: RBAC Overgrant In The Cluster

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Source Map

Completion Standard