Skip to main content

Module 3: Container Orchestration: Case Studies

These case studies make Kubernetes concrete: reconciliation, rollouts, probes, resources, scheduling, networking, and state.


Case Study 1: Deployment Rollout Without Readiness

Scenario: A new version starts slowly. Kubernetes sends traffic before the app has warmed caches and opened DB connections. Error rate spikes during every rollout.

Source anchor: Kubernetes Deployments, which covers rolling updates, rollout status, and rollback behavior.

Module concepts: Deployment, ReplicaSet, rolling update, readiness probe, rollback.

Wrong Approach

"The container is running, so it is ready."

Better Approach

Separate liveness from readiness:

readinessProbe:
httpGet:
path: /ready
port: 8080
livenessProbe:
httpGet:
path: /live
port: 8080

Tradeoff Table

ChoiceGainCost
no readinesssimpletraffic before ready
readiness probesafer rolloutmust implement truthful endpoint
slow maxUnavailablesafer capacityslower release
rollbackfast recoveryneeds stable revision and migration safety

Required Artifact

Write a rollout spec with readiness/liveness, maxSurge/maxUnavailable, rollback trigger, and migration note.


Case Study 2: OOMKilled From Missing Memory Limits

Scenario: A batch pod consumes all node memory. Other workloads are evicted. The app team says "Kubernetes killed us randomly."

Source anchor: Kubernetes Resource Management for Pods and Containers, which describes requests, limits, and OOMKilled behavior.

Module concepts: requests, limits, QoS, OOMKilled, scheduling.

Wrong Approach

Deploy pods without requests/limits and hope the scheduler knows intent.

Better Approach

Set resource contracts:

resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
memory: "1Gi"

Tradeoff Table

ChoiceGainCost
no requestseasy schedulingnoisy neighbor risk
requestsscheduler capacity signalneeds measurement
memory limitbounds damageOOM if too low
CPU limitcaps usagethrottling risk

Required Artifact

Create a resource sizing note from observed p50/p95/p99 CPU/memory and expected burst.


Case Study 3: Service Hides Pod IP Churn

Scenario: Clients call pod IPs directly. After a rollout, pod IPs change and clients fail.

Source anchor: Kubernetes Services explain stable network abstraction over changing Pods.

Module concepts: Pod IP, Service, selector, ClusterIP, DNS.

Wrong Approach

Treat pod IPs as durable endpoints.

Better Approach

Use Services:

Client -> service DNS -> selected healthy pods

Tradeoff Table

ChoiceGainCost
pod IP directsimple debugbreaks on restart
ClusterIP servicestable internal endpointselector correctness
headless servicedirect pod discoveryclient handles endpoints
LoadBalancerexternal entrycloud cost/exposure

Required Artifact

Draw service discovery for one workload including pod labels, selector, DNS name, and failure behavior.


Case Study 4: StatefulSet For Identity, Not Just Replicas

Scenario: A database is deployed as a Deployment with three replicas. Pod names and storage identities change, confusing replication membership.

Source anchor: Kubernetes StatefulSets describe stable network identities and stable persistent storage for stateful applications.

Module concepts: StatefulSet, stable identity, PVC, headless service.

Wrong Approach

Run every replicated app as a Deployment.

Better Approach

Use StatefulSet when identity matters:

db-0, db-1, db-2
stable PVC per ordinal
headless service for peer discovery
ordered rollout when needed

Tradeoff Table

ChoiceGainCost
Deploymentsimple stateless scalingno stable identity
StatefulSetstable identity/storagemore operational care
managed databaseless opsprovider coupling/cost
operatordomain automationoperator complexity

Required Artifact

Write a workload decision: Deployment vs StatefulSet vs managed service.


Case Study 5: RBAC Overgrant In The Cluster

Scenario: A CI service account has cluster-admin because early deploys failed. A compromised pipeline can now read secrets and mutate every namespace.

Source anchor: Kubernetes RBAC authorization documents roles, cluster roles, role bindings, and least privilege.

Module concepts: service account, RBAC, namespace, least privilege, secret exposure.

Wrong Approach

Grant cluster-admin to make deploys pass.

Better Approach

Scope permissions:

namespace:
production-app-a

verbs:
get, list, watch, create, patch deployments/services/configmaps

denied:
secrets read unless required
cluster-wide mutation

Tradeoff Table

ChoiceGainCost
cluster-admineasyhuge blast radius
namespace rolescopedmore policy work
separate deploy accountsisolationmore identities
read secrets in CIflexibleleakage risk

Required Artifact

Write an RBAC review: subject, namespace, verbs, resources, forbidden actions, and audit test.


Source Map

SourceUse it for
Kubernetes Deploymentsrolling updates and rollback
Kubernetes resource managementrequests, limits, OOMKilled
Kubernetes Servicesstable networking for pods
Kubernetes StatefulSetsstable identity and storage
Kubernetes RBACcluster authorization

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact includes rollout safety.
  • At least one artifact includes resources and QoS reasoning.
  • At least one artifact includes RBAC least privilege.