Module 4: CI/CD Pipelines & Release Engineering: Case Studies

These case studies focus on delivery safety: build once, promote, measure, deploy progressively, roll back, and secure the pipeline itself.

Case Study 1: DORA Metrics Reveal A Delivery Bottleneck

Scenario: A team says delivery is healthy because deployments are frequent. Incidents show high change failure rate and slow recovery.

Source anchor: Google Cloud's Four Keys metrics, which connects DORA metrics to delivery performance.

Module concepts: deployment frequency, lead time, change failure rate, MTTR.

Wrong Approach

Optimize only deployment frequency.

Better Approach

Measure the four together:

deployment frequency:
lead time:
change failure rate:
time to restore:

Tradeoff Table

Metric	If ignored
deployment frequency	batches grow
lead time	slow feedback
change failure rate	speed hides instability
restore time	incidents last too long

Required Artifact

Create a delivery-health dashboard and one improvement experiment.

Case Study 2: Build Once, Promote Everywhere

Scenario: CI builds one artifact for staging and another for production. The staging test passed on a different binary than the one deployed.

Source anchor: The Twelve-Factor App and modern release guidance emphasize strict separation of build, release, and run. See Twelve-Factor: Build, release, run.

Module concepts: immutable artifact, promotion, provenance, environment config.

Wrong Approach

Rebuild per environment.

Better Approach

Build once:

commit -> build image -> sign/tag digest -> deploy digest to staging -> promote same digest to prod

Tradeoff Table

Choice	Gain	Cost
rebuild per env	easy variables	no artifact equivalence
promote same artifact	test/prod parity	config discipline
image digest	immutability	tooling required
mutable tag latest	convenience	audit risk

Required Artifact

Write a promotion pipeline with artifact ID, environments, approvals, and rollback target.

Case Study 3: Canary Without Rollback Criteria

Scenario: A canary deploy sends 5% traffic to a new version. It stays live despite elevated checkout errors because nobody defined abort thresholds.

Source anchor: Kubernetes Deployment docs and progressive delivery guidance support controlled rollouts; use Google SRE monitoring concepts to choose symptoms. See Kubernetes Deployments and Google SRE monitoring.

Module concepts: canary, metrics, rollback, deployment marker, SLO.

Wrong Approach

"Canary" means small traffic, not safe traffic.

Better Approach

Define gates:

advance if:
  p95 latency within 10%
  error rate below threshold
  checkout success unchanged

rollback if:
  SLO burn exceeds threshold

Tradeoff Table

Choice	Gain	Cost
manual canary	human judgment	slow/missed signals
automated gates	fast stop	metric quality required
blue-green	quick switch	duplicate capacity
rolling	efficient	harder instant rollback

Required Artifact

Write canary stages, metrics, thresholds, duration, rollback command, and owner.

Case Study 4: CI Secrets Replaced With OIDC

Scenario: A GitHub Actions workflow stores a cloud access key. A forked workflow or log leak risks production credentials.

Source anchor: GitHub's OpenID Connect security hardening, which explains using OIDC tokens instead of long-lived secrets with cloud providers.

Module concepts: OIDC, short-lived credentials, CI identity, least privilege.

Wrong Approach

Put long-lived cloud keys in CI secrets.

Better Approach

Federate identity:

GitHub OIDC token -> cloud trust policy -> short-lived deploy role

Tradeoff Table

Choice	Gain	Cost
static key	simple	leak/rotation risk
OIDC role	short-lived and scoped	trust-policy setup
broad role	fewer failures	high blast radius
env-scoped roles	safer	more policies

Required Artifact

Write an OIDC trust policy review: repo, branch/environment, role permissions, and audit evidence.

Case Study 5: Database Migration Breaks Rolling Deploy

Scenario: A deploy removes a column while old pods still run. Old pods crash during the rolling update.

Source anchor: GitLab's post-deployment migration guidance describes separating dangerous database changes from code rollout. See GitLab post-deployment migrations.

Module concepts: expand/contract, backward compatibility, rolling deploy, migration ordering.

Wrong Approach

Deploy incompatible schema and code together.

Better Approach

Use expand/contract:

Release A:
  add new column/table
  code writes both if needed

Release B:
  read new shape

Post-deploy:
  remove old column after old code gone

Tradeoff Table

Choice	Gain	Cost
one-step migration	simple	rolling deploy breakage
expand/contract	safe compatibility	more phases
post-deploy migration	lower downtime risk	process overhead
feature flag	decouples release	cleanup discipline

Required Artifact

Write a migration rollout plan with compatibility matrix and rollback point.

Source Map

Source	Use it for
Google Cloud Four Keys	DORA delivery metrics
Twelve-Factor: Build, release, run	immutable artifact promotion
Kubernetes Deployments	rollout and rollback mechanics
Google SRE monitoring	symptom metrics for release gates
GitHub Actions OIDC	secure cloud auth from CI
GitLab post-deployment migrations	safe schema rollout

Completion Standard

At least three artifacts are completed.
At least one artifact tracks all four DORA metrics.
At least one artifact promotes an immutable artifact.
At least one artifact includes rollback criteria.

Case Study 1: DORA Metrics Reveal A Delivery Bottleneck​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 2: Build Once, Promote Everywhere​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 3: Canary Without Rollback Criteria​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 4: CI Secrets Replaced With OIDC​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Case Study 5: Database Migration Breaks Rolling Deploy​

Wrong Approach​

Better Approach​

Tradeoff Table​

Required Artifact​

Source Map​

Completion Standard​

Case Study 1: DORA Metrics Reveal A Delivery Bottleneck

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 2: Build Once, Promote Everywhere

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 3: Canary Without Rollback Criteria

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 4: CI Secrets Replaced With OIDC

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Case Study 5: Database Migration Breaks Rolling Deploy

Wrong Approach

Better Approach

Tradeoff Table

Required Artifact

Source Map

Completion Standard