Skip to main content

Module 4: CI/CD Pipelines & Release Engineering: Case Studies

These case studies focus on delivery safety: build once, promote, measure, deploy progressively, roll back, and secure the pipeline itself.


Case Study 1: DORA Metrics Reveal A Delivery Bottleneck

Scenario: A team says delivery is healthy because deployments are frequent. Incidents show high change failure rate and slow recovery.

Source anchor: Google Cloud's Four Keys metrics, which connects DORA metrics to delivery performance.

Module concepts: deployment frequency, lead time, change failure rate, MTTR.

Wrong Approach

Optimize only deployment frequency.

Better Approach

Measure the four together:

deployment frequency:
lead time:
change failure rate:
time to restore:

Tradeoff Table

MetricIf ignored
deployment frequencybatches grow
lead timeslow feedback
change failure ratespeed hides instability
restore timeincidents last too long

Required Artifact

Create a delivery-health dashboard and one improvement experiment.


Case Study 2: Build Once, Promote Everywhere

Scenario: CI builds one artifact for staging and another for production. The staging test passed on a different binary than the one deployed.

Source anchor: The Twelve-Factor App and modern release guidance emphasize strict separation of build, release, and run. See Twelve-Factor: Build, release, run.

Module concepts: immutable artifact, promotion, provenance, environment config.

Wrong Approach

Rebuild per environment.

Better Approach

Build once:

commit -> build image -> sign/tag digest -> deploy digest to staging -> promote same digest to prod

Tradeoff Table

ChoiceGainCost
rebuild per enveasy variablesno artifact equivalence
promote same artifacttest/prod parityconfig discipline
image digestimmutabilitytooling required
mutable tag latestconvenienceaudit risk

Required Artifact

Write a promotion pipeline with artifact ID, environments, approvals, and rollback target.


Case Study 3: Canary Without Rollback Criteria

Scenario: A canary deploy sends 5% traffic to a new version. It stays live despite elevated checkout errors because nobody defined abort thresholds.

Source anchor: Kubernetes Deployment docs and progressive delivery guidance support controlled rollouts; use Google SRE monitoring concepts to choose symptoms. See Kubernetes Deployments and Google SRE monitoring.

Module concepts: canary, metrics, rollback, deployment marker, SLO.

Wrong Approach

"Canary" means small traffic, not safe traffic.

Better Approach

Define gates:

advance if:
p95 latency within 10%
error rate below threshold
checkout success unchanged

rollback if:
SLO burn exceeds threshold

Tradeoff Table

ChoiceGainCost
manual canaryhuman judgmentslow/missed signals
automated gatesfast stopmetric quality required
blue-greenquick switchduplicate capacity
rollingefficientharder instant rollback

Required Artifact

Write canary stages, metrics, thresholds, duration, rollback command, and owner.


Case Study 4: CI Secrets Replaced With OIDC

Scenario: A GitHub Actions workflow stores a cloud access key. A forked workflow or log leak risks production credentials.

Source anchor: GitHub's OpenID Connect security hardening, which explains using OIDC tokens instead of long-lived secrets with cloud providers.

Module concepts: OIDC, short-lived credentials, CI identity, least privilege.

Wrong Approach

Put long-lived cloud keys in CI secrets.

Better Approach

Federate identity:

GitHub OIDC token -> cloud trust policy -> short-lived deploy role

Tradeoff Table

ChoiceGainCost
static keysimpleleak/rotation risk
OIDC roleshort-lived and scopedtrust-policy setup
broad rolefewer failureshigh blast radius
env-scoped rolessafermore policies

Required Artifact

Write an OIDC trust policy review: repo, branch/environment, role permissions, and audit evidence.


Case Study 5: Database Migration Breaks Rolling Deploy

Scenario: A deploy removes a column while old pods still run. Old pods crash during the rolling update.

Source anchor: GitLab's post-deployment migration guidance describes separating dangerous database changes from code rollout. See GitLab post-deployment migrations.

Module concepts: expand/contract, backward compatibility, rolling deploy, migration ordering.

Wrong Approach

Deploy incompatible schema and code together.

Better Approach

Use expand/contract:

Release A:
add new column/table
code writes both if needed

Release B:
read new shape

Post-deploy:
remove old column after old code gone

Tradeoff Table

ChoiceGainCost
one-step migrationsimplerolling deploy breakage
expand/contractsafe compatibilitymore phases
post-deploy migrationlower downtime riskprocess overhead
feature flagdecouples releasecleanup discipline

Required Artifact

Write a migration rollout plan with compatibility matrix and rollback point.


Source Map

SourceUse it for
Google Cloud Four KeysDORA delivery metrics
Twelve-Factor: Build, release, runimmutable artifact promotion
Kubernetes Deploymentsrollout and rollback mechanics
Google SRE monitoringsymptom metrics for release gates
GitHub Actions OIDCsecure cloud auth from CI
GitLab post-deployment migrationssafe schema rollout

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact tracks all four DORA metrics.
  • At least one artifact promotes an immutable artifact.
  • At least one artifact includes rollback criteria.