CI/CD Katas
Focused, repeatable drills. Complete each kata end-to-end. Then do it again from scratch until the shape is automatic.
Kata 1: GitHub Actions Workflow with OIDC Deploy to Staging
Time limit: 20 minutes (after first time)
Goal: build fluency authoring a workflow that builds, tests, and deploys to staging using OIDC -- no static cloud keys.
Setup: a sample repo with a simple service (any language) and an AWS or GCP target account you can configure an OIDC trust policy on.
Implementation checklist:
- One workflow file
.github/workflows/deploy-staging.yml. - Triggers:
pushtomainandworkflow_dispatch. permissions:block --id-token: writefor OIDC,contents: read.- Jobs:
test-- lint and unit tests.build-- build and push a container image tagged by commit SHA to GHCR.deploy-staging--needs: build; usesaws-actions/configure-aws-credentialswithrole-to-assume; deploys the image.
- The AWS IAM role's trust policy restricts assumption by
repo:<owner>/<repo>:ref:refs/heads/main. - No
AWS_ACCESS_KEY_IDanywhere in the workflow or repo secrets. - Smoke test after deploy --
curlthe staging health endpoint, fail the job on non-200.
Repeat until: you can author this workflow from memory in under 20 minutes, it validates on first push, and the deploy succeeds with a freshly created OIDC trust policy.
Kata 2: Design a Canary Rollout for One Service
Time limit: 30 minutes
Goal: fluency with rollout design and rollback criteria.
Setup: pick a single service (real or one from workshop 02).
Produce a one-page plan containing:
- Strategy: canary. State why (not rolling, not blue-green) for this service.
- Traffic progression: exact weights and wait times (e.g.,
5% -> 10m -> 25% -> 15m -> 50% -> 30m -> 100%). - Success metrics: concrete queries, not vibes. At least: success rate, p95 latency. If relevant: business metric (conversion, queue depth).
- Rollback trigger: threshold + duration for each success metric.
- Rollback action: exact command or tool invocation; tested.
- Rollback owner: named role (e.g., on-call), no approval chain.
- Rollback deadline: end-to-end seconds to minutes.
- Observability: the deploy marker, the canary dashboard link, the alert route.
Deliverable: a Markdown doc + a working Argo Rollouts or Flagger spec (YAML) that implements it.
Repeat until: you can author both the doc and the spec in under 30 minutes for a new service description, and a teammate can critique it with fewer than 3 findings.
Kata 3: Expand/Contract Migration Paired with Backward-Compatible Code
Time limit: 45 minutes
Goal: operational fluency with DB schema changes that do not require downtime.
Setup: pick any non-trivial column change. Recommended: split a full_name column into first_name + last_name.
Produce four separately shippable PRs:
- PR 1 -- Expand. Migration that
ADD COLUMN first_name,ADD COLUMN last_name. Code: dual-write on save, continue readingfull_name. - PR 2 -- Backfill. Migration to populate
first_name/last_namefor existing rows. Code unchanged. - PR 3 -- Switch reads. Code change: read
first_name/last_name, fall back to parsingfull_nameif null. No schema change. - PR 4 -- Contract. Stop writing
full_name. Later PR:ALTER TABLE ... DROP COLUMN full_name.
For each PR, state:
- the rollback path from that deployed state
- the minimum time window between this deploy and the next
- what metric you would watch to confirm it is safe to proceed
Repeat until: you can sketch the four PRs and the transitions from memory in < 45 minutes, and you can name at least one alternative change type (add nullable column, change column type, split row into two tables) and adapt the pattern.
Kata 4: Write a Release Note Generator Spec
Time limit: 30 minutes
Goal: formalize release-note generation rules so they can be automated.
Setup: a repo using Conventional Commits (or pick one to retrofit).
Produce a specification document covering:
- Input. Range of commits (
vA..vB). Optional: PR metadata from the forge API. - Grouping rules. How commit types map to changelog sections:
feat:->Addedfix:->Fixedrefactor:/perf:-> optionalChangeddocs:/chore:-> excluded by defaultBREAKING CHANGE:orfeat!:->Changedplus a### Breaking Changessubsection
- Version bump logic. How the presence of
feat:vsfix:vsBREAKING CHANGE:determines MAJOR/MINOR/PATCH. - Commit-to-entry transform. Rules for turning a commit subject into a bullet: strip type prefix, capitalize, include PR link if available, group by scope if present.
- Output. Full Markdown block ready to drop into
CHANGELOG.md; separate human-facing summary for release notes. - Edge cases. Reverts, merges of merges, squash-merge scopes, Dependabot PRs.
- Example. Run the spec against a real past release manually; compare with what was actually published.
Repeat until: two teammates, given the spec and the same commit range, produce identical changelog Markdown without consulting each other.
Completion Standard
- Each kata completed end-to-end, in order, at least once.
- Kata 1 and 2 repeated until within the time limit.
- Kata 3 validated against a real production-like dataset (row counts on staging, not unit-test fixtures).
- Kata 4 spec used to generate at least one real release note.