Skip to main content

CI/CD Katas

Focused, repeatable drills. Complete each kata end-to-end. Then do it again from scratch until the shape is automatic.

Kata 1: GitHub Actions Workflow with OIDC Deploy to Staging

Time limit: 20 minutes (after first time)
Goal: build fluency authoring a workflow that builds, tests, and deploys to staging using OIDC -- no static cloud keys.
Setup: a sample repo with a simple service (any language) and an AWS or GCP target account you can configure an OIDC trust policy on.

Implementation checklist:

  1. One workflow file .github/workflows/deploy-staging.yml.
  2. Triggers: push to main and workflow_dispatch.
  3. permissions: block -- id-token: write for OIDC, contents: read.
  4. Jobs:
    • test -- lint and unit tests.
    • build -- build and push a container image tagged by commit SHA to GHCR.
    • deploy-staging -- needs: build; uses aws-actions/configure-aws-credentials with role-to-assume; deploys the image.
  5. The AWS IAM role's trust policy restricts assumption by repo:<owner>/<repo>:ref:refs/heads/main.
  6. No AWS_ACCESS_KEY_ID anywhere in the workflow or repo secrets.
  7. Smoke test after deploy -- curl the staging health endpoint, fail the job on non-200.

Repeat until: you can author this workflow from memory in under 20 minutes, it validates on first push, and the deploy succeeds with a freshly created OIDC trust policy.

Kata 2: Design a Canary Rollout for One Service

Time limit: 30 minutes
Goal: fluency with rollout design and rollback criteria.
Setup: pick a single service (real or one from workshop 02).

Produce a one-page plan containing:

  1. Strategy: canary. State why (not rolling, not blue-green) for this service.
  2. Traffic progression: exact weights and wait times (e.g., 5% -> 10m -> 25% -> 15m -> 50% -> 30m -> 100%).
  3. Success metrics: concrete queries, not vibes. At least: success rate, p95 latency. If relevant: business metric (conversion, queue depth).
  4. Rollback trigger: threshold + duration for each success metric.
  5. Rollback action: exact command or tool invocation; tested.
  6. Rollback owner: named role (e.g., on-call), no approval chain.
  7. Rollback deadline: end-to-end seconds to minutes.
  8. Observability: the deploy marker, the canary dashboard link, the alert route.

Deliverable: a Markdown doc + a working Argo Rollouts or Flagger spec (YAML) that implements it.

Repeat until: you can author both the doc and the spec in under 30 minutes for a new service description, and a teammate can critique it with fewer than 3 findings.

Kata 3: Expand/Contract Migration Paired with Backward-Compatible Code

Time limit: 45 minutes
Goal: operational fluency with DB schema changes that do not require downtime.
Setup: pick any non-trivial column change. Recommended: split a full_name column into first_name + last_name.

Produce four separately shippable PRs:

  1. PR 1 -- Expand. Migration that ADD COLUMN first_name, ADD COLUMN last_name. Code: dual-write on save, continue reading full_name.
  2. PR 2 -- Backfill. Migration to populate first_name / last_name for existing rows. Code unchanged.
  3. PR 3 -- Switch reads. Code change: read first_name / last_name, fall back to parsing full_name if null. No schema change.
  4. PR 4 -- Contract. Stop writing full_name. Later PR: ALTER TABLE ... DROP COLUMN full_name.

For each PR, state:

  • the rollback path from that deployed state
  • the minimum time window between this deploy and the next
  • what metric you would watch to confirm it is safe to proceed

Repeat until: you can sketch the four PRs and the transitions from memory in < 45 minutes, and you can name at least one alternative change type (add nullable column, change column type, split row into two tables) and adapt the pattern.

Kata 4: Write a Release Note Generator Spec

Time limit: 30 minutes
Goal: formalize release-note generation rules so they can be automated.
Setup: a repo using Conventional Commits (or pick one to retrofit).

Produce a specification document covering:

  1. Input. Range of commits (vA..vB). Optional: PR metadata from the forge API.
  2. Grouping rules. How commit types map to changelog sections:
    • feat: -> Added
    • fix: -> Fixed
    • refactor: / perf: -> optional Changed
    • docs: / chore: -> excluded by default
    • BREAKING CHANGE: or feat!: -> Changed plus a ### Breaking Changes subsection
  3. Version bump logic. How the presence of feat: vs fix: vs BREAKING CHANGE: determines MAJOR/MINOR/PATCH.
  4. Commit-to-entry transform. Rules for turning a commit subject into a bullet: strip type prefix, capitalize, include PR link if available, group by scope if present.
  5. Output. Full Markdown block ready to drop into CHANGELOG.md; separate human-facing summary for release notes.
  6. Edge cases. Reverts, merges of merges, squash-merge scopes, Dependabot PRs.
  7. Example. Run the spec against a real past release manually; compare with what was actually published.

Repeat until: two teammates, given the spec and the same commit range, produce identical changelog Markdown without consulting each other.

Completion Standard

  • Each kata completed end-to-end, in order, at least once.
  • Kata 1 and 2 repeated until within the time limit.
  • Kata 3 validated against a real production-like dataset (row counts on staging, not unit-test fixtures).
  • Kata 4 spec used to generate at least one real release note.