Skip to main content

IaC Katas

Four timed katas. Each should take 30-60 minutes. Do them spaced out, not in one sitting. They exercise design taste, not just syntax, and they loop through all five clusters.

Kata 1 -- Design a Reusable VPC Module Spec

Task. Without writing any HCL, write the spec for a reusable VPC module your team would actually use. Deliver:

  • Inputs table: name, type, default, required, description. Include at least name, cidr, azs, enable_nat, tags, and one opinionated toggle.
  • Outputs table: vpc_id, public_subnet_ids, private_subnet_ids, and two others you defend.
  • Opinions section (bulleted). For each opinion, state the scenario it rules out. Example: "Always enables VPC flow logs to an S3 bucket. Rules out silent network debugging sessions that last days."
  • Non-goals section (bulleted). What the module explicitly does not handle (e.g., peering, transit gateways, IPv6).
  • Versioning rule. When is a change major? minor? patch? Give an example of each.

Graded against Cluster 3 concepts. This is the hardest kata because it is almost entirely design.

Kata 2 -- Refactor a Monolith Root Module

Task. You inherit monolith.tf -- 600 lines, one root module, VPC + 3 services + RDS + S3 + CloudFront. Plan output is 400 lines long; nobody reads it.

Deliver a refactor plan, in writing:

  • the module boundaries you would cut (at least three), each justified with "X rate of change vs Y rate of change" logic
  • the exact moved blocks you would write to accomplish the refactor without any destroy
  • the order of PRs (first split, second split, etc.) such that every intermediate state is applyable and reviewable
  • what you would deliberately not touch in this refactor, and why

No HCL required; draw your boundaries with prose and pseudo-addresses (e.g., module.network, module.web_service, module.data). You are answering: what does this codebase look like after a calm, safe refactor?

Kata 3 -- Plan a Safe Rename with moved

Task. A resource currently addressed as aws_db_instance.primary must be renamed to aws_db_instance.platform_primary and also moved into module.data. It has prevent_destroy = true and is production.

Deliver:

  • the exact moved block(s) you would write, with commentary on each
  • the expected plan output (describe it in words: "no change to the instance; two address updates in state")
  • the PR description you would send, including the three things a reviewer should check
  • the rollback procedure if apply fails halfway
  • the reason you would not bundle any other change in this PR

Then, in a scratch environment, actually execute the plan on a stand-in resource (any renamed resource with prevent_destroy). Capture the before/after terraform state list.

Kata 4 -- Write an OPA Policy Rejecting Public S3 Buckets

Task. Write a Rego policy under policies/s3.rego that denies any Terraform plan containing an S3 bucket with a public ACL or with the public-access block disabled.

Deliver:

  • policies/s3.rego -- at minimum, rules that catch:

    • aws_s3_bucket_acl.acl in {"public-read", "public-read-write", "authenticated-read"}
    • aws_s3_bucket_public_access_block with any of the four flags set to false
  • a policies/s3_test.rego file with positive and negative test cases, using test_ prefixed rules

  • a small fixtures/ directory with two plan.json files: one that should pass, one that should fail

  • a short README.md showing the exact commands to run:

    opa test ./policies
    opa eval -d policies/s3.rego -i fixtures/bad_plan.json "data.terraform.s3.deny"
  • one paragraph in the README on what this policy cannot catch (static website hosting exceptions, CloudFront-fronted buckets, etc.) and how you would extend it.

Evidence Check

You have completed all four katas when you can:

  • hand Kata 1's spec to a peer and have them build the module without a meeting
  • make a monolith refactor (Kata 2) that a senior engineer nods through
  • execute Kata 3's rename without a destroy and explain every line of the PR
  • produce Kata 4's OPA policy and its tests -- both passing and failing -- from memory
  • explain which kata each of Clusters 1-5 is primarily testing, and why