Skip to main content

Semester 9 Checkpoint Gate

Required Output Classification

Required outputClassificationPublic/private guidance
Closed-book prompts, self-assessment answers, and skills matricesPractice artifactUse for honest calibration; do not publish raw answers unless rewritten as a study guide.
Required evidence gate items, sign-off checklist, and readiness decisionCheckpoint evidenceKeep as private progression evidence; share only sanitized summaries with mentors or reviewers.
Repair artifacts produced after a weak checkpoint, such as corrected solutions, diagrams, traces, benchmarks, or runbooksCheckpoint evidenceStore beside the checkpoint so the remediation trail is inspectable without making mistakes public.
Reviewer notes or mentor feedback that materially improve a project artifactPortfolio candidateConvert into public-safe acknowledgements or changelog entries only after removing private feedback context.

Pass this self-assessment before treating Semester 9 as complete and before starting the Semester 10 capstone. The capstone assumes you can deploy, operate, and defend production systems without hand-holding; if the evidence below is missing or weak, the capstone will expose it painfully. A go decision requires at least six "Y" rows in the matrix, clean from-memory answers on both prompts, and no more than two items in the remediation plan.


Entry Criteria

  • All five module quizzes passed (≥ 80%) and module Feynman notes written.
  • Semester 9 project deployed or reproduced on the selected local-first/cloud-sandbox track, reviewed, and linked below.
  • Budget ceiling, billing-alert evidence for any real cloud deployment, and teardown checklist are complete.
  • Spaced-repetition decks for S9M1-M5 created and at least one full review cycle completed.

Skills Verification Matrix

SkillCan you do this without looking it up?Evidence (link / note)Pass (Y/N)
Draw the shared-responsibility boundary for your project's compute and data layers and name what the provider owns vs what you ownShared-responsibility diagram in library/raw/architecture.md
Explain the VPC topology of your project: subnets, routing, NAT, and which resources live where, in under 5 minutes on a whiteboardVPC diagram + terraform/network/ module
Apply an IaC change safely end to end: branch, local terraform validate/static scan, terraform plan reviewed in PR, apply with locking if using cloud, drift check afterPR link with validation output, posted plan, and successful local run or apply run
Write or critique a reusable Terraform module interface (inputs, outputs, versioning) without copying from another projectModule in terraform/modules/ + README
Describe the Kubernetes objects you actually used (Deployment, Service, Ingress, ConfigMap, Secret, HPA) and why each one fits, first proven on kind/minikube/k3d before managed KubernetesManifests or Helm chart in k8s/ plus local-cluster evidence
Walk through every stage of your CI/CD pipeline, name the failure mode each stage catches, and execute a rollback in under 5 minutesWorkflow file + rollback demo recording or PR
Enumerate the cloud security controls you implemented (IAM roles, secrets manager, OIDC for CI, encryption at rest/in transit) and point at the code for eachlibrary/raw/security-review.md with links
Tie a user request end to end across your observability stack (log line ↔ trace span ↔ metric) using one request ID, using local OpenTelemetry/logging tooling if paid services are not neededDashboard screenshot + example trace URL or local trace export
Define your SLIs/SLOs precisely and show what happens when the error budget is spentlibrary/raw/slo.md + alert policy file

Explain-From-Memory Prompts

Do these with a pen, a blank page, and a timer. No tabs open.

  1. Draw your full deployment topology from memory. Local-first: Docker Compose network, kind/minikube/k3d cluster, services, database, ingress, mock services, CI/CD path, and any local secret flow. Cloud sandbox: VPC, subnets across AZs, managed cluster, services, database, load balancer, CI/CD path, and the IAM trust relationship between GitHub Actions and the cloud. Annotate where secrets flow and where TLS starts and ends.
  2. Describe, step by step, one incident your dashboard would detect: the alert that fires, the first three places you look, the most likely root causes, and what the rollback or remediation action is. Name the runbook section you would follow and what you would update in it after the incident.

Cost and Safety Gate

A checkpoint cannot pass until these guardrails are evidenced:

  • Budget ceiling: the project declares the active ceiling (default ≤ $50 total for cloud sandbox; $0 cloud spend for local-first).
  • Billing alert requirement: any real cloud deployment has budget/billing alerts configured before resources are created.
  • Required teardown step: every cloud apply/deploy has a linked teardown command or checklist entry, with final state verified.
  • No long-lived paid resources: any paid resource kept overnight has an owner, expiration date, reason, and alert coverage; otherwise it is destroyed.
  • Track fit: local-first work uses Docker Compose, kind/minikube/k3d, local Terraform validation, mock services, and local observability; cloud sandbox work uses least-privilege IAM, short-lived resources, and documented teardown.

Go / No-Go Questions

Answer each aloud or in writing; any "no" means work to do before the capstone.

  1. If I delete my cloud account tomorrow and clone my repo, can I still reproduce the local-first environment, or stand up the cloud sandbox from a clean terraform apply with no console clicks?
  2. If my CI secret leaked right now, what is the blast radius and what is the longest-lived credential that would still be valid tomorrow?
  3. Can I name the top three cost drivers or cost drivers avoided, prove my budget ceiling and billing alerts existed before any cloud deploy, and show one optimization experiment with measurable before/after?
  4. If the api pods start crash-looping at 03:00, do I have an alert, a dashboard, and a runbook that would get a teammate (not me) to recovery?
  5. Could another engineer review my IaC and pipeline without me in the room and tell me what is wrong?
  6. Have I practiced a rollback on purpose, against a deliberately broken deploy, timed how long it took, and completed teardown for any paid resources?

Remediation Plan

Fill in only for rows scored "N" in the matrix above or any "no" on the go/no-go list.

GapRemediation
E.g., "Cannot explain my IAM policy for the CI role"Re-read S9M1 Cluster 5 + S9M5 identity concepts; rewrite the policy from scratch, then diff against the current one
E.g., "No runbook entry exists for api latency alert"Draft a runbook entry with first three diagnostic steps; fire the alert on purpose and follow your own runbook
E.g., "Terraform state is local, not remote"If cloud sandbox: bootstrap S3 + DynamoDB lock backend and migrate state with terraform init -migrate-state; if local-first: document why local state is acceptable and protect it from accidental commits
E.g., "No billing alert before EKS test"Stop provisioning, configure budget/billing alerts, rerun the smallest local kind test, then recreate the cloud sandbox only with a teardown timer

Added Gate Evidence

Bring these delivery-discipline artifacts:

  • sprint plan or iteration plan tied to the deployed project
  • branch and release policy with merge, rebase, hotfix, and rollback rules
  • SQA checklist mapped to CI/CD gates
  • maintenance/support note for one realistic post-release issue
  • one paragraph connecting process choice to quality gates, rollback, and operations

Sign-Off

  • Date completed: YYYY-MM-DD
  • Honest self-rating (1-5) across matrix: N
  • Top gap carried into the capstone: one sentence
  • Ready for Semester 10 (Capstone): Y / N -- one-sentence justification tying the decision to concrete evidence above

Mastery Rubric

LevelEvidence
Beginner passCan answer direct questions and complete familiar exercises with light notes.
Solid passCan solve new variants, explain choices, and connect the work to Semester 8 System Design and Technical Leadership.
Strong passCan defend tradeoffs, identify failure modes, and produce clean evidence in the portfolio artifact.
Not readyRelies on copied solutions, cannot explain mistakes, or lacks durable artifacts.

Retake and Repair Rule

If a section is weak, do not only reread. Repair it by producing new evidence: a corrected solution, a fresh implementation, a rewritten proof, a benchmark, a diagram, a runbook, or a short teaching note.


Answer-Quality Examples

Use these examples when grading written answers or spoken explanations.

QualityExample pattern
WeakNames a concept but gives no example, constraint, or failure case.
AcceptableDefines the concept and applies it to a familiar exercise.
StrongApplies the concept to a new variant and explains why an alternative would fail.
Portfolio-readyConnects the concept to Semester 8 System Design and Technical Leadership, current project evidence, and a future capstone decision.

Interleaving Prompt

For any missed answer, add one sentence starting with: This depends on an earlier skill because...

Calibration Materials

Use these learner-visible calibration materials before self-grading or requesting review: