Semester 9 Checkpoint Gate
Required Output Classification
| Required output | Classification | Public/private guidance |
|---|---|---|
| Closed-book prompts, self-assessment answers, and skills matrices | Practice artifact | Use for honest calibration; do not publish raw answers unless rewritten as a study guide. |
| Required evidence gate items, sign-off checklist, and readiness decision | Checkpoint evidence | Keep as private progression evidence; share only sanitized summaries with mentors or reviewers. |
| Repair artifacts produced after a weak checkpoint, such as corrected solutions, diagrams, traces, benchmarks, or runbooks | Checkpoint evidence | Store beside the checkpoint so the remediation trail is inspectable without making mistakes public. |
| Reviewer notes or mentor feedback that materially improve a project artifact | Portfolio candidate | Convert into public-safe acknowledgements or changelog entries only after removing private feedback context. |
Pass this self-assessment before treating Semester 9 as complete and before starting the Semester 10 capstone. The capstone assumes you can deploy, operate, and defend production systems without hand-holding; if the evidence below is missing or weak, the capstone will expose it painfully. A go decision requires at least six "Y" rows in the matrix, clean from-memory answers on both prompts, and no more than two items in the remediation plan.
Entry Criteria
- All five module quizzes passed (≥ 80%) and module Feynman notes written.
- Semester 9 project deployed or reproduced on the selected local-first/cloud-sandbox track, reviewed, and linked below.
- Budget ceiling, billing-alert evidence for any real cloud deployment, and teardown checklist are complete.
- Spaced-repetition decks for S9M1-M5 created and at least one full review cycle completed.
Skills Verification Matrix
| Skill | Can you do this without looking it up? | Evidence (link / note) | Pass (Y/N) |
|---|---|---|---|
| Draw the shared-responsibility boundary for your project's compute and data layers and name what the provider owns vs what you own | Shared-responsibility diagram in library/raw/architecture.md | ||
| Explain the VPC topology of your project: subnets, routing, NAT, and which resources live where, in under 5 minutes on a whiteboard | VPC diagram + terraform/network/ module | ||
Apply an IaC change safely end to end: branch, local terraform validate/static scan, terraform plan reviewed in PR, apply with locking if using cloud, drift check after | PR link with validation output, posted plan, and successful local run or apply run | ||
| Write or critique a reusable Terraform module interface (inputs, outputs, versioning) without copying from another project | Module in terraform/modules/ + README | ||
Describe the Kubernetes objects you actually used (Deployment, Service, Ingress, ConfigMap, Secret, HPA) and why each one fits, first proven on kind/minikube/k3d before managed Kubernetes | Manifests or Helm chart in k8s/ plus local-cluster evidence | ||
| Walk through every stage of your CI/CD pipeline, name the failure mode each stage catches, and execute a rollback in under 5 minutes | Workflow file + rollback demo recording or PR | ||
| Enumerate the cloud security controls you implemented (IAM roles, secrets manager, OIDC for CI, encryption at rest/in transit) and point at the code for each | library/raw/security-review.md with links | ||
| Tie a user request end to end across your observability stack (log line ↔ trace span ↔ metric) using one request ID, using local OpenTelemetry/logging tooling if paid services are not needed | Dashboard screenshot + example trace URL or local trace export | ||
| Define your SLIs/SLOs precisely and show what happens when the error budget is spent | library/raw/slo.md + alert policy file |
Explain-From-Memory Prompts
Do these with a pen, a blank page, and a timer. No tabs open.
- Draw your full deployment topology from memory. Local-first: Docker Compose network, kind/minikube/k3d cluster, services, database, ingress, mock services, CI/CD path, and any local secret flow. Cloud sandbox: VPC, subnets across AZs, managed cluster, services, database, load balancer, CI/CD path, and the IAM trust relationship between GitHub Actions and the cloud. Annotate where secrets flow and where TLS starts and ends.
- Describe, step by step, one incident your dashboard would detect: the alert that fires, the first three places you look, the most likely root causes, and what the rollback or remediation action is. Name the runbook section you would follow and what you would update in it after the incident.
Cost and Safety Gate
A checkpoint cannot pass until these guardrails are evidenced:
- Budget ceiling: the project declares the active ceiling (default ≤ $50 total for cloud sandbox; $0 cloud spend for local-first).
- Billing alert requirement: any real cloud deployment has budget/billing alerts configured before resources are created.
- Required teardown step: every cloud apply/deploy has a linked teardown command or checklist entry, with final state verified.
- No long-lived paid resources: any paid resource kept overnight has an owner, expiration date, reason, and alert coverage; otherwise it is destroyed.
- Track fit: local-first work uses Docker Compose, kind/minikube/k3d, local Terraform validation, mock services, and local observability; cloud sandbox work uses least-privilege IAM, short-lived resources, and documented teardown.
Go / No-Go Questions
Answer each aloud or in writing; any "no" means work to do before the capstone.
- If I delete my cloud account tomorrow and clone my repo, can I still reproduce the local-first environment, or stand up the cloud sandbox from a clean
terraform applywith no console clicks? - If my CI secret leaked right now, what is the blast radius and what is the longest-lived credential that would still be valid tomorrow?
- Can I name the top three cost drivers or cost drivers avoided, prove my budget ceiling and billing alerts existed before any cloud deploy, and show one optimization experiment with measurable before/after?
- If the
apipods start crash-looping at 03:00, do I have an alert, a dashboard, and a runbook that would get a teammate (not me) to recovery? - Could another engineer review my IaC and pipeline without me in the room and tell me what is wrong?
- Have I practiced a rollback on purpose, against a deliberately broken deploy, timed how long it took, and completed teardown for any paid resources?
Remediation Plan
Fill in only for rows scored "N" in the matrix above or any "no" on the go/no-go list.
| Gap | Remediation |
|---|---|
| E.g., "Cannot explain my IAM policy for the CI role" | Re-read S9M1 Cluster 5 + S9M5 identity concepts; rewrite the policy from scratch, then diff against the current one |
E.g., "No runbook entry exists for api latency alert" | Draft a runbook entry with first three diagnostic steps; fire the alert on purpose and follow your own runbook |
| E.g., "Terraform state is local, not remote" | If cloud sandbox: bootstrap S3 + DynamoDB lock backend and migrate state with terraform init -migrate-state; if local-first: document why local state is acceptable and protect it from accidental commits |
| E.g., "No billing alert before EKS test" | Stop provisioning, configure budget/billing alerts, rerun the smallest local kind test, then recreate the cloud sandbox only with a teardown timer |
Added Gate Evidence
Bring these delivery-discipline artifacts:
- sprint plan or iteration plan tied to the deployed project
- branch and release policy with merge, rebase, hotfix, and rollback rules
- SQA checklist mapped to CI/CD gates
- maintenance/support note for one realistic post-release issue
- one paragraph connecting process choice to quality gates, rollback, and operations
Sign-Off
- Date completed: YYYY-MM-DD
- Honest self-rating (1-5) across matrix: N
- Top gap carried into the capstone: one sentence
- Ready for Semester 10 (Capstone): Y / N -- one-sentence justification tying the decision to concrete evidence above
Mastery Rubric
| Level | Evidence |
|---|---|
| Beginner pass | Can answer direct questions and complete familiar exercises with light notes. |
| Solid pass | Can solve new variants, explain choices, and connect the work to Semester 8 System Design and Technical Leadership. |
| Strong pass | Can defend tradeoffs, identify failure modes, and produce clean evidence in the portfolio artifact. |
| Not ready | Relies on copied solutions, cannot explain mistakes, or lacks durable artifacts. |
Retake and Repair Rule
If a section is weak, do not only reread. Repair it by producing new evidence: a corrected solution, a fresh implementation, a rewritten proof, a benchmark, a diagram, a runbook, or a short teaching note.
Answer-Quality Examples
Use these examples when grading written answers or spoken explanations.
| Quality | Example pattern |
|---|---|
| Weak | Names a concept but gives no example, constraint, or failure case. |
| Acceptable | Defines the concept and applies it to a familiar exercise. |
| Strong | Applies the concept to a new variant and explains why an alternative would fail. |
| Portfolio-ready | Connects the concept to Semester 8 System Design and Technical Leadership, current project evidence, and a future capstone decision. |
Interleaving Prompt
For any missed answer, add one sentence starting with: This depends on an earlier skill because...
Calibration Materials
Use these learner-visible calibration materials before self-grading or requesting review: