Semester 9 Checkpoint Gate

Required Output Classification

Required output	Classification	Public/private guidance
Closed-book prompts, self-assessment answers, and skills matrices	`Practice artifact`	Use for honest calibration; do not publish raw answers unless rewritten as a study guide.
Required evidence gate items, sign-off checklist, and readiness decision	`Checkpoint evidence`	Keep as private progression evidence; share only sanitized summaries with mentors or reviewers.
Repair artifacts produced after a weak checkpoint, such as corrected solutions, diagrams, traces, benchmarks, or runbooks	`Checkpoint evidence`	Store beside the checkpoint so the remediation trail is inspectable without making mistakes public.
Reviewer notes or mentor feedback that materially improve a project artifact	`Portfolio candidate`	Convert into public-safe acknowledgements or changelog entries only after removing private feedback context.

Pass this self-assessment before treating Semester 9 as complete and before starting the Semester 10 capstone. The capstone assumes you can deploy, operate, and defend production systems without hand-holding; if the evidence below is missing or weak, the capstone will expose it painfully. A go decision requires at least six "Y" rows in the matrix, clean from-memory answers on both prompts, and no more than two items in the remediation plan.

Entry Criteria

All five module quizzes passed (≥ 80%) and module Feynman notes written.
Semester 9 project deployed or reproduced on the selected local-first/cloud-sandbox track, reviewed, and linked below.
Budget ceiling, billing-alert evidence for any real cloud deployment, and teardown checklist are complete.
Spaced-repetition decks for S9M1-M5 created and at least one full review cycle completed.

Skills Verification Matrix

Skill	Can you do this without looking it up?	Evidence (link / note)	Pass (Y/N)
Draw the shared-responsibility boundary for your project's compute and data layers and name what the provider owns vs what you own		Shared-responsibility diagram in `library/raw/architecture.md`
Explain the VPC topology of your project: subnets, routing, NAT, and which resources live where, in under 5 minutes on a whiteboard		VPC diagram + `terraform/network/` module
Apply an IaC change safely end to end: branch, local `terraform validate`/static scan, `terraform plan` reviewed in PR, `apply` with locking if using cloud, drift check after		PR link with validation output, posted plan, and successful local run or apply run
Write or critique a reusable Terraform module interface (inputs, outputs, versioning) without copying from another project		Module in `terraform/modules/` + README
Describe the Kubernetes objects you actually used (`Deployment`, `Service`, `Ingress`, `ConfigMap`, `Secret`, `HPA`) and why each one fits, first proven on kind/minikube/k3d before managed Kubernetes		Manifests or Helm chart in `k8s/` plus local-cluster evidence
Walk through every stage of your CI/CD pipeline, name the failure mode each stage catches, and execute a rollback in under 5 minutes		Workflow file + rollback demo recording or PR
Enumerate the cloud security controls you implemented (IAM roles, secrets manager, OIDC for CI, encryption at rest/in transit) and point at the code for each		`library/raw/security-review.md` with links
Tie a user request end to end across your observability stack (log line ↔ trace span ↔ metric) using one request ID, using local OpenTelemetry/logging tooling if paid services are not needed		Dashboard screenshot + example trace URL or local trace export
Define your SLIs/SLOs precisely and show what happens when the error budget is spent		`library/raw/slo.md` + alert policy file

Explain-From-Memory Prompts

Do these with a pen, a blank page, and a timer. No tabs open.

Draw your full deployment topology from memory. Local-first: Docker Compose network, kind/minikube/k3d cluster, services, database, ingress, mock services, CI/CD path, and any local secret flow. Cloud sandbox: VPC, subnets across AZs, managed cluster, services, database, load balancer, CI/CD path, and the IAM trust relationship between GitHub Actions and the cloud. Annotate where secrets flow and where TLS starts and ends.
Describe, step by step, one incident your dashboard would detect: the alert that fires, the first three places you look, the most likely root causes, and what the rollback or remediation action is. Name the runbook section you would follow and what you would update in it after the incident.

Cost and Safety Gate

A checkpoint cannot pass until these guardrails are evidenced:

Budget ceiling: the project declares the active ceiling (default ≤ $50 total for cloud sandbox; $0 cloud spend for local-first).
Billing alert requirement: any real cloud deployment has budget/billing alerts configured before resources are created.
Required teardown step: every cloud apply/deploy has a linked teardown command or checklist entry, with final state verified.
No long-lived paid resources: any paid resource kept overnight has an owner, expiration date, reason, and alert coverage; otherwise it is destroyed.
Track fit: local-first work uses Docker Compose, kind/minikube/k3d, local Terraform validation, mock services, and local observability; cloud sandbox work uses least-privilege IAM, short-lived resources, and documented teardown.

Go / No-Go Questions

Answer each aloud or in writing; any "no" means work to do before the capstone.

If I delete my cloud account tomorrow and clone my repo, can I still reproduce the local-first environment, or stand up the cloud sandbox from a clean terraform apply with no console clicks?
If my CI secret leaked right now, what is the blast radius and what is the longest-lived credential that would still be valid tomorrow?
Can I name the top three cost drivers or cost drivers avoided, prove my budget ceiling and billing alerts existed before any cloud deploy, and show one optimization experiment with measurable before/after?
If the api pods start crash-looping at 03:00, do I have an alert, a dashboard, and a runbook that would get a teammate (not me) to recovery?
Could another engineer review my IaC and pipeline without me in the room and tell me what is wrong?
Have I practiced a rollback on purpose, against a deliberately broken deploy, timed how long it took, and completed teardown for any paid resources?

Remediation Plan

Fill in only for rows scored "N" in the matrix above or any "no" on the go/no-go list.

Gap	Remediation
E.g., "Cannot explain my IAM policy for the CI role"	Re-read S9M1 Cluster 5 + S9M5 identity concepts; rewrite the policy from scratch, then diff against the current one
E.g., "No runbook entry exists for `api` latency alert"	Draft a runbook entry with first three diagnostic steps; fire the alert on purpose and follow your own runbook
E.g., "Terraform state is local, not remote"	If cloud sandbox: bootstrap S3 + DynamoDB lock backend and migrate state with `terraform init -migrate-state`; if local-first: document why local state is acceptable and protect it from accidental commits
E.g., "No billing alert before EKS test"	Stop provisioning, configure budget/billing alerts, rerun the smallest local kind test, then recreate the cloud sandbox only with a teardown timer

Added Gate Evidence

Bring these delivery-discipline artifacts:

sprint plan or iteration plan tied to the deployed project
branch and release policy with merge, rebase, hotfix, and rollback rules
SQA checklist mapped to CI/CD gates
maintenance/support note for one realistic post-release issue
one paragraph connecting process choice to quality gates, rollback, and operations

Sign-Off

Date completed: YYYY-MM-DD
Honest self-rating (1-5) across matrix: N
Top gap carried into the capstone: one sentence
Ready for Semester 10 (Capstone): Y / N -- one-sentence justification tying the decision to concrete evidence above

Mastery Rubric

Level	Evidence
Beginner pass	Can answer direct questions and complete familiar exercises with light notes.
Solid pass	Can solve new variants, explain choices, and connect the work to Semester 8 System Design and Technical Leadership.
Strong pass	Can defend tradeoffs, identify failure modes, and produce clean evidence in the portfolio artifact.
Not ready	Relies on copied solutions, cannot explain mistakes, or lacks durable artifacts.

Retake and Repair Rule

If a section is weak, do not only reread. Repair it by producing new evidence: a corrected solution, a fresh implementation, a rewritten proof, a benchmark, a diagram, a runbook, or a short teaching note.

Answer-Quality Examples

Use these examples when grading written answers or spoken explanations.

Quality	Example pattern
Weak	Names a concept but gives no example, constraint, or failure case.
Acceptable	Defines the concept and applies it to a familiar exercise.
Strong	Applies the concept to a new variant and explains why an alternative would fail.
Portfolio-ready	Connects the concept to Semester 8 System Design and Technical Leadership, current project evidence, and a future capstone decision.

Interleaving Prompt

For any missed answer, add one sentence starting with: This depends on an earlier skill because...

Calibration Materials

Use these learner-visible calibration materials before self-grading or requesting review:

Required Output Classification​

Entry Criteria​

Skills Verification Matrix​

Explain-From-Memory Prompts​

Cost and Safety Gate​

Go / No-Go Questions​

Remediation Plan​

Added Gate Evidence​

Sign-Off​

Mastery Rubric​

Retake and Repair Rule​

Answer-Quality Examples​

Interleaving Prompt​

For any missed answer, add one sentence starting with: This depends on an earlier skill because...​

Calibration Materials​