Semester 8 Exam

Required Output Classification

Required output	Classification	Public/private guidance
Timed written answers, diagrams, code snippets, and design responses	`Checkpoint evidence`	Keep raw exam work private so it remains useful for assessment and retake calibration.
Post-exam review notes, missed-answer repairs, and Feynman explanations	`Practice artifact`	Use for spaced review; publish only rewritten explanations that no longer reveal exam solutions wholesale.
Capstone-defense or architecture-defense packets created from exam prompts	`Portfolio candidate`	Polish publicly only when they are original to your project, sanitized, and framed as engineering rationale rather than exam answers.

Timed, written exam that simulates a staff-level internal technical evaluation. The grader cares about reasoning -- framing, numbers, named tradeoffs -- far more than trivia recall.

Instructions

Duration: 2.5 hours. Closed book for design and leadership answers; you may consult a one-page personal notes sheet for latency numbers, unit conversions, and acronym definitions only. For design questions, show your reasoning: requirements, estimates, decisions, and tradeoffs with the rejected alternative named. Answer in the order given; do not spend more than 25% of the time on any single section.

Section A: System Design & Methodology

Short-answer and small design-fragment questions covering the methodology spine from Module 1. Aim for tight, structured answers over prose essays.

You are asked on a whiteboard to "design a URL shortener." Write the first 10 minutes of your response: functional requirements, three non-functional targets with concrete numbers, three constraints, and a ranked list of the two hardest parts of the problem.
A product manager says the system will have 200 million daily active users, each making 20 reads per day at an average response of 2 KB. Estimate daily read QPS, peak read QPS (assume 3x peak factor over average), daily egress in TB, and the cache memory needed to hold the top 20% hot items if the working set is 500 million objects at 2 KB each. Show the arithmetic.
Define the difference between a bottleneck and a single point of failure using an example from a typical three-tier web application. State one scaling action that fixes a bottleneck but not a SPOF, and one that fixes a SPOF but not a bottleneck.

Section B: Microservices & Events

You inherit a monolith handling rider requests, driver locations, pricing, and payments in one deployable. Propose a service-decomposition plan: which services you would extract first, in what order, and why. For each, state the data you would move with the service and the contract you would publish to the rest of the system.
Compare synchronous request/response with asynchronous event-based integration for the flow "trip completed -> settle payment -> notify rider." Name one failure mode each approach makes easier to handle, and one each makes harder. Which would you choose for this specific flow, and why?
Describe a saga (choreographed or orchestrated -- state which) for creating a ride: match driver -> reserve pricing quote -> authorize payment -> confirm trip. Include the compensating action for at least two steps, and state what the rider sees if payment authorization fails after the driver has already been reserved.

Section C: Scale, Reliability & Performance

A service currently has a mean response time of 120 ms and a P99 of 2,400 ms. Explain why optimizing the mean may not be the right move, and name three independent causes that commonly show up as a tall P99 tail. For each cause, name the metric that would confirm it.
Write an SLI, an SLO target, and an error-budget policy for the requirement "a driver-location update is visible to the dispatch service within 1 second." State what traffic or behavior your team is allowed to reject when the error budget is exhausted, and which classes of requests are exempt from that policy.
Using rough queuing-theory intuition (utilization effects and Little's Law), explain why a service running at 90% CPU behaves qualitatively differently than the same service running at 50% CPU, and what capacity action you would take before a known 3x traffic event. Include at least one back-of-envelope number in your answer.

Section D: Technical Leadership

Write a Rumelt-style engineering strategy memo (one page, 200-400 words) for the decision: "Should the team adopt a service mesh this year?" Include diagnosis, guiding policy, and coherent action. Name at least one rejected alternative and one measurable signal that would tell you the strategy is working six months in.
You disagree with a peer staff engineer's architecture proposal in an open design review. Describe how you would raise the objection in the meeting, what you would put in writing afterward, and how you would handle it if the decision ultimately went the way you opposed. Keep the answer operational -- what would actually come out of your mouth and onto the page.

Section E: Interleaved (Prior Semesters)

Interleaved questions from earlier semesters. These test whether the prior material is still alive in your working memory, not whether you can re-derive it from the book. Closed-book.

[From S6] You operate a primary-replica PostgreSQL deployment with asynchronous replication. A user writes, then immediately reads, and sometimes does not see their own write. Explain the failure, name the two most common mitigations, and state the cost each mitigation imposes.
[From S7] For the OrderFlow system from Semester 7, write one architectural quality-attribute scenario for "the system continues to accept orders during a carrier outage." Use the six-part scenario form (source, stimulus, artifact, environment, response, response measure).
[From S4/S5] A production service leaks memory slowly under load. Describe how you would isolate the leak using only observability and reasoning (no debugger on the production box), and state what you would change in the service's design to make future leaks easier to detect.

Self-Grading Key

After completion, score each section against your own expected answers first, then against the reference material. For every missed item, create or update a spaced-repetition card the same day and tag it with the module it came from. Do not treat the exam as "passed" unless Sections A-D each score above 70% and Section E above 60%; below those thresholds, remediate the weakest module before moving into the cumulative review.

Mastery Rubric

Level	Evidence
Beginner pass	Can answer direct questions and complete familiar exercises with light notes.
Solid pass	Can solve new variants, explain choices, and connect the work to Semester 7 Architecture and DDD.
Strong pass	Can defend tradeoffs, identify failure modes, and produce clean evidence in the portfolio artifact.
Not ready	Relies on copied solutions, cannot explain mistakes, or lacks durable artifacts.

Retake and Repair Rule

If a section is weak, do not only reread. Repair it by producing new evidence: a corrected solution, a fresh implementation, a rewritten proof, a benchmark, a diagram, a runbook, or a short teaching note.

Answer-Quality Examples

Use these examples when grading written answers or spoken explanations.

Quality	Example pattern
Weak	Names a concept but gives no example, constraint, or failure case.
Acceptable	Defines the concept and applies it to a familiar exercise.
Strong	Applies the concept to a new variant and explains why an alternative would fail.
Portfolio-ready	Connects the concept to Semester 7 Architecture and DDD, current project evidence, and a future capstone decision.

Interleaving Prompt

For any missed answer, add one sentence starting with: This depends on an earlier skill because...

Calibration Materials

Use these learner-visible calibration materials before self-grading or requesting review:

Required Output Classification​

Instructions​

Section A: System Design & Methodology​

Section B: Microservices & Events​

Section C: Scale, Reliability & Performance​

Section D: Technical Leadership​

Section E: Interleaved (Prior Semesters)​

Self-Grading Key​

Mastery Rubric​

Retake and Repair Rule​

Answer-Quality Examples​

Interleaving Prompt​

For any missed answer, add one sentence starting with: This depends on an earlier skill because...​

Calibration Materials​