Module 3: Probability & Statistics: Case Studies
These cases focus on uncertainty, measurement, and avoiding conclusions that the data does not support.
Case Study 1: A/B Test With Too Little Data
Scenario: A signup page variant gets 12 conversions out of 100 visitors while the old page gets 10 out of 100. The team wants to ship immediately.
Source anchor: CMU notes on probability for randomized algorithms.
Module concepts:
- random variation
- sample size
- confidence
- decision risk
Wrong Approach
Declare the variant better because the observed conversion rate is higher.
Better Approach
Estimate uncertainty, define the minimum effect worth detecting, and avoid overreading small samples. Decide whether to collect more data or make a low-risk product decision.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Ship immediately | Fast iteration | High false-positive risk |
| Collect more data | Better evidence | Slower decision |
| Use prior/product judgment | Practical | Less purely statistical |
Failure Mode
The team ships noise and later sees conversion return to baseline.
Required Artifact
Write an experiment decision memo with observed rates, uncertainty concern, practical significance, and next action.
Project / Capstone Connection
Use this memo structure later whenever a project claim depends on measured improvement rather than intuition.
Case Study 2: Expected Value in Retry Costs
Scenario: A client retries failed requests up to three times. Engineers count only successful user experience and ignore extra backend load.
Source anchor: CMU notes on probability for randomized algorithms.
Module concepts:
- expected value
- independent trials
- tail behavior
- cost modeling
Wrong Approach
Assume retries are free because each individual retry is fast.
Better Approach
Model expected attempts per request using failure probability, then estimate load under normal and degraded conditions. Add caps, jitter, and observability around retry storms.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Aggressive retries | Better chance of success | Amplifies outages |
| Capped retries | Limits load | Some requests fail sooner |
| Backoff with jitter | Reduces synchronization | More logic |
Failure Mode
During an outage, retries multiply traffic and make recovery slower.
Required Artifact
Calculate expected attempts for failure probabilities 1%, 10%, and 50%, and write a retry policy note.
Project / Capstone Connection
Bring this expected-value note into later reliability work so retry logic is justified against system load, not only success odds.
Case Study 3: Misleading Average Latency
Scenario: A service reports average latency of 120 ms, but some users experience multi-second delays. The dashboard hides tail latency.
Source anchor: CMU notes on probability for randomized algorithms.
Module concepts:
- mean vs distribution
- percentiles
- sampling
- outliers
Wrong Approach
Track only the average and call the service healthy.
Better Approach
Inspect the distribution: median, p90, p95, p99, and outlier causes. Match the metric to user experience and alert on tail behavior where appropriate.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Mean only | Simple number | Hides tails |
| Percentiles | Better user signal | Requires more data care |
| Full histogram | Rich diagnosis | More storage and analysis |
Failure Mode
The service looks healthy while high-value users hit slow paths.
Required Artifact
Produce a latency summary with mean, median, p95, p99, and one hypothesis for tail behavior.
Project / Capstone Connection
Reuse this latency summary format in observability dashboards and incident reviews during later systems and production semesters.
Case Study 4: Base Rate Neglect In Fraud Alerts
Scenario: A fraud detector flags 95% of fraudulent transactions, but only 0.2% of all transactions are actually fraud. The team assumes a flagged transaction is almost certainly fraudulent.
Source anchor: Khan Academy on expected value and probability is a practical anchor for reasoning from probabilities instead of intuition alone.
Module concepts:
- base rate
- conditional probability
- false positives
- decision thresholds
Wrong Approach
Judge the alert system only by its detection rate.
Better Approach
Combine detector accuracy with the base rate of fraud. Ask how many flagged transactions are true fraud, how many are false positives, and what operational cost review teams can absorb.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| aggressive flagging | catches more fraud | more manual review noise |
| conservative threshold | fewer false positives | misses some fraud |
| base-rate analysis | realistic operating picture | requires better probability literacy |
Failure Mode
The team overwhelms reviewers and frustrates good customers because it ignored how rare fraud is in the full population.
Required Artifact
Write a confusion-matrix note for 100,000 transactions with stated fraud prevalence, detector recall, and false-positive rate.
Project / Capstone Connection
Use this note later when evaluating alerts, classifiers, monitoring thresholds, or any project metric that depends on rare events.
Case Study 5: Median Salary Looks Fine, But The Distribution Is Skewed
Scenario: A bootcamp reports a median graduate salary that looks healthy, but the underlying outcomes vary widely across regions, experience levels, and a small number of unusually high offers.
Source anchor: NIST Engineering Statistics Handbook: Percentiles is a good anchor for thinking beyond a single summary number when describing a distribution.
Module concepts:
- median
- percentiles
- skewed distributions
- summary choice
Wrong Approach
Assume one central statistic is enough to describe the whole outcome picture.
Better Approach
Report a small distribution summary: median plus p25 and p75, or another percentile range appropriate to the decision. Explain what kind of variation the summary hides and who might be affected.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| single summary number | easy communication | hides spread and skew |
| percentile range | better outcome picture | more explanation needed |
| full histogram | richest detail | heavier to present |
Failure Mode
Stakeholders make decisions from a clean-looking headline statistic while ignoring the real spread of outcomes.
Required Artifact
Write a one-page metric summary that includes median, percentile range, and a short note on what the distribution shape implies.
Project / Capstone Connection
Reuse this summary style later for latency, compensation, experiment, or survey metrics whenever a single mean or median would hide too much.
Source Map
| Source | Use it for |
|---|---|
| CMU probability notes | Expected value, randomized reasoning, and uncertainty vocabulary for experiments, retries, and decision risk. |
| Khan Academy expected value | Accessible reinforcement for probability-based decisions when intuition alone is likely to mislead. |
| NIST Engineering Statistics Handbook: Percentiles | Supporting percentile-based summaries and discussions of spread when averages or medians alone hide user experience. |