Skip to main content

Module 3: Probability & Statistics: Case Studies

These cases focus on uncertainty, measurement, and avoiding conclusions that the data does not support.


Case Study 1: A/B Test With Too Little Data

Scenario: A signup page variant gets 12 conversions out of 100 visitors while the old page gets 10 out of 100. The team wants to ship immediately.

Source anchor: CMU notes on probability for randomized algorithms.

Module concepts:

  • random variation
  • sample size
  • confidence
  • decision risk

Wrong Approach

Declare the variant better because the observed conversion rate is higher.

Better Approach

Estimate uncertainty, define the minimum effect worth detecting, and avoid overreading small samples. Decide whether to collect more data or make a low-risk product decision.

Tradeoff Table

ChoiceGainCost
Ship immediatelyFast iterationHigh false-positive risk
Collect more dataBetter evidenceSlower decision
Use prior/product judgmentPracticalLess purely statistical

Failure Mode

The team ships noise and later sees conversion return to baseline.

Required Artifact

Write an experiment decision memo with observed rates, uncertainty concern, practical significance, and next action.

Project / Capstone Connection

Use this memo structure later whenever a project claim depends on measured improvement rather than intuition.


Case Study 2: Expected Value in Retry Costs

Scenario: A client retries failed requests up to three times. Engineers count only successful user experience and ignore extra backend load.

Source anchor: CMU notes on probability for randomized algorithms.

Module concepts:

  • expected value
  • independent trials
  • tail behavior
  • cost modeling

Wrong Approach

Assume retries are free because each individual retry is fast.

Better Approach

Model expected attempts per request using failure probability, then estimate load under normal and degraded conditions. Add caps, jitter, and observability around retry storms.

Tradeoff Table

ChoiceGainCost
Aggressive retriesBetter chance of successAmplifies outages
Capped retriesLimits loadSome requests fail sooner
Backoff with jitterReduces synchronizationMore logic

Failure Mode

During an outage, retries multiply traffic and make recovery slower.

Required Artifact

Calculate expected attempts for failure probabilities 1%, 10%, and 50%, and write a retry policy note.

Project / Capstone Connection

Bring this expected-value note into later reliability work so retry logic is justified against system load, not only success odds.


Case Study 3: Misleading Average Latency

Scenario: A service reports average latency of 120 ms, but some users experience multi-second delays. The dashboard hides tail latency.

Source anchor: CMU notes on probability for randomized algorithms.

Module concepts:

  • mean vs distribution
  • percentiles
  • sampling
  • outliers

Wrong Approach

Track only the average and call the service healthy.

Better Approach

Inspect the distribution: median, p90, p95, p99, and outlier causes. Match the metric to user experience and alert on tail behavior where appropriate.

Tradeoff Table

ChoiceGainCost
Mean onlySimple numberHides tails
PercentilesBetter user signalRequires more data care
Full histogramRich diagnosisMore storage and analysis

Failure Mode

The service looks healthy while high-value users hit slow paths.

Required Artifact

Produce a latency summary with mean, median, p95, p99, and one hypothesis for tail behavior.

Project / Capstone Connection

Reuse this latency summary format in observability dashboards and incident reviews during later systems and production semesters.


Case Study 4: Base Rate Neglect In Fraud Alerts

Scenario: A fraud detector flags 95% of fraudulent transactions, but only 0.2% of all transactions are actually fraud. The team assumes a flagged transaction is almost certainly fraudulent.

Source anchor: Khan Academy on expected value and probability is a practical anchor for reasoning from probabilities instead of intuition alone.

Module concepts:

  • base rate
  • conditional probability
  • false positives
  • decision thresholds

Wrong Approach

Judge the alert system only by its detection rate.

Better Approach

Combine detector accuracy with the base rate of fraud. Ask how many flagged transactions are true fraud, how many are false positives, and what operational cost review teams can absorb.

Tradeoff Table

ChoiceGainCost
aggressive flaggingcatches more fraudmore manual review noise
conservative thresholdfewer false positivesmisses some fraud
base-rate analysisrealistic operating picturerequires better probability literacy

Failure Mode

The team overwhelms reviewers and frustrates good customers because it ignored how rare fraud is in the full population.

Required Artifact

Write a confusion-matrix note for 100,000 transactions with stated fraud prevalence, detector recall, and false-positive rate.

Project / Capstone Connection

Use this note later when evaluating alerts, classifiers, monitoring thresholds, or any project metric that depends on rare events.


Case Study 5: Median Salary Looks Fine, But The Distribution Is Skewed

Scenario: A bootcamp reports a median graduate salary that looks healthy, but the underlying outcomes vary widely across regions, experience levels, and a small number of unusually high offers.

Source anchor: NIST Engineering Statistics Handbook: Percentiles is a good anchor for thinking beyond a single summary number when describing a distribution.

Module concepts:

  • median
  • percentiles
  • skewed distributions
  • summary choice

Wrong Approach

Assume one central statistic is enough to describe the whole outcome picture.

Better Approach

Report a small distribution summary: median plus p25 and p75, or another percentile range appropriate to the decision. Explain what kind of variation the summary hides and who might be affected.

Tradeoff Table

ChoiceGainCost
single summary numbereasy communicationhides spread and skew
percentile rangebetter outcome picturemore explanation needed
full histogramrichest detailheavier to present

Failure Mode

Stakeholders make decisions from a clean-looking headline statistic while ignoring the real spread of outcomes.

Required Artifact

Write a one-page metric summary that includes median, percentile range, and a short note on what the distribution shape implies.

Project / Capstone Connection

Reuse this summary style later for latency, compensation, experiment, or survey metrics whenever a single mean or median would hide too much.


Source Map

SourceUse it for
CMU probability notesExpected value, randomized reasoning, and uncertainty vocabulary for experiments, retries, and decision risk.
Khan Academy expected valueAccessible reinforcement for probability-based decisions when intuition alone is likely to mislead.
NIST Engineering Statistics Handbook: PercentilesSupporting percentile-based summaries and discussions of spread when averages or medians alone hide user experience.