Skip to main content

Module 3: Probability & Statistics

Primary text: Introduction to Probability
Selective support: Mathematics for Computer Science probability chapters and the local semester book chunks only where they sharpen modeling, expectation, or confidence language

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become operationally strong at probability modeling, random variables, expectation, dependence, and the statistical meaning of averaging and uncertainty.


Scope of This Module

This module is not a formula sheet for "plug and chug" probability. It is where uncertainty becomes something you model precisely.

What it covers in depth:

  • building probability spaces from combinatorial structure instead of vague intuition
  • event algebra, complements, overlap, and why equally likely outcomes must be justified
  • conditional probability as a restricted world, not a decorative fraction
  • Bayes' rule, total probability, base rates, and why many diagnostic intuitions fail
  • independence as information neutrality, not the same thing as disjointness
  • random variables as functions on outcomes, together with PMFs, CDFs, and transformations
  • standard discrete models: Bernoulli, binomial, hypergeometric, geometric, and Poisson
  • expectation, linearity, indicator variables, variance, covariance, and dependence structure
  • continuous random variables, densities, Normal and Exponential models, and location-scale thinking
  • the law of large numbers, the central limit theorem, empirical distributions, simulation, and careful confidence language

What it deliberately does not try to finish here:

  • full mathematical statistics as a separate course
  • measure-theoretic probability
  • advanced stochastic processes beyond brief motivation
  • hypothesis-testing catalogs and cookbook procedures detached from probabilistic meaning

This is an in-depth foundation module. If you can compute answers but cannot explain the model that produced them, you are not done.


Before You Start

Answer these closed-book before starting the main path:

  1. Why is C(n, k) relevant to probability only after you have defined what outcomes are equally likely?
  2. What is the difference between "events cannot happen together" and "events tell you nothing about each other"?
  3. If you know P(A | B), why does that usually not determine P(B | A)?
  4. Why can the expected value of a random variable fail to be an outcome you ever actually observe?
  5. If you average many noisy measurements, what do you expect to stabilize and what still remains random?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path.

2-3 solid answers

  • Continue, but expect extra time in the conditioning and expectation clusters.

0-1 solid answers

  • Revisit Module 2 counting setups first. Probability inherits its structure from counting and proof discipline.

What This Module Is For

Probability is the mathematical language for controlled uncertainty. Later CS work repeatedly asks questions like:

  • what is the chance this randomized algorithm succeeds quickly?
  • how many collisions, retries, failures, or rare events should we expect?
  • when does new evidence actually change our belief?
  • how much trust should we put in a sample, simulation, benchmark, or test result?
  • what does "unlikely" mean quantitatively rather than rhetorically?

This module builds the probability and statistical reasoning needed for:

  • randomized algorithms and expected running time
  • data-structure analysis such as hashing and Bloom filters
  • reliability, failure, and queueing intuition in systems
  • machine learning foundations such as distributions, averaging, and variance
  • simulation, experimentation, and confidence claims in engineering work

You are learning to reason about uncertainty without handwaving.


Concept Map


How To Use This Module

Work in order. The later clusters only make sense if the earlier modeling habits are stable.

Cluster 1: Probability Models

OrderConceptTypeFocus
1Probability Starts with a Well-Defined ModelPRIMARYOutcomes, events, and the difference between the world and your model of it
2Counting Models, Equally Likely Outcomes, and Event AlgebraSUPPORTINGWhen combinatorics legitimately turns into probability and how events combine
3Complements, Unions, and Overlap ControlSUPPORTINGComplement counting, inclusion-exclusion, and avoiding double counting

Cluster mastery check: Can you define the sample space and justify why the probability model is valid before writing any formula?

Cluster 2: Conditioning and Information

OrderConceptTypeFocus
4Conditional Probability Restricts the WorldPRIMARYReweighting when new information rules out outcomes
5Bayes, Total Probability, and Base-Rate ReasoningPRIMARYForward and reverse conditioning, priors, posteriors, and diagnostic thinking
6Independence Is Information Neutrality, Not DisjointnessSUPPORTINGWhat independence really means and where intuition fails

Cluster mastery check: Can you explain in words what information is being conditioned on and whether it should change the model?

Cluster 3: Random Variables and Distributions

OrderConceptTypeFocus
7Random Variables Turn Outcomes into QuantitiesPRIMARYRandom variables as functions, not mysterious containers
8PMFs, CDFs, and Functions of Random VariablesSUPPORTINGDistribution descriptions, cumulative viewpoints, and transformation thinking
9Core Discrete Distribution FamiliesPRIMARYBernoulli, binomial, hypergeometric, geometric, and Poisson models

Cluster mastery check: Can you choose a random variable that captures the real question instead of only the raw event?

Cluster 4: Expectation, Variance, and Dependence

OrderConceptTypeFocus
10Expectation Is the Center of a Random ProcessPRIMARYWeighted averages, long-run meaning, and why expectation matters
11Linearity and Indicator Variables Are the Main WorkhorsesPRIMARYSolving expectation questions without full distributions
12Variance, Joint Structure, and CovariancePRIMARYSpread, dependence, and how variables move together

Cluster mastery check: Can you compute expectation and variance with a reason, and can you say what those numbers do and do not tell you?

Cluster 5: Continuous Models and Statistical Thinking

OrderConceptTypeFocus
13Continuous Random Variables Use Densities, Not Point MassesPRIMARYPDFs, intervals, and why point probabilities collapse to zero
14Normal, Exponential, and the Return of StandardizationSUPPORTINGTwo core continuous models and how scale changes interpretation
15Averages, Simulation, and Confidence LanguagePRIMARYLLN, CLT, empirical distributions, Monte Carlo, and careful statistical claims

Cluster mastery check: Can you explain why averaging stabilizes, why Normal approximations recur, and why "95% confidence" is not the same thing as "95% probability the fact is true"?

Then work these practice pages:

OrderPractice pathFocus
1Probability Modeling and Conditioning LabSample spaces, event setup, conditioning, Bayes, and independence
2Random Variables and Distribution WorkshopPMFs, CDFs, model selection, and distribution comparison
3Expectation, Variance, and Statistical Reasoning ClinicIndicators, variance, covariance, LLN, CLT, and confidence language
4Code KatasSimulation and probabilistic programming drills

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Build a correct probability space and justify whether an equally likely model is legitimate.
  2. Compute event probabilities using complements, unions, overlap accounting, and counting arguments.
  3. Interpret conditional probability as model restriction and use it to solve multistage problems.
  4. Apply Bayes' rule and the law of total probability without losing track of base rates.
  5. Distinguish clearly between mutual exclusivity, conditional dependence, and independence.
  6. Define and use random variables, PMFs, CDFs, and simple transformations.
  7. Recognize when Bernoulli, binomial, hypergeometric, geometric, or Poisson models fit a problem.
  8. Compute and interpret expectation, linearity, indicator sums, variance, and covariance.
  9. Work with continuous random variables using PDFs and interval probabilities.
  10. Explain the law of large numbers, the central limit theorem, and the difference between probability statements and confidence statements.

Outputs

  • a probability notebook with at least 25 solved problems and written modeling justification
  • one Bayes and conditional reasoning sheet with at least 8 fully explained updates
  • one random-variable catalog where you define your own variables for at least 10 scenarios
  • one expectation and indicator sheet solving at least 6 problems without brute-force enumeration
  • one variance and covariance sheet that includes interpretation, not only arithmetic
  • one simulation mini-lab verifying at least two theoretical results such as the birthday paradox, LLN, or CLT behavior
  • one mistake log naming at least 12 reasoning errors such as wrong sample space, assumed equal likelihood, mixed up P(A|B) and P(B|A), or treated confidence as posterior probability
  • one short memo explaining how Module 3 tools carry into randomized algorithms, systems reliability, or experimentation

Completion Standard

You have completed Module 3 when all of these are true:

  • you can define the model before computing the probability
  • you can tell when conditioning changes the world and when it should not
  • you can use Bayes' rule without collapsing into test-accuracy folklore
  • you can choose a random variable that matches the question being asked
  • you can solve expectation problems using linearity and indicators instead of brute force
  • you can interpret variance and covariance in words, not only symbols
  • you can explain why averages stabilize and why confidence statements require careful wording

If the answer looks familiar but you cannot explain what each term in the setup means, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance or exercise volume, not required progression.
  • Because this is an in-depth module, written modeling explanations are required, not optional enrichment.

Suggested Weekly Flow

DayWork
1Concepts 1-3 and one full sample-space modeling sheet
2Concepts 4-6 and at least four conditional/Bayes problems in sentence form
3Concepts 7-9 and one random-variable catalog page
4Concepts 10-12 and two expectation problems solved using indicators or linearity
5Concepts 13-15 and one simulation or empirical-data notebook
6Practice pages 1-2 and targeted local-book reinforcement
7Practice pages 3-4, quiz, and mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread