Linearity and Indicator Variables Are the Main Workhorses
What This Concept Is
Linearity of expectation says: for any random variables ( X ) and ( Y ) (defined on the same probability space) and constants ( a, b ), [ E[aX + bY] = a, E[X] + b, E[Y]. ] More generally, for any finite sum ( X = \sum_{i=1}^n X_i ), [ E[X] = \sum_{i=1}^n E[X_i]. ] The striking fact -- and the reason this concept gets its own page -- is that this holds without any assumption about independence. The ( X_i ) can be arbitrarily correlated, arbitrarily complex in their joint behavior; the expectation of the sum is still the sum of the expectations.
An indicator variable ( I_A ) for an event ( A ) is the random variable that is 1 if ( A ) occurs and 0 otherwise. Its expectation is trivial: [ E[I_A] = 1 \cdot P(A) + 0 \cdot P(A^c) = P(A). ] This is called the fundamental bridge (Blitzstein's term): every probability is the expectation of an indicator, and every expectation of an indicator is a probability.
Together, linearity and indicators give you a mechanical procedure for computing expected counts: write the count as a sum of indicators, take expectations, translate each expectation into a probability, sum. This transforms many hard-looking distribution problems into easy probability problems.
Why It Matters Here
This is one of the most useful techniques in the entire module, and possibly the single most reused probability tool in CS. It appears in:
- expected number of collisions in hashing
- expected number of matches (fixed points) in a random permutation
- expected bucket occupancy
- expected number of inversions in an array
- expected number of rule violations in a random construction
- expected number of edges in a random graph
- expected comparisons in randomized quicksort
- expected cycles of a given length in a random permutation
The power comes from the fact that the individual indicator expectations are almost always easy to compute, even when the joint distribution of the sum is forbiddingly complex. Quicksort's expected comparisons would be nearly intractable without linearity; the indicator-sum argument makes it a three-line proof.
Concrete Examples
Example 1 -- Expected nonempty buckets in hashing. Hash ( n ) keys independently and uniformly into ( m ) buckets. Let ( X ) = number of nonempty buckets. Define indicators ( I_j = \mathbb{1}[\text{bucket } j \text{ is nonempty}] ). Then ( X = I_1 + I_2 + \cdots + I_m ). By linearity, [ E[X] = \sum_{j=1}^m E[I_j] = m \cdot P(\text{bucket } j \text{ nonempty}) = m \cdot (1 - (1 - 1/m)^n). ] For ( n = m ), this is ( m \cdot (1 - (1 - 1/m)^m) \approx m(1 - 1/e) \approx 0.632, m ) for large ( m ). The expected empty bucket count is therefore ( \approx 0.368, m ), which is the classic "coupon collector"-adjacent result. Notice what we did: we did not compute the distribution of ( X ) (which is a complicated occupancy distribution). We computed just its expectation, and indicators plus linearity made it a one-line argument.
Example 2 -- Expected fixed points of a random permutation. Let ( \sigma ) be a uniformly random permutation of ( {1, 2, \dots, n} ). Define ( I_i = \mathbb{1}[\sigma(i) = i] ), the indicator that position ( i ) is a fixed point. Then ( X = \sum I_i ) is the number of fixed points, and [ E[X] = \sum_{i=1}^n E[I_i] = \sum_{i=1}^n \frac{1}{n} = 1. ] The indicators ( I_i ) are not independent -- if position 1 is fixed, position 2 is very slightly more likely to be fixed (fewer remaining elements), and the full joint structure of permutation fixed points is the complicated derangement formula. But linearity does not care. The expected number of fixed points is always exactly 1, regardless of ( n ). This counterintuitive result is used in Semester 2 in the analysis of the "Las Vegas" algorithm for finding fixed points and in combinatorial optimization as the starting point for Markov chain mixing arguments.
Common Confusion / Misconceptions
"Linearity of expectation requires independence." Absolutely false, and the most common misconception about this tool. Linearity holds for any random variables, correlated or not. The similar-sounding statement that requires independence is about variance: ( \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) ) only when ( X, Y ) are uncorrelated.
"An indicator must correspond to a 'simple' event." Any event. If you can compute ( P(A) ), you can compute ( E[I_A] ).
"Indicator variables are just the binomial distribution." A sum of independent identically distributed Bernoulli indicators is binomial. A sum of correlated indicators is something else -- but its expectation is still the sum of expectations. The linearity trick is actually more useful when the indicators are not independent, because then the distribution of the sum is not a named family.
"You need to compute the variance of the count to use linearity." No. Linearity alone gives you the expectation. Variance is a separate (and harder) calculation that requires covariances between pairs of indicators.
"( E[XY] = E[X] E[Y] )." Only when ( X ) and ( Y ) are uncorrelated (in particular, when independent). This is not linearity; it is a separate property that fails in general.
How To Use It
When you need an expected count, follow this recipe every time:
- Identify the count or sum random variable. "Number of collisions", "number of matches", "total cost", etc.
- Define one indicator per object, slot, event, or pair that contributes to the count. Be explicit about indexing.
- Write the count as the sum of indicators: ( X = \sum_i I_i ).
- Apply expectation and linearity: ( E[X] = \sum_i E[I_i] ).
- Replace each indicator expectation with a probability: ( E[I_i] = P(A_i) ).
- Compute each probability individually. This is usually easy because the single event ( A_i ) involves only one or two items, regardless of how large ( n ) is.
- Sum and simplify. Often a uniform structure lets you pull out a factor of ( n ).
This usually turns a hard distribution question into several easy probability questions. When it works, it feels like cheating.
Transfer / Where This Shows Up Later
- Semester 2 (Randomized algorithms). Expected comparisons in randomized quicksort: define ( X_{ij} = \mathbb{1}[\text{items } i, j \text{ are ever compared}] ); by a clever argument, ( P(X_{ij} = 1) = 2/(|i - j| + 1) ); then ( E[\text{comparisons}] = \sum_{i < j} P(X_{ij} = 1) = O(n \log n) ). The derivation is pure linearity.
- Semester 2 (Graph algorithms). Expected number of edges in a random graph ( G(n, p) ) is ( \binom{n}{2} p ), by summing indicators over pairs.
- Semester 5 (Queueing / load balancing). Expected load on a single server in a power-of-two-choices scheme is bounded using indicator arguments over balls and bins.
- Semester 6 (Distributed). Expected number of conflicting writes in a multi-master replication model is an indicator-sum: one indicator per pair of writes that might conflict.
- Semester 8 (SRE). Expected number of SLO violations across a fleet is linear in the per-node violation probability by the indicator argument, even when nodes share failure domains.
- Semester 9 (Canary / A/B). Expected number of users who see a bug in the canary is linear in the per-user exposure probability; this is the starting point of sample-size calculations for detection.
Check Yourself
- Why does linearity of expectation not need independence? State a one-line proof using the definition of expectation on a finite sample space.
- What is the expectation of an indicator variable? (Fundamental bridge -- be able to state and prove it.)
- Why are indicators so useful for counting? In what kind of problem is the distribution of the count hard but its expectation easy?
- For a uniformly random permutation of 10 elements, what is the expected number of fixed points? Does the answer depend on 10?
- For ( n ) balls thrown into ( m ) bins uniformly, compute the expected number of empty bins. Compare to Example 1.
- Prove that ( \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2, \mathrm{Cov}(X, Y) ). Explain why this is not a violation of linearity.
Mini Drill or Application
Set up, and then fully compute, an indicator-based expectation for each:
- Number of heads in 20 coin flips. (Check: ( E[X] = 10 ).)
- Number of fixed points in a random permutation of 50 elements. (Check: ( E[X] = 1 ).)
- Number of occupied hash buckets when 100 keys are hashed into 50 uniform buckets. (Compute ( 50 \cdot (1 - (49/50)^{100}) ) and compare to ( 50 \cdot (1 - e^{-2}) \approx 43.2 ).)
- Number of pairs of people with the same birthday in a room of 30 (uniform over 365 days). (Hint: define an indicator per pair. Answer: ( \binom{30}{2}/365 \approx 1.19 ).)
- Expected number of comparisons in randomized quicksort on 100 elements using the formula ( \sum_{i < j} 2/(|i-j| + 1) ). (Answer: approximately ( 200 \ln(100) \approx 921 ).)
Simulation drill -- birthday pairs. Simulate 100,000 rooms of 30 people with uniform independent birthdays. Count the average number of pairs with the same birthday. Compare to the indicator-expectation answer 1.19. Then compute the empirical fraction of rooms with at least one shared birthday and compare to the complement formula from Cluster 1 Concept 2 (≈ 0.71). Two different statistics, one simulation -- both match theory when the model is right.
Read This Only If Stuck
- Introduction to Probability: Linearity of expectation (Part 1)
- Introduction to Probability: Linearity of expectation (Part 2)
- Introduction to Probability: Indicator r.v.s and the fundamental bridge (Part 1)
- Introduction to Probability: Indicator r.v.s and the fundamental bridge (Part 2)
- Introduction to Probability: Using probability and expectation to prove existence (Part 1)
- MCS: Linearity of Expectation (Part 1)
- MCS: Linearity of Expectation (Part 2)
- Wikipedia: Indicator function -- cross-reference for the formal definition (Iverson bracket) and the use of indicators in probability and combinatorics.