Complements, Unions, and Overlap Control

What This Concept Is

Many probability problems become dramatically simpler when you stop attacking the target event directly and instead rewrite it using set identities before computing:

complement rule: ( P(A^c) = 1 - P(A) )
addition rule: ( P(A \cup B) = P(A) + P(B) - P(A \cap B) )
inclusion-exclusion: for ( n ) events, [ P!\left(\bigcup_{i=1}^n A_i\right) = \sum_i P(A_i) - \sum_{i<j} P(A_i \cap A_j) + \sum_{i<j<k} P(A_i \cap A_j \cap A_k) - \cdots ]
union bound (Boole's inequality): ( P(A_1 \cup \cdots \cup A_n) \le \sum_i P(A_i) ), always true, often loose

Think of this as probability as bookkeeping: if you double-counted, subtract; if computing the positive event is hard, compute the negative event instead.

Two further useful identities:

Disjointization. ( P(A \cup B) = P(A) + P(B \setminus A) ): whenever you can peel one event away from the other, the addition rule becomes exact without an intersection term.
Partition rule. If ( B_1, \dots, B_k ) partition ( S ), then ( P(A) = \sum_i P(A \cap B_i) ) -- the horizontal step that becomes the Law of Total Probability once conditionals appear (Cluster 2, Concept 5).

Two of these tools -- the complement rule and the union bound -- are doing almost all the real work in the engineer's toolkit. Inclusion-exclusion is the tool you reach for when the events are correlated and you need an exact answer; the union bound is the tool you reach for when ( n ) is large and a clean upper bound is enough.

Two further identities that are worth committing to memory:

De Morgan, probability version. ( P(A \cup B)^c = P(A^c \cap B^c) ). This is the algebraic reason complements are so powerful for "at least one" events -- the complement of "at least one" is "none."
Bonferroni truncation. Keeping the first ( k ) terms of inclusion-exclusion gives a bound -- upper if ( k ) is odd, lower if ( k ) is even. In particular, the 1-term truncation is exactly the union bound.

Why It Matters Here

In modern distributed systems, the single most common probability you compute is "what is the chance that at least one of ( n ) things goes wrong this hour?" Given independence, this is ( 1 - (1-p)^n \approx np ) when ( p ) is small -- the first-order linearization of the exact complement. The union bound gives you this estimate for free even without independence, which is exactly why it is the default tool for availability math.

Real probabilistic questions in systems work almost never look like "find ( P(A) ) for a single simple event." They look like:

at least one of ( n ) independent retries succeeds
none of the fleet of ( m ) servers fails today
one or more of several alerts fires
the user hits at least one slow shard

All four shapes are much cleaner via complement, inclusion-exclusion, or a union bound than by direct case-splitting. A practiced engineer rewrites the question in set algebra before touching a number.

This concept is also the analytical foundation of the union bound arguments you will use everywhere downstream: proving a randomized algorithm succeeds with high probability, bounding the total error budget for a system made of ( k ) independent components, or arguing that a distributed protocol avoids a failure mode across all of its quorum combinations.

A short decision tree for choosing your tool:

"At least one of ( n ) things happens" -> complement.
"A or B (both small, possibly overlapping)" -> inclusion-exclusion with 2 terms.
"A or B or C or …" with many terms and no exact joint model -> union bound.
"Exactly ( k ) things happen" -> direct case split or a binomial-style argument, not inclusion-exclusion.

Concrete Examples

Example 1 -- At least one server failure. Three independent servers each fail in a day with probability ( p = 0.02 ). The probability that at least one fails is cleanest as a complement. Let ( A_i ) be "server ( i ) fails" and assume ( A_1, A_2, A_3 ) independent: [ P(A_1 \cup A_2 \cup A_3) = 1 - P(A_1^c \cap A_2^c \cap A_3^c) = 1 - (1-p)^3. ] Plugging in, ( 1 - 0.98^3 \approx 0.0588 ). Compare to the union bound ( P(A_1 \cup A_2 \cup A_3) \le 3p = 0.06 ), which is loose by about 0.0012. For ( p ) small, the union bound is an excellent approximation, which is why engineers reach for it constantly.

Example 2 -- Hearts or face cards. Draw one card from a standard deck. Let ( A = {\text{heart}} ), ( B = {\text{face card (J, Q, K)}} ). Then ( |A| = 13 ), ( |B| = 12 ), and ( |A \cap B| = 3 ) (J, Q, K of hearts). So [ P(A \cup B) = \frac{13}{52} + \frac{12}{52} - \frac{3}{52} = \frac{22}{52} = \frac{11}{26}. ] Forgetting the ( -P(A \cap B) ) term would give ( 25/52 ) -- an overcount by exactly the three cards that live in both sets. The subtraction is not a correction factor you add at the end; it is the fix for a real double-count.

Example 3 -- Retry success probability. A client retries a request up to ( n = 5 ) times, with independent per-attempt success probability ( q = 0.8 ). The probability of eventual success is ( 1 - (1-q)^n = 1 - 0.2^5 = 1 - 0.00032 = 0.99968 ). This is the complement rule plus independence: the event "all attempts fail" has probability ( (1-q)^n ), and its complement is "at least one succeeds." The ( 0.99968 ) number is why retry loops are a cheap reliability tool -- five independent retries turn an 80% success rate into a 99.97% success rate, at the cost of tail latency (Concept 4 in Cluster 2 handles the latency side of this trade).

Common Confusion / Misconceptions

"I can always add ( P(A) + P(B) )." Only when ( A \cap B = \emptyset ). If the events can co-occur, you have counted every joint outcome twice. The inclusion-exclusion term fixes the overcount, it does not "correct for dependence."

"The union bound is exact." It is an upper bound. It becomes tight only when the events are (approximately) disjoint or their probabilities are tiny. For small ( p ) and small ( n ), ( 1 - (1-p)^n \approx np ), which is the union bound to leading order.

"Complement is just a cute trick." It is essentially always simpler. If the target event contains the word "at least one," try the complement before anything else. The direct case split has ( 2^n - 1 ) cases; the complement has one.

"Inclusion-exclusion with many terms is always the answer." No -- inclusion-exclusion with ( n ) events has ( 2^n - 1 ) terms. For large ( n ) you almost always bound instead of computing exactly: the union bound above, or a Bonferroni truncation (take only the first two levels of inclusion-exclusion for a tighter-than-union bound).

"The complement always simplifies things." Usually, but not always -- if the complement event is itself hard to describe (e.g., "no three-way coincidences among many events"), the direct inclusion-exclusion might be cleaner. The rule of thumb: complement first, direct only if complement is uglier.

"Independence is required for the complement trick." No -- the complement rule ( P(A^c) = 1 - P(A) ) is always valid. What requires independence is factorizing ( P(A_1^c \cap A_2^c \cap \cdots) = \prod P(A_i^c) ). Without independence, you can still use complement -- you just cannot decompose the resulting intersection into a product.

"Union bound fails for dependent events." The union bound ( P(\cup A_i) \le \sum P(A_i) ) requires no assumptions whatsoever -- it holds for any events, correlated or not. What changes under dependence is how tight the bound is: strongly-positively-correlated events make the union bound loose; negatively-correlated events make it tighter.

How To Use It

Translate the English phrase into event algebra before computing:

"at least one X" -> complement of "no X"
"none of X" -> intersection of complements
"A or B" -> check whether they can overlap; if so, use inclusion-exclusion
"exactly k of n" -> split by cases or use a binomial-style identity
"several rare bad things, bound the total risk" -> union bound
"exactly one of A, B" -> ( A \triangle B ), i.e., ( (A \setminus B) \cup (B \setminus A) )
Sanity-check with a trivial case: if ( p = 0 ), does the formula give ( 0 )? If ( p = 1 ), does it give ( 1 )?
Prefer tight bounds when possible. If you can compute exactly (small ( n ), known independence), do so; only fall back to the union bound when the exact computation is expensive or requires a joint model you do not have.
Report both the bound and what it approximates. "At most 6%, approximately 5.9% under the independence model" communicates more than a bare 6%.

Transfer / Where This Shows Up Later

Semester 2 (Randomized algorithms). "Quicksort is ( O(n \log n) ) with high probability" is usually proved by a union bound: the probability that some level has a bad pivot is at most the sum over levels of the per-level bad-pivot probability.
Semester 2 (Hashing). The classic argument "at least one of ( n ) items lands in a crowded bucket" is a union bound. Bloom filter false-positive analysis is a union bound over ( k ) hash functions.
Semester 5 (Availability). The probability that all of ( m ) redundant components survive the hour is ( \prod_i (1 - p_i) ); the probability that at least one fails is the complement. This is the core formula for RAID, multi-AZ, and replication availability math.
Semester 6 (Quorum protocols). The failure of a read quorum is a union of component-failure events; union-bound style reasoning gives a safe upper bound on inconsistency risk.
Semester 8 (SLO compounding). If a user request touches ( k ) services each with availability ( 1 - p_i ), the probability of any service failure on that request is at most ( \sum_i p_i ) by the union bound. This is the standard "availability budget" calculation.
Semester 9 (Canary rollouts). "Probability of at least one statistical false alarm across ( m ) metrics" is a union bound; Bonferroni correction on multiple-hypothesis testing is an explicit application.
Semester 9 (Observability). When ( k ) alerting rules can each fire for a transient spike, the probability of any rule firing in a window is a union -- and the union bound is typically used because exact calculation requires a correlation model that is rarely available.
Semester 6 (Databases). Probability of any replica lagging beyond a threshold for at least one read during a window -- a union over replicas and over reads -- is a union-bound calculation that drives read-your-writes consistency analyses.

Check Yourself

Why is "at least one" almost always best handled by a complement?
When exactly does ( P(A \cup B) = P(A) + P(B) ) hold? Give two different sufficient conditions.
Compare the union bound ( 1 - (1-p)^n \le np ) to the exact value when ( p = 0.1, n = 10 ). How tight is the bound?
A distributed system has 5 components, each independently failing with ( p = 0.01 ) per day. Using both the exact formula and the union bound, compute ( P(\text{at least one fails today}) ). Where is each more useful in daily engineering work?
Three events ( A, B, C ) are pairwise disjoint but not all three intersect at once (impossible, but pretend). Would inclusion-exclusion simplify? To what?
For small ( p ), ( 1 - (1-p)^n \approx np - \binom{n}{2} p^2 ). When is the union bound ( np ) a significant overestimate?
What is the probability that none of five servers fails today if each has independent failure probability ( 0.01 )? (( 0.99^5 \approx 0.951 ).) State this also as "at least one fails": ( \approx 0.049 ). Which form is easier to communicate?

Mini Drill or Application

Rewrite each probability question into a cleaner event identity before solving:

At least one duplicate birthday in a room of 20 people (use complement + independence model).
At least one defective chip in a batch of 5 when each is independently defective with ( p = 0.1 ). (Answer: ( 1 - 0.9^5 = 0.41 ).)
A card is a heart or a face card (use inclusion-exclusion; this was Example 2).
At least one of two alarms triggers when each fires with probability 0.3 and the joint probability is 0.15. (Answer: ( 0.3 + 0.3 - 0.15 = 0.45 ); note the union bound ( 0.6 ) is loose here.)
The probability of at least one false positive across 20 independent hypothesis tests each with size ( \alpha = 0.05 ), using both the exact formula and the union bound. Comment on the gap. (Exact: ( 1 - 0.95^{20} \approx 0.642 ). Union bound: ( 20 \cdot 0.05 = 1.0 ) -- useless here, showing that union bounds degrade badly when ( np ) approaches 1.)
Two correlated alerts: ( P(A) = 0.2 ), ( P(B) = 0.3 ), ( P(A \cap B) = 0.1 ). Probability that neither fires? (( 1 - (0.2 + 0.3 - 0.1) = 0.6 ).)

Simulation drill. Simulate 200,000 trials of Example 1 (three servers, independent failure probability 0.02). Record the fraction with at least one failure. Compare to ( 1 - 0.98^3 ) and to the union bound ( 3 \cdot 0.02 ). Then sweep ( p ) from ( 0.001 ) to ( 0.3 ) and plot the gap between the exact value and the union bound -- this visualizes why union bounds are "free" at small ( p ) and wasteful at large ( p ).

Engineering scenario -- multi-region error budget. A user request is served by one of three regional clusters, each with an independent 0.1% hourly failure probability. The probability that at least one region is down during an arbitrary hour is ( 1 - 0.999^3 \approx 0.003 ), while the probability that the specific region the request is routed to is down is still 0.001. Confusion between these two quantities is a classic source of over-engineered multi-region designs: engineers size failover capacity for the union probability when the actual risk to a single request is the per-region probability. Event algebra separates these clearly.

Read This Only If Stuck

Introduction to Probability: Non-naive definition of probability (Part 1)
Introduction to Probability: Non-naive definition of probability (Part 2)
Introduction to Probability: Pitfalls and paradoxes (Part 1)
MCS: Set Theory and Probability
MCS: The Birthday Principle
Wikipedia: Inclusion-exclusion principle -- authoritative cross-reference for the general formula and for the Bonferroni truncation used in multiple-testing corrections.
Wikipedia: Boole's inequality -- the formal statement of the union bound and its role in probabilistic analysis.
Google SRE Book -- Embracing Risk -- engineering treatment of availability budgets and how component failure probabilities compound under union-bound logic.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​