Inclusion-Exclusion as Overlap Accounting

What This Concept Is

Inclusion-exclusion counts a union of overlapping sets by correcting the overcount created by naive addition. For two sets:

$$|A \cup B| = |A| + |B| - |A \cap B|.$$

For three sets:

$$|A \cup B \cup C| = |A| + |B| + |C| - |A \cap B| - |A \cap C| - |B \cap C| + |A \cap B \cap C|.$$

In general, for sets $A_1, \ldots, A_n$:

$$\left|\bigcup_{i=1}^n A_i\right| = \sum_{k=1}^n (-1)^{k+1} \sum_{I \subseteq {1,\ldots,n},,|I|=k} \left|\bigcap_{i \in I} A_i\right|.$$

The sign pattern alternates because each layer corrects the mistakes of the previous one. An element in exactly $m$ of the sets gets counted $\binom{m}{1} - \binom{m}{2} + \binom{m}{3} - \cdots = 1$ times, by the binomial identity $\sum_{k} (-1)^{k+1} \binom{m}{k} = 1$ for $m \ge 1$. That identity is the reason inclusion-exclusion works.

Equivalently, inclusion-exclusion can be stated as a complement form: to count objects with none of a set of forbidden properties $P_1, \ldots, P_n$, compute

$$\left|\overline{A_1} \cap \cdots \cap \overline{A_n}\right| = \sum_{I \subseteq {1,\ldots,n}} (-1)^{|I|} \left|\bigcap_{i \in I} A_i\right|,$$

where $A_i$ is the set of objects with property $P_i$. This "avoid all forbidden properties" form is the workhorse version for derangements and forbidden-pattern counting.

Two ways to see why the signs alternate: (a) binomial identity $\sum_{k \ge 1} (-1)^{k+1}\binom{m}{k} = 1$ for $m \ge 1$ -- each element's contribution telescopes to exactly $1$; (b) expanding the product $\prod_i (1 - \mathbf{1}{A_i})$ -- each $(-1)^{|I|}$ factor comes from choosing the $\mathbf{1}{A_i}$ term in each factor $i \in I$. Pick whichever proof you find memorable; both recur across probability, combinatorics, and number theory.

Why It Matters Here

Many constrained counting problems are "count all objects, then remove the forbidden ones." That fails when objects violate several constraints at once. Inclusion-exclusion is the disciplined repair: count each forbidden-property set, their pairwise intersections, their triple intersections, and so on with alternating signs.

It is also a conceptual bridge to probability: the same logic underlies the union bound in reverse ($P(A \cup B) = P(A) + P(B) - P(A \cap B)$), which you revisit in Module 3, and reliability analysis in Module 6 distributed systems.

The formula is not deep. What is deep is noticing when to invoke it -- typically when the complement is easy but the direct count is messy.

Indicator-function view. Many modern proofs write inclusion-exclusion as an identity between indicator functions: $\mathbf{1}{A_1 \cup \cdots \cup A_n} = 1 - \prod_i (1 - \mathbf{1}{A_i})$. Expanding the product and summing recovers the classical formula; this viewpoint generalizes cleanly to weighted counts (Möbius inversion on a poset) and to probability theory (inclusion-exclusion for events).

Concrete Examples

Example 1: divisibility. How many integers from $1$ to $100$ are divisible by $2$ or $3$?

Let $A$ = multiples of $2$, $B$ = multiples of $3$. Then $|A| = \lfloor 100/2 \rfloor = 50$, $|B| = \lfloor 100/3 \rfloor = 33$, and $|A \cap B|$ = multiples of $\text{lcm}(2,3) = 6$, which is $\lfloor 100/6 \rfloor = 16$. Therefore:

$$|A \cup B| = 50 + 33 - 16 = 67.$$

The $16$ multiples of $6$ were double-counted by the naive sum; the subtraction repairs it.

Example 2: derangements. How many permutations of ${1, 2, \ldots, n}$ fix no element? (These are called derangements and arise in the hat-check problem: $n$ guests check $n$ hats, and all hats are returned with every guest getting the wrong hat.)

Let $A_i$ = permutations that fix element $i$. We want $|\overline{A_1} \cap \cdots \cap \overline{A_n}|$. By inclusion-exclusion:

$$D_n = \sum_{k=0}^n (-1)^k \binom{n}{k}(n-k)! = n! \sum_{k=0}^n \frac{(-1)^k}{k!}.$$

Reasoning: $|A_{i_1} \cap \cdots \cap A_{i_k}|$ = permutations fixing a specified $k$-subset of positions = $(n-k)!$, and there are $\binom{n}{k}$ such subsets.

For $n = 4$: $D_4 = 4!(1 - 1 + 1/2 - 1/6 + 1/24) = 24 \cdot (9/24) = 9$. Check by brute enumeration: there are indeed $9$ derangements of ${1,2,3,4}$. As $n \to \infty$, $D_n / n! \to 1/e \approx 0.368$; roughly $37%$ of random permutations are derangements regardless of $n$.

Example 3: surjection counting. How many functions $f: [n] \to [k]$ are surjective (every output is hit)?

Let $A_i$ = functions that miss output $i$. We want $|\overline{A_1} \cap \cdots \cap \overline{A_k}|$. For a fixed $k$-subset $I$ with $|I| = j$, the number of functions avoiding all outputs in $I$ is $(k-j)^n$ (each of $n$ inputs chooses from $k-j$ allowed outputs). By inclusion-exclusion,

$$\text{Surj}(n, k) = \sum_{j=0}^{k} (-1)^j \binom{k}{j} (k-j)^n.$$

These are the Stirling numbers of the second kind scaled by $k!$: $\text{Surj}(n, k) = k! \cdot S(n, k)$, where $S(n,k)$ counts partitions of an $n$-set into $k$ non-empty blocks. This identity underlies analyses of hash uniformity, random coloring, and coupon-collector-like problems in Semester 3.

Common Confusion / Misconceptions

Subtracting overlaps that were never counted twice. If $A$ and $B$ are disjoint, $|A \cap B| = 0$ and inclusion-exclusion just becomes plain addition. Applying it to disjoint sets adds bookkeeping with no payoff.
Forgetting higher-order intersections. For three sets, the triple intersection must be added back. Novice attempts often write $|A|+|B|+|C| - |A\cap B| - |A\cap C| - |B\cap C|$ and stop there, which undercounts elements that are in all three.
Unclear forbidden-property sets. The method is only as good as the chosen property sets. If $A_i$ is fuzzy or poorly defined, the intersections collapse or become impossible to compute.
Sign confusion. The sign on the $k$-fold intersection is $(-1)^{k+1}$ in the union form or $(-1)^k$ in the complement form. Writing both forms in the same argument without labeling gets signs wrong.
Forgetting that events must share a ground set. The formula tracks elements of a common universe. If $A$ lives in one universe and $B$ in another, $|A \cap B|$ is not meaningful and the argument fails at the setup.
Applying IE to weighted sums without care. Probability versions (Bonferroni) truncate the alternating series for conservative bounds; stopping early yields bounds, not equalities. Know whether you want equality or bound.

How To Use It

Use inclusion-exclusion when:

you can describe forbidden (or desired) properties as sets $A_i$
direct counting of the allowed objects is messy
intersections $|\bigcap A_i|$ are comparatively easy (e.g., fixing $k$ positions reduces to counting permutations of the rest)
the number of sets is small enough that $2^n$ intersection terms remain tractable

Operationally:

define the property sets $A_1, \ldots, A_n$ clearly in words.
compute single-set sizes $|A_i|$.
compute pairwise intersections $|A_i \cap A_j|$.
continue to higher intersections if needed, noting any symmetry that makes all $k$-fold intersections equal.
keep signs consistent -- write the formula once and substitute.
verify on a small instance by direct enumeration before trusting the general answer.
consider the complement: sometimes "count objects avoiding all $P_i$" is cleaner than "count objects with at least one $P_i$."
if $n$ is large but most intersections collapse (e.g., $A_i \cap A_j = \varnothing$ for many pairs), the series has short effective length. Exploit sparsity.
if a symmetry makes all $k$-fold intersections equal in size, the sum reduces to $\binom{n}{k}$ times one intersection size -- write it out and save half the work.

Transfer / Where This Shows Up Later

Semester 2 algorithms: counting valid inputs to brute-force search (e.g., placements avoiding forbidden patterns) uses inclusion-exclusion to size the search space. The Möbius function in number theory is inclusion-exclusion over the divisor lattice.
Semester 3 data structures: cache-oblivious analyses and streaming-algorithm correctness rely on inclusion-exclusion over collision events.
Semester 4 databases: query-size estimation for OR predicates uses the two-set inclusion-exclusion formula; multi-predicate estimation generalizes it.
Semester 5 OS & 6 distributed systems: reliability of a system with redundant components -- "at least one replica up" -- is a union probability computed by inclusion-exclusion; "all replicas fail independently" uses the complement form.
Module 3 probability: the same logic governs $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ and its generalizations (Bonferroni bounds truncate the series to get conservative estimates).
Number theory: the Möbius function $\mu$ is the inclusion-exclusion coefficient on the divisor poset; the inverse of the "count-multiples" function under Möbius is the "count-coprimes" function, yielding Euler's totient $\varphi(n)$.
Compilers / SAT: counting satisfying assignments for CNF formulas (#SAT) uses inclusion-exclusion over violated clauses; though #SAT is #P-hard in general, IE gives exact algorithms for small clause counts.

Check Yourself

Why does naive addition fail for unions? Which elements get overcounted, and by how much?
Why is the triple intersection added back in the three-set formula? Trace an element in all three sets through the formula.
When is complement counting plus inclusion-exclusion better than direct enumeration?
Derive the two-set formula from the three-set formula by setting $C = \varnothing$.
Why does $D_n / n! \to 1/e$? Point to the Taylor series for $e^{-1}$.
For $n = 5$ sets of sizes $|A_i| = 10$ each, with all pairwise intersections of size $3$, triple intersections of size $1$, and higher intersections empty: compute $|\bigcup_i A_i|$ by inclusion-exclusion.
What is the sign of the $k$-fold intersection in the "avoid all forbidden properties" form? Why does the sign flip compared to the "union" form?

Mini Drill or Application

Count the integers from $1$ to $200$ divisible by at least one of $2$, $3$, or $5$. Write the set definitions explicitly.
How many permutations of ${1, \ldots, 6}$ send at least one of $1, 2, 3$ to itself?
Count surjections from an $n$-set to a $k$-set using inclusion-exclusion (property $A_i$ = "$i$ is not in the image").
Count binary strings of length $10$ that do not contain 000 as a substring. (Hint: complement form on positions where 000 could start.)
Verify $D_5 = 44$ using the inclusion-exclusion sum.
Count the number of onto functions $[6] \to [3]$ using the surjection formula. Cross-check against $3! \cdot S(6, 3) = 6 \cdot 90 = 540$.
Using inclusion-exclusion on "multiple of $p$" events for primes $p \le \sqrt{N}$, relate the count of integers $\le N$ coprime to $\prod p_i$ to Euler's totient.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​