Applications: Kruskal MST, Dynamic Connectivity, Equivalence Classes
What This Concept Is
Once union-find is near-constant per operation, a surprising number of graph and system problems collapse into a simple loop: sort or stream events, and for each event call find or union.
Three canonical applications:
- Kruskal's MST: sort edges by weight, then for each edge
(u, v)in increasing order, add it to the tree iffind(u) != find(v)and thenunion(u, v). The forest of union-find classes is exactly the MST-so-far. - Incremental dynamic connectivity: given a stream of edge insertions and connectivity queries on an undirected graph, maintain connectivity by calling
union(u, v)on every insertion and answering queries withfind(u) == find(v). - Equivalence classes: any problem that defines "
xis related toy" as the reflexive, symmetric, transitive closure of an observed relation (type unification, pixel regions, friend-of-friend groups, network-partition views).
Two extensions that matter in practice:
- Offline union-find (Tarjan's off-line LCA): when all unions and queries are known up front, you can often answer each in effectively
O(1)amortized by processing them in a topologically sensitive order. - Weighted / potential union-find: store an integer delta per edge;
find(x)returns both the root and the accumulated weight along the path. This solves equality-modulo-offset problems (parity constraints, checking whetherxandydiffer by a given amount) without changing the asymptotic cost.
Why It Matters Here
Union-find is one of the rare cases where the right data structure changes an algorithm's asymptotic class:
- Kruskal with union-find:
O(E log E)dominated by sorting - Kruskal with naive connectivity:
O(E * V)or worse
It is also the foundational structure for incremental graph algorithms where nothing is ever deleted, and a gateway for the weighted / offline variants that dominate competitive programming and scientific workloads.
Concrete Examples
Example 1 -- Kruskal end-to-end. Run Kruskal on the graph with edges (A,B,1), (B,C,2), (A,C,3), (C,D,4), (B,D,5):
Start: {A}, {B}, {C}, {D} (4 components)
edge (A,B,1): find(A) != find(B), union -> {A,B}, {C}, {D}
edge (B,C,2): find(B) != find(C), union -> {A,B,C}, {D}
edge (A,C,3): find(A) == find(C), SKIP (would form a cycle)
edge (C,D,4): find(C) != find(D), union -> {A,B,C,D}
edge (B,D,5): find(B) == find(D), SKIP
MST weight = 1 + 2 + 4 = 7. Exactly three union operations on four vertices (V - 1 unions total), plus E find pairs. Total time is dominated by the edge sort.
Example 2 -- weighted union-find for parity. Given relations "x and y have the same parity" and "x and y have different parity," maintain a parity-consistent labeling under streaming constraints. Store weight[x] modulo 2 interpreted as "distance from root."
union(x, y, delta): # assert parity(x) - parity(y) == delta mod 2
rx, wx = find(x); ry, wy = find(y)
if rx == ry:
return (wx - wy) mod 2 == delta # consistency check
parent[rx] = ry
weight[rx] = (wy + delta - wx) mod 2
return True
Adding constraint parity(a) == parity(b) (delta = 0) and then parity(b) != parity(c) (delta = 1) gives the deterministic assignment a, b same, c different. Any later constraint that contradicts this returns False in O(alpha(n)). Competitive programming has hundreds of problems of exactly this shape.
Common Confusion / Misconceptions
"Union-find does dynamic connectivity in general." It supports incremental connectivity -- only edge insertions. For decremental or fully dynamic connectivity (insertions and deletions) you need link-cut trees (O(log n) per op), Euler-tour trees, or Holm-Lichtenberg-Thorup (O(log^2 n)). Do not confuse "dynamic connectivity" in the research sense with "my edges only arrive" in the Kruskal sense.
"You can skip the find check before union and it still works." Correctness survives, but Kruskal's cycle detection depends on only adding edges across different classes. Calling union unconditionally adds E - V + 1 spurious links and silently corrupts the MST.
"Minimum-spanning tree requires edge weights to be distinct." It does not. When ties occur, Kruskal produces one MST; others with equal total weight exist. The correctness proof (cut property) handles ties by a consistent tiebreak, e.g., edge index.
"Union-find only applies to graphs." It applies to any monotone equivalence relation. Compiler type-unification, offline nearest-common-ancestor, Hindley-Milner inference, and clustering in bioinformatics all reduce to union-find.
How To Use It
Pattern 1 -- Kruskal MST:
- sort edges by weight
- make-set every vertex
- for each edge in order: if
find(u) != find(v), accept the edge andunion(u, v) - stop when
V - 1edges are accepted
Pattern 2 -- Incremental connectivity over a stream:
- make-set every vertex
- on
add-edge(u, v):union(u, v) - on
connected?(u, v): returnfind(u) == find(v)
Pattern 3 -- Equivalence classes:
- make-set every element
- whenever a rule says "
x ~ y", callunion(x, y) - to answer "are
xandyequivalent?", comparefind(x)andfind(y)
Pattern 4 -- Offline processing:
- collect all operations before running
- sort or permute them to favor a structure (e.g., small-to-large merging, DFS order)
- process in bulk; maintain aggregates at roots
Transfer / Where This Shows Up Later
- S5 (OS):
mount --moveand namespace unification in Linux compute transitive mount reachability with a union-find under the hood. - S6 (databases): MST reductions appear in graph-database query planners (shortest connecting subgraph); equivalence classes appear in query-optimizer constant propagation and predicate unification (all rows with
a = bandb = csharea = cwithout a join). - S7 (architecture): ADRs for service-mesh topology analysis (who-calls-whom transitively) and dependency-graph reachability often lean on union-find as the baseline before reaching for heavier structures.
- S8 (scale): distributed component labeling in stream processors (Flink's
ConnectedComponents, Spark GraphX) uses union-find with merge-at-shuffle to partition massive graphs.
Check Yourself
- Why does Kruskal's correctness rely on the union-find cycle check and not just on the edge sort?
- What does union-find not support that full dynamic connectivity requires?
- How many unions does Kruskal perform on a connected graph with
Vvertices? - How would you detect that a graph is disconnected using union-find and
Vmake-sets? - In weighted union-find, why must
weight[parent]be updated during path compression, and what happens if you forget? - Name two non-graph problems that reduce cleanly to union-find and explain the reduction.
Mini Drill or Application
- Given edges
(A,B,4), (A,C,2), (B,C,1), (B,D,5), (C,D,8), (C,E,10), (D,E,3), run Kruskal with union-find by hand and list the accepted edges in order. - For the same edge set arriving as a stream without weights, answer: after inserting the first three edges, is
Aconnected toD? - Implement the equivalence-class pattern for
n = 6elements and the relations1~2, 3~4, 5~6, 2~4. Report the final partition. - Implement weighted (potential) union-find in ~80 lines and solve a small parity-constraint problem (e.g., 10 constraints over 10 variables, one of which is inconsistent).
- Simulate an offline batch of 100 unions and 100 connectivity queries; compare to an online batch interleaving both, confirming total work is identical.
Read This Only If Stuck
- CLRS: Kruskal and Prim (Part 1)
- CLRS: Kruskal and Prim (Part 2)
- CLRS: Analysis of union by rank with path compression (Part 4)
- Competitive Programming: 2.4.2 Union-find disjoint sets
- Skiena: 15.3 Minimum spanning tree (Part 1)
- Skiena: 15.3 Minimum spanning tree (Part 2)
- CP-Algorithms: Disjoint Set Union -- applications (weighted DSU, offline LCA, Kruskal)
- MIT 6.046J videos on Kruskal and greedy MST