Skip to main content

Applications: Kruskal MST, Dynamic Connectivity, Equivalence Classes

What This Concept Is

Once union-find is near-constant per operation, a surprising number of graph and system problems collapse into a simple loop: sort or stream events, and for each event call find or union.

Three canonical applications:

  • Kruskal's MST: sort edges by weight, then for each edge (u, v) in increasing order, add it to the tree if find(u) != find(v) and then union(u, v). The forest of union-find classes is exactly the MST-so-far.
  • Incremental dynamic connectivity: given a stream of edge insertions and connectivity queries on an undirected graph, maintain connectivity by calling union(u, v) on every insertion and answering queries with find(u) == find(v).
  • Equivalence classes: any problem that defines "x is related to y" as the reflexive, symmetric, transitive closure of an observed relation (type unification, pixel regions, friend-of-friend groups, network-partition views).

Two extensions that matter in practice:

  • Offline union-find (Tarjan's off-line LCA): when all unions and queries are known up front, you can often answer each in effectively O(1) amortized by processing them in a topologically sensitive order.
  • Weighted / potential union-find: store an integer delta per edge; find(x) returns both the root and the accumulated weight along the path. This solves equality-modulo-offset problems (parity constraints, checking whether x and y differ by a given amount) without changing the asymptotic cost.

Why It Matters Here

Union-find is one of the rare cases where the right data structure changes an algorithm's asymptotic class:

  • Kruskal with union-find: O(E log E) dominated by sorting
  • Kruskal with naive connectivity: O(E * V) or worse

It is also the foundational structure for incremental graph algorithms where nothing is ever deleted, and a gateway for the weighted / offline variants that dominate competitive programming and scientific workloads.

Concrete Examples

Example 1 -- Kruskal end-to-end. Run Kruskal on the graph with edges (A,B,1), (B,C,2), (A,C,3), (C,D,4), (B,D,5):

Start: {A}, {B}, {C}, {D}      (4 components)
edge (A,B,1): find(A) != find(B), union -> {A,B}, {C}, {D}
edge (B,C,2): find(B) != find(C), union -> {A,B,C}, {D}
edge (A,C,3): find(A) == find(C), SKIP (would form a cycle)
edge (C,D,4): find(C) != find(D), union -> {A,B,C,D}
edge (B,D,5): find(B) == find(D), SKIP

MST weight = 1 + 2 + 4 = 7. Exactly three union operations on four vertices (V - 1 unions total), plus E find pairs. Total time is dominated by the edge sort.

Example 2 -- weighted union-find for parity. Given relations "x and y have the same parity" and "x and y have different parity," maintain a parity-consistent labeling under streaming constraints. Store weight[x] modulo 2 interpreted as "distance from root."

union(x, y, delta):   # assert parity(x) - parity(y) == delta mod 2
rx, wx = find(x); ry, wy = find(y)
if rx == ry:
return (wx - wy) mod 2 == delta # consistency check
parent[rx] = ry
weight[rx] = (wy + delta - wx) mod 2
return True

Adding constraint parity(a) == parity(b) (delta = 0) and then parity(b) != parity(c) (delta = 1) gives the deterministic assignment a, b same, c different. Any later constraint that contradicts this returns False in O(alpha(n)). Competitive programming has hundreds of problems of exactly this shape.

Common Confusion / Misconceptions

"Union-find does dynamic connectivity in general." It supports incremental connectivity -- only edge insertions. For decremental or fully dynamic connectivity (insertions and deletions) you need link-cut trees (O(log n) per op), Euler-tour trees, or Holm-Lichtenberg-Thorup (O(log^2 n)). Do not confuse "dynamic connectivity" in the research sense with "my edges only arrive" in the Kruskal sense.

"You can skip the find check before union and it still works." Correctness survives, but Kruskal's cycle detection depends on only adding edges across different classes. Calling union unconditionally adds E - V + 1 spurious links and silently corrupts the MST.

"Minimum-spanning tree requires edge weights to be distinct." It does not. When ties occur, Kruskal produces one MST; others with equal total weight exist. The correctness proof (cut property) handles ties by a consistent tiebreak, e.g., edge index.

"Union-find only applies to graphs." It applies to any monotone equivalence relation. Compiler type-unification, offline nearest-common-ancestor, Hindley-Milner inference, and clustering in bioinformatics all reduce to union-find.

How To Use It

Pattern 1 -- Kruskal MST:

  1. sort edges by weight
  2. make-set every vertex
  3. for each edge in order: if find(u) != find(v), accept the edge and union(u, v)
  4. stop when V - 1 edges are accepted

Pattern 2 -- Incremental connectivity over a stream:

  1. make-set every vertex
  2. on add-edge(u, v): union(u, v)
  3. on connected?(u, v): return find(u) == find(v)

Pattern 3 -- Equivalence classes:

  1. make-set every element
  2. whenever a rule says "x ~ y", call union(x, y)
  3. to answer "are x and y equivalent?", compare find(x) and find(y)

Pattern 4 -- Offline processing:

  1. collect all operations before running
  2. sort or permute them to favor a structure (e.g., small-to-large merging, DFS order)
  3. process in bulk; maintain aggregates at roots

Transfer / Where This Shows Up Later

  • S5 (OS): mount --move and namespace unification in Linux compute transitive mount reachability with a union-find under the hood.
  • S6 (databases): MST reductions appear in graph-database query planners (shortest connecting subgraph); equivalence classes appear in query-optimizer constant propagation and predicate unification (all rows with a = b and b = c share a = c without a join).
  • S7 (architecture): ADRs for service-mesh topology analysis (who-calls-whom transitively) and dependency-graph reachability often lean on union-find as the baseline before reaching for heavier structures.
  • S8 (scale): distributed component labeling in stream processors (Flink's ConnectedComponents, Spark GraphX) uses union-find with merge-at-shuffle to partition massive graphs.

Check Yourself

  1. Why does Kruskal's correctness rely on the union-find cycle check and not just on the edge sort?
  2. What does union-find not support that full dynamic connectivity requires?
  3. How many unions does Kruskal perform on a connected graph with V vertices?
  4. How would you detect that a graph is disconnected using union-find and V make-sets?
  5. In weighted union-find, why must weight[parent] be updated during path compression, and what happens if you forget?
  6. Name two non-graph problems that reduce cleanly to union-find and explain the reduction.

Mini Drill or Application

  1. Given edges (A,B,4), (A,C,2), (B,C,1), (B,D,5), (C,D,8), (C,E,10), (D,E,3), run Kruskal with union-find by hand and list the accepted edges in order.
  2. For the same edge set arriving as a stream without weights, answer: after inserting the first three edges, is A connected to D?
  3. Implement the equivalence-class pattern for n = 6 elements and the relations 1~2, 3~4, 5~6, 2~4. Report the final partition.
  4. Implement weighted (potential) union-find in ~80 lines and solve a small parity-constraint problem (e.g., 10 constraints over 10 variables, one of which is inconsistent).
  5. Simulate an offline batch of 100 unions and 100 connectivity queries; compare to an online batch interleaving both, confirming total work is identical.

Read This Only If Stuck