Kruskal with Union-Find

What This Concept Is

Kruskal's algorithm builds an MST by sorting edges by weight and greedily adding each edge that does not create a cycle:

sort E in nondecreasing order of weight
make-set(v) for each v in V
MST := empty
for each edge (u, v, w) in sorted order:
    if find(u) != find(v):
        union(u, v)
        add (u, v) to MST
return MST

The cycle check is the bottleneck. A union-find (disjoint-set) data structure answers "are these two vertices in the same component?" in effectively O(alpha(|V|)) amortized time, where alpha is the inverse Ackermann function - slower than O(log) but unmeasurably close to O(1) in practice.

Total time: O(|E| log |E|) for the sort plus O(|E| alpha(|V|)) for the union-find, i.e., O(|E| log |V|) overall (log E = Theta(log V) for simple graphs).

Two optimizations are essential to achieve the inverse-Ackermann bound:

Union by rank (or size). Always attach the smaller tree under the larger root. Prevents unbalanced chains.
Path compression. On find(x), rewrite parent pointers so they point directly at the root.

Either optimization alone gives O(log |V|) per operation. Both together yield the celebrated O(alpha(|V|)) amortized bound (Tarjan).

Why It Matters Here

Kruskal is the cleanest demonstration of "greedy plus good bookkeeping" in this module. It also gives you the union-find data structure, which reappears in:

connectivity queries under incremental edge additions
offline-query algorithms (small-to-large, DSU on tree)
Tarjan's offline LCA algorithm
percolation simulations
image-segmentation (Felzenszwalb-Huttenlocher)

When to prefer Kruskal over Prim:

the edges are already (or can be cheaply) sorted
the graph is sparse; Kruskal scales with |E|
you need the MST edges in increasing weight order as a by-product (useful for clustering, k-th MST)
the graph is given as an edge list (a common input format in competitive programming)

Concrete Example

Graph:

Sort: (a,b)=1, (b,c)=2, (b,d)=3, (c,d)=4, (a,c)=5, (c,e)=6, (d,e)=7.

Initial components: {a}, {b}, {c}, {d}, {e}.

step	edge	`find(u), find(v)`	action	components after
1	`(a,b)=1`	`a, b`	union, add	`{a,b}, {c}, {d}, {e}`
2	`(b,c)=2`	`a, c`	union, add	`{a,b,c}, {d}, {e}`
3	`(b,d)=3`	`a, d`	union, add	`{a,b,c,d}, {e}`
4	`(c,d)=4`	`a, a`	skip (same)	unchanged
5	`(a,c)=5`	`a, a`	skip (same)	unchanged
6	`(c,e)=6`	`a, e`	union, add	`{a,b,c,d,e}`
7	`(d,e)=7`	`a, a`	skip; also done (`	V

Total MST weight: 1 + 2 + 3 + 6 = 12.

Common Confusion / Misconceptions

"Union-find is just for MST." No. It is the workhorse for any "maintain components under adding edges" problem: connectivity queries, offline dynamic problems, percolation simulations.
"Path compression alone is enough." It gives O(log |V|) amortized per operation, not inverse-Ackermann. Combine with union-by-rank for the tight bound.
"Kruskal is O(|E| log |E|) because of the sort." Correct, but note log |E| = log |V^2| = 2 log |V|, so asymptotic equivalents are usually written as O(|E| log |V|).
"Kruskal needs a connected graph." If the graph is disconnected, Kruskal returns a minimum spanning forest.
"The cycle check is an explicit DFS." It is find(u) != find(v) - an O(alpha) operation - not a DFS.
"Integer weights make Kruskal O(|E|)." Only with radix sort and specific conditions. The generic version is O(|E| log |V|).

How To Use It

Baseline implementation skeleton:

parent[v] := v; rank[v] := 0
find(x):
    while parent[x] != x:
        parent[x] := parent[parent[x]]   # path compression (halving)
        x := parent[x]
    return x
union(x, y):
    rx := find(x); ry := find(y)
    if rx = ry: return false
    if rank[rx] < rank[ry]: swap rx, ry
    parent[ry] := rx
    if rank[rx] = rank[ry]: rank[rx] := rank[rx] + 1
    return true

Run Kruskal by sorting edges and calling union on each; stop when you have added |V| - 1 edges.

Integration checklist:

verify edges are sorted stably (ties do not matter for MST correctness)
early-exit after |V| - 1 edges to avoid processing the tail
for weighted graphs with floating-point weights, break ties deterministically if you need reproducibility
if you want the minimum spanning forest, run Kruskal to completion (will produce multiple components)

Transfer / Where This Shows Up Later

S3 (systems). File-system quota and cgroup membership sometimes use union-find for fast "same-group" queries.
S4 (compilers). Static single assignment construction uses union-find for variable renaming; type unification uses it to merge equivalence classes.
S5 (OS). Page-frame merging for huge pages uses union-find; kernel ID namespaces map via union-find.
S6 (databases). Equi-join inference (which columns are guaranteed equal by transitivity) uses union-find.
S7 (DDD). Merging bounded contexts at the "shared-kernel" level conceptually does a union on terminology sets.
S8 (distributed). Partition detection after network splits: union-find over surviving nodes identifies the quorum-reachable set.

Back-link to concept 13 (cut and cycle properties) since they justify Kruskal's correctness, and forward to concept 16 (MST variants) for single-linkage clustering.

Check Yourself

Why is the running time O(|E| log |V|) rather than O(|E| log |E|) asymptotically (hint: |E| <= |V|^2)?
What does the cut property guarantee about the edges Kruskal adds?
Why does union by rank + path compression yield amortized inverse-Ackermann time?
Why does stopping after |V| - 1 edges give a tree (not a forest or a multigraph)?
Sketch why find(x) with path compression alone still costs O(log |V|) amortized but not O(alpha).
Give a production scenario where you would use union-find without Kruskal.

Mini Drill or Application

Implementation drill.

Implement union-find with union-by-rank and path compression. Test against a naive O(|V|)-per-op implementation on the same sequence of operations.
Implement Kruskal on top of your union-find. Verify correctness by comparing the total weight with a brute-force MST on small graphs (|V| <= 8).
Use your implementation to solve a clustering task: connect each point to its k nearest neighbors, run Kruskal, and stop when you have |V| - c edges to produce c clusters.
Apply Kruskal to a percolation problem: randomly order edges of a grid; run Kruskal until the top and bottom of the grid are in the same component.
Benchmark Kruskal (union-find) vs a naive O(|V|^2) Prim on random sparse graphs; observe Kruskal wins as density decreases.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​