Bipartite Matching via Max Flow
What This Concept Is
A matching in a graph is a set of edges such that no two share an endpoint. A maximum matching has the largest possible size. In a bipartite graph G = (L union R, E) with edges only crossing L and R, maximum matching has a particularly clean structure.
The standard reduction to max flow:
- add a source
swith unit-capacity edges to every vertex inL - orient every
L-Redge fromLtoRwith unit capacity - add a sink
twith unit-capacity edges from every vertex inR - run max flow from
stot
The max-flow value equals the size of the maximum matching. Saturated L-R edges in any max flow form one such matching.
With unit capacities, Edmonds-Karp already runs in O(|V| |E|). The Hopcroft-Karp algorithm improves this to O(|E| sqrt(|V|)) by augmenting along maximal sets of vertex-disjoint shortest paths in phases - equivalent to running Dinic's algorithm on the flow network. The sqrt(|V|)-phase bound relies on a clever counting argument: after sqrt(|V|) phases, any remaining augmenting path has length at least sqrt(|V|), and there are at most sqrt(|V|) disjoint such paths left.
For weighted bipartite matching (the "assignment problem"), plain max flow is not enough; you need min-cost max flow or the Hungarian algorithm, which runs in O(|V|^3) via potentials.
Why It Matters Here
Bipartite matching is the canonical "not obviously flow, but obviously flow" problem. Reducing it to max flow is the introduction to flow-based modeling, and it shows up everywhere:
- task-to-worker assignment
- student-to-project, student-to-dormitory
- ad-slot assignment in online auctions
- kidney exchange (non-cycle version)
- stable marriage's unstable cousin when preferences are unweighted
- bipartite scheduling: machines to jobs, rooms to classes
Hopcroft-Karp is not necessary to understand the reduction, but knowing it exists keeps you honest about the expected running time once the graph gets large. It is also the default matching in NetworkX's nx.bipartite.maximum_matching, SciPy's min_weight_full_bipartite_matching, and virtually every competitive programming template.
Concrete Examples
Example 1. Bipartite graph with L = {a, b, c}, R = {1, 2, 3}:
Build flow network: s to each of a, b, c capacity 1; orient L-R edges left-to-right with capacity 1; each of 1, 2, 3 to t capacity 1. Run max flow.
Edmonds-Karp finds augmenting paths:
| iter | augmenting path | bottleneck | total flow |
|---|---|---|---|
| 1 | s-a-1-t | 1 | 1 |
| 2 | s-b-3-t | 1 | 2 |
| 3 | s-c-2-t | 1 | 3 |
| 4 | no augmenting path | - | terminate |
Max flow = 3. Saturated L-R edges (a,1), (b,3), (c,2) form a perfect matching.
Example 2 (Konig's theorem). In the same graph, the maximum matching has size 3. A minimum vertex cover (smallest set of vertices touching every edge) must also have size 3; e.g., {1, 2, 3} or {a, b, c}. Konig's theorem guarantees the two numbers agree for bipartite graphs.
Example 3 (Hall's marriage condition). If instead a-1, a-2, b-1, b-2, c-1, c-2 (all three left vertices attach only to {1, 2}), N({a,b,c}) = {1,2}, |N| = 2 < 3 = |S|, so by Hall's theorem no perfect matching exists. Max flow returns 2.
Common Confusion / Misconceptions
- "Bipartite matching is the general matching problem." It is not. The general (non-bipartite) matching problem is solvable in polynomial time via Edmonds' blossom algorithm, but not via a trivial flow reduction - odd cycles ("blossoms") break the clean flow structure.
- "Max flow handles weighted matching." No. For weighted bipartite matching, you need min-cost max flow or the Hungarian algorithm.
- "Hopcroft-Karp is just Edmonds-Karp with a speed tweak." It is structurally different - it augments along maximal sets of vertex-disjoint shortest augmenting paths in phases, matching Dinic's strategy.
- "Konig's theorem is magic." It is a direct corollary of max-flow min-cut: the
S-side of a min cut picks exactly a minimum vertex cover. - "Matching size equals
min(|L|, |R|)for reasonable graphs." Not if the structure is unbalanced: Hall's condition pinpoints when it does.
How To Use It
For unweighted bipartite matching, reduce to max flow:
build flow network as above (each L-R edge becomes a capacity-1 arc)
run Edmonds-Karp (or Hopcroft-Karp for speed)
the saturated L-R edges form a maximum matching
Two related results are easy consequences of max-flow min-cut:
- Konig's theorem. In a bipartite graph,
max matching = min vertex cover. TheS-side of the min cut picks the minimum vertex cover: takeLvertices on theT-side plusRvertices on theS-side. - Hall's theorem. A bipartite graph
G = (L, R, E)has a perfect matching saturatingLiff for everyS subseteq L,|N(S)| >= |S|.
Engineering tips:
- Represent
LandRwith disjoint index ranges; it makes debugging tractable. - When matching fails, use the min-cut side to identify a violated Hall condition (gives a clear diagnostic).
- For repeated matchings under small graph edits, re-use the previous matching as a warm start; only find augmenting paths for unmatched vertices.
- For large
|L|, |R|(hundreds of thousands), use Hopcroft-Karp;O(|E| sqrt(|V|))is dramatic in practice. - For weighted matching, reach for the assignment problem APIs (
scipy.optimize.linear_sum_assignment,networkx.bipartite.minimum_weight_full_matching).
Transfer / Where This Shows Up Later
- S3 (systems). Task scheduling, job-to-machine assignment, request routing.
- S4 (compilers). Register-allocation variants that reduce to bipartite matching; instruction-to-functional-unit scheduling.
- S5 (OS). Process-to-CPU affinity decisions under constraints, I/O scheduling.
- S6 (databases). Query-plan-to-resource assignment, replica-to-shard assignment.
- S7 (DDD). Ubiquitous-language cross-walking when merging contexts - matching "their term X" with "our term Y" under consistency constraints.
- S8 (distributed). Data-center scheduling (Mesos, Kubernetes scheduling), work-stealing bipartite assignments; advertising auctions; kidney exchange networks.
Back-link to concepts 17-19 for the flow infrastructure, and forward to Module 5's LP duality section for the Konig/Hall connections.
Check Yourself
- Why is the max-flow value on the constructed network equal to the matching size?
- Why does Hopcroft-Karp's
sqrt(|V|)-phase argument not generalize to arbitrary (non-bipartite) graphs? - State Konig's theorem and describe how it falls out of max-flow min-cut.
- Give a scenario in which you need weighted bipartite matching and max flow is not enough.
- State Hall's theorem; sketch why it is equivalent to "max matching saturates
L." - What does the
S-side of the residual-BFS min-cut tell you about the minimum vertex cover?
Mini Drill or Application
Solve by hand a 4-by-4 bipartite matching instance (pick any sensible edge set). Trace Edmonds-Karp augmenting paths and report the maximum matching size. Then verify that your answer also gives a minimum vertex cover of the same size (Konig).
Implementation drills.
- Implement bipartite matching as a thin wrapper over your existing Edmonds-Karp implementation.
- Implement Hopcroft-Karp (phased BFS + vertex-disjoint DFS augmentation). Benchmark against Edmonds-Karp on a random bipartite graph with
|L| = |R| = 1000, expecting Hopcroft-Karp to be noticeably faster. - Add a min-vertex-cover extractor using the residual-BFS cut; verify size equality with max matching (Konig).
- Test against
networkx.bipartite.maximum_matchingon graphs with up to10^4vertices. - Solve a concrete assignment instance: match 50 classes to 50 rooms with capacity and timing constraints; confirm result via
scipy.optimize.linear_sum_assignmenton a weighted version.
Read This Only If Stuck
- CLRS 24.3: Maximum Bipartite Matching
- CLRS 24.3: Maximum Bipartite Matching (Part 2)
- CLRS 24.3: Maximum Bipartite Matching (Part 3)
- CLRS 25.1: Maximum Bipartite Matching Revisited (Hopcroft-Karp)
- ADM 6.5: Network Flows and Bipartite Matching
- Sedgewick: Matching
- Sedgewick: Network Flow
- Competitive Programming: Bipartite graph
- cp-algorithms: Hopcroft-Karp bipartite matching
- NetworkX: bipartite matching documentation