Skip to main content

Bipartite Matching via Max Flow

What This Concept Is

A matching in a graph is a set of edges such that no two share an endpoint. A maximum matching has the largest possible size. In a bipartite graph G = (L union R, E) with edges only crossing L and R, maximum matching has a particularly clean structure.

The standard reduction to max flow:

  1. add a source s with unit-capacity edges to every vertex in L
  2. orient every L-R edge from L to R with unit capacity
  3. add a sink t with unit-capacity edges from every vertex in R
  4. run max flow from s to t

The max-flow value equals the size of the maximum matching. Saturated L-R edges in any max flow form one such matching.

With unit capacities, Edmonds-Karp already runs in O(|V| |E|). The Hopcroft-Karp algorithm improves this to O(|E| sqrt(|V|)) by augmenting along maximal sets of vertex-disjoint shortest paths in phases - equivalent to running Dinic's algorithm on the flow network. The sqrt(|V|)-phase bound relies on a clever counting argument: after sqrt(|V|) phases, any remaining augmenting path has length at least sqrt(|V|), and there are at most sqrt(|V|) disjoint such paths left.

For weighted bipartite matching (the "assignment problem"), plain max flow is not enough; you need min-cost max flow or the Hungarian algorithm, which runs in O(|V|^3) via potentials.

Why It Matters Here

Bipartite matching is the canonical "not obviously flow, but obviously flow" problem. Reducing it to max flow is the introduction to flow-based modeling, and it shows up everywhere:

  • task-to-worker assignment
  • student-to-project, student-to-dormitory
  • ad-slot assignment in online auctions
  • kidney exchange (non-cycle version)
  • stable marriage's unstable cousin when preferences are unweighted
  • bipartite scheduling: machines to jobs, rooms to classes

Hopcroft-Karp is not necessary to understand the reduction, but knowing it exists keeps you honest about the expected running time once the graph gets large. It is also the default matching in NetworkX's nx.bipartite.maximum_matching, SciPy's min_weight_full_bipartite_matching, and virtually every competitive programming template.

Concrete Examples

Example 1. Bipartite graph with L = {a, b, c}, R = {1, 2, 3}:

Build flow network: s to each of a, b, c capacity 1; orient L-R edges left-to-right with capacity 1; each of 1, 2, 3 to t capacity 1. Run max flow.

Edmonds-Karp finds augmenting paths:

iteraugmenting pathbottlenecktotal flow
1s-a-1-t11
2s-b-3-t12
3s-c-2-t13
4no augmenting path-terminate

Max flow = 3. Saturated L-R edges (a,1), (b,3), (c,2) form a perfect matching.

Example 2 (Konig's theorem). In the same graph, the maximum matching has size 3. A minimum vertex cover (smallest set of vertices touching every edge) must also have size 3; e.g., {1, 2, 3} or {a, b, c}. Konig's theorem guarantees the two numbers agree for bipartite graphs.

Example 3 (Hall's marriage condition). If instead a-1, a-2, b-1, b-2, c-1, c-2 (all three left vertices attach only to {1, 2}), N({a,b,c}) = {1,2}, |N| = 2 < 3 = |S|, so by Hall's theorem no perfect matching exists. Max flow returns 2.

Common Confusion / Misconceptions

  • "Bipartite matching is the general matching problem." It is not. The general (non-bipartite) matching problem is solvable in polynomial time via Edmonds' blossom algorithm, but not via a trivial flow reduction - odd cycles ("blossoms") break the clean flow structure.
  • "Max flow handles weighted matching." No. For weighted bipartite matching, you need min-cost max flow or the Hungarian algorithm.
  • "Hopcroft-Karp is just Edmonds-Karp with a speed tweak." It is structurally different - it augments along maximal sets of vertex-disjoint shortest augmenting paths in phases, matching Dinic's strategy.
  • "Konig's theorem is magic." It is a direct corollary of max-flow min-cut: the S-side of a min cut picks exactly a minimum vertex cover.
  • "Matching size equals min(|L|, |R|) for reasonable graphs." Not if the structure is unbalanced: Hall's condition pinpoints when it does.

How To Use It

For unweighted bipartite matching, reduce to max flow:

build flow network as above (each L-R edge becomes a capacity-1 arc)
run Edmonds-Karp (or Hopcroft-Karp for speed)
the saturated L-R edges form a maximum matching

Two related results are easy consequences of max-flow min-cut:

  • Konig's theorem. In a bipartite graph, max matching = min vertex cover. The S-side of the min cut picks the minimum vertex cover: take L vertices on the T-side plus R vertices on the S-side.
  • Hall's theorem. A bipartite graph G = (L, R, E) has a perfect matching saturating L iff for every S subseteq L, |N(S)| >= |S|.

Engineering tips:

  1. Represent L and R with disjoint index ranges; it makes debugging tractable.
  2. When matching fails, use the min-cut side to identify a violated Hall condition (gives a clear diagnostic).
  3. For repeated matchings under small graph edits, re-use the previous matching as a warm start; only find augmenting paths for unmatched vertices.
  4. For large |L|, |R| (hundreds of thousands), use Hopcroft-Karp; O(|E| sqrt(|V|)) is dramatic in practice.
  5. For weighted matching, reach for the assignment problem APIs (scipy.optimize.linear_sum_assignment, networkx.bipartite.minimum_weight_full_matching).

Transfer / Where This Shows Up Later

  • S3 (systems). Task scheduling, job-to-machine assignment, request routing.
  • S4 (compilers). Register-allocation variants that reduce to bipartite matching; instruction-to-functional-unit scheduling.
  • S5 (OS). Process-to-CPU affinity decisions under constraints, I/O scheduling.
  • S6 (databases). Query-plan-to-resource assignment, replica-to-shard assignment.
  • S7 (DDD). Ubiquitous-language cross-walking when merging contexts - matching "their term X" with "our term Y" under consistency constraints.
  • S8 (distributed). Data-center scheduling (Mesos, Kubernetes scheduling), work-stealing bipartite assignments; advertising auctions; kidney exchange networks.

Back-link to concepts 17-19 for the flow infrastructure, and forward to Module 5's LP duality section for the Konig/Hall connections.

Check Yourself

  1. Why is the max-flow value on the constructed network equal to the matching size?
  2. Why does Hopcroft-Karp's sqrt(|V|)-phase argument not generalize to arbitrary (non-bipartite) graphs?
  3. State Konig's theorem and describe how it falls out of max-flow min-cut.
  4. Give a scenario in which you need weighted bipartite matching and max flow is not enough.
  5. State Hall's theorem; sketch why it is equivalent to "max matching saturates L."
  6. What does the S-side of the residual-BFS min-cut tell you about the minimum vertex cover?

Mini Drill or Application

Solve by hand a 4-by-4 bipartite matching instance (pick any sensible edge set). Trace Edmonds-Karp augmenting paths and report the maximum matching size. Then verify that your answer also gives a minimum vertex cover of the same size (Konig).

Implementation drills.

  1. Implement bipartite matching as a thin wrapper over your existing Edmonds-Karp implementation.
  2. Implement Hopcroft-Karp (phased BFS + vertex-disjoint DFS augmentation). Benchmark against Edmonds-Karp on a random bipartite graph with |L| = |R| = 1000, expecting Hopcroft-Karp to be noticeably faster.
  3. Add a min-vertex-cover extractor using the residual-BFS cut; verify size equality with max matching (Konig).
  4. Test against networkx.bipartite.maximum_matching on graphs with up to 10^4 vertices.
  5. Solve a concrete assignment instance: match 50 classes to 50 rooms with capacity and timing constraints; confirm result via scipy.optimize.linear_sum_assignment on a weighted version.

Read This Only If Stuck