Topological Sort and DAG Algorithms
What This Concept Is
A topological order on a directed acyclic graph G = (V, E) is a linear order of V such that every edge (u, v) goes from an earlier vertex to a later one. A graph admits a topological order if and only if it is a DAG.
Two linear-time algorithms produce one:
- DFS-based. Run DFS on
G. Output vertices in decreasing finish time. Back edges would contradict acyclicity, so if none are found, the sort is valid. - Kahn's algorithm. Repeatedly remove a vertex with in-degree
0and append it to the output. If the output has fewer than|V|elements, a cycle exists.
The two agree on the output class (any topological order of the DAG) but not on which specific order they emit. Kahn's produces orders based on current in-degree; DFS produces orders determined by the DFS traversal. Both are O(|V| + |E|).
Once you have a topological order, many problems collapse:
- Longest / shortest path in a DAG. Relax edges in topological order; works even with negative weights in linear time.
- DP over vertices. Any DP whose subproblems depend on "strictly earlier" entities uses topological order implicitly.
- Scheduling with precedence. The order is a valid schedule; critical-path analysis gives the minimum project duration.
- Counting paths. Number of distinct
s -> tpaths in a DAG inO(|V| + |E|)via a single topological pass.
Why It Matters Here
Before you reach for Dijkstra or Bellman-Ford, ask whether the graph is a DAG. If yes, you can solve shortest paths (even with negative weights) in O(|V| + |E|) instead of O((|V|+|E|) log |V|) or O(|V||E|). The DAG shortcut is both strictly faster and strictly more general on the weight dimension.
Beyond shortest paths, topological thinking turns up everywhere: package managers, spreadsheet recalculation, Makefile-like build systems, and cyclic check-and-fix pipelines. Any time you see "process X only after all of Y," you are doing a topological sort, whether you name it or not.
Concrete Example
Suppose courses have prerequisites:
This graph is a DAG. A valid topological order is calculus, analysis, linear_algebra, topology, category_theory (or a swap of the middle two). Kahn's algorithm would start by emitting calculus (only vertex with in-degree 0), then decrement and emit analysis and linear_algebra in either order, and so on.
Kahn trace (in-degree in brackets):
| step | in-degree | queue | emitted |
|---|---|---|---|
| init | calc[0] ana[1] la[1] topo[2] cat[1] | [calc] | [] |
| 1 | ana[0] la[0] topo[2] cat[1] | [ana, la] | [calc] |
| 2 | la[0] topo[1] cat[1] | [la] | [calc, ana] |
| 3 | topo[0] cat[1] | [topo] | [calc, ana, la] |
| 4 | cat[0] | [cat] | [calc, ana, la, topo] |
| 5 | - | [] | [calc, ana, la, topo, cat] |
If a single cycle were added - say category_theory -> calculus - Kahn's output length would fall to 0 immediately, correctly reporting "not a DAG."
Second example - shortest path with negative edges on a DAG:
a -(3)-> b, a -(5)-> c, b -(-2)-> c, b -(6)-> d, c -(4)-> d
Topological order: a, b, c, d. Relax edges in that order from a: dist[a]=0, dist[b]=3, dist[c]=min(5, 3+(-2))=1, dist[d]=min(3+6, 1+4)=5. Done in O(|V|+|E|), negative edges accommodated.
Common Confusion / Misconceptions
- "There is the topological order." No. Any two incomparable vertices can appear in either order. Problems that ask for "the" order are usually underspecified; what they want is a deterministic rule (lex smallest, earliest-start, etc.) layered on top.
- "Topological sort is approximate for cyclic graphs." It simply does not exist. You cannot produce an order that respects all edges of a cyclic graph.
- "Kahn and DFS produce the same order." They do not in general. They produce some topological order; which one depends on queue-ordering and neighbor-iteration order. Tests should accept any valid order.
- "If DFS finds no back edge, no further check needed." True, but Kahn's check (output length
< |V|) is often easier to add to production code than a back-edge hook.
How To Use It
Kahn's algorithm:
compute in-degree in[v] for all v
Q := queue of all v with in[v] = 0
while Q is not empty:
u := Q.dequeue(); append u to order
for each v in adj[u]:
in[v] := in[v] - 1
if in[v] = 0: Q.enqueue(v)
if |order| < |V|: report "cycle"
DFS-based:
DFS(G)
output vertices in decreasing f[u]
if any back edge found: report "cycle"
Prefer Kahn's when you want to detect cycles cheaply or produce an order under a tie-breaking rule (swap the FIFO for a heap to get lex-smallest); prefer DFS-based when you need finish times anyway (e.g., for SCCs).
Downstream patterns to master:
- DP over DAG. Order vertices, relax in order;
dp[v] = f(dp[u] for (u,v) in edges). - Critical path. Longest path in a DAG, same structure as shortest path but maximize.
- Path counting.
ways[v] = sum over (u, v) of ways[u], withways[source] = 1.
Transfer / Where This Shows Up Later
- S3 (systems).
make,ninja,bazel,cargo, and every package manager's install phase topologically sort a dependency graph. Circular dependencies are a Kahn-detection bug report. - S4 (compilers). Expression-graph evaluation order, instruction scheduling (with priority to break ties), and data-flow worklist order all use topological sort.
- S5 (OS). Boot sequences order services by
systemd-style dependency graphs (a topological sort with priorities). - S6 (databases). Foreign-key insert/delete order, migration scripts, and materialized-view refresh order are topological sorts over table-dependency DAGs.
- S7 (DDD). Event-processing pipelines as DAGs; event-choreography orchestrators topologically process saga steps.
- S8 (distributed). Airflow/Luigi/Prefect DAG schedulers are topological sort with retry and idempotency wrappers.
Back-link to S1 M2 for the DAG definition; forward to M6 dynamic programming for DAG-DP as a generalized pattern.
Check Yourself
- Why does a graph admit a topological order iff it is a DAG?
- In Kahn's algorithm, why does "queue is empty before
|V|vertices emitted" prove a cycle exists? - Given a topological order, why can you solve shortest paths even with negative edge weights in
O(|V| + |E|)? - How would you produce the lexicographically smallest topological order?
- How do you count the number of distinct topological orders (hint: tricky; expected #P-hard in general).
- What breaks if you try DFS-based topological sort on a graph with a cycle?
Mini Drill or Application
Given the DAG
a -> b, a -> c
b -> d, c -> d
d -> e
c -> e
and edge weights
(a,b)=3, (a,c)=1, (b,d)=1, (c,d)=5, (d,e)=1, (c,e)=7
find the shortest path from a to e by relaxing edges in topological order. Then find the longest path from a to e in O(|V| + |E|) using the same trick.
Implementation drill. Implement:
- Kahn's algorithm with in-degree tracking
- DFS-based topological sort
- DAG shortest path via topo order (accept negative weights)
- A small build-order tool that reads
(target, dependency)pairs from a file and prints a build order; error out with a clear message on a cycle
Verify both topo-sort implementations agree on the set of valid orders for a hand-built DAG with two incomparable branches.
Read This Only If Stuck
- CLRS 20.4: Topological Sort
- CLRS 22.2: Shortest Paths in DAGs
- ADM 5.10: DFS on Directed Graphs
- ADM 5.9: Applications of DFS
- Competitive Programming 4.7.1: DAG
- Competitive Programming 4.7.1: DAG (Part 2)
- Sedgewick: Directed Graphs (Part 2)
- cp-algorithms: Topological Sorting
- Competitive Programmer's Handbook (cses.fi): graph chapter on DAGs