Skip to main content

Topological Sort and DAG Algorithms

What This Concept Is

A topological order on a directed acyclic graph G = (V, E) is a linear order of V such that every edge (u, v) goes from an earlier vertex to a later one. A graph admits a topological order if and only if it is a DAG.

Two linear-time algorithms produce one:

  • DFS-based. Run DFS on G. Output vertices in decreasing finish time. Back edges would contradict acyclicity, so if none are found, the sort is valid.
  • Kahn's algorithm. Repeatedly remove a vertex with in-degree 0 and append it to the output. If the output has fewer than |V| elements, a cycle exists.

The two agree on the output class (any topological order of the DAG) but not on which specific order they emit. Kahn's produces orders based on current in-degree; DFS produces orders determined by the DFS traversal. Both are O(|V| + |E|).

Once you have a topological order, many problems collapse:

  • Longest / shortest path in a DAG. Relax edges in topological order; works even with negative weights in linear time.
  • DP over vertices. Any DP whose subproblems depend on "strictly earlier" entities uses topological order implicitly.
  • Scheduling with precedence. The order is a valid schedule; critical-path analysis gives the minimum project duration.
  • Counting paths. Number of distinct s -> t paths in a DAG in O(|V| + |E|) via a single topological pass.

Why It Matters Here

Before you reach for Dijkstra or Bellman-Ford, ask whether the graph is a DAG. If yes, you can solve shortest paths (even with negative weights) in O(|V| + |E|) instead of O((|V|+|E|) log |V|) or O(|V||E|). The DAG shortcut is both strictly faster and strictly more general on the weight dimension.

Beyond shortest paths, topological thinking turns up everywhere: package managers, spreadsheet recalculation, Makefile-like build systems, and cyclic check-and-fix pipelines. Any time you see "process X only after all of Y," you are doing a topological sort, whether you name it or not.

Concrete Example

Suppose courses have prerequisites:

This graph is a DAG. A valid topological order is calculus, analysis, linear_algebra, topology, category_theory (or a swap of the middle two). Kahn's algorithm would start by emitting calculus (only vertex with in-degree 0), then decrement and emit analysis and linear_algebra in either order, and so on.

Kahn trace (in-degree in brackets):

stepin-degreequeueemitted
initcalc[0] ana[1] la[1] topo[2] cat[1][calc][]
1ana[0] la[0] topo[2] cat[1][ana, la][calc]
2la[0] topo[1] cat[1][la][calc, ana]
3topo[0] cat[1][topo][calc, ana, la]
4cat[0][cat][calc, ana, la, topo]
5-[][calc, ana, la, topo, cat]

If a single cycle were added - say category_theory -> calculus - Kahn's output length would fall to 0 immediately, correctly reporting "not a DAG."

Second example - shortest path with negative edges on a DAG:

a -(3)-> b, a -(5)-> c, b -(-2)-> c, b -(6)-> d, c -(4)-> d

Topological order: a, b, c, d. Relax edges in that order from a: dist[a]=0, dist[b]=3, dist[c]=min(5, 3+(-2))=1, dist[d]=min(3+6, 1+4)=5. Done in O(|V|+|E|), negative edges accommodated.

Common Confusion / Misconceptions

  • "There is the topological order." No. Any two incomparable vertices can appear in either order. Problems that ask for "the" order are usually underspecified; what they want is a deterministic rule (lex smallest, earliest-start, etc.) layered on top.
  • "Topological sort is approximate for cyclic graphs." It simply does not exist. You cannot produce an order that respects all edges of a cyclic graph.
  • "Kahn and DFS produce the same order." They do not in general. They produce some topological order; which one depends on queue-ordering and neighbor-iteration order. Tests should accept any valid order.
  • "If DFS finds no back edge, no further check needed." True, but Kahn's check (output length < |V|) is often easier to add to production code than a back-edge hook.

How To Use It

Kahn's algorithm:

compute in-degree in[v] for all v
Q := queue of all v with in[v] = 0
while Q is not empty:
u := Q.dequeue(); append u to order
for each v in adj[u]:
in[v] := in[v] - 1
if in[v] = 0: Q.enqueue(v)
if |order| < |V|: report "cycle"

DFS-based:

DFS(G)
output vertices in decreasing f[u]
if any back edge found: report "cycle"

Prefer Kahn's when you want to detect cycles cheaply or produce an order under a tie-breaking rule (swap the FIFO for a heap to get lex-smallest); prefer DFS-based when you need finish times anyway (e.g., for SCCs).

Downstream patterns to master:

  • DP over DAG. Order vertices, relax in order; dp[v] = f(dp[u] for (u,v) in edges).
  • Critical path. Longest path in a DAG, same structure as shortest path but maximize.
  • Path counting. ways[v] = sum over (u, v) of ways[u], with ways[source] = 1.

Transfer / Where This Shows Up Later

  • S3 (systems). make, ninja, bazel, cargo, and every package manager's install phase topologically sort a dependency graph. Circular dependencies are a Kahn-detection bug report.
  • S4 (compilers). Expression-graph evaluation order, instruction scheduling (with priority to break ties), and data-flow worklist order all use topological sort.
  • S5 (OS). Boot sequences order services by systemd-style dependency graphs (a topological sort with priorities).
  • S6 (databases). Foreign-key insert/delete order, migration scripts, and materialized-view refresh order are topological sorts over table-dependency DAGs.
  • S7 (DDD). Event-processing pipelines as DAGs; event-choreography orchestrators topologically process saga steps.
  • S8 (distributed). Airflow/Luigi/Prefect DAG schedulers are topological sort with retry and idempotency wrappers.

Back-link to S1 M2 for the DAG definition; forward to M6 dynamic programming for DAG-DP as a generalized pattern.

Check Yourself

  1. Why does a graph admit a topological order iff it is a DAG?
  2. In Kahn's algorithm, why does "queue is empty before |V| vertices emitted" prove a cycle exists?
  3. Given a topological order, why can you solve shortest paths even with negative edge weights in O(|V| + |E|)?
  4. How would you produce the lexicographically smallest topological order?
  5. How do you count the number of distinct topological orders (hint: tricky; expected #P-hard in general).
  6. What breaks if you try DFS-based topological sort on a graph with a cycle?

Mini Drill or Application

Given the DAG

a -> b, a -> c
b -> d, c -> d
d -> e
c -> e

and edge weights

(a,b)=3, (a,c)=1, (b,d)=1, (c,d)=5, (d,e)=1, (c,e)=7

find the shortest path from a to e by relaxing edges in topological order. Then find the longest path from a to e in O(|V| + |E|) using the same trick.

Implementation drill. Implement:

  • Kahn's algorithm with in-degree tracking
  • DFS-based topological sort
  • DAG shortest path via topo order (accept negative weights)
  • A small build-order tool that reads (target, dependency) pairs from a file and prints a build order; error out with a clear message on a cycle

Verify both topo-sort implementations agree on the set of valid orders for a hand-built DAG with two incomparable branches.

Read This Only If Stuck