Skip to main content

DFS and the Edge Taxonomy

What This Concept Is

Depth-first search explores a graph by going as deep as possible before backtracking. For each vertex it records two timestamps:

  • d[u] - the discovery time, when u is first pushed on the DFS stack
  • f[u] - the finish time, when u's recursion returns

These timestamps produce a parenthesis structure: for any two vertices u and v, the intervals [d[u], f[u]] and [d[v], f[v]] are either nested or disjoint - never crossing. That single structural fact is the engine behind topological sort, SCC decomposition, bridge and articulation-point detection, and dominator trees.

On a directed graph, DFS classifies each edge (u, v) into exactly one of:

  • Tree edge - v was undiscovered when we explored (u, v).
  • Back edge - v is an ancestor of u in the DFS tree. Indicates a cycle.
  • Forward edge - v is a non-child descendant of u.
  • Cross edge - v lies in a separate subtree finished earlier (f[v] < d[u]).

On an undirected graph there are only tree and back edges, because "forward" and "cross" would require a direction the graph does not provide.

The distinction between DFS and BFS is structural, not cosmetic. DFS visits one branch exhaustively before moving on; BFS visits shallowly before deeply. DFS produces finish times and a parenthesis structure; BFS produces levels and shortest-path distances. Knowing which you need is step zero.

Why It Matters Here

Timestamps and edge types are the core of many graph algorithms:

  • Cycle detection. A directed graph has a cycle iff DFS finds a back edge.
  • Topological sort. On a DAG, vertices in decreasing f[u] order form a valid topological order.
  • Strongly connected components. Kosaraju and Tarjan both depend on finish times and the low-link structure DFS produces.
  • Bridges and articulation points. Computed from DFS tree plus the low function over back edges.
  • Biconnected components, 2-edge-connected components. All DFS-derived.
  • Dominator tree (used in SSA form). Built on DFS preorder.

DFS is not a prettier traversal - it is a source of structural information BFS does not provide.

Concrete Example

Pseudocode with global timer:

DFS(G):
time := 0
for each u in V: color[u] := WHITE
for each u in V:
if color[u] = WHITE: DFS-VISIT(u)

DFS-VISIT(u):
color[u] := GRAY
time := time + 1
d[u] := time
for each v in adj[u]:
if color[v] = WHITE:
parent[v] := u
DFS-VISIT(v)
color[u] := BLACK
time := time + 1
f[u] := time

On the directed graph

starting DFS at a:

stepeventdfclassification
1discover ad[a]=1
2traverse a -> bd[b]=2tree
3traverse b -> cd[c]=3tree
4look at c -> aback (a is GRAY ancestor)
5finish cf[c]=4
6finish bf[b]=5
7traverse a -> dd[d]=6tree
8finish df[d]=7
9finish af[a]=8

The back edge c -> a exposes the cycle a -> b -> c -> a. No forward or cross edges here.

Common Confusion / Misconceptions

  • "DFS edge types are a property of the graph." They are DFS-tree-dependent. Two different DFS starts can classify the same edge differently (forward vs cross). What is invariant is the existence of some back edge iff the directed graph has a cycle.
  • "Recursion is mandatory." Iterative DFS with an explicit stack works fine; the only care is pushing children in reverse if you want to match recursive order.
  • "DFS discovers shortest paths." No, it does not. DFS can return a long path to a vertex that BFS would reach in one edge. Use DFS for structural info, BFS for distances.
  • "Grayness is just bookkeeping." The GRAY color (vertex in progress) is exactly what distinguishes a back edge from a forward or cross edge. Three colors, not two, is the correct discipline.

How To Use It

Default DFS recipe:

  1. color every vertex WHITE
  2. for every WHITE vertex, call DFS-VISIT
  3. during DFS-VISIT, color it GRAY, record d[u], recurse into unvisited neighbors, then color BLACK and record f[u]
  4. classify edges on the fly using color: WHITE target = tree, GRAY = back, BLACK with d[u] < d[v] = forward, else cross

Use DFS when you need any of: cycle detection, topological sort, SCCs, articulation points, bridges, biconnectivity, or the tree-edge structure itself.

Two implementation notes:

  • For deep graphs (|V| > 10^5) increase your stack size or convert to iterative DFS. Default stack in Python is 1000 frames.
  • Store low[u] = min(d[u], d[v] for back edges) over descendants alongside the DFS; it supports Tarjan's SCC, bridges, and articulation points with no extra pass.

Transfer / Where This Shows Up Later

DFS-derived information shows up almost everywhere structured analysis is needed.

  • S3 (systems). Module load order (topological sort via DFS), dynamic-library dependency cycles, garbage-collection mark phase are all DFS in the runtime.
  • S4 (compilers). CFG analyses: dominator tree via DFS; loop detection via back edges; irreducibility detection; SSA construction uses the DFS spanning tree.
  • S5 (OS). Deadlock detection on the wait-for graph is DFS cycle detection; resource allocation graphs use DFS.
  • S6 (databases). DFS over dependency graphs for cascade deletes; query-plan validation uses DFS.
  • S7 (DDD). DFS of context maps finds cyclic dependencies fitness functions should flag.
  • S8 (distributed). Leader election via DFS spanning trees; flooding-DFS for membership view construction.

Back-link to S1 M2 where recursion is introduced as a proof technique and to S2 M2 where the stack data structure is formalized.

Check Yourself

  1. Why does the color scheme (WHITE/GRAY/BLACK) give you the edge classification for free?
  2. What is the parenthesis theorem saying, in one sentence?
  3. In an undirected graph, why are there only tree and back edges?
  4. State one corollary of "a directed graph has a cycle iff DFS finds a back edge."
  5. Why do two different DFS runs from different starting vertices produce different classifications of the same edges, yet agree on whether the graph has a cycle?
  6. How do you modify DFS to compute low[u] and detect bridges in one pass?

Mini Drill or Application

On the directed graph

1 -> 2, 1 -> 4
2 -> 3
3 -> 1, 3 -> 5
4 -> 3
5 -> 6
6 -> 4

run DFS starting at vertex 1, record d and f, and classify every edge. Identify any cycles exposed by back edges.

Then implement iterative DFS using an explicit stack and compare its behavior with recursive DFS - the order of neighbor exploration should match if you push in reverse.

Implementation drill. On top of your Graph class, write:

  • has_cycle(g) using back-edge detection
  • dfs_times(g) returning (d, f) arrays
  • classify_edges(g) returning a map (u, v) -> {tree, back, forward, cross}
  • find_bridges(g) for undirected graphs using low[u]

Verify each on a hand-drawn 6-vertex graph before running on random inputs.

Read This Only If Stuck