Skip to main content

Connected and Strongly Connected Components

What This Concept Is

In an undirected graph, a connected component is a maximal set of vertices mutually reachable by any path. One BFS or DFS from each unvisited vertex labels every component in O(|V| + |E|).

In a directed graph, reachability is not symmetric. Three connectivity notions coexist:

  • Weakly connected. The undirected version is connected.
  • Unilaterally connected. For every pair u, v, either u -> v or v -> u (but not necessarily both).
  • Strongly connected. For every pair u, v in C, there is a directed path u -> v and v -> u.

A strongly connected component (SCC) is a maximal strongly connected set. The SCCs partition V, and the condensation (the graph obtained by contracting each SCC to a single vertex) is always a DAG - no matter how tangled the original directed graph was.

Two classic algorithms compute SCCs in linear time:

  • Kosaraju's algorithm. Run DFS on G and push vertices on a stack in finish-time order. Reverse all edges to get G^T. Pop the stack; for each popped vertex, if unvisited, run DFS on G^T - each such DFS discovers one SCC.
  • Tarjan's algorithm. Single DFS that tracks a low[u] value (the smallest d[v] reachable by going down tree edges and then at most one back edge). SCCs form whenever low[u] = d[u] at the moment u finishes.

Both run in O(|V| + |E|) with adjacency lists. Kosaraju is easier to prove correct; Tarjan saves the graph-transpose pass.

Why It Matters Here

SCCs matter whenever a directed graph has mutual reachability pockets:

  • 2-SAT solvers collapse clauses to SCCs and read the satisfiability answer off the condensation.
  • In program analysis, mutually recursive function sets are exactly the SCCs of the call graph.
  • In social and web graphs, SCCs isolate tightly interconnected communities for further analysis (PageRank within an SCC, etc.).
  • In deadlock detection, an SCC in the wait-for graph of size >= 2 witnesses a deadlock.
  • In compilation, SCCs of the dependency graph are "recursive module groups" that must be linked together.

For undirected graphs, connected components are the coarser-grained version of the same idea. "Is the network split?" is a connected-component question; "are these services in the same recursive cluster?" is an SCC question.

Concrete Example

Directed graph:

The SCCs are {a, b, c}, {d, e, f}, and {g}. The condensation {a,b,c} -> {d,e,f} -> {g} is a 3-node DAG.

Kosaraju trace on this graph:

  1. First DFS on G (starting at a): visits a, b, c, d, e, f, g. Finish times (one legal order): f[g] < f[f] < f[e] < f[d] < f[c] < f[b] < f[a]. Stack top-to-bottom: a, b, c, d, e, f, g.
  2. Transpose G^T: flip every edge.
  3. Pop stack; DFS on G^T:
    • Pop a. DFS from a in G^T reaches c, b (not d - the only G^T edge out of d reaching c is via (c,d) reversed to d -> c, but we start at a, not d). SCC: {a, b, c}.
    • Pop b, c - already visited.
    • Pop d. DFS in G^T reaches f, e. SCC: {d, e, f}.
    • Pop e, f - already visited.
    • Pop g. DFS reaches only g. SCC: {g}.

Condensation:

Common Confusion / Misconceptions

  • "Strongly connected = connected (ignoring direction)." That is weakly connected. Strongly connected requires mutual reachability along directed paths.
  • "The condensation might have cycles." It cannot. A cycle in the condensation would merge two SCCs into one larger SCC, contradicting maximality.
  • "Kosaraju needs the reverse." Yes, non-negotiable. Without reversing edges, the second DFS does not correctly confine each run to a single SCC. This is the most common bug.
  • "Tarjan is strictly better." Tarjan saves a pass, but its low[u] bookkeeping is subtler and harder to debug. Kosaraju is a solid production default.
  • "Every SCC is at least two vertices." A single vertex with no self-loop is its own (trivial) SCC. That is by definition; the graph could be entirely trivial SCCs if acyclic.

How To Use It

To compute CCs (undirected):

  1. mark all vertices unvisited
  2. for each unvisited u, run DFS/BFS and tag all reached vertices with the same component id

To compute SCCs (directed) via Kosaraju:

  1. DFS on G, push each vertex on a stack at its finish time
  2. build G^T by reversing every edge
  3. pop vertices from the stack; for each unvisited popped vertex, DFS on G^T and tag all reached vertices with a new SCC id

Both run in O(|V| + |E|) with adjacency lists.

For applications:

  • After SCC decomposition, compute the condensation and run DAG algorithms (topological sort, longest path) on it.
  • For 2-SAT, build the implication graph, run SCCs, and check whether any variable and its negation lie in the same SCC.

Transfer / Where This Shows Up Later

  • S3 (systems). SCCs of the module dependency graph identify mutually recursive module groups that must be linked as a unit.
  • S4 (compilers). SCCs of the call graph identify recursive function cliques for whole-cluster inlining decisions. Data-flow analyses often process the condensation bottom-up.
  • S5 (OS). Deadlock detection in OS wait-for graphs is SCC detection; finding the deadlocked set is extracting the non-trivial SCC.
  • S6 (databases). Foreign-key SCCs expose cyclic table dependencies that migrations need to handle specially.
  • S7 (DDD). SCCs of a context map expose bounded-context cycles; fitness functions often target "no SCCs of size > 1 across contexts."
  • S8 (distributed). Replication topology SCC analysis identifies quorum-reachable clusters.

Back-link to DFS edge classification (concept 06) since SCC algorithms are its most important application.

Check Yourself

  1. Why is the condensation of a directed graph always a DAG?
  2. What goes wrong if you try Kosaraju without reversing the edges on the second pass?
  3. Contrast weakly connected, strongly connected, and connected (for undirected) in one sentence each.
  4. Why does Tarjan's low[u] = d[u] criterion identify the root of an SCC in the DFS tree?
  5. Give a 4-vertex directed graph whose SCCs change if you remove one specific edge. Which edge?
  6. How does 2-SAT use SCCs to decide satisfiability?

Mini Drill or Application

For the directed graph

1 -> 2, 2 -> 3, 3 -> 1
3 -> 4
4 -> 5, 5 -> 4
5 -> 6

find the SCCs by hand, using Kosaraju. Then draw the condensation and confirm it is a DAG.

Implementation drill. Implement Kosaraju from scratch on your Graph class. Test on:

  • a random directed graph with 10^4 vertices, confirming linear scaling
  • a single cycle (one SCC)
  • a tree (all SCCs trivial)
  • a 2-SAT instance: variables x1, x2, x3 with clauses (x1 OR x2), (NOT x1 OR x3), (NOT x2 OR NOT x3); decide satisfiability via SCCs of the implication graph

Then re-implement using Tarjan's algorithm and compare correctness on the same inputs.

Read This Only If Stuck