Connected and Strongly Connected Components
What This Concept Is
In an undirected graph, a connected component is a maximal set of vertices mutually reachable by any path. One BFS or DFS from each unvisited vertex labels every component in O(|V| + |E|).
In a directed graph, reachability is not symmetric. Three connectivity notions coexist:
- Weakly connected. The undirected version is connected.
- Unilaterally connected. For every pair
u, v, eitheru -> vorv -> u(but not necessarily both). - Strongly connected. For every pair
u, v in C, there is a directed pathu -> vandv -> u.
A strongly connected component (SCC) is a maximal strongly connected set. The SCCs partition V, and the condensation (the graph obtained by contracting each SCC to a single vertex) is always a DAG - no matter how tangled the original directed graph was.
Two classic algorithms compute SCCs in linear time:
- Kosaraju's algorithm. Run DFS on
Gand push vertices on a stack in finish-time order. Reverse all edges to getG^T. Pop the stack; for each popped vertex, if unvisited, run DFS onG^T- each such DFS discovers one SCC. - Tarjan's algorithm. Single DFS that tracks a
low[u]value (the smallestd[v]reachable by going down tree edges and then at most one back edge). SCCs form wheneverlow[u] = d[u]at the momentufinishes.
Both run in O(|V| + |E|) with adjacency lists. Kosaraju is easier to prove correct; Tarjan saves the graph-transpose pass.
Why It Matters Here
SCCs matter whenever a directed graph has mutual reachability pockets:
2-SATsolvers collapse clauses to SCCs and read the satisfiability answer off the condensation.- In program analysis, mutually recursive function sets are exactly the SCCs of the call graph.
- In social and web graphs, SCCs isolate tightly interconnected communities for further analysis (PageRank within an SCC, etc.).
- In deadlock detection, an SCC in the wait-for graph of size
>= 2witnesses a deadlock. - In compilation, SCCs of the dependency graph are "recursive module groups" that must be linked together.
For undirected graphs, connected components are the coarser-grained version of the same idea. "Is the network split?" is a connected-component question; "are these services in the same recursive cluster?" is an SCC question.
Concrete Example
Directed graph:
The SCCs are {a, b, c}, {d, e, f}, and {g}. The condensation {a,b,c} -> {d,e,f} -> {g} is a 3-node DAG.
Kosaraju trace on this graph:
- First DFS on
G(starting ata): visitsa, b, c, d, e, f, g. Finish times (one legal order):f[g] < f[f] < f[e] < f[d] < f[c] < f[b] < f[a]. Stack top-to-bottom:a, b, c, d, e, f, g. - Transpose
G^T: flip every edge. - Pop stack; DFS on
G^T:- Pop
a. DFS fromainG^Treachesc, b(notd- the onlyG^Tedge out ofdreachingcis via(c,d)reversed tod -> c, but we start ata, notd). SCC:{a, b, c}. - Pop
b, c- already visited. - Pop
d. DFS inG^Treachesf, e. SCC:{d, e, f}. - Pop
e, f- already visited. - Pop
g. DFS reaches onlyg. SCC:{g}.
- Pop
Condensation:
Common Confusion / Misconceptions
- "Strongly connected = connected (ignoring direction)." That is weakly connected. Strongly connected requires mutual reachability along directed paths.
- "The condensation might have cycles." It cannot. A cycle in the condensation would merge two SCCs into one larger SCC, contradicting maximality.
- "Kosaraju needs the reverse." Yes, non-negotiable. Without reversing edges, the second DFS does not correctly confine each run to a single SCC. This is the most common bug.
- "Tarjan is strictly better." Tarjan saves a pass, but its
low[u]bookkeeping is subtler and harder to debug. Kosaraju is a solid production default. - "Every SCC is at least two vertices." A single vertex with no self-loop is its own (trivial) SCC. That is by definition; the graph could be entirely trivial SCCs if acyclic.
How To Use It
To compute CCs (undirected):
- mark all vertices unvisited
- for each unvisited
u, run DFS/BFS and tag all reached vertices with the same component id
To compute SCCs (directed) via Kosaraju:
- DFS on
G, push each vertex on a stack at its finish time - build
G^Tby reversing every edge - pop vertices from the stack; for each unvisited popped vertex, DFS on
G^Tand tag all reached vertices with a new SCC id
Both run in O(|V| + |E|) with adjacency lists.
For applications:
- After SCC decomposition, compute the condensation and run DAG algorithms (topological sort, longest path) on it.
- For 2-SAT, build the implication graph, run SCCs, and check whether any variable and its negation lie in the same SCC.
Transfer / Where This Shows Up Later
- S3 (systems). SCCs of the module dependency graph identify mutually recursive module groups that must be linked as a unit.
- S4 (compilers). SCCs of the call graph identify recursive function cliques for whole-cluster inlining decisions. Data-flow analyses often process the condensation bottom-up.
- S5 (OS). Deadlock detection in OS wait-for graphs is SCC detection; finding the deadlocked set is extracting the non-trivial SCC.
- S6 (databases). Foreign-key SCCs expose cyclic table dependencies that migrations need to handle specially.
- S7 (DDD). SCCs of a context map expose bounded-context cycles; fitness functions often target "no SCCs of size > 1 across contexts."
- S8 (distributed). Replication topology SCC analysis identifies quorum-reachable clusters.
Back-link to DFS edge classification (concept 06) since SCC algorithms are its most important application.
Check Yourself
- Why is the condensation of a directed graph always a DAG?
- What goes wrong if you try Kosaraju without reversing the edges on the second pass?
- Contrast weakly connected, strongly connected, and connected (for undirected) in one sentence each.
- Why does Tarjan's
low[u] = d[u]criterion identify the root of an SCC in the DFS tree? - Give a 4-vertex directed graph whose SCCs change if you remove one specific edge. Which edge?
- How does 2-SAT use SCCs to decide satisfiability?
Mini Drill or Application
For the directed graph
1 -> 2, 2 -> 3, 3 -> 1
3 -> 4
4 -> 5, 5 -> 4
5 -> 6
find the SCCs by hand, using Kosaraju. Then draw the condensation and confirm it is a DAG.
Implementation drill. Implement Kosaraju from scratch on your Graph class. Test on:
- a random directed graph with
10^4vertices, confirming linear scaling - a single cycle (one SCC)
- a tree (all SCCs trivial)
- a 2-SAT instance: variables
x1, x2, x3with clauses(x1 OR x2), (NOT x1 OR x3), (NOT x2 OR NOT x3); decide satisfiability via SCCs of the implication graph
Then re-implement using Tarjan's algorithm and compare correctness on the same inputs.
Read This Only If Stuck
- CLRS 20.5: Strongly Connected Components
- CLRS 20.5: Strongly Connected Components (Part 2)
- CLRS 20.5: Strongly Connected Components (Part 3)
- ADM 5.10.2: Strongly Connected Components
- ADM 5.10: DFS on Directed Graphs
- Sedgewick: Directed Graphs (Part 1)
- Sedgewick: Connectivity
- Competitive Programming 4.2.9: SCC (Tarjan)
- cp-algorithms: Strongly Connected Components and Condensation
- cp-algorithms: Finding Connected Components