Adjacency List vs Adjacency Matrix

What This Concept Is

A graph G = (V, E) can be stored in two standard ways, plus a few hybrids you should recognize:

Adjacency list. For each vertex u, keep a container (array, linked list, or hash set) of its neighbors. Total size is Theta(|V| + |E|).
Adjacency matrix. A |V| x |V| array A with A[u][v] = 1 (or the edge weight) if (u, v) is an edge, else 0 (or +infinity). Total size is Theta(|V|^2).
Edge list. A plain array of (u, v, w) triples. Great for Kruskal and Bellman-Ford; terrible for neighbor lookup.
CSR (compressed sparse row). Two flat arrays - offsets[|V| + 1] and targets[|E|] - giving cache-friendly sequential access. Used in high-performance graph libraries (Boost.Graph, NetworkX's SciPy backend, GAP).

They are informationally equivalent. They are not operationally equivalent, and the choice silently fixes the running time of every algorithm on top.

Why It Matters Here

The representation choice silently determines the running time of everything that follows.

BFS and DFS visit each vertex and each edge once. With a list, they run in O(|V| + |E|). With a matrix, they run in O(|V|^2) because finding neighbors of u forces you to scan a full row.
Dijkstra with a binary heap is O((|V| + |E|) log |V|) on a list, but a matrix version costs at least O(|V|^2) from the neighbor scan alone.
Checking "is (u, v) an edge?" is O(1) on a matrix, but O(deg(u)) on a list.
Floyd-Warshall is defined in terms of a matrix - the inner dist[i][k] + dist[k][j] access is what makes it O(|V|^3) with a tight constant.

So the rule is not "lists are better." The rule is: match the representation to the density of the graph and the operations the algorithm needs.

Concrete Example

A social graph on |V| = 10^6 users where each user has about 200 friends.

|E| is about 10^8.
Adjacency list: Theta(|V| + |E|) ~= 10^8 edge cells. Feasible.
Adjacency matrix: Theta(|V|^2) = 10^{12} cells. Infeasible, regardless of algorithm.

By contrast, a 500-vertex dense graph where every pair is connected has |E| ~ 125{,}000. The matrix is 250{,}000 cells - smaller than the list representation with its per-entry overhead. Here the matrix is the right choice, and algorithms like Floyd-Warshall are written as if the matrix is the only input.

A pocket example to trace:

Representation	Storage for this 4-node graph	`has_edge(1, 3)` cost	`neighbors(1)` cost
Adjacency list	5 edge entries plus 4 vertex headers	`O(deg(1)) = 2`	`O(deg(1)) = 2`
Adjacency matrix	16 cells (`4 x 4`)	`O(1)`	`O(4) = O(
Edge list	5 triples	`O(	E

Tiny graphs make the constants obvious, but the |V|-vs-|V|^2 scaling is where the real choice is made.

Common Confusion / Misconceptions

"Many edges = dense graph." Density ratio |E| / |V|^2, not raw edge count, decides. 10^8 edges on 10^6 vertices is still extremely sparse (ratio 10^{-4}).
"Matrices are wasteful." Not if the graph is actually dense; the matrix row scan is cache-friendly and the code is short. Floyd-Warshall tightens to a few CPU cycles per iteration on a matrix.
"Hash-set adjacency is always best." Hash sets give O(1) has_edge, but they cost more memory and cache misses than a sorted list. Use them only when has_edge is the inner loop.
"I have to pick one." Many production systems maintain both: a CSR for traversal, plus a bitset or hash for edge-existence. The cost of duplication is often less than the cost of a wrong asymptotic.

How To Use It

Pick a representation by asking three questions:

What is the density |E| / |V|^2? If it is near 1, prefer matrix. If it is small, prefer list.
Which operation dominates? Edge-existence queries favor matrices or hash sets; neighbor enumeration favors lists.
What are the memory constraints? |V|^2 must actually fit.

As a default in this module: use adjacency lists unless the algorithm (Floyd-Warshall, transitive closure, matrix-multiplication shortest paths) explicitly demands a matrix.

Engineering checklist when you implement:

give each vertex an integer id 0..|V|-1; hash maps for string keys belong in a separate layer
store weights inside the adjacency entry (struct { int to; int w; }) not in a parallel array
if the graph is undirected, insert each edge twice; budget 2|E| memory accordingly
for directed graphs, often useful to store the reverse adjacency as well (needed for Kosaraju, transposed BFS)

Transfer / Where This Shows Up Later

Representation choice drives performance in every downstream system that stores a graph.

S3 (systems). CPU cache behavior makes CSR dominant in production BFS/DFS. The linker's module-dependency graph is held as a CSR.
S4 (compilers). Control flow graphs use adjacency lists (small, sparse, few predecessors). Interference graphs for register allocation use adjacency matrices when |V| is small (a single function), lists otherwise.
S5 (OS). Scheduler's wait-for graph is dense-ish in small processes; bitmasks (a compressed matrix) are the idiomatic representation.
S6 (databases). Query planners keep plan DAGs as adjacency lists; join dependency graphs are matrix-like when the planner uses dynamic programming bitmasks.
S7 (DDD). Context maps and service dependency graphs drive fitness functions whose implementations scan adjacency lists.
S8 (distributed). Service meshes and replication topologies expose neighbor queries; NetworkX / Boost.Graph internals use CSR.

Revisit S1 M2 if the |V| + |E| vs |V|^2 comparison still feels abstract - the space-complexity reasoning is the same one you met for set representations.

Check Yourself

Why is BFS on an adjacency matrix O(|V|^2) instead of O(|V| + |E|)?
When would an adjacency set (hash set per vertex) beat both a list and a matrix?
For a weighted graph, how do each of the two representations store weights?
Why does |E| = O(|V|^2) for simple graphs, and why does that matter for asymptotic comparisons?
What does CSR buy you over a list-of-lists that a plain vector<vector<int>> does not?

Mini Drill or Application

For each graph, recommend a representation and justify with a density estimate:

US road network: |V| ~= 2 x 10^7, average degree ~ 3.
Flight network between major airports: |V| ~= 10^3, average degree ~ 50.
Random Erdos-Renyi G(100, 0.5): expected |E| ~= 2500.
Webcrawl of top-level domain: |V| ~= 10^9, |E| ~= 10^{11}.
Petersen graph: |V| = 10, |E| = 15.

Then implement a minimal Graph class in your language of choice that supports add_edge, neighbors(u), and has_edge(u, v), once with a list and once with a matrix, and benchmark neighbor iteration on a random sparse graph of |V| = 10^5, average degree 10. Measure cache misses if your profiler supports it.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​