Skip to main content

What a Graph Is

What This Concept Is

A graph is a pair G = (V, E) where V is a set of vertices and E is a set of edges, each edge connecting a pair of vertices.

That one sentence hides a cluster of choices. Before you can use a graph algorithm, you must fix all of them:

  • Directed vs undirected. Is the edge (u, v) different from (v, u), or are the two ends symmetric?
  • Weighted vs unweighted. Does each edge carry a number (distance, capacity, cost, probability), or are edges uniform?
  • Simple vs multi. Can two distinct edges join the same pair? Can an edge loop from a vertex to itself?
  • Finite and labeled. How are vertices identified, and how many of them are there?

The same informal drawing can correspond to many different G = (V, E). The algorithm you pick will depend on which one you actually committed to.

Two additional vocabulary items you will use throughout the module:

  • Path - a sequence of vertices v_0, v_1, ..., v_k where each consecutive pair is joined by an edge, and no vertex repeats.
  • Walk - the same, but repeats are allowed.
  • Cycle - a closed walk with length at least 1 (directed) or 3 (undirected, simple), where only the start and end vertex coincide.
  • Degree - for undirected u, the number of edges incident to u. For directed u, you have in-degree and out-degree. The handshaking lemma sum of degrees = 2|E| falls out of counting edge-endpoints.

You will see the same graph modeled several different ways across this module. A road network becomes weighted and undirected when you want shortest driving distance, directed if some streets are one-way, and a multigraph if you care that two distinct streets connect the same pair. Committing to a model is half the work.

Why It Matters Here

Almost every pitfall in this module is secretly a modeling pitfall, not a coding pitfall.

  • Running BFS on a weighted graph and believing the answer is a shortest weighted path.
  • Running Dijkstra on a graph with negative edge weights because "it's a graph and I want shortest paths."
  • Counting self-loops or parallel edges in a degree sum without realizing the definition matters.
  • Calling DFS on an "undirected" version of a directed graph and losing the notion of a back edge.

Fixing the graph object before you fix the algorithm prevents most of this. Skiena puts it bluntly in The Algorithm Design Manual: "design graphs, not algorithms." Once the graph is right, the textbook algorithms handle the rest.

Concrete Example

Consider a road network where you can travel in both directions along each street, distances are in kilometers, and two intersections might be connected by several parallel streets.

The correct model is an undirected, weighted multigraph:

  • V = set of intersections
  • E = multiset of {u, v, w} triples, where w is the road length
  • there can be several edges between the same {u, v} pair

If you instead model it as a simple undirected graph by keeping only the shortest road between each pair, you have discarded information. That may be fine for shortest paths, but it is wrong for a question like "how many independent routes are there between u and v?"

A compiler build graph, by contrast, is a directed simple graph, often acyclic: u -> v means "building u requires v first." Direction matters; a cycle would mean unsatisfiable dependencies.

Common Confusion / Misconceptions

  • "The graph is a property of the problem." Learners often talk about "the graph" as if it were a single canonical object derived from the problem. In practice, you are choosing a model. On Facebook friendships are undirected; on Twitter they are directed. The graph follows from the operational question you are asking.
  • "Walks, paths, and cycles are interchangeable." They are not. A walk may revisit vertices; a path cannot; a cycle only coincides at the endpoints. Algorithms that count "paths" and algorithms that count "walks" differ by orders of magnitude.
  • "Self-loops and parallel edges are exotic." They matter in real graphs: parallel flights between two airports, multiple dependency edges from two features to one module, self-referential Wikipedia pages. Decide up front whether your model permits them.
  • "Infinite graphs are a theory toy." The Rubik's cube state graph and the game tree of chess are finite only in principle. You rarely materialize such graphs; you explore them lazily with BFS or DFS.

How To Use It

Before naming an algorithm, write a one-sentence model:

  1. what is a vertex?
  2. what is an edge, and why?
  3. is it directed?
  4. is it weighted, and what do the weights mean?
  5. can parallel edges or self-loops occur?
  6. what is the target question expressed in graph terms (reach, shortest path, spanning, matching, cut)?
  7. what structural shortcut applies (DAG, tree, bipartite, planar)?

If any of those answers is vague, you are not ready to pick an algorithm.

Transfer / Where This Shows Up Later

The graph object you define here is the vocabulary every later semester leans on.

  • S1 M2 (discrete math foundations). Degree, handshaking, and Eulerian/Hamiltonian cycles all live in this vocabulary. Back-link: if the terms above feel slippery, revisit the graph-theory primer before proceeding.
  • S3 (computer systems). Module dependency graphs, linker resolution, and call graphs are directed graphs whose acyclicity (or lack thereof) drives the tooling.
  • S4 (compilers). Control flow graphs (CFGs), SSA dominance trees, and interference graphs for register allocation - each one is a named specialization of G = (V, E) with constraints attached.
  • S5 (operating systems). Process dependency graphs for job scheduling, wait-for graphs used in deadlock detection, and resource allocation graphs.
  • S6 (databases). Query plans as DAGs, foreign-key dependency graphs, and replication topologies.
  • S7 (architecture, DDD). Service topology, bounded-context maps, and component dependency graphs that fitness functions enforce.
  • S8 (distributed systems). Service meshes, consistent-hashing rings (as graphs), and failure-domain analysis.

Whenever a later module mentions "dependency graph," "call graph," or "topology," the entity it refers to is built on this page.

Check Yourself

  1. Why does the same real-world situation sometimes admit both a directed and an undirected model?
  2. What is a path, a walk, and a cycle, and how do they differ?
  3. Why does sum of degrees = 2|E| hold for undirected graphs but not directed ones?
  4. When do self-loops matter and when can you discard them?
  5. Give an example of a real system whose natural model is a multigraph, not a simple graph.

Mini Drill or Application

For each scenario, write the full model (V, E, directed?, weighted?, simple?):

  1. A road map used for driving directions.
  2. A social network where "followed by" is the relation.
  3. A build system for a C++ project with header dependencies.
  4. A currency-exchange graph where you want to detect arbitrage.
  5. A chessboard where vertices are squares and edges are legal knight moves.

Then, for each, name one question that changes the answer if you change direction or weights.

Implementation drill. In your language of choice, implement a generic Graph class with one vertex type and a flag for directedness. Expose add_vertex, add_edge(u, v, w=1), neighbors(u), degree(u) (and in_degree/out_degree for directed). Use it for the rest of the module so you never rewrite the model.

Read This Only If Stuck