Skip to main content

Module 3: Graph Algorithms & Network Analysis: Case Studies

These cases turn graph vocabulary into engineering judgment: choose the right graph model, name the constraints, prove the algorithm fits, and produce an artifact that another engineer can inspect.


Case Study 1: Git Commit History Is a DAG

Scenario: A learner thinks a Git branch is a folder containing commits. During a release fix, they cannot explain why a commit appears on two branches after a merge, or why rewriting history changes what collaborators can reach.

Source anchor: Git commit-graph documentation documents Git's commit graph structure and graph-walk optimization.

Module concepts:

  • directed acyclic graph
  • reachability
  • ancestry
  • merge commits

Wrong Approach

Treat branches as containers and reason only from branch names.

Better Approach

Model commits as nodes and parent links as directed edges. A branch is a movable pointer into the graph. Reachability from that pointer explains ancestry, merge bases, and why history rewriting surprises collaborators.

Tradeoff Table

ChoiceGainCost
Folder mental modelEasy beginner storyFails for merges and rebases
DAG modelExplains ancestry and reachabilityRequires graph vocabulary
Commit-graph accelerationFaster graph walksAdds implementation detail

Failure Mode

A learner force-pushes after rewriting reachable commits and teammates continue from incompatible history.

Required Artifact

Draw six commits with two branches and one merge commit. Mark the commits reachable from main, the commits reachable from the feature branch, and the merge base.

Project / Capstone Connection

Use the diagram style to explain repository history in the portfolio and to defend merge/rebase choices during collaboration.


Case Study 2: Package Installation Requires Topological Order

Scenario: A package installer must install app, auth, db-driver, crypto, and logging. The first implementation sorts package names alphabetically and fails when a package runs before its dependency exists.

Source anchor: Python graphlib.TopologicalSorter provides a standard-library reference for topological sorting over dependency graphs.

Module concepts:

  • dependency graph
  • topological sort
  • cycle detection
  • DAG precondition

Wrong Approach

Sort packages by name or by the order they appear in a config file.

Better Approach

Represent each package as a node and each prerequisite relationship as an edge. Run topological sort to produce a valid install order, and report a cycle when no valid order exists.

Tradeoff Table

ChoiceGainCost
Alphabetical orderSimple and deterministicIgnores dependencies
Topological sortCorrect for DAG dependenciesNeeds cycle handling
Full resolverHandles versions and conflictsMore complex than ordering alone

Failure Mode

A circular dependency creates an impossible install order, but the tool reports a vague install failure instead of the cycle.

Required Artifact

Model five packages as a DAG, produce two valid install orders, then add one edge that creates a cycle and show the detection output.

Project / Capstone Connection

Use this artifact when explaining build pipelines, prerequisite planning, or task scheduling in later projects.


Case Study 3: Network Routing Is Not Always Dijkstra

Scenario: A service routes traffic through data centers. Engineers pick the lowest-latency path, but policy requires avoiding a region for compliance and congestion changes weights during incidents.

Source anchor: MIT OCW Dijkstra lecture covers shortest paths in DAGs and Dijkstra's algorithm for graphs without negative edges.

Module concepts:

  • weighted graph
  • shortest path
  • edge weights as model choices
  • algorithm preconditions

Wrong Approach

Assume "shortest" means "best" without defining the weight or checking algorithm assumptions.

Better Approach

Define what each edge weight represents: latency, cost, policy penalty, congestion, or risk. Use Dijkstra only when weights are nonnegative and the objective is truly additive shortest path. If policy dominates, encode it in the model or choose a different routing rule.

Tradeoff Table

ChoiceGainCost
Lowest latencyEasy to measureCan violate policy
Weighted policy scoreCaptures business constraintsHarder to explain
Dynamic routingAdapts to incidentsNeeds stability and observability

Failure Mode

The route is mathematically shortest under latency but operationally wrong because the graph omitted policy constraints.

Required Artifact

Create a five-node network where the lowest-latency path differs from the policy-preferred path. Show edge weights, chosen path, and the reason.

Project / Capstone Connection

Use this model-choice note when the capstone uses ranking, routing, recommendation, or scheduling weights.


Case Study 4: Review Assignment As Bipartite Matching

Scenario: A team assigns reviewers to pull requests. Greedy assignment picks the first available reviewer, then later PRs cannot be reviewed because their only eligible reviewers are already full.

Source anchor: CMU maximum bipartite matching notes frame matching as selecting compatible pairs subject to constraints.

Module concepts:

  • bipartite graph
  • matching
  • capacity
  • max-flow reduction

Wrong Approach

Assign each task to the first eligible person with remaining capacity.

Better Approach

Model reviewers and pull requests as two node sets. Add edges for eligibility, capacities for reviewer load, and solve for maximum feasible assignment. Use greedy only when you can prove it cannot block scarce matches.

Tradeoff Table

ChoiceGainCost
Greedy assignmentSimple and fastCan block scarce matches
Bipartite matchingFinds feasible maximum assignmentNeeds graph construction
Weighted matchingHandles preferencesMore complex objective

Failure Mode

A broadly eligible reviewer is consumed by an easy PR, leaving a specialized PR with no reviewer.

Required Artifact

Build a four-PR, four-reviewer matching example where greedy fails but maximum matching succeeds. Show the graph and final assignment.

Project / Capstone Connection

Use this artifact for any capstone feature that assigns scarce people, machines, jobs, or resources.


Case Study 5: Fraud Rings As Connected Components

Scenario: A marketplace fraud team links accounts by shared phone numbers, payment cards, devices, and shipping addresses. Individual account checks miss clusters that only become suspicious as a connected component.

Source anchor: The module's graph traversal lessons connect BFS/DFS to reachability and component discovery.

Module concepts:

  • undirected graph
  • connected components
  • BFS and DFS
  • model risk

Wrong Approach

Review each account independently and ignore relationship edges.

Better Approach

Build a graph where accounts and shared identifiers form edges, then use traversal to find connected components. Treat edge types carefully because shared household devices are different from shared stolen cards.

Tradeoff Table

ChoiceGainCost
Account-only reviewSimple and explainableMisses coordinated behavior
Component detectionFinds clustersSensitive to noisy edges
Typed weighted graphBetter signal qualityMore modeling work

Failure Mode

The system flags an entire apartment building because shared Wi-Fi was treated like shared payment identity.

Required Artifact

Draw a mixed account-identifier graph, identify connected components, and label which edge types are strong or weak evidence.

Project / Capstone Connection

Use this artifact to explain graph-modeling risks in recommendation, security, dependency, or social-network capstones.


Source Map

SourceUse it for
Git commit-graph documentationCommit graph, reachability, and graph-walk motivation.
Python graphlib.TopologicalSorterTopological sorting and cycle detection in dependency graphs.
MIT OCW Dijkstra lectureShortest-path algorithm preconditions and weighted-graph reasoning.
CMU maximum bipartite matching notesMatching model and compatibility graph framing.