Module 3: Computer Organization & Architecture: Case Studies

These case studies connect C code to instructions, registers, cache lines, branch predictors, SIMD, and memory hierarchy behavior.

Case Study 1: Same Big-O, Different Cache Behavior

Scenario: Two matrix traversal loops both visit every element. Row-major traversal is much faster than column-major traversal on a row-major array.

Source anchor: Ulrich Drepper's What Every Programmer Should Know About Memory explains cache locality and memory hierarchy effects that make access order visible in runtime.

Module concepts: cache line, locality, row-major order, memory hierarchy.

Wrong Approach

"Same O(n), same performance."

Better Approach

Walk memory in layout order:

for (int i = 0; i < rows; i++)
  for (int j = 0; j < cols; j++)
    sum += a[i][j];

Tradeoff Table

Choice	Gain	Cost
Ignore layout and use any traversal	Simple reasoning from algorithm shape	Can waste cache bandwidth badly
Match row-major layout	Better locality and throughput	Requires awareness of representation
Block/tile traversal	Even stronger cache reuse	More code and tuning effort

Failure Mode

A loop that looks equivalent in asymptotic analysis runs far slower because every access misses useful cache locality.

Required Artifact

Draw cache lines for row-major and column-major traversal and benchmark both.

Project / Capstone Connection

Use this reasoning when optimizing image, matrix, or buffer-heavy code in later performance work.

Case Study 2: Compiler Explorer Reveals A Branch

Scenario: A tight loop with an unpredictable if runs slower than a branchless version.

Source anchor: Compiler Explorer exposes the generated assembly so learners can inspect whether the compiler emitted a branch, conditional move, or another form.

Module concepts: assembly, branch, prediction, generated code.

Wrong Approach

Guess from C source alone.

Better Approach

Inspect assembly:

C branch:
  compare + conditional jump

branchless version:
  conditional move or arithmetic mask

Tradeoff Table

Choice	Gain	Cost
Reason only from source	Fastest first pass	Hides actual machine behavior
Inspect emitted assembly	Grounded evidence	Requires ABI and instruction literacy
Force branchless code everywhere	May help hot unpredictable paths	Can hurt readability or other workloads

Failure Mode

An "obvious" micro-optimization changes source shape but not the generated branch pattern, so performance assumptions stay wrong.

Required Artifact

Paste two C snippets into Compiler Explorer and annotate the branch instruction.

Project / Capstone Connection

Use this workflow whenever you claim a hot-path optimization in systems benchmarks or writeups.

Scenario: Four threads update separate counters in the same cache line. Performance collapses because cache lines bounce between cores.

Source anchor: Drepper's memory paper and cache coherence concepts explain why independent variables can still interfere when they share a cache line.

Module concepts: cache line, coherence, false sharing, padding.

Wrong Approach

"Different variables cannot contend."

Better Approach

Separate hot counters by cache line:

struct Counter {
  alignas(64) long value;
};

Tradeoff Table

Choice	Gain	Cost
Pack counters tightly	Less memory use	Coherence traffic can dominate runtime
Pad to cache-line size	Removes false sharing	Wastes space
Aggregate locally then merge	Limits contention further	Adds merge logic and latency

Failure Mode

Each thread updates its own field, but cache coherence invalidations make throughput collapse under multicore load.

Required Artifact

Draw the cache line before/after padding and write a benchmark plan.

Project / Capstone Connection

Apply this when designing per-thread metrics, queues, or worker-state structures in concurrent code.

Case Study 4: Function Call ABI Misread

Scenario: A learner writes inline assembly or reads disassembly and cannot explain where arguments and return values live.

Source anchor: ABI and calling-convention documents are platform-specific; Compiler Explorer and disassembly make the active convention visible on the target toolchain.

Module concepts: register file, stack pointer, calling convention, return address.

Wrong Approach

Assume function calls are abstract jumps with no machine contract.

Better Approach

Trace:

arguments:
  registers/stack by ABI

call:
  pushes or records return address

return:
  value in return register

Tradeoff Table

Choice	Gain	Cost
Ignore ABI details	Less initial complexity	Hard to read disassembly or debug low-level issues
Learn active calling convention	Better debugging and interop	Platform-specific material to absorb
Inline assembly without ABI care	Quick experiments	Easy register clobber and stack bugs

Failure Mode

Inline assembly or FFI code appears correct in source but corrupts arguments, return values, or caller state because ABI rules were guessed.

Required Artifact

Annotate disassembly for a function with six integer arguments and one return value.

Project / Capstone Connection

Use this foundation for debugger sessions, syscall wrappers, and any low-level interop in later modules.

Case Study 5: SIMD Opportunity Hidden In Scalar Loop

Scenario: A loop sums arrays element-by-element. The compiler can vectorize only after aliasing and alignment assumptions are clarified.

Source anchor: Compiler diagnostics and Compiler Explorer reveal vectorization decisions. See GCC optimization options.

Module concepts: SIMD, aliasing, alignment, compiler optimization.

Wrong Approach

"The compiler always optimizes obvious loops."

Better Approach

Make assumptions explicit:

void add(size_t n, float *restrict out,
         const float *restrict a,
         const float *restrict b);

Tradeoff Table

Choice	Gain	Cost
Leave aliasing ambiguous	Minimal API claims	Blocks vectorization opportunities
Add `restrict` and alignment facts	Enables stronger optimization	Incorrect promises create undefined behavior
Hand-write SIMD	Maximum control	Larger maintenance and portability burden

Failure Mode

The compiler declines vectorization because pointers might alias, so a hot numeric loop stays scalar despite suitable hardware.

Required Artifact

Compare assembly/vectorization report before and after restrict or alignment changes.

Project / Capstone Connection

Use this evidence pattern when you justify performance claims for numeric or media-processing kernels.

Source Map

Source	Use it for
What Every Programmer Should Know About Memory	cache and memory hierarchy
Compiler Explorer	assembly inspection
GCC optimization options	optimization/vectorization evidence

Completion Standard

At least three artifacts are completed.
At least one artifact includes disassembly.
At least one artifact explains cache-line behavior.

Case Study 1: Same Big-O, Different Cache Behavior​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 2: Compiler Explorer Reveals A Branch​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 3: False Sharing In Counters​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 4: Function Call ABI Misread​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 5: SIMD Opportunity Hidden In Scalar Loop​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Source Map​

Completion Standard​

Case Study 1: Same Big-O, Different Cache Behavior

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 2: Compiler Explorer Reveals A Branch

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 3: False Sharing In Counters

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 4: Function Call ABI Misread

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 5: SIMD Opportunity Hidden In Scalar Loop

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Source Map

Completion Standard