Module 3: Computer Organization & Architecture: Case Studies
These case studies connect C code to instructions, registers, cache lines, branch predictors, SIMD, and memory hierarchy behavior.
Case Study 1: Same Big-O, Different Cache Behavior
Scenario: Two matrix traversal loops both visit every element. Row-major traversal is much faster than column-major traversal on a row-major array.
Source anchor: Ulrich Drepper's What Every Programmer Should Know About Memory explains cache locality and memory hierarchy effects that make access order visible in runtime.
Module concepts: cache line, locality, row-major order, memory hierarchy.
Wrong Approach
"Same O(n), same performance."
Better Approach
Walk memory in layout order:
for (int i = 0; i < rows; i++)
for (int j = 0; j < cols; j++)
sum += a[i][j];
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Ignore layout and use any traversal | Simple reasoning from algorithm shape | Can waste cache bandwidth badly |
| Match row-major layout | Better locality and throughput | Requires awareness of representation |
| Block/tile traversal | Even stronger cache reuse | More code and tuning effort |
Failure Mode
A loop that looks equivalent in asymptotic analysis runs far slower because every access misses useful cache locality.
Required Artifact
Draw cache lines for row-major and column-major traversal and benchmark both.
Project / Capstone Connection
Use this reasoning when optimizing image, matrix, or buffer-heavy code in later performance work.
Case Study 2: Compiler Explorer Reveals A Branch
Scenario: A tight loop with an unpredictable if runs slower than a branchless version.
Source anchor: Compiler Explorer exposes the generated assembly so learners can inspect whether the compiler emitted a branch, conditional move, or another form.
Module concepts: assembly, branch, prediction, generated code.
Wrong Approach
Guess from C source alone.
Better Approach
Inspect assembly:
C branch:
compare + conditional jump
branchless version:
conditional move or arithmetic mask
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Reason only from source | Fastest first pass | Hides actual machine behavior |
| Inspect emitted assembly | Grounded evidence | Requires ABI and instruction literacy |
| Force branchless code everywhere | May help hot unpredictable paths | Can hurt readability or other workloads |
Failure Mode
An "obvious" micro-optimization changes source shape but not the generated branch pattern, so performance assumptions stay wrong.
Required Artifact
Paste two C snippets into Compiler Explorer and annotate the branch instruction.
Project / Capstone Connection
Use this workflow whenever you claim a hot-path optimization in systems benchmarks or writeups.
Case Study 3: False Sharing In Counters
Scenario: Four threads update separate counters in the same cache line. Performance collapses because cache lines bounce between cores.
Source anchor: Drepper's memory paper and cache coherence concepts explain why independent variables can still interfere when they share a cache line.
Module concepts: cache line, coherence, false sharing, padding.
Wrong Approach
"Different variables cannot contend."
Better Approach
Separate hot counters by cache line:
struct Counter {
alignas(64) long value;
};
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Pack counters tightly | Less memory use | Coherence traffic can dominate runtime |
| Pad to cache-line size | Removes false sharing | Wastes space |
| Aggregate locally then merge | Limits contention further | Adds merge logic and latency |
Failure Mode
Each thread updates its own field, but cache coherence invalidations make throughput collapse under multicore load.
Required Artifact
Draw the cache line before/after padding and write a benchmark plan.
Project / Capstone Connection
Apply this when designing per-thread metrics, queues, or worker-state structures in concurrent code.
Case Study 4: Function Call ABI Misread
Scenario: A learner writes inline assembly or reads disassembly and cannot explain where arguments and return values live.
Source anchor: ABI and calling-convention documents are platform-specific; Compiler Explorer and disassembly make the active convention visible on the target toolchain.
Module concepts: register file, stack pointer, calling convention, return address.
Wrong Approach
Assume function calls are abstract jumps with no machine contract.
Better Approach
Trace:
arguments:
registers/stack by ABI
call:
pushes or records return address
return:
value in return register
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Ignore ABI details | Less initial complexity | Hard to read disassembly or debug low-level issues |
| Learn active calling convention | Better debugging and interop | Platform-specific material to absorb |
| Inline assembly without ABI care | Quick experiments | Easy register clobber and stack bugs |
Failure Mode
Inline assembly or FFI code appears correct in source but corrupts arguments, return values, or caller state because ABI rules were guessed.
Required Artifact
Annotate disassembly for a function with six integer arguments and one return value.
Project / Capstone Connection
Use this foundation for debugger sessions, syscall wrappers, and any low-level interop in later modules.
Case Study 5: SIMD Opportunity Hidden In Scalar Loop
Scenario: A loop sums arrays element-by-element. The compiler can vectorize only after aliasing and alignment assumptions are clarified.
Source anchor: Compiler diagnostics and Compiler Explorer reveal vectorization decisions. See GCC optimization options.
Module concepts: SIMD, aliasing, alignment, compiler optimization.
Wrong Approach
"The compiler always optimizes obvious loops."
Better Approach
Make assumptions explicit:
void add(size_t n, float *restrict out,
const float *restrict a,
const float *restrict b);
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Leave aliasing ambiguous | Minimal API claims | Blocks vectorization opportunities |
Add restrict and alignment facts | Enables stronger optimization | Incorrect promises create undefined behavior |
| Hand-write SIMD | Maximum control | Larger maintenance and portability burden |
Failure Mode
The compiler declines vectorization because pointers might alias, so a hot numeric loop stays scalar despite suitable hardware.
Required Artifact
Compare assembly/vectorization report before and after restrict or alignment changes.
Project / Capstone Connection
Use this evidence pattern when you justify performance claims for numeric or media-processing kernels.
Source Map
| Source | Use it for |
|---|---|
| What Every Programmer Should Know About Memory | cache and memory hierarchy |
| Compiler Explorer | assembly inspection |
| GCC optimization options | optimization/vectorization evidence |
Completion Standard
- At least three artifacts are completed.
- At least one artifact includes disassembly.
- At least one artifact explains cache-line behavior.