Skip to main content

Build Your Own Emulator

"Writing an emulator is the most fun way to learn computer architecture you didn't know you wanted to learn." -- every emulator author

An emulator is the closest you can come, in software, to building a CPU. You parse machine code, model registers and memory, decode instructions, emulate timers, render graphics, and handle input. CHIP-8 is the friendly first step (35 opcodes, monochrome 64x32 display, completable in a weekend). Game Boy is the serious project (300+ opcodes, banked memory, real games).


1. Overview & motivation

A CPU emulator is a fetch-decode-execute loop:

while running:
opcode = memory[PC]
PC += instruction_length
decode(opcode)
execute(opcode, operands)
update_timers()
render_if_needed()
handle_input()

What you can only learn by building one:

  • Why instruction encoding matters -- you'll be reading hex dumps and recognising opcodes by sight.
  • What flags (Z, N, H, C on Game Boy; carry/zero on most CPUs) actually do in practice.
  • Why timing accuracy is the dividing line between "an emulator" and "an emulator that runs Mario".
  • Why memory-mapped I/O exists (graphics, sound, input all share the address space).
  • Why the fetch-decode-execute loop is fundamental -- modern CPUs do exactly this, with massive parallelism layered on top.

2. Where this fits in the degree

  • Phase: Systems
  • Semester: 4 (Systems Programming)
  • Modules deepened: Module 1 (C/C++/Rust fundamentals), Module 3 (computer organization -- this is the module's perfect concretization).

Cross-phase relevance:

  • Direct prerequisite for thinking about the Operating System tutorial -- emulating a CPU clarifies what an OS actually controls.
  • Performance work transfers to the Compiler tutorial (writing a VM is half of writing a CPU emulator).

3. Prerequisites

  • C/C++ or Rust. Comfortable with bit manipulation: &, |, ^, <<, >>.
  • Hex and binary fluency.
  • A graphics library: SDL2 (recommended) or raylib for rendering and input.

You do not need a computer-architecture course beforehand. CHIP-8 is gentle enough to teach as you go.


4. Theory & research

Required reading -- CHIP-8

Required reading -- Game Boy

  • Pan Docs -- gbdev.io/pandocs/. The Game Boy reverse-engineering bible. 200+ pages. ⭐ canonical.
  • GBDev wiki (gbdev.io) and gbdev community Discord.
  • Patterson & Hennessy, Computer Organization and Design -- Chapters 4 (processor) and 5 (memory). Standard textbook. Read sections relevant to fetch/decode/execute.
  • Imran Nazar, "GameBoy Emulation in JavaScript" -- imrannazar.com/series/gameboy-emulation-in-javascript. Older but excellent walkthrough.

For deep dives

  • GekkioEK's Mooneye GB tests -- github.com/Gekkio/mooneye-test-suite. A test ROM suite for Game Boy emulator accuracy. If your emulator passes Blargg's CPU tests, it can run most games.

5. Curated tutorial list (from BYO-X)


Two-stage path:

  1. CHIP-8 first (1 weekend): Follow Tobias V. Langhoff's guide. Implement all 35 opcodes. Get it running the IBM logo ROM, then Tetris. Roughly 500 lines.

  2. Then choose:

    • Game Boy (3-6 weeks): Imran Nazar's tutorial. 300+ opcodes. Banked memory. Real games. This is the most rewarding emulator project, and your CHIP-8 experience makes it much more tractable.
    • LC-3 (1 week): Justin Meiners' "Write your Own Virtual Machine". 16 opcodes, educational architecture, runs assembly programs.
    • NES (6+ weeks): javidx9's video series. Hard. Includes PPU, sound. Spectacular results.

For this degree, the recommended sequence is CHIP-8 -> Game Boy.


7. Implementation milestones (CHIP-8)

Milestone 1: Memory, registers, opcode fetch

struct CHIP8 {
uint8_t memory[4096];
uint8_t V[16]; // V0..VF general registers
uint16_t I; // index register
uint16_t PC; // program counter, starts at 0x200
uint16_t stack[16];
uint8_t SP;
uint8_t delay_timer;
uint8_t sound_timer;
uint8_t display[64 * 32];
uint8_t keypad[16];
};

uint16_t fetch(CHIP8 *c) {
uint16_t op = (c->memory[c->PC] << 8) | c->memory[c->PC + 1];
c->PC += 2;
return op;
}

Evidence: Loading a 4-byte ROM into memory at 0x200; manually fetching opcodes one at a time.

Milestone 2: Decode and execute (35 opcodes)

CHIP-8 opcodes are 16-bit. Decode by high nibble:

void execute(CHIP8 *c, uint16_t op) {
uint16_t nnn = op & 0x0FFF;
uint8_t n = op & 0x000F;
uint8_t x = (op & 0x0F00) >> 8;
uint8_t y = (op & 0x00F0) >> 4;
uint8_t kk = op & 0x00FF;

switch (op & 0xF000) {
case 0x0000:
if (op == 0x00E0) { /* CLS -- clear display */ }
else if (op == 0x00EE) { /* RET */ }
break;
case 0x1000: c->PC = nnn; break; // JP addr
case 0x2000: /* CALL addr */ break;
case 0x3000: if (c->V[x] == kk) c->PC += 2; break; // SE
case 0x4000: if (c->V[x] != kk) c->PC += 2; break; // SNE
case 0x6000: c->V[x] = kk; break; // LD Vx, byte
case 0x7000: c->V[x] += kk; break; // ADD Vx, byte
case 0xA000: c->I = nnn; break; // LD I, addr
case 0xD000: /* DRW Vx, Vy, nibble -- sprite drawing */ break;
// ... 25 more
}
}

Evidence: Run each opcode through a unit test with known input/output.

Milestone 3: Display and sprite drawing

CHIP-8 has a 64x32 monochrome display. The DXYN opcode XORs an N-byte sprite into the display at position (Vx, Vy). VF is set if any pixel is erased (collision detection).

Render with SDL2: a 1D 64x32 array, one pixel-per-byte, scaled up to a 640x320 window.

Evidence: Load and run the IBM logo ROM (IBM Logo.ch8). Display the famous black-and-white IBM logo. If this works, your opcode dispatch is correct.

Milestone 4: Timers and audio

delay_timer and sound_timer both count down at 60 Hz. When sound_timer > 0, beep. Simple square wave through SDL.

Evidence: Run a beep-test ROM. Hear the beep.

Milestone 5: Input

CHIP-8 has a 16-key hex keypad (0-F). Map to a 4x4 grid on the keyboard.

Evidence: Run an interactive ROM (Tetris, Pong, Brix). Keys respond.

Milestone 6: Quirks

Several CHIP-8 opcodes have ambiguous spec. Different historical interpreters do different things; many roms only run correctly with one behavior. Langhoff's guide lists them. Implement as configurable flags.

Evidence: Test ROMs (chip8-test-suite from Timendus on GitHub) report which quirks each ROM expects, and your emulator can be configured for each.

Milestone 7 (Game Boy track): MMU, PPU, audio, MBC

For Game Boy: an order of magnitude more work.

  • MMU -- memory management unit. Banked memory, hardware register mapping.
  • CPU -- Sharp LR35902 (close to Z80). ~300 opcodes including CB-prefixed.
  • PPU -- pixel processing unit. Tile-based graphics, sprites, scrolling.
  • APU -- audio processing unit. Four channels.
  • MBC -- memory bank controllers (MBC1, MBC3) for cartridges larger than 32 KB.

Plan for 200+ hours.

Evidence (Game Boy): Run cpu_instrs.gb from Blargg's test suite. If all 11 sub-tests pass, your CPU is accurate enough to run most games.


8. Tests & evidence

TestHow
Opcode unit testsEach of 35 (CHIP-8) or 300+ (GB) opcodes tested independently
IBM logo (CHIP-8)Loads and displays correctly
Blargg's cpu_instrs.gbAll 11 sub-tests pass
Quirk test ROMAll quirks correctly configured
Game ROMTetris (CHIP-8) or Tetris/Mario Land (GB) playable end-to-end
Frame timing60 Hz steady, ±1 ms

The strongest single piece of evidence: a recording of a real game being played.


9. Common pitfalls

  • Wrong byte order. CHIP-8 opcodes are big-endian (memory[PC] << 8 | memory[PC+1]). Game Boy is little-endian. Get this wrong and nothing decodes correctly.
  • Forgetting VF. Several CHIP-8 opcodes set VF as a side effect (carry, borrow, collision). Easy to miss.
  • Sprite-drawing wraparound. Some specs wrap, some clip at the screen edge. Get the convention right.
  • Cycle counting. A real emulator advances by cycles, not by instructions. For CHIP-8 you can fudge it (run N ops per 60 Hz frame). For Game Boy, you must count.
  • Flags in Game Boy. Z, N, H, C. The H (half-carry) flag has fiddly rules. Most emulator bugs are wrong H flags.
  • Timing-sensitive code. Some games rely on cycle-exact behaviour. Don't chase 100% accuracy on your first emulator; document the limitation.
  • Endianness in your codegen language. When you read a 16-bit value from memory[PC] in C, you typically need to combine bytes explicitly, not cast.

10. Extensions

  • Debugger -- single-step, breakpoints, register/memory inspector, disassembly view. Easy with SDL2 and ImGui.
  • Save states -- serialize the entire emulator state. Restore later.
  • Rewind -- keep the last N states; press a key to step backward. Frequently fun.
  • Audio recording -- write a .wav of the game's audio output.
  • Multiple system tracks -- once you have CHIP-8 done, NES is the spectacular next step. Then SNES (much harder), Genesis.
  • Game Boy Color, Game Boy Advance -- extensions of the original GB pipeline.

11. Module integration

ModuleWhat the emulator deepens
Sem 4 Module 1 -- C/C++ fundamentalsSubstantial project. Bit manipulation, file I/O, dynamic state.
Sem 4 Module 3 -- Computer organizationThe definitive concretization. Every concept becomes a struct field.
Sem 4 Module 5 -- Abstraction & interpretationA CPU emulator is structurally identical to a bytecode VM.
Compiler tutorialThe dispatch loop is the same; the instruction set is the difference.
Operating System tutorialKnowing what a CPU does at the metal makes OS code much clearer.

12. Portfolio framing

What to publish:

  • Source organized by component (cpu/, mmu/, display/, input/).
  • README with animated GIF or video of a game running. This is the single most important demo asset.
  • A test suite (opcode unit tests + ROM-based tests).
  • A list of which test ROMs your emulator passes.

What to keep private:

  • Game ROMs. They're copyrighted. Never include them in the repo. Use freeware test ROMs only.

Reviewer entry points:

  • src/cpu/execute.c -- the opcode dispatch (the heart of the emulator).
  • src/display.c -- sprite rendering.
  • tests/blargg_cpu_instrs.md -- accuracy report.
  • README: include the game-running GIF/video; list passing test ROMs.

Emulators are striking portfolio pieces because the output is visual and visceral. "Here is my CHIP-8 running Tetris" reads spectacularly well.


13. Local source backbone

Use Programming a Toy Computer from Scratch (build-your-own/toy-computer) to connect the emulator project to a ground-up computer architecture path. This is especially useful before attempting Game Boy or NES, where timing and hardware state dominate.

Local chunksUse them forAdd to this project
002-004Binary numbers, arithmetic, logic, hexadecimalAdd bit-manipulation drills before opcode decoding.
005-014Logic gates, memory cells, buses, instructions, control circuitsAdd a diagram of the emulated CPU datapath: registers, memory, ALU, PC, flags, bus.
015-017Toy computer implementation and example programsImplement a toy ISA before CHIP-8 if the learner struggles with CPU state.
026-032Cortex-M registers, instruction set, vector table, first programOptional hardware-flavored comparison: how a real embedded CPU differs from a toy VM.
033-040Bytecode instructions and interpreterDirectly maps to the fetch-decode-execute loop. Use it to annotate the emulator main loop.
041-063Timers, clock setup, display, keyboard, UART, interrupts, basic I/OExpand emulator milestones with timer, input, display, and interrupt checklists.
064-172Flash, UI, algorithms, implementation, compilation and tests for larger toy system piecesUse as optional capstone material for building monitor/debugger, command editor, and storage.

Extra checkpoints from the book chunks

  1. Datapath checkpoint: draw the movement of one instruction through fetch, decode, execute, and writeback.
  2. Timing checkpoint: separate CPU cycles, timer ticks, frame refresh, and input polling in the emulator loop.
  3. I/O checkpoint: show how keyboard/display state changes are represented in memory or registers.
  4. Debug checkpoint: add a stepper that prints PC, current opcode, registers, flags, and changed memory.

14. Deep project spec

Project contract

Build an emulator for a documented target. CHIP-8 is the default target; Game Boy or NES is an advanced target. The emulator must define CPU state, memory map, instruction decode, timers, input, display, halt/error behavior, and trace/debug mode.

Source-backed reading map

Source IDUse forRequired output
build-your-own/toy-computermachine-state discipline, instruction traces, memory mapsemulator trace contract and debugger commands

Milestone map

MilestoneDeliverableTestsFailure case
Machine stateregisters, PC, memory, timersreset-state snapshotinvalid memory access
Decoderopcode tabledecode fixtures for every opcodeunknown opcode trap
Executionarithmetic, jumps, memory opsinstruction-level unit testsPC update bug regression
Display/inputframebuffer and key staterender/input smoke testskey wait does not spin forever
Timersdelay/sound timerstick-rate teststimer drift note
ROM runnerload and execute ROMknown ROM screenshot/traceunsupported opcode reported
Debuggerstep, breakpoint, inspecttranscript fixturebreakpoint at invalid address

Test matrix

Test typeRequired examples
Unitone fixture per opcode family
Goldentrace for a known small program
Integrationpublic test ROMs where legal/available
Visualframebuffer snapshot or screenshot
Performancecycles/tick rate and throttle behavior

Design notes required

  • machine.md: registers, memory map, timers, display, input.
  • instruction-set.md: opcode table, operands, side effects, PC behavior.
  • timing.md: cycle model, timer rate, and simplifications.

Portfolio evidence

Publish one ROM trace, one screenshot, the opcode coverage table, debugger transcript, and a limitation note separating emulation correctness from cycle-perfect accuracy.


Source

This tutorial draws from the BYO-X catalog "Emulator / Virtual Machine" entry. Cowgod's CHIP-8 reference, Pan Docs for Game Boy, and Imran Nazar's JS GB tutorial are the canonical primary sources.