Build Your Own Interpreter (Tree-Walking)
"A programming language is a structured way to express computation. An interpreter is a program that takes such expression and produces the computation."
A tree-walking interpreter is the cleanest entry point into language implementation. You'll build a lexer, a parser, an AST, an evaluator, and a small but real programming language with variables, functions, closures, and control flow -- in roughly 1,500 lines of code.
Then in the Compiler tutorial, you'll graduate to bytecode and a virtual machine.
1. Overview & motivation
An interpreter is a pipeline:
source text -> [lexer] -> tokens -> [parser] -> AST -> [evaluator] -> result
You'll implement each stage. The reward is enormous: programming languages stop being magic. You will read other languages' source code with much more comfort.
What you can only learn by building one:
- Why operator precedence is a real implementation problem (Pratt parsing or recursive descent with binding powers).
- Why closures require careful environment design -- they are functions that capture their lexical scope.
- Why first-class functions simplify the language dramatically.
- Why tree-walking is too slow for real use, motivating the bytecode VM in the Compiler tutorial.
- Why most "scripting languages" share the same fundamental shape under the hood.
2. Where this fits in the degree
- Phase: Systems
- Semester: 4 (Systems Programming)
- Modules deepened: Module 1 (C/Go/Rust fundamentals) -- large project with multiple files. Module 5 (abstraction & interpretation) -- this is the module's project.
Cross-phase relevance:
- Foundation for the Compiler tutorial.
- Parsing/AST design carries over to the Web Browser Engine (HTML/CSS parsing) and Search Engine (query parsing).
- Pratt-style operator precedence shows up everywhere.
3. Prerequisites
- Comfort with a single language deeply (Go, Rust, C, OCaml, or Java).
- Recursive data structures: trees, especially.
- Basic familiarity with grammars (BNF/EBNF). Optional.
You do not need a compilers course. Both recommended tutorials are self-contained.
4. Theory & research
Required reading
- Robert Nystrom, Crafting Interpreters -- craftinginterpreters.com (free, also in print). Part II (Java tree-walking) is exactly this project. â the canonical text.
- Thorsten Ball, Writing An Interpreter In Go -- interpreterbook.com. The Go counterpart. Smaller, more concentrated. Read either.
Strongly recommended
- Aho, Sethi, Ullman, Compilers: Principles, Techniques, and Tools ("Dragon Book") -- Chapters 3 (lexical analysis), 4 (parsing). Reference depth, less readable than Nystrom/Ball.
- Pratt, "Top Down Operator Precedence" (1973) -- the original short paper on Pratt parsing. Lucid.
- Crockford, "Top Down Operator Precedence" (crockford.com/javascript/tdop/tdop.html) -- the modern JavaScript-flavored re-explanation.
Optional but illuminating
- Friedman & Wand, Essentials of Programming Languages -- the textbook for the "small interpreter for many language features" approach. Especially good on environments and closures.
- Steele & Sussman, "The Art of the Interpreter" (1978 MIT memo) -- historic. Beautiful.
5. Curated tutorial list (from BYO-X)
This corresponds to the "Programming Language" section of the BYO-X catalog (lower-effort, interpreter-focused entries -- full compilers are in the next tutorial):
- (any): mal - Make a Lisp -- github.com/kanaka/mal -- 80+ languages, 16 incremental steps. Excellent template.
- C: Build Your Own Lisp: Learn C and build your own programming language in 1000 lines of code -- Daniel Holden, buildyourownlisp.com
- Java: Crafting interpreters: A handbook for making programming languages -- Robert Nystrom â recommended primary
- JavaScript: The Super Tiny Interpreter -- the-super-tiny-interpreter
- JavaScript: Little Lisp interpreter -- Mary Rose Cook
- JavaScript: How to implement a programming language in JavaScript -- Mihai Bazon
- JavaScript: Let's go write a Lisp
- Python: A Python Interpreter Written in Python -- Allison Kaptur
- Python: lisp.py: Make your own Lisp interpreter
- Python: How to Write a Lisp Interpreter in Python -- Peter Norvig, lispy -- 90 lines. The single most-loved tiny interpreter on the internet.
- Python: Let's Build A Simple Interpreter -- Ruslan Spivak
- Python: Make Your Own Simple Interpreted Programming Language [video]
- Ruby: Markdown compiler from scratch in Ruby
- Swift: Building a LISP from scratch with Swift
6. Recommended primary path
Robert Nystrom, Crafting Interpreters, Part II (Java). Builds a complete tree-walking interpreter for "Lox" -- a small dynamic language with variables, functions, closures, classes, inheritance. About 1,500 lines of Java.
Read it linearly. Each chapter ends with a working interpreter that does more than the previous.
For a Go version with the same shape: Thorsten Ball, Writing An Interpreter In Go. Shorter language (no classes), and Ball makes Pratt parsing the centerpiece -- excellent treatment.
For 90 lines of Lisp: Norvig's lispy. Read it after the bigger book to appreciate the contrast.
For 10-20 hours of "do it again, your way": Make a Lisp (mal). 16 incremental steps, each a Git tag. Implement Lisp in your favorite language.
7. Implementation milestones
(Lox language, following Nystrom. Translate to your language of choice.)
Milestone 1: Lexer
Turn source text into a stream of tokens.
enum TokenType {
LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE,
COMMA, DOT, MINUS, PLUS, SEMICOLON, SLASH, STAR,
BANG, BANG_EQUAL, EQUAL, EQUAL_EQUAL,
GREATER, GREATER_EQUAL, LESS, LESS_EQUAL,
IDENTIFIER, STRING, NUMBER,
AND, CLASS, ELSE, FALSE, FUN, FOR, IF, NIL, OR,
PRINT, RETURN, SUPER, THIS, TRUE, VAR, WHILE,
EOF
}
class Token {
final TokenType type;
final String lexeme;
final Object literal;
final int line;
}
class Scanner {
void scanToken() {
char c = advance();
switch (c) {
case '(': addToken(LEFT_PAREN); break;
case ')': addToken(RIGHT_PAREN); break;
case '!': addToken(match('=') ? BANG_EQUAL : BANG); break;
// ...
default:
if (isDigit(c)) number();
else if (isAlpha(c)) identifier();
else error("Unexpected character.");
}
}
}
Evidence: Tokenize var x = 1 + 2; into the expected 7 tokens.
Milestone 2: Parser (recursive descent)
Build an AST from tokens. Use recursive descent with one function per grammar rule.
expression -> equality ;
equality -> comparison ( ( "!=" | "==" ) comparison )* ;
comparison -> term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term -> factor ( ( "-" | "+" ) factor )* ;
factor -> unary ( ( "/" | "*" ) unary )* ;
unary -> ( "!" | "-" ) unary | primary ;
primary -> NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" ;
private Expr expression() { return equality(); }
private Expr equality() {
Expr expr = comparison();
while (match(BANG_EQUAL, EQUAL_EQUAL)) {
Token op = previous();
Expr right = comparison();
expr = new Binary(expr, op, right);
}
return expr;
}
Evidence: Parse 1 + 2 * 3 and (1 + 2) * 3 correctly -- operator precedence works.
Milestone 3: Tree-walking evaluator
Visit the AST. Compute results.
class Interpreter implements Expr.Visitor<Object> {
public Object visitBinaryExpr(Binary expr) {
Object left = evaluate(expr.left);
Object right = evaluate(expr.right);
switch (expr.operator.type) {
case MINUS: return (double)left - (double)right;
case PLUS:
if (left instanceof Double) return (double)left + (double)right;
if (left instanceof String) return (String)left + (String)right;
throw new RuntimeError(...);
case SLASH: return (double)left / (double)right;
case STAR: return (double)left * (double)right;
// ...
}
}
}
Evidence: Evaluate (3 + 4) * (5 - 1) -> 28.
Milestone 4: Statements, variables, assignment
Add statements (print, expression statement, variable declaration). Add an environment for variables.
class Environment {
private final Map<String, Object> values = new HashMap<>();
private final Environment enclosing;
Object get(Token name) {
if (values.containsKey(name.lexeme)) return values.get(name.lexeme);
if (enclosing != null) return enclosing.get(name);
throw new RuntimeError(name, "Undefined variable '" + name.lexeme + "'.");
}
}
Evidence: Run var x = 5; print x; x = x + 1; print x; -> 5\n6\n.
Milestone 5: Control flow
if/else, while, for (desugared to while). Logical and/or with short-circuit evaluation.
Evidence: Run a FizzBuzz program.
Milestone 6: Functions and closures
Functions are first-class. They capture their enclosing environment, becoming closures.
class LoxFunction implements LoxCallable {
private final Stmt.Function declaration;
private final Environment closure;
public Object call(Interpreter interpreter, List<Object> arguments) {
Environment env = new Environment(closure);
for (int i = 0; i < declaration.params.size(); i++)
env.define(declaration.params.get(i).lexeme, arguments.get(i));
try {
interpreter.executeBlock(declaration.body, env);
} catch (Return r) {
return r.value;
}
return null;
}
}
Evidence: Run the classic counter-closure:
fun makeCounter() {
var count = 0;
fun increment() { count = count + 1; return count; }
return increment;
}
var c = makeCounter();
print c(); print c(); print c(); // 1 2 3
Milestone 7 (optional): Classes
Inheritance, methods, this. Nystrom covers this; Ball does not. Skip if you'd rather move on to the Compiler tutorial.
8. Tests & evidence
| Test | How |
|---|---|
| Lexer | Each token type matched correctly; line numbers preserved on errors. |
| Operator precedence | 1 + 2 * 3 = 7, not 9. |
| Associativity | 1 - 2 - 3 = -4 (left), not 2. |
| Variable scope | Variables defined in a block do not leak. |
| Closure capture | makeCounter test (above). |
| Recursion | fun fib(n) { ... } computes fib(10) = 55. |
| Error reporting | Syntax errors show line number and an explanation, not just "SyntaxError". |
| Runtime errors | Divide by zero, undefined variable, calling non-callable. |
| Performance | The infamous Fibonacci benchmark. Tree-walking will be 10-100x slower than a real VM. Document this -- it motivates the compiler. |
9. Common pitfalls
- Treating the parser as one big function. Each grammar rule should be one function. Otherwise you fight precedence forever.
- Mixing parsing and evaluation. Parser produces AST; evaluator visits AST. Don't fold them.
- Closures without scope chains. A closure captures the environment, not the value. Walking back through environments is what makes the counter example work.
- String concatenation conflated with addition. Decide your semantics. Either you support both with type checks, or only one of them.
- Forgetting
return. Use exceptions or sentinel values to unwind out of function bodies. Don't return from inside the visitor pattern naively. - Re-evaluating expressions you've already evaluated. Especially in
or/and-- these need short-circuit semantics. - No environment chain. Without nested environments, blocks share variables. Nesting is mandatory.
10. Extensions
- Static analysis pass. Resolver for variable bindings. Nystrom covers this in Chapter 11.
- Lambdas / arrow functions. Anonymous function syntax. Trivial extension once functions work.
- Pattern matching. Bigger language change. Look at OCaml or Rust pattern syntax.
- Tail-call optimization. Required for recursive functional style.
- Macros. Lisp-style.
malcovers macros in step 8. - Run a real benchmark. Compare your interpreter to Python on a small numeric workload.
The natural next step is the Compiler tutorial -- Nystrom's Part III takes the same Lox language and compiles it to bytecode for 10-100x speedup.
11. Module integration
| Module | What the interpreter deepens |
|---|---|
| Sem 4 Module 1 -- C/Go/Rust fundamentals | Largest single project in the systems phase; forces good code organization. |
| Sem 4 Module 5 -- Abstraction & interpretation | The whole module exists for this project. |
| Sem 7 Module 2 -- Architecture & modularity | Lexer/parser/AST/evaluator are textbook modular boundaries. |
| Compiler tutorial | Same source language, replace evaluator with bytecode VM. |
| Web Browser Engine | HTML and CSS need parsers. Same Pratt-parsing technique. |
| Search Engine | Query parsing reuses recursive descent. |
12. Portfolio framing
What to publish:
- Source organized as
src/lexer/,src/parser/,src/eval/,src/runtime/. - A
tests/folder with.loxfiles for each language feature. - A REPL --
./interpreterwith no args drops you into an interactive prompt. - A README with a "show, don't tell" demo: a Fibonacci function, a closure example, an inheritance example.
Reviewer entry points:
src/lexer/scanner.c-- tokenizer.src/parser/parser.c-- grammar rule per function.src/eval/interpreter.c-- the visitor pattern.tests/closure_test.lox-- the closure example shows the design works.- README: include a note on tree-walking performance and a pointer to your compiler project (if you do one).
13. Local source backbone
Use Build Your Own Programming Language (build-your-own/programming-language-jeffery) to widen the interpreter project from "tree walker" into a complete language-front-end study.
| Local chunks | Use them for | Add to this project |
|---|---|---|
004-008 | Why build a language, requirements, and language-design choices | Start with a language design memo: purpose, audience, data types, syntax style, and non-goals. |
009-023 | Tokens, lexical attributes, regex rules, scanner construction, and scanner testing | Expand Milestone 1 with a token-spec document and scanner error tests. |
024-032 | Context-free grammars, parser conflicts, syntax-error recovery | Add grammar-first parser work: BNF, invalid-program tests, and better parse errors. |
033-043 | Syntax trees, parse trees versus ASTs, tree printing, DOT visualization | Add AST visualization as required evidence. |
044-060 | Declarations, scopes, symbol tables, base types, arrays, calls, returns, instance access | Add a semantic-analysis layer before evaluation. Even a dynamic interpreter benefits from explicit scope and binding checks. |
067-073 | Syntax highlighting and IDE support | Optional extension: VS Code grammar and a tiny language-server diagnostic command. |
Extra checkpoints from the book chunks
- Language brief: define the smallest useful language and name at least three features intentionally excluded.
- Scanner contract: for every token class, include one valid example and one invalid near-miss.
- Parser recovery: show that one syntax error does not hide every later error in the file.
- Scope audit: include tests for shadowing, unbound names, duplicate declarations, and closure capture.
14. Deep project spec
Project contract
Build a small interpreted language with lexical scoping, expressions, statements, variables, functions, closures, and a deterministic error model. The language does not need classes, modules, or a standard library on the first pass. It does need a written grammar, AST shape, resolver or symbol-table story, and REPL or file runner.
Source-backed reading map
| Source ID | Use for | Required output |
|---|---|---|
build-your-own/programming-language-jeffery | scanner, parser, grammar, AST, semantic analysis, runtime architecture | grammar note, AST dump, resolver tests |
build-your-own/programming-language-affan | lexer automata, DFA/NFA thinking, tokenization edge cases | tokenizer fixture suite and invalid-token cases |
Milestone map
| Milestone | Deliverable | Tests | Failure case |
|---|---|---|---|
| Lexer | token stream with source spans | golden token snapshots | unknown character, unterminated string |
| Parser | AST for expressions and statements | AST snapshots for precedence and grouping | malformed expression recovery |
| Evaluator | arithmetic, booleans, variables | interpreter output fixtures | type error with line number |
| Scope resolver | lexical scope and closures | shadowing and closure fixtures | undefined variable before runtime |
| Functions | call frames and return values | recursion and higher-order function tests | arity mismatch |
| REPL/file runner | usable CLI | command-level smoke tests | syntax error does not crash process |
Test matrix
| Test type | Required examples |
|---|---|
| Golden | tokens, AST, and program output fixtures |
| Semantic | undefined names, duplicate declarations, invalid assignment |
| Runtime | type errors, arity errors, recursion depth behavior |
| Differential | compare compiled examples against a simple reference evaluator where possible |
| Property | parser never panics on random token streams |
Design notes required
grammar.md: grammar, precedence table, and rejected syntax.runtime.md: value representation, environment chain, function/closure model.errors.md: syntax, semantic, and runtime error categories.
Portfolio evidence
Publish the grammar, three small programs, token/AST dumps for one program, resolver tests, and a short trace showing one function call entering and leaving scope.
Source
This tutorial draws from the BYO-X catalog "Programming Language" section. Crafting Interpreters is the canonical reference; Writing An Interpreter In Go is the alternative for Go fans.