Build Your Own Interpreter (Tree-Walking)

"A programming language is a structured way to express computation. An interpreter is a program that takes such expression and produces the computation."

A tree-walking interpreter is the cleanest entry point into language implementation. You'll build a lexer, a parser, an AST, an evaluator, and a small but real programming language with variables, functions, closures, and control flow -- in roughly 1,500 lines of code.

Then in the Compiler tutorial, you'll graduate to bytecode and a virtual machine.

1. Overview & motivation

An interpreter is a pipeline:

source text -> [lexer] -> tokens -> [parser] -> AST -> [evaluator] -> result

You'll implement each stage. The reward is enormous: programming languages stop being magic. You will read other languages' source code with much more comfort.

What you can only learn by building one:

Why operator precedence is a real implementation problem (Pratt parsing or recursive descent with binding powers).
Why closures require careful environment design -- they are functions that capture their lexical scope.
Why first-class functions simplify the language dramatically.
Why tree-walking is too slow for real use, motivating the bytecode VM in the Compiler tutorial.
Why most "scripting languages" share the same fundamental shape under the hood.

2. Where this fits in the degree

Phase: Systems
Semester: 4 (Systems Programming)
Modules deepened: Module 1 (C/Go/Rust fundamentals) -- large project with multiple files. Module 5 (abstraction & interpretation) -- this is the module's project.

Cross-phase relevance:

Foundation for the Compiler tutorial.
Parsing/AST design carries over to the Web Browser Engine (HTML/CSS parsing) and Search Engine (query parsing).
Pratt-style operator precedence shows up everywhere.

3. Prerequisites

Comfort with a single language deeply (Go, Rust, C, OCaml, or Java).
Recursive data structures: trees, especially.
Basic familiarity with grammars (BNF/EBNF). Optional.

You do not need a compilers course. Both recommended tutorials are self-contained.

4. Theory & research

Required reading

Robert Nystrom, Crafting Interpreters -- craftinginterpreters.com (free, also in print). Part II (Java tree-walking) is exactly this project. â the canonical text.
Thorsten Ball, Writing An Interpreter In Go -- interpreterbook.com. The Go counterpart. Smaller, more concentrated. Read either.

Strongly recommended

Aho, Sethi, Ullman, Compilers: Principles, Techniques, and Tools ("Dragon Book") -- Chapters 3 (lexical analysis), 4 (parsing). Reference depth, less readable than Nystrom/Ball.
Pratt, "Top Down Operator Precedence" (1973) -- the original short paper on Pratt parsing. Lucid.
Crockford, "Top Down Operator Precedence" (crockford.com/javascript/tdop/tdop.html) -- the modern JavaScript-flavored re-explanation.

Optional but illuminating

Friedman & Wand, Essentials of Programming Languages -- the textbook for the "small interpreter for many language features" approach. Especially good on environments and closures.
Steele & Sussman, "The Art of the Interpreter" (1978 MIT memo) -- historic. Beautiful.

5. Curated tutorial list (from BYO-X)

This corresponds to the "Programming Language" section of the BYO-X catalog (lower-effort, interpreter-focused entries -- full compilers are in the next tutorial):

(any): mal - Make a Lisp -- github.com/kanaka/mal -- 80+ languages, 16 incremental steps. Excellent template.
C: Build Your Own Lisp: Learn C and build your own programming language in 1000 lines of code -- Daniel Holden, buildyourownlisp.com
Java: Crafting interpreters: A handbook for making programming languages -- Robert Nystrom â recommended primary
JavaScript: The Super Tiny Interpreter -- the-super-tiny-interpreter
JavaScript: Little Lisp interpreter -- Mary Rose Cook
JavaScript: How to implement a programming language in JavaScript -- Mihai Bazon
JavaScript: Let's go write a Lisp
Python: A Python Interpreter Written in Python -- Allison Kaptur
Python: lisp.py: Make your own Lisp interpreter
Python: How to Write a Lisp Interpreter in Python -- Peter Norvig, lispy -- 90 lines. The single most-loved tiny interpreter on the internet.
Python: Let's Build A Simple Interpreter -- Ruslan Spivak
Python: Make Your Own Simple Interpreted Programming Language [video]
Ruby: Markdown compiler from scratch in Ruby
Swift: Building a LISP from scratch with Swift

6. Recommended primary path

Robert Nystrom, Crafting Interpreters, Part II (Java). Builds a complete tree-walking interpreter for "Lox" -- a small dynamic language with variables, functions, closures, classes, inheritance. About 1,500 lines of Java.

Read it linearly. Each chapter ends with a working interpreter that does more than the previous.

For a Go version with the same shape: Thorsten Ball, Writing An Interpreter In Go. Shorter language (no classes), and Ball makes Pratt parsing the centerpiece -- excellent treatment.

For 90 lines of Lisp: Norvig's lispy. Read it after the bigger book to appreciate the contrast.

For 10-20 hours of "do it again, your way": Make a Lisp (mal). 16 incremental steps, each a Git tag. Implement Lisp in your favorite language.

7. Implementation milestones

(Lox language, following Nystrom. Translate to your language of choice.)

Milestone 1: Lexer

Turn source text into a stream of tokens.

enum TokenType {
    LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE,
    COMMA, DOT, MINUS, PLUS, SEMICOLON, SLASH, STAR,
    BANG, BANG_EQUAL, EQUAL, EQUAL_EQUAL,
    GREATER, GREATER_EQUAL, LESS, LESS_EQUAL,
    IDENTIFIER, STRING, NUMBER,
    AND, CLASS, ELSE, FALSE, FUN, FOR, IF, NIL, OR,
    PRINT, RETURN, SUPER, THIS, TRUE, VAR, WHILE,
    EOF
}

class Token {
    final TokenType type;
    final String lexeme;
    final Object literal;
    final int line;
}

class Scanner {
    void scanToken() {
        char c = advance();
        switch (c) {
            case '(': addToken(LEFT_PAREN); break;
            case ')': addToken(RIGHT_PAREN); break;
            case '!': addToken(match('=') ? BANG_EQUAL : BANG); break;
            // ...
            default:
                if (isDigit(c)) number();
                else if (isAlpha(c)) identifier();
                else error("Unexpected character.");
        }
    }
}

Evidence: Tokenize var x = 1 + 2; into the expected 7 tokens.

Milestone 2: Parser (recursive descent)

Build an AST from tokens. Use recursive descent with one function per grammar rule.

expression -> equality ;
equality   -> comparison ( ( "!=" | "==" ) comparison )* ;
comparison -> term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term       -> factor ( ( "-" | "+" ) factor )* ;
factor     -> unary ( ( "/" | "*" ) unary )* ;
unary      -> ( "!" | "-" ) unary | primary ;
primary    -> NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" ;

private Expr expression() { return equality(); }
private Expr equality() {
    Expr expr = comparison();
    while (match(BANG_EQUAL, EQUAL_EQUAL)) {
        Token op = previous();
        Expr right = comparison();
        expr = new Binary(expr, op, right);
    }
    return expr;
}

Evidence: Parse 1 + 2 * 3 and (1 + 2) * 3 correctly -- operator precedence works.

Milestone 3: Tree-walking evaluator

Visit the AST. Compute results.

class Interpreter implements Expr.Visitor<Object> {
    public Object visitBinaryExpr(Binary expr) {
        Object left = evaluate(expr.left);
        Object right = evaluate(expr.right);
        switch (expr.operator.type) {
            case MINUS: return (double)left - (double)right;
            case PLUS: 
                if (left instanceof Double) return (double)left + (double)right;
                if (left instanceof String) return (String)left + (String)right;
                throw new RuntimeError(...);
            case SLASH: return (double)left / (double)right;
            case STAR:  return (double)left * (double)right;
            // ...
        }
    }
}

Evidence: Evaluate (3 + 4) * (5 - 1) -> 28.

Milestone 4: Statements, variables, assignment

Add statements (print, expression statement, variable declaration). Add an environment for variables.

class Environment {
    private final Map<String, Object> values = new HashMap<>();
    private final Environment enclosing;
    
    Object get(Token name) {
        if (values.containsKey(name.lexeme)) return values.get(name.lexeme);
        if (enclosing != null) return enclosing.get(name);
        throw new RuntimeError(name, "Undefined variable '" + name.lexeme + "'.");
    }
}

Evidence: Run var x = 5; print x; x = x + 1; print x; -> 5\n6\n.

Milestone 5: Control flow

if/else, while, for (desugared to while). Logical and/or with short-circuit evaluation.

Evidence: Run a FizzBuzz program.

Milestone 6: Functions and closures

Functions are first-class. They capture their enclosing environment, becoming closures.

class LoxFunction implements LoxCallable {
    private final Stmt.Function declaration;
    private final Environment closure;
    
    public Object call(Interpreter interpreter, List<Object> arguments) {
        Environment env = new Environment(closure);
        for (int i = 0; i < declaration.params.size(); i++)
            env.define(declaration.params.get(i).lexeme, arguments.get(i));
        try {
            interpreter.executeBlock(declaration.body, env);
        } catch (Return r) {
            return r.value;
        }
        return null;
    }
}

Evidence: Run the classic counter-closure:

fun makeCounter() {
    var count = 0;
    fun increment() { count = count + 1; return count; }
    return increment;
}
var c = makeCounter();
print c(); print c(); print c();  // 1 2 3

Milestone 7 (optional): Classes

Inheritance, methods, this. Nystrom covers this; Ball does not. Skip if you'd rather move on to the Compiler tutorial.

8. Tests & evidence

Test	How
Lexer	Each token type matched correctly; line numbers preserved on errors.
Operator precedence	`1 + 2 * 3` = 7, not 9.
Associativity	`1 - 2 - 3` = -4 (left), not 2.
Variable scope	Variables defined in a block do not leak.
Closure capture	makeCounter test (above).
Recursion	`fun fib(n) { ... }` computes `fib(10) = 55`.
Error reporting	Syntax errors show line number and an explanation, not just "SyntaxError".
Runtime errors	Divide by zero, undefined variable, calling non-callable.
Performance	The infamous Fibonacci benchmark. Tree-walking will be 10-100x slower than a real VM. Document this -- it motivates the compiler.

9. Common pitfalls

Treating the parser as one big function. Each grammar rule should be one function. Otherwise you fight precedence forever.
Mixing parsing and evaluation. Parser produces AST; evaluator visits AST. Don't fold them.
Closures without scope chains. A closure captures the environment, not the value. Walking back through environments is what makes the counter example work.
String concatenation conflated with addition. Decide your semantics. Either you support both with type checks, or only one of them.
Forgetting return. Use exceptions or sentinel values to unwind out of function bodies. Don't return from inside the visitor pattern naively.
Re-evaluating expressions you've already evaluated. Especially in or/and -- these need short-circuit semantics.
No environment chain. Without nested environments, blocks share variables. Nesting is mandatory.

10. Extensions

Static analysis pass. Resolver for variable bindings. Nystrom covers this in Chapter 11.
Lambdas / arrow functions. Anonymous function syntax. Trivial extension once functions work.
Pattern matching. Bigger language change. Look at OCaml or Rust pattern syntax.
Tail-call optimization. Required for recursive functional style.
Macros. Lisp-style. mal covers macros in step 8.
Run a real benchmark. Compare your interpreter to Python on a small numeric workload.

The natural next step is the Compiler tutorial -- Nystrom's Part III takes the same Lox language and compiles it to bytecode for 10-100x speedup.

11. Module integration

Module	What the interpreter deepens
Sem 4 Module 1 -- C/Go/Rust fundamentals	Largest single project in the systems phase; forces good code organization.
Sem 4 Module 5 -- Abstraction & interpretation	The whole module exists for this project.
Sem 7 Module 2 -- Architecture & modularity	Lexer/parser/AST/evaluator are textbook modular boundaries.
Compiler tutorial	Same source language, replace evaluator with bytecode VM.
Web Browser Engine	HTML and CSS need parsers. Same Pratt-parsing technique.
Search Engine	Query parsing reuses recursive descent.

12. Portfolio framing

What to publish:

Source organized as src/lexer/, src/parser/, src/eval/, src/runtime/.
A tests/ folder with .lox files for each language feature.
A REPL -- ./interpreter with no args drops you into an interactive prompt.
A README with a "show, don't tell" demo: a Fibonacci function, a closure example, an inheritance example.

Reviewer entry points:

src/lexer/scanner.c -- tokenizer.
src/parser/parser.c -- grammar rule per function.
src/eval/interpreter.c -- the visitor pattern.
tests/closure_test.lox -- the closure example shows the design works.
README: include a note on tree-walking performance and a pointer to your compiler project (if you do one).

13. Local source backbone

Use Build Your Own Programming Language (build-your-own/programming-language-jeffery) to widen the interpreter project from "tree walker" into a complete language-front-end study.

Local chunks	Use them for	Add to this project
`004`-`008`	Why build a language, requirements, and language-design choices	Start with a language design memo: purpose, audience, data types, syntax style, and non-goals.
`009`-`023`	Tokens, lexical attributes, regex rules, scanner construction, and scanner testing	Expand Milestone 1 with a token-spec document and scanner error tests.
`024`-`032`	Context-free grammars, parser conflicts, syntax-error recovery	Add grammar-first parser work: BNF, invalid-program tests, and better parse errors.
`033`-`043`	Syntax trees, parse trees versus ASTs, tree printing, DOT visualization	Add AST visualization as required evidence.
`044`-`060`	Declarations, scopes, symbol tables, base types, arrays, calls, returns, instance access	Add a semantic-analysis layer before evaluation. Even a dynamic interpreter benefits from explicit scope and binding checks.
`067`-`073`	Syntax highlighting and IDE support	Optional extension: VS Code grammar and a tiny language-server diagnostic command.

Extra checkpoints from the book chunks

Language brief: define the smallest useful language and name at least three features intentionally excluded.
Scanner contract: for every token class, include one valid example and one invalid near-miss.
Parser recovery: show that one syntax error does not hide every later error in the file.
Scope audit: include tests for shadowing, unbound names, duplicate declarations, and closure capture.

14. Deep project spec

Project contract

Build a small interpreted language with lexical scoping, expressions, statements, variables, functions, closures, and a deterministic error model. The language does not need classes, modules, or a standard library on the first pass. It does need a written grammar, AST shape, resolver or symbol-table story, and REPL or file runner.

Source-backed reading map

Source ID	Use for	Required output
`build-your-own/programming-language-jeffery`	scanner, parser, grammar, AST, semantic analysis, runtime architecture	grammar note, AST dump, resolver tests
`build-your-own/programming-language-affan`	lexer automata, DFA/NFA thinking, tokenization edge cases	tokenizer fixture suite and invalid-token cases

Milestone map

Milestone	Deliverable	Tests	Failure case
Lexer	token stream with source spans	golden token snapshots	unknown character, unterminated string
Parser	AST for expressions and statements	AST snapshots for precedence and grouping	malformed expression recovery
Evaluator	arithmetic, booleans, variables	interpreter output fixtures	type error with line number
Scope resolver	lexical scope and closures	shadowing and closure fixtures	undefined variable before runtime
Functions	call frames and return values	recursion and higher-order function tests	arity mismatch
REPL/file runner	usable CLI	command-level smoke tests	syntax error does not crash process

Test matrix

Test type	Required examples
Golden	tokens, AST, and program output fixtures
Semantic	undefined names, duplicate declarations, invalid assignment
Runtime	type errors, arity errors, recursion depth behavior
Differential	compare compiled examples against a simple reference evaluator where possible
Property	parser never panics on random token streams

Design notes required

grammar.md: grammar, precedence table, and rejected syntax.
runtime.md: value representation, environment chain, function/closure model.
errors.md: syntax, semantic, and runtime error categories.

Portfolio evidence

Publish the grammar, three small programs, token/AST dumps for one program, resolver tests, and a short trace showing one function call entering and leaving scope.

Source

This tutorial draws from the BYO-X catalog "Programming Language" section. Crafting Interpreters is the canonical reference; Writing An Interpreter In Go is the alternative for Go fans.

1. Overview & motivation​

2. Where this fits in the degree​

3. Prerequisites​

4. Theory & research​

Required reading​

Strongly recommended​

Optional but illuminating​

5. Curated tutorial list (from BYO-X)​

6. Recommended primary path​

7. Implementation milestones​

Milestone 1: Lexer​

Milestone 2: Parser (recursive descent)​

Milestone 3: Tree-walking evaluator​

Milestone 4: Statements, variables, assignment​

Milestone 5: Control flow​

Milestone 6: Functions and closures​

Milestone 7 (optional): Classes​

8. Tests & evidence​

9. Common pitfalls​

10. Extensions​

11. Module integration​

12. Portfolio framing​

13. Local source backbone​

Extra checkpoints from the book chunks​

14. Deep project spec​

Project contract​

Source-backed reading map​

Milestone map​

Test matrix​

Design notes required​

Portfolio evidence​

Source​