Skip to main content

Query Execution Models: Volcano / Iterator, Vectorized, JIT

What This Concept Is

Once the planner has produced a tree of relational operators (scan, filter, join, aggregate, sort), something has to actually run that tree. There are three canonical execution models, and modern engines typically use a mix.

Volcano / Iterator (tuple-at-a-time). Every operator exposes open(), next(), close(). next() returns one tuple. A query executes by the root calling next() which calls its children's next(), pulling one tuple at a time through the pipeline. Proposed by Graefe in the late 80s; still the backbone of PostgreSQL.

  • Pro: simple, composable, handles arbitrary operators and user-defined functions.
  • Con: one virtual function call per operator per tuple. Across a deep plan on millions of rows, that is billions of calls. Poor cache locality because each operator's state is visited once per tuple and rotated.

Vectorized (batch-at-a-time). Operators still pull, but each next() returns a vector of tuples (typically ~1024 values). Inside an operator, tight loops process the whole vector at once -- SIMD-friendly, branch-predictable, cache-resident. Pioneered in MonetDB/X100 and now standard in DuckDB, ClickHouse, Vectorwise, Apache Arrow.

  • Pro: one virtual call per vector, not per tuple -- 1000x less dispatch overhead. Vector loops vectorize in the compiler. Cache-friendly.
  • Con: larger memory footprint per operator; some operators (control flow, UDFs) don't fit the vector model cleanly.

JIT / code-generated (whole-pipeline). The engine compiles the plan into native code at runtime, typically via LLVM. The generated code fuses multiple operators into one tight loop with no virtual dispatch at all. Neumann's HyPer popularized this; Spark SQL, Impala, Postgres (since 11, for expressions), and Snowflake use variants.

  • Pro: the fastest execution for long-running queries because the inner loop looks like handwritten C.
  • Con: compile time is non-zero -- typically 10-100 ms. Bad for very short queries. Harder to debug.

Why It Matters Here

"The optimizer picked a plan" is only half the story. How that plan runs explains why a warehouse on the same plan can be 10x or 100x faster than an OLTP engine. The three models correspond roughly to:

  • OLTP engines: volcano, because queries are small and the dispatch overhead is tolerable
  • columnar analytical engines: vectorized, because scan volume is the bottleneck
  • long analytical queries: JIT, because the amortized compile cost is tiny next to minutes of runtime

Reading an EXPLAIN ANALYZE from a different engine without knowing which model it uses leads to false conclusions about the plan.

Concrete Example -- Iterator Chain

Query: SELECT AVG(amount) FROM orders WHERE status = 'SHIPPED'.

Volcano pipeline: Aggregate(Filter(SeqScan(orders))).

Aggregate.next():
loop:
t = Filter.next()
if t is None: return final avg
acc += t.amount; count += 1

Filter.next():
loop:
t = SeqScan.next()
if t is None: return None
if t.status == 'SHIPPED': return t

SeqScan.next():
if buffer empty: read next page
return next tuple from buffer

For 10^7 rows, this is 10^7 calls to Filter.next(), each branching on status, each paying virtual-function overhead. In a vectorized version, SeqScan.next() returns 1024 tuples; Filter applies the predicate to the whole vector, generating a bitmap of matches; Aggregate sums the matching lanes. Roughly 10,000 vectors instead of 10^7 tuples -- and the inner loops vectorize.

Concrete Example -- JIT

The same query compiled to native code effectively becomes:

double sum = 0; long count = 0;
for (Page* p = first_page; p != NULL; p = p->next) {
for (Tuple* t = page_begin(p); t < page_end(p); ++t) {
if (t->status == SHIPPED_CODE) {
sum += t->amount;
count += 1;
}
}
}
return sum / count;

No virtual dispatch, no interpretation, no materialization of intermediate vectors. The inner loop is a handful of machine instructions per tuple. For queries over billions of rows, this is where real performance lives.

Common Confusion / Misconception

"Vectorized means SIMD." SIMD is a common payoff, not a definition. Vectorized execution means operators process fixed-size batches. The compiler often vectorizes the inner loops with SIMD, but even without SIMD the dispatch savings alone are enormous.

"JIT always wins." Not for short queries. A 0.1 ms point lookup doesn't want to wait 20 ms to compile. Engines like Postgres pick a threshold by cost estimate before deciding to JIT-compile expressions.

How To Use It

When you see execution-time numbers:

  1. Ask which model the engine uses, especially the inner loop structure.
  2. In benchmarks, compare engines only under comparable workloads (pointy OLTP vs long analytical).
  3. For a plan that surprises you, look for materialization boundaries -- places where one model hands off to another.
  4. If a query is unexpectedly slow in a volcano engine, a JIT-enabled engine might change the picture; do not draw conclusions about plan shape from raw numbers alone.

Check Yourself

  1. Why does per-tuple virtual dispatch matter more on big scans than on small lookups?
  2. Why does vectorized execution play well with columnar storage?
  3. When would you not want to JIT-compile a plan even though the engine supports it?

Mini Drill or Application

For each scenario, name which execution model (volcano, vectorized, JIT) you would expect to dominate and why:

  1. OLTP primary-key lookups returning one row at 10^5 QPS.
  2. A 15-minute analytical query aggregating 10^10 rows.
  3. Short interactive dashboard queries over a warehouse-scale fact table.
  4. A UDF-heavy ETL pipeline with unpredictable control flow.
  5. A real-time stream processor evaluating simple predicates on 10^6 events/s.

Read This Only If Stuck