Streams, Laziness, and Infinite Data Structures
What This Concept Is
A stream (SICP §3.5) is a list-like data structure whose tail is not computed until it is asked for. In Scheme this is built with two primitives:
(delay expr)-- produce a promise wrappingexprunevaluated(force p)-- evaluate the promise, memoize the result, and return it
A stream is a pair where the head is a value and the tail is a delayed (and, once forced, memoized) stream. You build infinite sequences -- primes, Fibonacci numbers, Fourier coefficients -- as streams. You only pay for the prefix you actually use.
SICP's radical claim is that streams give you most of the modeling power of mutable state without the mutation. A bank account with history is a stream of (operation, balance) pairs. Feedback control is a stream feeding itself. Time becomes the index of a stream, not a hidden dimension of the environment.
Why It Matters Here
Streams belong to this cluster because they are the pure alternative to the story told in Concept 13. Chapter 3 of SICP first introduces set! and the costs of mutation, then, in §3.5, offers streams as a way to model the same "changing-over-time" phenomena in a functional world.
Practically:
- Infinite-sequence programming. Integers, primes, random numbers, Taylor series -- expressible as one-liners on streams; unnatural with eager lists.
- Generators in mainstream languages. Python generators, C#/JS iterators, Rust's
Iteratortrait are all streams by another name. - Pipeline composition. A pipeline of stream-to-stream transformers is the same mental model as Concept 06's map/filter/reduce -- but now it can operate on infinite data without consuming memory proportional to the full sequence.
- Performance. Lazy evaluation plus memoization often replaces a hand-rolled cache.
Concrete Example
Integers from 1 as a stream (SICP §3.5.2):
(define (integers-from n)
(cons-stream n (integers-from (+ n 1))))
(define integers (integers-from 1))
(stream-ref integers 0) ; 1
(stream-ref integers 1000) ; 1001
cons-stream is syntax sugar: (cons-stream a b) = (cons a (delay b)). The recursive call is wrapped in a delay, so integers-from does not spin forever.
Sieve of Eratosthenes on streams:
(define (sieve s)
(cons-stream
(stream-car s)
(sieve (stream-filter
(lambda (x) (not (divisible? x (stream-car s))))
(stream-cdr s)))))
(define primes (sieve (integers-from 2)))
(stream-ref primes 10) ; 31
Every time you ask for (stream-ref primes k), the evaluator forces exactly as much of the stream as needed. The filter chain looks like an infinite pipe; in memory, only a short prefix ever exists.
Python generators are streams in disguise:
def integers_from(n):
while True:
yield n
n += 1
def sieve(s):
p = next(s)
yield p
yield from sieve(x for x in s if x % p != 0)
Same algorithm; same shape.
Common Confusion / Misconception
- "A stream is a list of promises." Closer: a stream is a promise of a pair, or a pair whose cdr is a promise. The subtlety matters when debugging: only the tail is delayed.
- "Delay is free." Every
delayallocates a closure; forcing it memoizes. For small prefixes, the overhead is real. Streams win when laziness saves much more work than it costs. - "Streams replace mutation everywhere." They do not. Some problems -- GUIs, databases, caches -- are fundamentally about shared mutable state. Streams are a tool for modeling time functionally; they are not a universal replacement.
- Space leaks. A well-known hazard: holding a reference to the head of a stream while traversing keeps the prefix in memory. If you don't need it, don't keep it. Haskell programmers have libraries of anti-leak idioms.
- Infinite streams + side effects. If the tail-producing expression has side effects,
forcewill not re-run them after memoization. This is usually what you want; occasionally it is the bug.
How To Use It
- When a data flow is naturally potentially infinite (log streams, input events, generated sequences), reach for streams or generators before mutable buffers.
- When you catch yourself building "the first N values of an infinite sequence," that is the stream signal.
- Keep pure stream transformers (
map,filter,zip-with,scan) in a shared library. Compose them to express pipelines. - In languages without streams, prefer generators/iterators over eagerly materializing lists when the data size is unbounded or very large.
Check Yourself
- Why must the sieve's recursive reference to itself be wrapped in a delay?
- What is the difference between
delayand memoizeddelay? - Give one real system where replacing mutable state with a stream abstraction would reduce bugs.
Mini Drill or Application
In either Scheme (with cons-stream/stream-car/stream-cdr) or Python (with generators), implement:
- the stream of natural numbers
- the stream of Fibonacci numbers in one definition that references itself
- a stream that emits moving averages of window size 3 of any input stream
- use the moving-average stream on
integers-from 1to print the first 10 averages
In one paragraph, describe what would change if you had to express the same pipeline with eager lists, and why that would be unsatisfactory for unbounded input.