Function Composition, map, filter, reduce as Abstractions
What This Concept Is
Once procedures are first-class (Concept 04), a few shape-shifting operations on sequences show up everywhere:
map(f, xs)-- applyfto each elementfilter(p, xs)-- keep only elements satisfyingpreduce(op, init, xs)-- fold a sequence into one value usingopcompose(f, g)--(compose f g)(x) = f(g(x))
SICP §2.2.3 calls these sequences as conventional interfaces. The point is not that you save a few lines. The point is that each of these is an abstraction over a recurring loop, and once you have them, the shape of data becomes visible in the code instead of being buried in indexes.
They are SUPPORTING in this module because they are applications of Concepts 04 and 05 -- useful as a daily vocabulary, but the deep ideas are HOFs and closures.
Why It Matters Here
- Fewer accidental bugs. Off-by-one index errors vanish when the loop is gone.
- Composable pipelines.
reduce(+, 0, map(square, filter(odd?, xs)))is visibly what you wanted -- sum of squares of the odd ones. - Common vocabulary across teams and languages. Every modern language has these under slightly different names (
collect/select/inject,filter/where,reduce/fold/inject).
For this module, map/filter/reduce also show up inside the interpreter: map eval args env evaluates every argument in order, reduce is how environments thread state during sequential evaluation, and compose is a natural tool when chaining passes in a compiler (Concept 12).
Concrete Example
SICP sum of odd squares, directly built:
(define (sum-odd-squares tree)
(accumulate +
0
(map square
(filter odd?
(enumerate-tree tree)))))
Note the four-layer pipeline: enumerate -> filter -> map -> accumulate. Each layer is a HOF. Each layer works on any sequence; none of them know what tree is beyond "something you can walk."
Python:
from functools import reduce
def sum_odd_squares(xs):
return reduce(lambda a, b: a + b,
map(lambda n: n * n,
filter(lambda n: n % 2 == 1, xs)),
0)
C for contrast (writing the same loop by hand):
int sum_odd_squares(const int *xs, int n) {
int s = 0;
for (int i = 0; i < n; i++)
if (xs[i] % 2 == 1) s += xs[i] * xs[i];
return s;
}
The C version is perfectly reasonable -- but every new shape of pipeline rewrites the loop and re-invites an off-by-one. The Scheme/Python version changes only by swapping a layer.
Common Confusion / Misconception
- "Map is faster than a loop." Usually no -- in a general-purpose language, it is the same order of cost. The win is clarity and correctness, not cycles.
- Chained
maps that do two things. If yourmapcallback has anifin it that filters, you wantedfilter + map, not one fused function that returnsNone-or-value. - Reduce with the wrong identity.
reduce(*, xs)without an initial value of1blows up on an empty sequence. Always pass the identity. - Composing from the wrong direction.
(compose f g)(x) = f(g(x)); pipelines readpipe(f, g)(x) = g(f(x)). Pick a convention and stick with it.
A subtle one: map and filter in Python return lazy iterators. In Scheme map is eager on lists (but not on streams; see Concept 14). Mixing lazy and eager in one pipeline is a classic foot-gun -- a filter that silently consumes nothing because nobody iterated.
How To Use It
- Write the pipeline as a chain of single-responsibility steps. Name any non-trivial transformer.
- Keep the source sequence first in your reading:
reduce(+, 0, map(square, filter(odd?, xs)))reads inside-out, pipelines read left-to-right. - Reach for
reduceonly whenmap/filterare not enough -- many apparent reduces are really maps followed by asum/max/anyspecialized HOF. - When a single step becomes complex, factor it into a named function; do not write four-line lambdas.
Check Yourself
- Write the type (in pseudo-syntax) of
map,filter,reduce,compose. - Why is a filter-then-map pipeline preferable to a single map that returns
Nonefor rejected items? - In which of Scheme and Python does
mapproduce a list eagerly, and in which does it produce an iterator?
Mini Drill or Application
Given a list of user records (name, age, active), produce:
- a list of active users' names
- the sum of ages of active users
- the oldest active user's name
Do each as a pipeline of filter / map / reduce in both Python and Scheme. Then write a fourth variant using a single reduce that produces all three results in one pass. Compare which reads better.