Skip to main content

Function Composition, map, filter, reduce as Abstractions

What This Concept Is

Once procedures are first-class (Concept 04), a few shape-shifting operations on sequences show up everywhere:

  • map(f, xs) -- apply f to each element
  • filter(p, xs) -- keep only elements satisfying p
  • reduce(op, init, xs) -- fold a sequence into one value using op
  • compose(f, g) -- (compose f g)(x) = f(g(x))

SICP §2.2.3 calls these sequences as conventional interfaces. The point is not that you save a few lines. The point is that each of these is an abstraction over a recurring loop, and once you have them, the shape of data becomes visible in the code instead of being buried in indexes.

They are SUPPORTING in this module because they are applications of Concepts 04 and 05 -- useful as a daily vocabulary, but the deep ideas are HOFs and closures.

Why It Matters Here

  • Fewer accidental bugs. Off-by-one index errors vanish when the loop is gone.
  • Composable pipelines. reduce(+, 0, map(square, filter(odd?, xs))) is visibly what you wanted -- sum of squares of the odd ones.
  • Common vocabulary across teams and languages. Every modern language has these under slightly different names (collect/select/inject, filter/where, reduce/fold/inject).

For this module, map/filter/reduce also show up inside the interpreter: map eval args env evaluates every argument in order, reduce is how environments thread state during sequential evaluation, and compose is a natural tool when chaining passes in a compiler (Concept 12).

Concrete Example

SICP sum of odd squares, directly built:

(define (sum-odd-squares tree)
(accumulate +
0
(map square
(filter odd?
(enumerate-tree tree)))))

Note the four-layer pipeline: enumerate -> filter -> map -> accumulate. Each layer is a HOF. Each layer works on any sequence; none of them know what tree is beyond "something you can walk."

Python:

from functools import reduce

def sum_odd_squares(xs):
return reduce(lambda a, b: a + b,
map(lambda n: n * n,
filter(lambda n: n % 2 == 1, xs)),
0)

C for contrast (writing the same loop by hand):

int sum_odd_squares(const int *xs, int n) {
int s = 0;
for (int i = 0; i < n; i++)
if (xs[i] % 2 == 1) s += xs[i] * xs[i];
return s;
}

The C version is perfectly reasonable -- but every new shape of pipeline rewrites the loop and re-invites an off-by-one. The Scheme/Python version changes only by swapping a layer.

Common Confusion / Misconception

  1. "Map is faster than a loop." Usually no -- in a general-purpose language, it is the same order of cost. The win is clarity and correctness, not cycles.
  2. Chained maps that do two things. If your map callback has an if in it that filters, you wanted filter + map, not one fused function that returns None-or-value.
  3. Reduce with the wrong identity. reduce(*, xs) without an initial value of 1 blows up on an empty sequence. Always pass the identity.
  4. Composing from the wrong direction. (compose f g)(x) = f(g(x)); pipelines read pipe(f, g)(x) = g(f(x)). Pick a convention and stick with it.

A subtle one: map and filter in Python return lazy iterators. In Scheme map is eager on lists (but not on streams; see Concept 14). Mixing lazy and eager in one pipeline is a classic foot-gun -- a filter that silently consumes nothing because nobody iterated.

How To Use It

  1. Write the pipeline as a chain of single-responsibility steps. Name any non-trivial transformer.
  2. Keep the source sequence first in your reading: reduce(+, 0, map(square, filter(odd?, xs))) reads inside-out, pipelines read left-to-right.
  3. Reach for reduce only when map/filter are not enough -- many apparent reduces are really maps followed by a sum/max/any specialized HOF.
  4. When a single step becomes complex, factor it into a named function; do not write four-line lambdas.

Check Yourself

  1. Write the type (in pseudo-syntax) of map, filter, reduce, compose.
  2. Why is a filter-then-map pipeline preferable to a single map that returns None for rejected items?
  3. In which of Scheme and Python does map produce a list eagerly, and in which does it produce an iterator?

Mini Drill or Application

Given a list of user records (name, age, active), produce:

  • a list of active users' names
  • the sum of ages of active users
  • the oldest active user's name

Do each as a pipeline of filter / map / reduce in both Python and Scheme. Then write a fourth variant using a single reduce that produces all three results in one pass. Compare which reads better.

Read This Only If Stuck