Skip to main content

Relational Algebra: Selection, Projection, Join, Union

What This Concept Is

Relational algebra is a small closed algebra whose operators take relations and return relations. Six operators are enough to express every query you will ever write, and the SQL you will write in Cluster 2 is a thin skin over them.

Core operators:

  • Selection sigma_theta(R): the subset of tuples in R satisfying predicate theta. In SQL, WHERE.
  • Projection pi_{A,B}(R): keep only attributes A, B (and remove duplicate tuples). In SQL, SELECT DISTINCT col_list.
  • Cartesian product R x S: every tuple of R paired with every tuple of S. In SQL, FROM R, S.
  • Natural join R ⋈ S: sigma on shared attributes applied to R x S, then pi to deduplicate. In SQL, typically written as INNER JOIN ... ON.
  • Union / intersection / difference on union-compatible relations (same schema). In SQL, UNION / INTERSECT / EXCEPT.
  • Rename rho_S(R) and rho_{a -> b}(R): relabel the relation or an attribute. In SQL, AS.

Because every operator returns a relation, queries compose: the output of one is the input of another. This is why SQL lets you nest subqueries, build CTEs, and write FROM (SELECT ...) x.

Why It Matters Here

  • every SQL query has an equivalent relational-algebra expression; writing the algebra first is the fastest path to a correct query
  • SQL optimizers transform relational-algebra expressions (push selections, reorder joins) before they pick a physical plan; if you cannot read the algebra, you cannot read an EXPLAIN plan
  • performance intuitions ("filter before you join") are algebra rewrites in disguise
  • set operations (UNION, INTERSECT, EXCEPT) only make sense over union-compatible relations, which forces you to think in schemas

Concrete Example

Tables:

Employee(id, name, dept_id, salary)
Dept(dept_id, name)

English: "the names of all employees in the Engineering department who earn over 100k."

Algebra:

pi_{Employee.name} (
sigma_{Dept.name = 'Eng' AND salary > 100000} (
Employee ⋈ Dept
)
)

SQL:

SELECT e.name
FROM employee e
INNER JOIN dept d ON e.dept_id = d.dept_id
WHERE d.name = 'Eng' AND e.salary > 100000;

Both are the same query. A smart optimizer will push the filter on Dept.name = 'Eng' inside the join, because sigma_theta(R ⋈ S) == (sigma_theta R) ⋈ S when theta only mentions attributes of R. That rewrite is algebra, not syntax.

Common Confusion / Misconception

"SELECT in SQL is the same as projection." Not quite. SELECT without DISTINCT preserves duplicates (bag semantics). Projection pi removes them (set semantics). This is the same gap discussed in Concept 1.

"A join is a separate primitive." Conceptually, no. A natural join is sigma over a product followed by pi. Treating it that way explains why ON conditions and WHERE conditions are sometimes interchangeable on inner joins but not on outer joins (Concept 5).

"UNION of any two results works." No. Both sides must be union-compatible: same arity and compatible types in each position. SELECT a, b FROM R UNION SELECT c, d FROM S is valid only if the domains line up.

How To Use It

For any SQL query, before you run it:

  1. Name the relations you will read.
  2. Write a one-line algebra sketch: outermost operation first, nesting inward.
  3. Check that every operator's inputs are union-compatible where required, and that joins really do share a join attribute.
  4. Translate to SQL. If the translation feels hard, your algebra sketch is probably wrong, not your SQL.

Check Yourself

  1. Write pi and sigma in the algebra expression for "the customer ids that have placed at least one order in 2025."
  2. Explain why sigma_{A > 10}(pi_B(R)) is illegal but pi_B(sigma_{A > 10}(R)) is fine.
  3. Give a case where R ⋈ S returns zero tuples even though R and S each have many.

Mini Drill or Application

Translate each English question into (a) relational algebra and (b) SQL against the schema Customer(id, name, country), Order(id, customer_id, amount, placed_at), Product(id, name, price), OrderItem(order_id, product_id, qty):

  1. Names of customers in Germany.
  2. Order ids that include product 'Widget'.
  3. Customers who have never placed an order. (Hint: EXCEPT or anti-join.)
  4. Products that appear in every order. (Hint: division, expressible via NOT EXISTS.)

Read This Only If Stuck