Relational Algebra: Selection, Projection, Join, Union
What This Concept Is
Relational algebra is a small closed algebra whose operators take relations and return relations. Six operators are enough to express every query you will ever write, and the SQL you will write in Cluster 2 is a thin skin over them.
Core operators:
- Selection
sigma_theta(R): the subset of tuples inRsatisfying predicatetheta. In SQL,WHERE. - Projection
pi_{A,B}(R): keep only attributesA, B(and remove duplicate tuples). In SQL,SELECT DISTINCT col_list. - Cartesian product
R x S: every tuple ofRpaired with every tuple ofS. In SQL,FROM R, S. - Natural join
R ⋈ S:sigmaon shared attributes applied toR x S, thenpito deduplicate. In SQL, typically written asINNER JOIN ... ON. - Union / intersection / difference on union-compatible relations (same schema). In SQL,
UNION/INTERSECT/EXCEPT. - Rename
rho_S(R)andrho_{a -> b}(R): relabel the relation or an attribute. In SQL,AS.
Because every operator returns a relation, queries compose: the output of one is the input of another. This is why SQL lets you nest subqueries, build CTEs, and write FROM (SELECT ...) x.
Why It Matters Here
- every SQL query has an equivalent relational-algebra expression; writing the algebra first is the fastest path to a correct query
- SQL optimizers transform relational-algebra expressions (push selections, reorder joins) before they pick a physical plan; if you cannot read the algebra, you cannot read an
EXPLAINplan - performance intuitions ("filter before you join") are algebra rewrites in disguise
- set operations (
UNION,INTERSECT,EXCEPT) only make sense over union-compatible relations, which forces you to think in schemas
Concrete Example
Tables:
Employee(id, name, dept_id, salary)
Dept(dept_id, name)
English: "the names of all employees in the Engineering department who earn over 100k."
Algebra:
pi_{Employee.name} (
sigma_{Dept.name = 'Eng' AND salary > 100000} (
Employee ⋈ Dept
)
)
SQL:
SELECT e.name
FROM employee e
INNER JOIN dept d ON e.dept_id = d.dept_id
WHERE d.name = 'Eng' AND e.salary > 100000;
Both are the same query. A smart optimizer will push the filter on Dept.name = 'Eng' inside the join, because sigma_theta(R ⋈ S) == (sigma_theta R) ⋈ S when theta only mentions attributes of R. That rewrite is algebra, not syntax.
Common Confusion / Misconception
"SELECT in SQL is the same as projection." Not quite. SELECT without DISTINCT preserves duplicates (bag semantics). Projection pi removes them (set semantics). This is the same gap discussed in Concept 1.
"A join is a separate primitive." Conceptually, no. A natural join is sigma over a product followed by pi. Treating it that way explains why ON conditions and WHERE conditions are sometimes interchangeable on inner joins but not on outer joins (Concept 5).
"UNION of any two results works." No. Both sides must be union-compatible: same arity and compatible types in each position. SELECT a, b FROM R UNION SELECT c, d FROM S is valid only if the domains line up.
How To Use It
For any SQL query, before you run it:
- Name the relations you will read.
- Write a one-line algebra sketch: outermost operation first, nesting inward.
- Check that every operator's inputs are union-compatible where required, and that joins really do share a join attribute.
- Translate to SQL. If the translation feels hard, your algebra sketch is probably wrong, not your SQL.
Check Yourself
- Write
piandsigmain the algebra expression for "the customer ids that have placed at least one order in 2025." - Explain why
sigma_{A > 10}(pi_B(R))is illegal butpi_B(sigma_{A > 10}(R))is fine. - Give a case where
R ⋈ Sreturns zero tuples even thoughRandSeach have many.
Mini Drill or Application
Translate each English question into (a) relational algebra and (b) SQL against the schema Customer(id, name, country), Order(id, customer_id, amount, placed_at), Product(id, name, price), OrderItem(order_id, product_id, qty):
- Names of customers in Germany.
- Order ids that include product 'Widget'.
- Customers who have never placed an order. (Hint:
EXCEPTor anti-join.) - Products that appear in every order. (Hint: division, expressible via
NOT EXISTS.)