Skip to main content

JOINs and Set Operations: Inner, Outer, Semi, Anti

What This Concept Is

A join combines two relations into one. Five varieties cover almost everything:

  • Inner join R INNER JOIN S ON p: keep pairs (r, s) where p is true. Algebra: sigma_p(R x S).
  • Left outer join R LEFT JOIN S ON p: inner-join result, plus every tuple of R that matched nothing, extended with NULLs on the S side.
  • Right outer join R RIGHT JOIN S ON p: symmetric; rarely used because you can always rewrite as left-join with swapped sides.
  • Full outer join R FULL JOIN S ON p: both extensions.
  • Cross join R CROSS JOIN S: the full Cartesian product. Usually a mistake.
  • Semi-join "tuples of R that match at least one tuple of S": written as WHERE EXISTS (SELECT 1 FROM S WHERE p) or WHERE r.x IN (SELECT s.x FROM s).
  • Anti-join "tuples of R with no match in S": WHERE NOT EXISTS (...) or LEFT JOIN ... WHERE s.key IS NULL.

Set operations require union-compatibility (same arity, compatible types):

  • UNION / UNION ALL: set / multiset union.
  • INTERSECT: tuples in both.
  • EXCEPT (SQL standard, PostgreSQL) / MINUS (Oracle): tuples in the first but not the second.

Why It Matters Here

  • joins are where query correctness most often fails silently: an inner join drops rows you wanted; an outer join keeps NULLs you did not expect
  • "the result has more rows than I expected" almost always means your join key is not unique on the side you thought it was
  • semi/anti joins are the idiomatic way to answer "has / has not" questions without multiplying the row count
  • set operators express cleanly what JOIN would make awkward (e.g., "customers who bought X but not Y")

Concrete Example

Schema:

CREATE TABLE customer(id INT PRIMARY KEY, name TEXT);
CREATE TABLE orders(id INT PRIMARY KEY, customer_id INT REFERENCES customer(id), total NUMERIC);

INSERT INTO customer VALUES (1,'Alice'), (2,'Bob'), (3,'Carol');
INSERT INTO orders VALUES (10,1,50), (11,1,30), (12,2,70);

Inner join (who has ordered):

SELECT c.name, o.total
FROM customer c INNER JOIN orders o ON o.customer_id = c.id
ORDER BY c.name, o.total;
-- name | total
-- -------+-------
-- Alice | 30
-- Alice | 50
-- Bob | 70

Left outer join (every customer, even those who have not ordered):

SELECT c.name, o.total
FROM customer c LEFT JOIN orders o ON o.customer_id = c.id
ORDER BY c.name;
-- name | total
-- -------+-------
-- Alice | 50
-- Alice | 30
-- Bob | 70
-- Carol | NULL

Anti-join (customers who have never ordered):

SELECT c.name FROM customer c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- name
-- -------
-- Carol

Common Confusion / Misconception

"ON and WHERE are the same." Only for inner joins. For a LEFT JOIN, a predicate on the right-side table belongs in ON, not WHERE:

-- Intent: every customer, plus their 2025 orders (NULL if none).
SELECT c.name, o.total
FROM customer c LEFT JOIN orders o
ON o.customer_id = c.id AND o.placed_at >= DATE '2025-01-01';
-- Correct: non-matching customers still appear.

SELECT c.name, o.total
FROM customer c LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.placed_at >= DATE '2025-01-01';
-- BUG: this silently turns the left join into an inner join,
-- because the WHERE filters out the NULL extensions.

"NULL = NULL is true." No. In three-valued logic, NULL = NULL is NULL, which is neither true nor false. That is why outer joins need IS NULL, not = NULL, to detect non-matches.

"UNION and UNION ALL are the same thing with different keystrokes." UNION removes duplicates (and therefore sorts or hashes the combined result). UNION ALL just concatenates. UNION ALL is strictly faster when you already know duplicates are impossible.

How To Use It

For every join you write:

  1. State the cardinality you expect ("one row per order, joined to its customer").
  2. Identify which join key is unique on which side. If both sides are 1:many, you will get a multiplicative blow-up.
  3. Pick the join type on intent: "every left" -> LEFT; "match or not, I do not care" -> INNER; "only lefts with matches" -> semi-join; "only lefts without matches" -> anti-join.
  4. If filtering the right side of a left join, put the filter in ON, not WHERE.

Check Yourself

  1. How would you rewrite SELECT * FROM R RIGHT JOIN S ON p as a LEFT JOIN?
  2. Why is EXCEPT ALL sometimes needed instead of EXCEPT?
  3. Under what condition does SELECT * FROM R INNER JOIN S ON r.id = s.id return more rows than R has?

Mini Drill or Application

Given Customer, Order, and OrderItem(order_id, product_id), write (a) an anti-join returning customers with no orders; (b) a semi-join returning customers who have ordered product 42; (c) a FULL OUTER JOIN listing customers and their orders while keeping both "no orders" and "orphan orders with missing customers" visible; (d) an EXCEPT query returning product ids that appear in at least one order but not in any order this month.

Read This Only If Stuck