Skip to main content

Subqueries, CTEs, and Window Functions

What This Concept Is

SQL composes queries in three ways. Each solves a class of problems the others cannot solve cleanly.

Subqueries - a SELECT inside another statement. Three flavors:

  • Scalar subquery: returns one column, one row. Usable anywhere a value is expected. SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS n FROM customer c.
  • IN / EXISTS subquery: returns a set for membership tests (semi/anti joins).
  • Derived table (subquery in FROM): returns a relation, given a name with AS. SELECT x.region, x.total FROM (SELECT region, SUM(amount) AS total FROM sale GROUP BY region) x WHERE x.total > 1000.

CTEs (Common Table Expressions) - WITH name AS (SELECT ...) SELECT .... Named subqueries that:

  • make long queries readable by pulling each step out;
  • can chain: WITH a AS (...), b AS (... FROM a) SELECT ... FROM b;
  • can be recursive (WITH RECURSIVE) for hierarchies and graph walks;
  • can always be inlined into the main query (modulo recursion and materialization hints).

Window functions - aggregates that do not collapse rows. SELECT name, salary, AVG(salary) OVER (PARTITION BY dept) AS dept_avg FROM employee. For each row, the window specification defines a frame of other rows; the function is computed over that frame. Common functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM, AVG, NTILE.

Why It Matters Here

  • any "top N per group" or "percentage of group total" question is a one-liner with window functions and a pyramid of pain without them
  • CTEs are the only reasonable way to write a query more than 20 lines long
  • recursive CTEs are how you walk trees and graphs inside the database, without an application round-trip per level
  • scalar subqueries in SELECT are easy to write and easy to make slow; knowing when they become correlated loops is part of the Cluster-5 performance story

Concrete Example

Schema:

CREATE TABLE employee(id INT, dept TEXT, salary NUMERIC);
INSERT INTO employee VALUES
(1,'Eng',120000), (2,'Eng',95000), (3,'Eng',150000),
(4,'Sales',110000), (5,'Sales',90000);

Question: "For each department, list employees ranked by salary and each one's share of the department total."

With a subquery per row (ugly):

SELECT e.id, e.dept, e.salary,
(SELECT SUM(salary) FROM employee e2 WHERE e2.dept = e.dept) AS dept_total
FROM employee e;

With a CTE + window functions (clean):

WITH ranked AS (
SELECT id, dept, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank,
SUM(salary) OVER (PARTITION BY dept) AS dept_total
FROM employee
)
SELECT id, dept, salary, dept_rank,
ROUND(100.0 * salary / dept_total, 1) AS pct_of_dept
FROM ranked
ORDER BY dept, dept_rank;

Expected output:

 id | dept  | salary | dept_rank | pct_of_dept
----+-------+--------+-----------+-------------
3 | Eng | 150000 | 1 | 41.1
1 | Eng | 120000 | 2 | 32.9
2 | Eng | 95000 | 3 | 26.0
4 | Sales | 110000 | 1 | 55.0
5 | Sales | 90000 | 2 | 45.0

Recursive CTE example - walk an org chart from the CEO down:

WITH RECURSIVE tree AS (
SELECT id, manager_id, name, 1 AS depth FROM employee WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.manager_id, e.name, t.depth + 1
FROM employee e JOIN tree t ON e.manager_id = t.id
)
SELECT * FROM tree ORDER BY depth, name;

Common Confusion / Misconception

"A CTE always materializes." In PostgreSQL 12+ the optimizer may inline non-recursive CTEs. If you need materialization as an optimization fence, write WITH x AS MATERIALIZED (...).

"Window functions replace GROUP BY." They are complementary. GROUP BY collapses rows into one per group. Windows add a column computed over a frame while keeping all rows.

"Correlated subqueries are always slow." Not always; they can become nested-loop joins or be transformed by the optimizer into semi-joins. But any correlated subquery inside a SELECT list deserves a second look (see Concept 14 on N+1).

How To Use It

When writing a non-trivial query:

  1. Write the algebra sketch first (Concept 3).
  2. If any step is naturally "compute X once per group but keep the rows," use a window function.
  3. If the query has more than two steps, use a CTE per step. Readability beats cleverness.
  4. Use correlated subqueries only when an equivalent join or semi-join would distort cardinality.
  5. Reach for recursive CTEs for tree/graph walks; use an iteration fence (depth < N) to avoid runaways on cyclic data.

Check Yourself

  1. Rewrite "top 3 highest-paid employees per department" using ROW_NUMBER and a CTE.
  2. Give one case where a scalar subquery is cleaner than a join.
  3. What is the difference between RANK, DENSE_RANK, and ROW_NUMBER when salaries tie?

Mini Drill or Application

Given sale(region, amount, sold_at):

  1. Using window functions, produce a running year-to-date total per region by sale date.
  2. Using a CTE, produce the top-2 highest-grossing days per region.
  3. Using WITH RECURSIVE, given parent(child, parent), produce the full ancestor chain for a given node.

Read This Only If Stuck