SQL Code Katas
Short, self-contained SQL exercises that exercise every cluster of this module. Each kata has a pass criterion you can check. Target: 30-45 minutes total.
Kata 1 - Joins That Answer Business Questions
Schema:
CREATE TABLE customer(
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
region TEXT NOT NULL,
signed_up_on DATE NOT NULL
);
CREATE TABLE orders(
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customer(id),
total_cents INT NOT NULL CHECK (total_cents >= 0),
placed_at TIMESTAMPTZ NOT NULL
);
CREATE TABLE order_item(
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL,
qty INT NOT NULL CHECK (qty > 0),
unit_cents INT NOT NULL CHECK (unit_cents >= 0),
PRIMARY KEY (order_id, product_id)
);
Write queries for:
- Customers in
'EU'who have placed 0 orders (use anti-join). - For every region, the median order total in 2025 (use
percentile_cont(0.5) WITHIN GROUP (ORDER BY total_cents)). - Customers who ordered at least 3 different products, ranked by distinct-product count.
- The monthly year-over-year revenue growth per region (self-join or window function with
LAG). - The top-3 highest-spending customers per region -- ties broken by earliest
signed_up_on(ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spend DESC, signed_up_on ASC)).
Pass criterion: each query runs on a seeded dataset and the result matches hand-computed expectations for at least one row.
Kata 2 - Normalize a Tangled Schema to 3NF
Starting table:
booking(
booking_id,
guest_id, guest_name, guest_email,
hotel_id, hotel_name, hotel_city, hotel_country,
room_number, room_type, nightly_rate,
check_in, check_out, total_price
)
FDs:
guest_id -> guest_name, guest_emailhotel_id -> hotel_name, hotel_city, hotel_country(hotel_id, room_number) -> room_type, nightly_ratebooking_id -> guest_id, hotel_id, room_number, check_in, check_outcheck_in, check_out, nightly_rate -> total_price(derived)
Tasks:
- Identify candidate keys and current normal form.
- Decompose to 3NF. Show each decomposition step.
- Write
CREATE TABLEstatements with keys, FKs,CHECKs, andON DELETEactions. - Justify that the final schema is in 3NF.
- State whether
total_priceshould be stored or computed, and defend the choice.
Pass criterion: no non-key attribute is transitively dependent on the key; all foreign keys resolve to a primary key in another table.
Kata 3 - Optimize a Slow Query with EXPLAIN
On a dataset of 50M orders and 10M customer, the following query takes 12 seconds:
SELECT c.name, SUM(o.total_cents) / 100.0 AS revenue_dollars
FROM orders o
JOIN customer c ON c.id = o.customer_id
WHERE c.region = 'EU'
AND date_part('year', o.placed_at) = 2025
GROUP BY c.name
ORDER BY revenue_dollars DESC
LIMIT 20;
Tasks:
- Run
EXPLAIN (ANALYZE, BUFFERS). Identify the dominant node. - Spot the non-sargable predicate on
placed_at. Rewrite it. - Propose one or two indexes that would support the final plan. Write
CREATE INDEXstatements. - Re-run
EXPLAIN ANALYZE. Show the new total execution time and the plan shape. - Briefly discuss whether a materialized view would be justified, and what its refresh cadence should be.
Pass criterion: the rewritten query uses an Index Scan (or Index Only Scan) on orders and total execution time is at least 5× lower on the same hardware.
Kata 4 - Design a Small Schema for a Given Domain
Choose one domain:
- a library system (books, copies, loans, members, holds)
- a bug tracker (projects, issues, comments, assignees, labels, milestones)
- an online course platform (courses, modules, lessons, enrollments, submissions, grades)
Deliverables:
- An ER diagram (Mermaid).
- DDL in PostgreSQL including every constraint needed to enforce the business rules you write down.
- Five representative SQL queries that the system should serve well (at least one aggregation, one anti-join, one join of three or more tables, one with a window function).
- For each of those queries,
EXPLAIN ANALYZEagainst a seeded dataset of at least 100k rows and a note on the index that supports it. - A one-page design note that justifies: primary key choices (surrogate vs natural), any denormalization, and the online-migration strategy you would use to add a new required column post-launch.
Pass criterion: every query runs against the seeded dataset, and every query's plan has no Seq Scan on a table larger than 10k rows unless you justify it.
Evidence Check
This page is complete only if you have:
- passing solutions for Kata 1 (all five queries)
- a 3NF decomposition and DDL for Kata 2
- before-and-after
EXPLAIN ANALYZEoutputs for Kata 3 - an end-to-end schema, queries, and design note for Kata 4
SQL Fluency Katas
Use the same schema from the modeling worksheet.
- Write DDL for all tables with
NOT NULL,UNIQUE,CHECK, and foreign-key constraints. - Insert seed data for at least five guests, five tables, three shifts, ten reservations, and twenty orders.
- Update a reservation time and record the SQL you used.
- Delete a cancelled reservation only if no payment exists; explain the constraint or transaction that protects this rule.
- Write an inner join from reservation to guest.
- Write a left join that shows tables with no current reservation.
- Write a subquery that finds guests with more than one reservation this month.
- Group orders by day and sort by revenue descending.
- Count open orders per server.
- Add one index, run the query before and after, and capture the plan difference.