Skip to main content

SQL Code Katas

Short, self-contained SQL exercises that exercise every cluster of this module. Each kata has a pass criterion you can check. Target: 30-45 minutes total.

Kata 1 - Joins That Answer Business Questions

Schema:

CREATE TABLE customer(
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
region TEXT NOT NULL,
signed_up_on DATE NOT NULL
);
CREATE TABLE orders(
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customer(id),
total_cents INT NOT NULL CHECK (total_cents >= 0),
placed_at TIMESTAMPTZ NOT NULL
);
CREATE TABLE order_item(
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL,
qty INT NOT NULL CHECK (qty > 0),
unit_cents INT NOT NULL CHECK (unit_cents >= 0),
PRIMARY KEY (order_id, product_id)
);

Write queries for:

  1. Customers in 'EU' who have placed 0 orders (use anti-join).
  2. For every region, the median order total in 2025 (use percentile_cont(0.5) WITHIN GROUP (ORDER BY total_cents)).
  3. Customers who ordered at least 3 different products, ranked by distinct-product count.
  4. The monthly year-over-year revenue growth per region (self-join or window function with LAG).
  5. The top-3 highest-spending customers per region -- ties broken by earliest signed_up_on (ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spend DESC, signed_up_on ASC)).

Pass criterion: each query runs on a seeded dataset and the result matches hand-computed expectations for at least one row.

Kata 2 - Normalize a Tangled Schema to 3NF

Starting table:

booking(
booking_id,
guest_id, guest_name, guest_email,
hotel_id, hotel_name, hotel_city, hotel_country,
room_number, room_type, nightly_rate,
check_in, check_out, total_price
)

FDs:

  • guest_id -> guest_name, guest_email
  • hotel_id -> hotel_name, hotel_city, hotel_country
  • (hotel_id, room_number) -> room_type, nightly_rate
  • booking_id -> guest_id, hotel_id, room_number, check_in, check_out
  • check_in, check_out, nightly_rate -> total_price (derived)

Tasks:

  1. Identify candidate keys and current normal form.
  2. Decompose to 3NF. Show each decomposition step.
  3. Write CREATE TABLE statements with keys, FKs, CHECKs, and ON DELETE actions.
  4. Justify that the final schema is in 3NF.
  5. State whether total_price should be stored or computed, and defend the choice.

Pass criterion: no non-key attribute is transitively dependent on the key; all foreign keys resolve to a primary key in another table.

Kata 3 - Optimize a Slow Query with EXPLAIN

On a dataset of 50M orders and 10M customer, the following query takes 12 seconds:

SELECT c.name, SUM(o.total_cents) / 100.0 AS revenue_dollars
FROM orders o
JOIN customer c ON c.id = o.customer_id
WHERE c.region = 'EU'
AND date_part('year', o.placed_at) = 2025
GROUP BY c.name
ORDER BY revenue_dollars DESC
LIMIT 20;

Tasks:

  1. Run EXPLAIN (ANALYZE, BUFFERS). Identify the dominant node.
  2. Spot the non-sargable predicate on placed_at. Rewrite it.
  3. Propose one or two indexes that would support the final plan. Write CREATE INDEX statements.
  4. Re-run EXPLAIN ANALYZE. Show the new total execution time and the plan shape.
  5. Briefly discuss whether a materialized view would be justified, and what its refresh cadence should be.

Pass criterion: the rewritten query uses an Index Scan (or Index Only Scan) on orders and total execution time is at least 5× lower on the same hardware.

Kata 4 - Design a Small Schema for a Given Domain

Choose one domain:

  • a library system (books, copies, loans, members, holds)
  • a bug tracker (projects, issues, comments, assignees, labels, milestones)
  • an online course platform (courses, modules, lessons, enrollments, submissions, grades)

Deliverables:

  1. An ER diagram (Mermaid).
  2. DDL in PostgreSQL including every constraint needed to enforce the business rules you write down.
  3. Five representative SQL queries that the system should serve well (at least one aggregation, one anti-join, one join of three or more tables, one with a window function).
  4. For each of those queries, EXPLAIN ANALYZE against a seeded dataset of at least 100k rows and a note on the index that supports it.
  5. A one-page design note that justifies: primary key choices (surrogate vs natural), any denormalization, and the online-migration strategy you would use to add a new required column post-launch.

Pass criterion: every query runs against the seeded dataset, and every query's plan has no Seq Scan on a table larger than 10k rows unless you justify it.

Evidence Check

This page is complete only if you have:

  • passing solutions for Kata 1 (all five queries)
  • a 3NF decomposition and DDL for Kata 2
  • before-and-after EXPLAIN ANALYZE outputs for Kata 3
  • an end-to-end schema, queries, and design note for Kata 4

SQL Fluency Katas

Use the same schema from the modeling worksheet.

  1. Write DDL for all tables with NOT NULL, UNIQUE, CHECK, and foreign-key constraints.
  2. Insert seed data for at least five guests, five tables, three shifts, ten reservations, and twenty orders.
  3. Update a reservation time and record the SQL you used.
  4. Delete a cancelled reservation only if no payment exists; explain the constraint or transaction that protects this rule.
  5. Write an inner join from reservation to guest.
  6. Write a left join that shows tables with no current reservation.
  7. Write a subquery that finds guests with more than one reservation this month.
  8. Group orders by day and sort by revenue descending.
  9. Count open orders per server.
  10. Add one index, run the query before and after, and capture the plan difference.