Skip to main content

Keys: Primary, Candidate, Foreign, Surrogate vs Natural

What This Concept Is

A key is a minimal set of attributes whose values uniquely identify a tuple in a relation. The full zoology:

  • Superkey: any set of attributes that uniquely identifies a tuple. Usually not minimal.
  • Candidate key: a minimal superkey. Remove any attribute and uniqueness breaks.
  • Primary key: the candidate key you designated as the canonical identifier. One per relation. Implies NOT NULL + UNIQUE.
  • Alternate key: any candidate key that is not the primary key.
  • Foreign key: a set of attributes in relation R whose values must match the primary (or unique) key of relation S. This is referential integrity.
  • Natural key: a candidate key that has meaning in the real world (email, ISBN, country code).
  • Surrogate key: an artificial identifier with no meaning outside the database (SERIAL, BIGINT IDENTITY, UUID).

Keys are the database's identity mechanism and its integrity mechanism. Without keys, you cannot say what a row "is," and you cannot safely join tables.

Why It Matters Here

  • every JOIN you write implicitly assumes some attribute is key-like on at least one side; if it is not, your result multiplies
  • foreign keys are the difference between "the orders table mentions a customer id" and "the orders table guarantees every customer id refers to a real customer"
  • the surrogate-vs-natural-key choice shows up in every new table you create, and getting it wrong is a multi-year mistake
  • normal forms (Cluster 3) are defined in terms of keys and functional dependencies

Concrete Example

A Book relation has natural candidate key isbn and we add a surrogate key book_id for internal use:

CREATE TABLE book (
book_id BIGSERIAL PRIMARY KEY,
isbn TEXT NOT NULL UNIQUE,
title TEXT NOT NULL,
published DATE
);

CREATE TABLE loan (
loan_id BIGSERIAL PRIMARY KEY,
book_id BIGINT NOT NULL REFERENCES book(book_id) ON DELETE RESTRICT,
borrower TEXT NOT NULL,
loaned_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
  • book.book_id is the primary (surrogate) key.
  • book.isbn is an alternate candidate (natural) key, enforced by UNIQUE.
  • loan.book_id is a foreign key into book(book_id).
  • ON DELETE RESTRICT means deleting a book that has active loans is rejected.

Expected behavior:

-- INSERT INTO book(isbn, title) VALUES ('978-0-13-496095-2', 'SICP');
-- INSERT INTO loan(book_id, borrower) VALUES (999, 'Carol');
-- ERROR: insert or update on table "loan" violates foreign key constraint ...

Common Confusion / Misconception

"Primary keys have to be integers." No. Any unique, non-null, stable identifier works. (country_code, license_plate) is a perfectly fine composite primary key for a vehicle registry.

"Use id everywhere because natural keys change." Partly true, but the lesson is "stable identity matters," not "avoid natural keys." Use surrogate keys for internal joins and foreign keys; keep natural keys as UNIQUE constraints so the real-world identifier is still enforced. The two are not in conflict.

"A UNIQUE constraint is a primary key." No. Primary key = UNIQUE + NOT NULL + exactly one per table + typically the default target for foreign keys. UNIQUE alone permits NULLs (in most dialects NULLs are treated as distinct).

How To Use It

For every new table:

  1. Write one English sentence for what "one row of this table" means.
  2. Identify every candidate key. There is usually more than one.
  3. Choose a surrogate primary key if any natural key is long, composite, mutable, or leaky.
  4. Keep every natural candidate key as a UNIQUE constraint so the real-world uniqueness is still enforced by the database.
  5. For every reference to another table, declare it as a FOREIGN KEY with an explicit ON DELETE and ON UPDATE action.

Check Yourself

  1. Given Enrollment(student_id, course_id, semester, grade), name a candidate key and state your assumption.
  2. Why is UNIQUE alone not a substitute for PRIMARY KEY?
  3. When would you not add a surrogate key even though the natural key is composite?

Mini Drill or Application

Design primary and foreign keys for this mini-schema:

  • University(code, name) - every university has a unique short code like MIT.
  • Student(university_code, student_number, name) - student numbers are unique within a university but not globally.
  • Enrollment(student, course, semester, grade) - one student may take the same course in different semesters.

For each table: (a) list all candidate keys; (b) choose a primary key and justify; (c) declare all foreign keys with a sensible ON DELETE rule; (d) show the CREATE TABLE DDL.

Read This Only If Stuck