Skip to main content

Functional Dependencies and Normal Forms: 1NF, 2NF, 3NF, BCNF

What This Concept Is

A functional dependency (FD) X -> Y on a relation R means: for any two tuples of R that agree on X, they also agree on Y. X determines Y.

Normal forms are a ladder of schema-shape rules, each eliminating a class of update anomalies (insertion, deletion, update anomalies) that result from storing the same fact in more than one place.

  • 1NF: every attribute value is atomic. No arrays, no "comma-separated lists," no repeating groups. This is the minimum to call something a relation.
  • 2NF: 1NF plus no non-key attribute depends on only part of a composite key. If the key is a single attribute, 2NF is automatic.
  • 3NF: 2NF plus no non-key attribute depends on another non-key attribute (no transitive dependencies through a non-key).
  • BCNF (Boyce-Codd): for every non-trivial FD X -> Y, X is a superkey. BCNF is strictly stronger than 3NF in rare cases; most schemas that are 3NF are also BCNF.

The discipline is: identify the FDs that hold in your domain, then decompose offending relations into smaller ones until the normal form you want holds. A good decomposition is lossless (joining the parts reproduces the original) and ideally dependency-preserving.

Why It Matters Here

  • every "update this field in 17 rows" problem is a normal-form violation
  • denormalization (Concept 8) only makes sense if you can recognize what you are denormalizing away from
  • foreign keys are meaningful only when the referenced relation satisfies at least 3NF on the referenced attribute
  • in interviews and design reviews, "is this at least in 3NF?" is a standard gate

Concrete Example

Unnormalized table (violates 3NF):

StudentCourse(student_id, student_name, student_email, course_code, course_title, instructor, grade)

FDs in the domain:

  • student_id -> student_name, student_email
  • course_code -> course_title, instructor
  • (student_id, course_code) -> grade

The first two FDs have non-key determinants (only the composite (student_id, course_code) is a key). Those are transitive dependencies, so the table is 2NF at best. It carries three different facts:

  • "student 42 is Alice Li": repeated in every one of Alice's rows
  • "course CS101 is taught by Prof X": repeated in every enrolled row
  • "Alice got a B in CS101": the actual enrollment fact

3NF decomposition:

CREATE TABLE student(
id INT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE
);
CREATE TABLE course(
code TEXT PRIMARY KEY,
title TEXT NOT NULL,
instructor TEXT NOT NULL
);
CREATE TABLE enrollment(
student_id INT REFERENCES student(id),
course_code TEXT REFERENCES course(code),
grade TEXT,
PRIMARY KEY (student_id, course_code)
);

Now each fact lives in exactly one place. Changing Alice's email is one UPDATE, not N.

Common Confusion / Misconception

"Normalization hurts performance." Sometimes, yes - joins cost something. But denormalized schemas are usually slower to maintain, buggier to update, and only faster on read if the denormalization pays off. Normalize first, then denormalize with measurements (Concept 8).

"1NF forbids nested data." Pragmatically yes in classic SQL; modern PostgreSQL supports jsonb and arrays. These are still atomic from SQL's viewpoint if you do not query their internals, but they quietly reintroduce update anomalies. Use them when the inside is genuinely opaque to your queries.

"3NF and BCNF are the same." Almost always, but not quite. BCNF can forbid a dependency-preserving decomposition in rare cases where a non-trivial FD has a non-superkey left side that happens to land inside a key. 3NF is the practical target; BCNF is the theoretical target.

How To Use It

For each relation:

  1. List every non-trivial FD that must hold in the real world.
  2. Compute the key(s) from the FDs (Armstrong's axioms are the formal method; usually you can do it by inspection).
  3. For each non-trivial FD X -> Y: is X a superkey? If no, the relation is not in BCNF. Check 3NF exception (every attribute of Y is part of some candidate key).
  4. Decompose into smaller relations, each of which holds a subset of the original attributes. Verify the decomposition is lossless.
  5. Re-derive FDs on the decomposed relations and confirm the target normal form.

Check Yourself

  1. Give an FD that holds on Employee(id, name, dept, manager) and an FD that does not.
  2. Why is Orders(order_id, customer_id, customer_email) not in 3NF? What is the decomposition?
  3. Name one update anomaly that 2NF eliminates and 1NF does not.

Mini Drill or Application

You are handed this schema (PostgreSQL):

CREATE TABLE booking(
booking_id SERIAL PRIMARY KEY,
guest_name TEXT, guest_email TEXT, guest_phone TEXT,
hotel_name TEXT, hotel_city TEXT, hotel_stars INT,
check_in DATE, check_out DATE,
nightly_rate NUMERIC, currency TEXT
);

(a) Identify the FDs; (b) show the table is not in 3NF; (c) produce a 3NF decomposition with DDL and foreign keys; (d) write one query over the original table and its equivalent over your decomposition.

Read This Only If Stuck