Normalisation in DBMS – 1NF 2NF 3NF BCNF and Functional Dependencies
What You Will Learn
- Functional dependencies: types and inference
- Armstrong’s axioms and attribute closure computation
- Candidate key finding using closure
- Normal forms: 1NF, 2NF, 3NF, BCNF with examples
- Lossless decomposition and dependency preservation
- GATE PYQs with complete solutions
GATE Weightage: 2–3 marks. Normalisation is the single highest-yield DBMS topic; questions appear every year.
1. Functional Dependencies
A functional dependency X → Y means: if two tuples have the same value for X, they must have the same value for Y. X functionally determines Y.
Types of FDs
| Type | Definition | Example |
|---|---|---|
| Trivial FD | Y ⊆ X (Y is a subset of X) | AB → A (always holds) |
| Non-trivial FD | Y ⊄ X | Eid → Name |
| Partial FD | X is a composite key; Y depends on only part of X | {Eid, Pid} → Name (only on Eid) |
| Full FD | Y depends on all of X; removing any attribute from X breaks the FD | {Eid, Pid} → Hours |
| Transitive FD | X → Y → Z where X is not a super key and Y is not a prime attribute | Eid → Did → DLocation |
2. Armstrong’s Axioms
Armstrong’s axioms are a complete and sound set of inference rules for deriving all functional dependencies logically implied by a given set F.
1. Reflexivity: If Y ⊆ X, then X → Y (trivial FDs)
2. Augmentation: If X → Y, then XZ → YZ
3. Transitivity: If X → Y and Y → Z, then X → Z
Derived Rules:
4. Union: If X → Y and X → Z, then X → YZ
5. Decomposition: If X → YZ, then X → Y and X → Z
6. Pseudotransitivity: If X → Y and WY → Z, then WX → Z
3. Attribute Closure & Candidate Keys
Computing X+ (Attribute Closure of X)
1. result = X
2. For each FD A → B in F:
if A ⊆ result, add B to result
3. Repeat step 2 until no change
4. X+ = result
Find A+:
Start: result = {A}
A→BC applies: result = {A, B, C}
BC→D applies: result = {A, B, C, D}
D→E applies: result = {A, B, C, D, E}
A+ = {A, B, C, D, E} = all attributes → A is a candidate key
Finding All Candidate Keys
A candidate key is a minimal set of attributes whose closure includes all attributes.
- Find attributes that appear only on LHS of FDs — they must be in every CK
- Find attributes that appear only on RHS of FDs — they cannot be in any CK
- Attributes on both sides or neither: may or may not be in CK (try combinations)
A never on RHS → must be in CK
A+ = {A, B, C, D} → A alone is a CK
Candidate keys: {A} only
4. First Normal Form (1NF)
A relation is in 1NF if every attribute has atomic (indivisible) values — no multi-valued attributes, no repeating groups, no nested tables.
STUDENT(Sid, Name, PhoneNumbers{p1, p2, p3})
In 1NF:
STUDENT(Sid, Name)
STUDENT_PHONE(Sid, Phone)
5. Second Normal Form (2NF)
A relation is in 2NF if it is in 1NF AND every non-prime attribute is fully functionally dependent on the entire primary key (no partial dependencies).
Partial dependency: A non-prime attribute depends on only a part of the composite primary key.
WORKS_ON(Eid, Pid, Hours, EmpName, ProjName)
PK = {Eid, Pid}
Eid → EmpName (partial — EmpName depends only on Eid, not full PK)
Pid → ProjName (partial)
Decomposed to 2NF:
EMPLOYEE(Eid, EmpName)
PROJECT(Pid, ProjName)
WORKS_ON(Eid, Pid, Hours)
6. Third Normal Form (3NF)
A relation is in 3NF if it is in 2NF AND there are no transitive dependencies (no non-prime attribute depends on another non-prime attribute).
Formal definition of 3NF: For every non-trivial FD X → Y, either:
- X is a super key, OR
- Y is a prime attribute (part of some candidate key)
EMPLOYEE(Eid, Name, Did, DLocation)
Eid → Did (Did depends on PK — OK)
Did → DLocation (DLocation depends on Did, not directly on Eid — TRANSITIVE)
Decomposed to 3NF:
EMPLOYEE(Eid, Name, Did)
DEPARTMENT(Did, DLocation)
7. Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if for every non-trivial FD X → Y, X must be a super key. BCNF is strictly stronger than 3NF.
| Property | 3NF | BCNF |
|---|---|---|
| Condition for X → Y (non-trivial) | X is superkey OR Y is prime attribute | X must be a superkey (always) |
| Lossless decomposition | Always possible | Always possible |
| Dependency preservation | Always possible | Not always possible |
| Redundancy | Some redundancy may remain | Eliminates all FD-based redundancy |
TEACHING(Student, Course, Teacher)
FDs: {Student, Course} → Teacher; Teacher → Course
Candidate keys: {Student, Course}, {Student, Teacher}
Teacher → Course: Teacher is NOT a super key (not in any CK alone); Course IS a prime attribute
→ Satisfies 3NF (Y is prime) but violates BCNF (X is not a super key)
BCNF decomposition:
TC(Teacher, Course) — Teacher → Course is now a super key
ST(Student, Teacher)
Note: FD {Student, Course} → Teacher cannot be preserved in this decomposition!
8. Decomposition Properties
Lossless-Join Decomposition
Decomposition of R into {R1, R2} is lossless iff the common attributes form a key for R1 or R2:
Equivalently: R1 ∩ R2 is a super key of R1 or R2
Decompose into R1(A, B, C) and R2(B, C, D).
R1 ∩ R2 = {B, C}. Is BC a key of R1 or R2?
BC+ = {B, C, D} ≠ {A, B, C} — BC is not a key of R1
BC+ covers {B, C, D} = all of R2 — BC IS a key of R2
Decomposition is lossless.
Dependency-Preserving Decomposition
A decomposition preserves dependencies if every FD in F+ can be enforced using only the projected FDs of R1 and R2 (without joining). Check: every FD in the original set can be derived from FDs restricted to individual relations.
BCNF decomposition: always lossless, but may NOT preserve dependencies
3NF decomposition: always both lossless AND dependency-preserving
If asked which normal form guarantees both properties → answer is 3NF.
9. GATE Examples
GATE 2015: Relation R(A, B, C, D, E) with FDs: A→BC, CD→E, B→D, E→A. Find the candidate keys.
View Solution
Try A: A+ = A → {A,B,C} → B→D: {A,B,C,D} → CD→E: {A,B,C,D,E} = all. A is a CK.
Try E: E→A, then A+: E → A → BC → D → E. E+ = all. E is a CK.
Try BC: BC+ → B→D: {B,C,D} → CD→E: {B,C,D,E} → E→A: {A,B,C,D,E}. BC is a CK.
Try CD: CD+ → CD→E: {C,D,E} → E→A: {A,C,D,E} → A→BC: {A,B,C,D,E}. CD is a CK.
Candidate keys: {A}, {E}, {BC}, {CD}
GATE 2017: Relation R(P, Q, R, S, T) with FDs: P→QR, RS→T, Q→S, T→P. Which of the following is NOT a candidate key?
View Solution
Test P: P→QR, Q→S: {P,Q,R,S}, RS→T: {P,Q,R,S,T}, T→P: confirmed. P+ = all. P is CK.
Test T: T→P, then P+: all. T is CK.
Test Q: Q→S, can Q determine P or R? Q doesn’t determine P or R directly. Q+ = {Q,S} ≠ all. Q is NOT a CK.
Q is not a candidate key.
GATE 2019: Relation R(A, B, C) with FDs: A→B, B→C, C→A. How many candidate keys does R have?
View Solution
A+ = {A,B,C} = all → A is CK.
B+ = {B,C,A} = all → B is CK.
C+ = {C,A,B} = all → C is CK.
All single attributes are candidate keys. AB, AC, BC, ABC are super keys but not minimal.
Answer: 3 candidate keys — {A}, {B}, {C}
Since all attributes are prime, there are no partial or transitive dependencies → R is in BCNF.
10. Common Mistakes
- Confusing 3NF and BCNF conditions: 3NF allows X → A where A is prime (even if X is not a super key). BCNF requires X to always be a super key with NO exceptions. A relation satisfying 3NF may fail BCNF.
- Assuming single-attribute PK means 3NF: A single-attribute PK means no partial dependencies (automatically in 2NF), but transitive dependencies can still exist, violating 3NF.
- Missing candidate keys: Always check all possible minimal attribute combinations, not just the “obvious” primary key. A relation can have multiple candidate keys.
- Closure computation errors: Apply FDs repeatedly until no more attributes can be added. Students often stop after one pass; apply ALL FDs in each round until the closure stabilises.
- Lossless test: The common attributes (R1 ∩ R2) must form a key for AT LEAST ONE of the two relations, not both.
11. Frequently Asked Questions
What is the difference between 3NF and BCNF?
Both address the same issue (FD-based anomalies) but at different strictness. 3NF allows X → A if A is a prime attribute, even when X is not a super key. BCNF requires X to always be a super key — no exceptions. Every BCNF relation is in 3NF. BCNF may sacrifice dependency preservation; 3NF guarantees both lossless and dependency-preserving decomposition, making it the practical choice for most applications.
How do you find the closure of a set of attributes?
Start with result = X. Scan all FDs: if the entire left side is in result, add the right side to result. Repeat until no new attributes are added. This is X+. Use it to: test if X is a super key (X+ = all attributes), find implied FDs (A → B holds iff B ⊆ A+), find minimal keys (try subsets of candidate key candidates).
What is lossless decomposition in database normalisation?
A decomposition is lossless if joining the pieces always gives back the original relation with no spurious (extra, incorrect) tuples. Test for binary decomposition: (R1 ∩ R2)+ must include all attributes of R1 OR all attributes of R2 — meaning the common attributes form a key of at least one sub-relation. Lossy decompositions generate spurious tuples that weren’t in the original data.
What is Armstrong’s axioms in DBMS?
Armstrong’s axioms are three inference rules for FDs: Reflexivity (supersets determine subsets), Augmentation (add same attributes to both sides), and Transitivity (chain dependencies). They are sound (only derive valid FDs) and complete (can derive all implied FDs). Derived rules like Union and Decomposition make computation faster. Used to find F+ (all implied FDs) and to test equivalence of two sets of FDs.