Normalization
Normalization is defined as organizing data
so as to reduce unnecessary data
redundancy and to preserve information.
A normal form is a measure of quality of
design of a relation schema and hence of a
relational database.
1
Normal forms & Normal tests
The normal form of relation schema is the
highest normal form satisfied by the
schema.
There are various normal forms and normal
tests namely,
First normal form, Second normal form,
Third normal form, Boyce-Codd normal form
and tests to verify whether a relation
schema is in a desired normal form.
2
First normal form (1NF)
A relation schema is said to be in first
normal form if all its attributes are atomic.
3
A schema which is not in first normal form
Dlocations is a
multi-valued attribute.
4
Conversion into first normal form
The above relation schema is decomposed into two relation schemas.
Now DEPARTMENT & DEPT_LOCATIONS are in 1NF.
5
Conversion into first normal form
Alternative technique to decomposition also exists.
You may expand the primary key incorporating the
Multi-valued attribute into the primary key.
This solution has a
disadvantage of introducing
redundancy in the relation.
6
Conversion into first normal form
If maximum number of values of multi-valued
attribute is known then you may replace the
multi-valued attribute by a number of attributes.
In the example, instead of using Dlocations, you
may use three attributes, namely
Dlocation1, Dlocation2, Dlocation3
assuming that the maximum number of values of
Dlocations can be three.
This solution has the disadvantage of introducing NULL values if most
departments have fewer than three locations. It further introduces spurious
semantics about the ordering among the location values that is not
originally intended. Querying on this attribute becomes more difficult. 7
Multi-valued attribute replaced
DEPARTMENT
Dname Dnumber Dmgr_ssn Docation1 Dlocation2 Dlocation3
Research 5 333445555 Bellaire Sugarland Houston
Administration 4 987654321 Stafford
Headquaters 1 888665555 Houston
8
Conversion into first normal form
First normal form does not allow complex attribute too.
9
Conversion into first normal form
Decompose
into two relations
10
Multiple multi-valued attributes
This relation is NOT in 1NF
and so
decompose this relation into two relations, namely
Presence of multiple complex attributes can be dealt with in a similar fashion.
Firstly, the composite part of the complex attribute is replaced by its components
and multi-valued aspect of each component is dealt with in the above manner.
11
Second normal form (2NF)
A relation schema R is said to be in second normal
form if
(i) it is in first normal form and
(ii) there is no partial dependency on primary
key of R.
12
Example
FD1: {Ssn, Pnumber} {Hours} (It is a full functional dependency.)
FD2: {Ssn} {Ename} (It is a partial functional dependency.)
FD3: {Pnumber} {Pname, Plocation} (It is a partial functional dependency.)
EMP_PROJ is already in 1NF because all its attributes are atomic.
But it is NOT in 2NF because of the partial dependencies – FD2 and
FD3.
13
Decomposition into 2NF
In order to reduce the schema EMP_PROJ into 2NF,
we decompose it with respect to partial functional
dependency.
Decomposition with respect to
{Ssn} {Ename} results in
R1(Ssn, Ename) &
R2(Ssn, Pnumber, Hours, Pname, Plocation).
R1 is in 2NF but R2 is not because of partial
dependency {Pnumber} {Pname, Plocation}
So decompose R2 with respect to
{Pnumber} {Pname, Plocation}
14
Decomposition into 2NF
The decomposition of R2 with respect to {Pnumber}
{Pname, Plocation} results in
R3(Pnumber, Pname, Plocation} &
R4(Ssn, Pnumber, Hours)
So decomposition of EMP_PROJ with respect to
partial dependency results in three relation
schemas namely,
R1(Ssn, Ename), R3(Pnumber, Pname, Plocation} &
R4(Ssn, Pnumber, Hours).
All these relations are in 2NF. (In fact these are in
higher normal forms than 2NF.)
15
Decomposed relation schemas of EMP_PROJ
R1 R3
R4
16
Decomposition rule
If you want to decompose a relation R with
respect to a functional dependency X Y then
one of the relations will be R1 = X U Y and the
other will be R2 = R – Y.
The primary key of R1 is X and the primary key
of R2 is the primary key of R.
17
Third normal form (3NF)
A relation schema R is said to be in third normal
form if
(i) it is in second normal form and
(ii) there is no transitive dependency in R.
18
Example
Consider the relation schema
This schema is in 2NF because all its attributes are atomic
and there is no partial dependency.
But it is not in 3NF because of the transitive dependency
{Dnumber} {Dname, Dmgr_ssn}
19
Decomposition
So decompose the schema with respect to
transitive dependency.
After decomposition one relation is
R1(Dnumber, Dname, Dmgr_ssn)
and the other is
R2(Ename, Ssn, Address, Dnumber).
R1 and R2 are in 3NF.
20
Boyce-Codd normal form (BCNF)
A relation schema is said to be in BCNF if
(i) it is third normal form and
(ii) key attribute does not depend on non-key
attribute.
FD1: AB C
FD2: C B
This relation schema is NOT in BCNF because of the dependency C B.
Decompose R into R1(C, B) & R2(A, C)
21
Another example
Consider another relation schema
TEACH(Student#, Course#, Instructor#) and
Suppose that Instructor# Course#
This functional dependency means that an
instructor can teach at the most one course.
Since Course# is a key attribute hence the
schema TEACH is NOT in BCNF.
Decomposition of TEACH with respect to
Instructor# Course# results in
R1(Instructor#, Course#) & R2(Student# , Instructor#)
22
Exercise problem 1
Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesperson#, Commission%,
Discount_amt).
Assume that a car may be sold by multiple
salespeople, and hence {Car#, Salesperson#} is
the primary key.
Additional dependencies are Date_sold → Discount_amt,
Salesperson# → Commission%.
Based on the given primary key, is this relation in 1NF, 2NF,
or 3NF? Why or why not? How would you successively
normalize it completely?
Ans. R1(Salesperson#, Commission%), R2(Date_sold, Discount_amt),
R3(Car#, Date_sold, Salesperson#)
23
Exercise problem 2
Consider the following relation for published books:
BOOK (Book_title, Author_name, Book_type, List_price,
Author_affil, Publisher).
Author_affil refers to the affiliation of author.
Suppose that the following dependencies exist:
Book_title → Publisher, Book_type
Book_type → List_price
Author_name → Author_affil.
What normal form is the relation in? Explain your answer.
Apply normalization until you cannot decompose the
relations further. State the reasons behind each
decomposition.
Ans. R1((Book_title, Book_type, Publisher), R2(Author_name, Author_affil),
R3(Book_title, Author_name, List_price)
24
Exercise problem 3
Consider the following relation:
CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date,
Option_discountedprice).
This relation refers to options installed in cars (e.g., cruise control)
that were sold at a dealership, and the list and discounted prices of
the options.
Let the following additional functional dependencies hold on the
relation: Car_id → Sale_date, Option_type → Option_listprice.
What normal form is the relation in? Explain your answer.
Apply normalization until you cannot decompose the relations
further. State the reasons behind each decomposition.
Ans. R1(Car_id, Sale_date), R2(Option_type, Option_listprice),
R3(Car_id, Option_type, Option_discountedprice)
25
Exercise problem 4
Consider the universal relation
R = {A, B, C, D, E}
and the set of functional dependencies
F = {ABC, ADE}.
What is the key for R? Decompose R into 2NF and
then 3NF relations.
Ans. Key = AB
D = {R1(A, D, E), R2(A, B, C)}
26
Exercise problem 4a
Consider the universal relation
R = {A, B, C, D, E, G, H, I, J, K}
and the set of functional dependencies
F = {ABC, ADE, BK, KGH, D IJ}.
What is the key for R? Decompose R into the
highest possible normal form.
Ans. Key: AB
D = {R1(A, D, E), R2(B, K), R3(A, B, C, G, H, I, J, K)}
27
Exercise problem 5
Consider the universal relation
R = {A, B, C, D, E, F, G, H, I, J, K}
and the set of functional dependencies
F = {AB C, BD EF, AD GH, A I, HJ}.
What is the key for R? Decompose R into the
highest possible normal form.
Ans. Key: ABD
D = {R1(A, B, C), R2(B, D, E, F), R3(A, D, G, H), R4(A, I),
R5(A, B, D, I, J, K)}.
28
Exercise problem 6
Consider a relation R with five attributes ABCDE. You are
given the following dependencies:
A → B, BC → E, and ED → A.
i) List all keys for R.
ii) Is R in 3NF?
iii) Is R in BCNF?
Decompose the schema into the highest possible normal
form, if required.
Ans. The keys are K1 = CED & K2 = BCD.
D = {R1(A, B), R2(D, E, A), R3(A, C)}
29
Partial Solution 6
A+ = AB ≠ R
(BC)+ = BCE ≠ R
(ED)+ = EDAB ≠ R
(ABC)+ = ABCE ≠ R
(AED)+ = AEDB ≠ R
(BCED)+ = R so BCED is a super key of R
(CED)+ = CEDAB = R and (CE)+ ≠ R, (CD)+ ≠ R.
So CED is a key of R.
Similarly, show that BCD is also a key of R.
30
Exercise problem 6
Consider the attribute set R = ABCDEGH and the FD set
F = {AB → C, AC → B, AD → E, B → D, BC → A, E → G}.
i) List all keys for R.
ii) Is R in 3NF?
iii) Is R in BCNF?
Decompose the schema into the highest possible normal
form, if required.
Ans. The keys are: ABEH, ABDH, ACH.
D = {R1(A, B, C), R2(A, D, E), R3(A, B, G, H)}
31