Chapter 4
Introduction to Normalization
Normalization is a method used to validate and
improve a logical design so that it satisfies certain
constraints that avoid unnecessary duplication of
data.
The process of decomposing relations with anomalies
to produces smaller, well-structured relations.
Cont’d…
The normalization process, as first proposed by Codd
(1972a), takes a relation schema through a series of
tests to certify whether it satisfies a certain normal
form. The process, which proceeds in a top-down
fashion by evaluating each relation against the criteria
for normal forms and decomposing relations as
necessary, can thus be considered as relational design
by analysis.
Cont’d…
Initially, Codd proposed three normal forms, which
he called first, second,and third normal form. A
stronger definition of 3NF—called Boyce-Codd
normal form (BCNF)—was proposed later by Boyce
and Codd. All these normal forms are based on a
single analytical tool: the
Cont’d…
functional dependencies among the attributes of a
relation. Later, a fourth normal form (4NF) and a fifth
normal form (5NF) were proposed, based on the
concepts of multivalued dependencies and join
dependencies, respectively.
Cont’d…
Normalization of data can be considered a process of
analyzing the given relation schemas based on their
FDs and primary keys to achieve the desirable
properties of
minimizing redundancy and
minimizing the insertion, delet.ion, and update
anomalies
First Normal Form
First normal form (1NF) is now considered to be part of
the formal definition of a relation in the basic (flat)
relational model;
historically, it was defined to disallow multivalued
attributes, composite attributes, and their combinations.
It states that the domain of an attribute must include only
atomic(simple, indivisible) values and that the value of
any attribute in a tuple must be a single value from the
domain of that attribute.
Cont’d…
The only attribute values permitted by 1NF are single
atomic (or indivisible) values. Consider the
DEPARTMENT relation schema shown in above
Figure whose primary key is Dnumber, and suppose
that we extend it by including the Dlocations attribute
as shown in the figure.
Cont’d…
We assume that each department can have a number
of locations. The DEPARTMENT schema and a
sample relation state.
As we can see, this is not in 1NF because Dlocations
is not an atomic attribute, as illustrated by the first
tuple. There are two ways we can look at the
Dlocations attribute:
Cont’d…
(a). The domain of Dlocations contains atomic values,
but some tuples can have a set of these values. In this
case, Dlocations is not functionally dependent on the
primary key Dnumber.
(b). The domain of Dlocations contains sets of values
and hence is nonatomic.
In this case,Dnumber→Dlocations because each set is
considered a single member of the attribute domain.
Cont’d…
Remove the attribute Dlocations that violates 1NF and
place it in a separate relation DEPT_LOCATIONS
along with the primary key Dnumber of
DEPARTMENT. The primary key of this relation is
the combination {Dnumber, Dlocation}.
Expand the key so that there will be a separate tuple in
the original DEPARTMENT relation for each location
of a DEPARTMENT.
Second Normal Form
Second normal form (2NF) is based on the concept of
full functional dependency.
Functional dependency is a relationship that exists when
one attribute uniquely determines another attribute. If R is
a relation with attributes X and Y, a functional
dependency between the attributes is represented as X-
>Y, which specifies Y is functionally dependent on X.
Here X is termed as a determinant set and Y as a
dependant attribute.
Social Security Number determines employee name and
project number
SSN ENAME and Pnumber
Project Number determines project name and location
PNUMBER PNAME, PLOCATION
Cont’d…
Cont’d…
A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully
functionally dependent on the primary key.
R can be decomposed into 2NF relations via the
process of 2NF normalization.
Third Normal Form
A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute A in R is
transitively dependent on the primary key
Transitive functional dependency – if there a set of
attribute Z that are neither a primary or candidate key and
both X Z and Y Z holds.
Examples:
SSN DMGRSSN is a transitive FD since SSN
DNUMBER and DNUMBER DMGRSSN hold
SSN ENAME is non-transitive since there is no set of
attributes X where SSN X and X ENAME
Cont’d…
Cont’d…
The dependency Ssn→Dmgr_ssn is transitive through
Dnumber in EMP_DEPT
Dependencies Ssn → Dnumber and Dnumber →
Dmgr_ssn hold and Dnumber is neither a key itself
nor a subset of the key of EMP_DEPT.
Intuitively, we can see that the dependency of
Dmgr_ssn on Dnumber is undesirable in EMP_DEPT
since Dnumber is not a key of EMP_DEPT.
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X A holds in R, then X is a
superkey of R
Each normal form is strictly stronger than the previous
one:
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Reading Assignment.
Attention:- read about
BCNF (Boyce-Codd Normal Form
Fourth Normal Form
Fifth Normal form