Module VII
Normalization
Chapter Outline
◼ 1 Informal Design Guidelines for Relational Databases
◼ 1.1Semantics of the Relation Attributes
◼ 1.2 Redundant Information in Tuples and Update Anomalies
◼ 1.3 Null Values in Tuples
◼ 1.4 Spurious Tuples
◼ 2 Functional Dependencies (FDs)
◼ 2.1 Definition of FD
◼ 2.2 Inference Rules for FDs
◼ 2.3 Equivalence of Sets of FDs
◼ 2.4 Minimal Sets of FDs
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter Outline
◼ 3 Normal Forms Based on Primary Keys
◼ 3.1 Normalization of Relations
◼ 3.2 Practical Use of Normal Forms
◼ 3.3 Definitions of Keys and Attributes Participating in Keys
◼ 3.4 First Normal Form
◼ 3.5 Second Normal Form
◼ 3.6 Third Normal Form
◼ 4 General Normal Form Definitions (For Multiple Keys)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
1 Informal Design Guidelines for
Relational Databases (1)
◼ What is relational database design?
◼ The grouping of attributes to form "good" relation
schemas
◼ Two levels of relation schemas
◼ The logical "user view" level
◼ The storage "base relation" level
◼ Design is concerned mainly with base relations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
1.1 Semantics of the Relation Attributes
◼ GUIDELINE 1: Informally, each tuple in a relation should
represent one entity or relationship instance. (Applies to
individual relations and their attributes).
◼ Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed in the
same relation
◼ Only foreign keys should be used to refer to other entities
◼ Entity and relationship attributes should be kept apart as
much as possible.
◼ Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of attributes
should be easy to interpret.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.1 A simplified COMPANY
relational database schema
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
1.2 Redundant Information in Tuples and
Update Anomalies
◼ Information is stored redundantly
◼ Wastes storage
◼ Causes problems with update anomalies
◼ Insertion anomalies
◼ Deletion anomalies
◼ Modification anomalies
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
EXAMPLE OF AN UPDATE ANOMALY
◼ Consider the relation:
◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Update Anomaly:
◼ Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this
update to be made for all 100 employees working
on project P1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
EXAMPLE OF AN INSERT ANOMALY
◼ Consider the relation:
◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Insert Anomaly:
◼ Cannot insert a project unless an employee is
assigned to it.
◼ Conversely
◼ Cannot insert an employee unless an he/she is
assigned to a project.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
EXAMPLE OF AN DELETE ANOMALY
◼ Consider the relation:
◼ EMP_PROJ(Emp#, Proj#, Ename, Pname,
No_hours)
◼ Delete Anomaly:
◼ When a project is deleted, it will result in deleting
all the employees who work on that project.
◼ Alternately, if an employee is the sole employee on
a project, deleting that employee would result in
deleting the corresponding project.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.3 Two relation schemas
suffering from update anomalies
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.4 Example States for EMP_DEPT and
EMP_PROJ
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Guideline to Redundant Information in
Tuples and Update Anomalies
◼ GUIDELINE 2:
◼ Design a schema that does not suffer from the
insertion, deletion and update anomalies.
◼ If there are any anomalies present, then note them
so that applications can be made to take them into
account.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
1.3 Null Values in Tuples
◼ GUIDELINE 3:
◼ Relations should be designed such that their
tuples will have as few NULL values as possible
◼ Attributes that are NULL frequently could be
placed in separate relations (with the primary key)
◼ Reasons for nulls:
◼ Attribute not applicable or invalid
◼ Attribute value unknown (may exist)
◼ Value known to exist, but unavailable
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
1.4 Spurious Tuples
◼ Bad designs for a relational database may result
in erroneous results for certain JOIN operations
◼ The "lossless join" property is used to guarantee
meaningful results for join operations
◼ GUIDELINE 4:
◼ The relations should be designed to satisfy the
lossless join condition.
◼ No spurious tuples should be generated by doing
a natural-join of any relations.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Spurious Tuples (2)
◼ There are two important properties of decompositions:
a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.
◼ Note that:
◼ Property (a) is extremely important and cannot be
sacrificed.
◼ Property (b) is less stringent and may be sacrificed.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
2.1 Functional Dependencies (1)
◼ Functional dependencies (FDs)
◼ Are used to specify formal measures of the
"goodness" of relational designs
◼ And keys are used to define normal forms for
relations
◼ Are constraints that are derived from the meaning
and interrelationships of the data attributes
◼ A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Functional Dependencies (2)
◼ X -> Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
◼ For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
◼ X -> Y in R specifies a constraint on all relation instances
r(R)
◼ Written as X -> Y; can be displayed graphically on a
relation schema as in Figures. ( denoted by the arrow: ).
◼ FDs are derived from the real-world constraints on the
attributes
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Examples of FD constraints (1)
◼ Social security number determines employee
name
◼ SSN -> ENAME
◼ Project number determines project name and
location
◼ PNUMBER -> {PNAME, PLOCATION}
◼ Employee ssn and project number determines
the hours per week that the employee works on
the project
◼ {SSN, PNUMBER} -> HOURS
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Examples of FD constraints (2)
◼ An FD is a property of the attributes in the
schema R
◼ The constraint must hold on every relation
instance r(R)
◼ If K is a key of R, then K functionally determines
all attributes in R
◼ (since we never have two distinct tuples with
t1[K]=t2[K])
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
2.2 Inference Rules for FDs (1)
◼ Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
◼ Armstrong's inference rules:
◼ IR1. (Reflexive) If Y subset-of X, then X -> Y
◼ IR2. (Augmentation) If X -> Y, then XZ -> YZ
◼ (Notation: XZ stands for X U Z)
◼ IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
◼ IR1, IR2, IR3 form a sound and complete set of
inference rules
◼ These are rules hold and all other rules that hold can be
deduced from these
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Inference Rules for FDs (2)
◼ Some additional inference rules that are useful:
◼ Decomposition: If X -> YZ, then X -> Y and X ->
Z
◼ Union: If X -> Y and X -> Z, then X -> YZ
◼ Psuedotransitivity: If X -> Y and WY -> Z, then
WX -> Z
◼ The last three inference rules, as well as any
other inference rules, can be deduced from IR1,
IR2, and IR3 (completeness property)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Inference Rules for FDs (3)
◼ Closure of a set F of FDs is the set F + of all FDs
that can be inferred from F
◼ Closure of a set of attributes X with respect to F
is the set X+ of all attributes that are functionally
determined by X
◼ X+ can be calculated by repeatedly applying IR1,
IR2, IR3 using the FDs in F
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
2.3 Equivalence of Sets of FDs
◼ Two sets of FDs F and G are equivalent if:
◼ Every FD in F can be inferred from G, and
◼ Every FD in G can be inferred from F
◼ Hence, F and G are equivalent if F + =G+
◼ Definition (Covers):
◼ F covers G if every FD in G can be inferred from F
◼ (i.e., if G+ subset-of F+)
◼ F and G are equivalent if F covers G and G covers F
◼ There is an algorithm for checking equivalence of sets of
FDs
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
2.4 Minimal Sets of FDs (1)
◼ A set of FDs is minimal if it satisfies the
following conditions:
1. Every dependency in F has a single attribute for
its RHS.
2. We cannot remove any dependency from F and
have a set of dependencies that is equivalent to
F.
3. We cannot replace any dependency X -> A in F
with a dependency Y -> A, where Y proper-
subset-of X ( Y subset-of X) and still have a set
of dependencies that is equivalent to F.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Minimal Sets of FDs (2)
◼ Every set of FDs has an equivalent minimal set
◼ There can be several equivalent minimal sets
◼ There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F of
FDs
◼ To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set
◼ E.g., see algorithms 11.2 and 11.4
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3 Normal Forms Based on Primary Keys
◼ 3.1 Normalization of Relations
◼ 3.2 Practical Use of Normal Forms
◼ 3.3 Definitions of Keys and Attributes Participating
in Keys
◼ 3.4 First Normal Form
◼ 3.5 Second Normal Form
◼ 3.6 Third Normal Form
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.1 Normalization of Relations (1)
◼ Normalization:
◼ The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into
smaller relations
◼ Normal form:
◼ Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular
normal form
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Normalization of Relations (2)
◼ 2NF, 3NF, BCNF
◼ based on keys and FDs of a relation schema
◼ 4NF
◼ based on keys, multi-valued dependencies :
MVDs;
◼ 5NF based on keys, join dependencies : JDs.
◼ Additional properties may be needed to ensure a
good relational design
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.2 Practical Use of Normal Forms
◼ Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
◼ The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
◼ The database designers need not normalize to the
highest possible normal form
◼ (usually up to 3NF, BCNF or 4NF)
◼ Denormalization:
◼ The process of storing the join of higher normal form
relations as a base relation—which is in a lower normal
form
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
◼ A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
◼ A key K is a superkey with the additional
property that removal of any attribute from K will
cause K not to be a superkey any more.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Definitions of Keys and Attributes
Participating in Keys (2)
◼ If a relation schema has more than one key, each
is called a candidate key.
◼ One of the candidate keys is arbitrarily designated
to be the primary key, and the others are called
secondary keys.
◼ A Prime attribute must be a member of some
candidate key
◼ A Nonprime attribute is not a prime attribute—
that is, it is not a member of any candidate key.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.2 First Normal Form
◼ Disallows
◼ composite attributes or nested relations
◼ multivalued attributes: attributes whose values
for an individual tuple are non-atomic.
◼ Considered to be part of the definition of relation
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.8 Normalization into 1NF
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.9 Normalization nested
relations into 1NF
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.3 Second Normal Form (1)
◼ Uses the concepts of FDs, primary key
◼ Definitions
◼ Prime attribute: An attribute that is member of the primary
key K
◼ Full functional dependency: a FD Y -> Z where removal
of any attribute from Y means the FD does not hold any
more
◼ Examples:
◼ {SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold
◼ {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Second Normal Form (2)
◼ A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on the primary key
◼ R can be decomposed into 2NF relations via the
process of 2NF normalization
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.10 Normalizing into 2NF and
3NF
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Figure 10.11 Normalization into 2NF and 3NF
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
3.4 Third Normal Form (1)
◼ Definition:
◼ Transitive functional dependency: a FD X -> Z
that can be derived from two FDs X -> Y and Y ->
Z
◼ Examples:
◼ SSN -> DMGRSSN is a transitive FD
◼ Since SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold
◼ SSN -> ENAME is non-transitive
◼ Since there is no set of attributes X where SSN -> X
and X -> ENAME
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Third Normal Form (2)
◼ A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively
dependent on the primary key
◼ R can be decomposed into 3NF relations via the process
of 3NF normalization
◼ NOTE:
◼ In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
◼ When Y is a candidate key, there is no problem with the
transitive dependency .
◼ E.g., Consider EMP (SSN, Emp#, Salary ).
◼ Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Normal Forms Defined Informally
◼ 1st normal form
◼ All attributes depend on the key
◼ 2nd normal form
◼ All attributes depend on the whole key
◼ 3rd normal form
◼ All attributes depend on nothing but the key
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
4 General Normal Form Definitions (For
Multiple Keys) (1)
◼ The above definitions consider the primary key
only
◼ The following more general definitions take into
account relations with multiple candidate keys
◼ A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
General Normal Form Definitions (2)
◼ Definition:
◼ Superkey of relation schema R - a set of attributes
S of R that contains a key of R
◼ A relation schema R is in third normal form (3NF)
if whenever a FD X -> A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter Outline
◼ Informal Design Guidelines for Relational
Databases
◼ Functional Dependencies (FDs)
◼ Definition, Inference Rules, Equivalence of Sets of
FDs, Minimal Sets of FDs
◼ Normal Forms Based on Primary Keys
◼ General Normal Form Definitions (For Multiple
Keys)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe