08_2_Database_Design_Normal_Form
08_2_Database_Design_Normal_Form
Normal Forms
Normal forms are used as a measure of the “goodness” of a relation. The higher the normal form,
fewer the redundancies, and fewer anomalies it has; and the better the relation is. The process of
deriving “good relations” in a relational database design is called Normalization. Normalized
relations have minimized data redundancies and hence minimized anomalies.
Initially, Edgar (Ted) F. Codd proposed three normal forms, which he called the First, Second, and
Third normal forms. A stronger definition of 3NF, called the Boyce-Codd Norm Form (BCNF) was
proposed later by Boyce and Codd. All these normal forms are based on the functional
dependencies among the attributes of a relation.
Later Fourth normal form (4NF) and the Fifth normal form (5NF) were proposed based on multi-
value dependencies and join dependencies respectively.
Each normal form defines its own set of requirements, and if a relation meets those requirements,
then it is in that normal form. Requirement rules are designed such that if a relation is in higher
norm form, it is guaranteed to be in its lower normal form.
Projected FDs
Consider the FD set for the XIT database.
Now suppose we want to find out what FDs apply to the “STUDENT” relation, we compute
projected FD on Student; let us say referred as Fs. The figure below depicts projected FDs for
STUDENT (Fs) and for PROGRAM (Fp).
F Fs Fp
studid → name studid → name progid → pname
studid → progid studid → progid progid → intake
studid → cpi studid → cpi progid → did
studid → progid studid → progid
studid → pname
studid → intake
studid → did
progid → pname
progid → intake
progid → did
progid → dname
did → dname
Computation of Projected FDs goes as follows:
Suppose you have a relation R and FD set F on R; let us say R is split into R1 and R2, FDs on R1
and R2 also need to be projected. Done as follows-
For every FD X Y on R, of X U R is a subset of R1 then X Y is projected on decomposed
relation R1
This is repeated for every FD in F for every decomposed relation
At the end we have sets of projected FDs on every decomposed relation.
Exercise ##: Are relations Student, Program, and Department in XIT database are in BCNF?
Exercise ##: Are relations in alternate relational schema for XIT database are in BCNF?
Company Database:
Employee(ENO, name, salary, DoB, SUPER_ENO, dno)
Department(dno, dname, MGR_ENO)
dept_locations(dno, dlocation)
Project(pno, pname, plocation, dno)
works_on(ENO, pno, hours)
dependent(ENO, dep_name, dep_bdate, relationship)
DA-ACAD Database:
Student(StudetID, StdName, ProgID, Batch, CPI)
Term(AcadYear, Semester)
Course(CourseNo, CourseName, Credit)
Faculty(FacultyID, FacultyName)
Offers(AcadYear, Semester, CourseNo, FacultyID)
Registers(StudetID, AcadYear, Semester, CourseNo, course_grade)
Result(StudetID, AcadYear, Semester, Semester_SPI, Semester_CPI)
Note that 3NF definition is relaxed with respect to BCNF. Therefore if a relation is BCNF, it is
automatically in 3NF?
Exercise #11: Suppose we have WORKS_ON as following:
WORKS_ON(ENO, PNo, PName, Hours)
FDs (suppose):
PNO → Pname
PName → PNo
{ENO, PNo} → Hours
{ENO, PName} → Hours
Keys: ?
Is it in BCNF?
Note that this definition of 2NF is relaxed with respect to 3NF. Therefore, if a relation is
3NF, it is automatically in 2NF
3NF We allow FD 𝑋 → 𝑌
Either X is Key OR Y is Prime Attribute
In other worlds: Every non-prime attribute is dependent only and only on
“Key” (transitively dependency is not acceptable)
Decomposition Requirements
Decomposition of a relation R into R1, R2, R3, … Rm implies, attributes of R spread in said m
relation schemas. Any arbitrary decomposition does not serve any purpose rather adds noice to the
data. A decomposition needs to comply following requirements-
• Attribute Preserving: Union of attributes of all decomposed relation should be equal to
attributes of R
• FD Preserving: A FD is said to be lost if its attributes are split into multiple relations. It is
desirable that we should not be losing any FD from minimal set; it is ok loosing inferred FD.
• Loss-less: JOIN of decomposed relation gives back original relation; suppose r1, r2, r3, ….
rm are instances of decomposed relation schemas, then r1 * r2 * r3 …. * rm should yield r
of R. [discussed below]
Lossless Decomposition:
As already stated, decomposition should guarantee r = r1 * r2 . When it does, then it is “loss-less
join decomposition” or simply “loss-less decomposition”.
Let us say R(A1,A2,A3,A4,A5,A6) getting decomposed into R1(A1,A2,A3,A4) and
R2(A4,A5,A6); that if r is relation instance of R, then instances of R1 and R2 will be computed as
following-
r1 πa1,a2,a3,a4(r), and
r2 πa4,a5,a6(r)
studid → name
studid →
prog_id
studid → cpi
studid → pid
pid → pname
pid → intake
pid → did
did → dname
πstudid,name,progid,cpi(rxit)
πprogid,pname,intake,did(rxit) πdid,dname(rxit)
It is called “loss-less” because natural join of decomposed relation results into original relation, that
is, there is no loss of “information”.
PNO PNAME
PNAME PNO
{PNO, ENO} HOURS
{PNAME, ENO} HOURS
Keys:
{PNO, ENO},
{PNAME, ENO}
Correct Decompositions are:
R1(PNO, PNAME), R2(PNO, ENO, HOURS) OR
R1(PNO, PNAME), R2(PNAME, ENO, HOURS)
PNO PNAME
PNAME PNO
{PNO, ENO} HOURS
{PNAME, ENO} HOURS
Keys:
{PNO, ENO},
{PNAME, ENO}
Correct Decompositions is:
R1(PNO, PNAME), R2(PNO, ENO, HOURS)
R1(PNO, PNAME), R2(PNAME, ENO, HOURS)
Have R1(ENO, FName) and R2(ENO, PNO, PName, HOURS), and projected FDs are
F1={ENO → FNAME} and F2 = {{ENO, PNO} → HOURS , PNO → PNAME }
We can prove that R1 is now in BCNF. But R2 is not; therefore we further apply the
algorithm on this, FD that violates the requirement is PNO → PNAME; compute closure of
PNO, we get {PNO, PName), so we decompose R2 into
Projected FDs are F21 = {PNO → PNAME } and F22 = {{ENO, PNO} → HOURS }
And, can prove that now R21 and R22 both are in BCNF?
AB R1(ABC)
AC Key: AF R2(CDE)
CD R(AF)
CE
In 3NF synthesis algorithm, sometimes we may get a relation that is subset of another. In such a
case subset relation can be dropped. For example, we have a relation
R1(ABC)
{A,B} C
CB Key: ABE R2(CB)
AD R3(AD)
R4(ABE)
If so,
How do you delete a phone number?
o Do we set null for the attribute in corresponding tuple? We cannot because the
attribute is part of key!
How do you add a new number?
o Do we update any of existing tuple (if some value null)?
How do you add a person not having email?
The problem is partially solved by making attributes independent, and has following tuples for
recording same facts-
For adding a phone number, we add phone number tuple for all emails of the user, and
updated relation becomes as following-
Moreover, you cannot add a person who does not have an email or phone?
There are still problem.
Hopefully you see redundancies here and hence anomalies.
This motivates us for higher normal form!
Basically the problem is due to a phenomenon called multi-value dependencies? Here you have
A person (given UUID), has multiple phone numbers?
A person (given UUID), has multiple emails?
You should be able to correlate this with functional dependencies; if for a given value of attribute
X, if we have a single value of Y retrieved from database then we say it is FD “X Y”; if we get a
set of values of Y, then it is Multi-Value Dependency or MVD and represented as
In our example, the relation R(UID, email, phone) with following MVDs: UID -->> email and
UID -->> phone; is not in 4NF because
MVDs are not trivial,
MVDs are not FDs and UID is not the key
Using above Algorithm can be decomposed as following-
Note following-
MVD should not be a problem in real cases, as most likely be get addressed by other FDs.
Consider example DNO, DNAME, DLocation, MGR_ENO; you have following FDs and
MVDs here-
DNO {DNAME, MGR_ENO}
DNO -->> DLocation
Being ignorant about MVD, if we decompose the relation using BCNF decomposition
algorithm, what we have is following-
R1(DNO, DNAME, MGR_ENO); FDs: {DNO DNAME, DNO MGR_ENO}
R2(DNO, DLocation}; FDs: None; MVD: DNO -->> DLocation
o Now we look at MVD; projected on R2; we find that it is trivial and has already been
taken care of.
o Both the relations are in 4NF.
Consider a relation R(ENO, PNO, HOUR) do you see MVDs
ENO -->> {PNO, HOUR}?
MVDs
ENO -->> {PNO, HOUR}
ENO -->> {DEP_NAME, DEP_RELATION}
FDs
{ENO, PNO} {HOUR}
{ENO, DEP_NAME} DEP_RELATION
Is it in BCNF? NO