UNIT-IV:
• SCHEMA REFINEMENT (NORMALIZATION) :
1) Purpose of Normalization or schema refinement.
2) concept of functional dependency.
3) normal forms based on functional
dependency(1NF, 2NF and3 NF)
4) concept of surrogate key, Boyce-codd normal
form(BCNF).
5) Lossless join and dependency preserving
decomposition.
6) Fourth normal form(4NF).
7) Fifth Normal Form (5NF).
1) Purpose of Normalization or schema
refinement.
NORMALIZATION:- Normalization is a schema
refinement process, It helps in removing
anomalies during insert, update and delete
operations.
Definition: “Normalization is a process of
decomposing relations to produce smaller and
well defined relations”.
NEED OF NORMALIZATION:-
Normalization is a refinement approach based
on decompositions.
Decompositions eliminates redundancy
storage and maintains data consistency.
Redundant storage of the information is the
root cause of this problems.
PROBLEMS CAUSED BY REDUNDENCY:-
Storing the information redundantly, that is
more than in one place with in a database can
lead to several problems.
By looking at the following table which is in
unnormalized form we can get the brief idea
on anomalies which occurs
STUDENT DETAILS:
S.NO S.NAME SUBJECT MARKS GRADE
1 VIKRANTH DBMS 80 A+
2 ABHISHEK CO 60 B
3 PRAVEEN JAVA 70 A
4 60 B
ABHISHEK FLAT
5 ABHISHEK ES 15 F
6 PRAVEEN DAA 70 A
REDUNDENT STORAGE:
Some information stored repeatedly is
known as a redundant storage.
In the above tables S.NO 2,4 and 3,6 had
repeated twice and marks 60,70 with there
grades A,B is also repeated, is an example of
redundant storage
UPDATE ANAMOLIES:
Definition: “ If one copy of such repeated data
is updated, an inconsistency is created unless
all copies are similarly up dated.”
When we tend to update the grade as B for 60
marks in the record of S.NO 2 then we need to
make the similar update for S.NO 4 otherwise
the data will be in inconsistent state
INSERTION ANAMOLIES: “It may not be
possible to store certain information unless
some other, unrelated, information is stored
as well “.
DELETION ANAMOLIES:
“It may not be possible to delete certain
information without loosing some other,
unrelated, information as well.”
When we want to delete the record SNO
5 ,the information marks 15 with F grade will
be deleted is not repeated in any other record.
DECOMPOSITION: “A decomposition of a relation
schema R consists of replacing the relation schema
by few (or more) relation schemas that each contain
a subset of the attributes of R and together include
all attributes in R”.
We can decompose the above STUDENT_DETAILS into
two relations.
STUDENT_DETAILS (SNO,SNAME,SUBJECT, MARKS, GRADE)
1. Student (sno,sname,subject,marks) 2. Marks(marks,grade)
STUDENT : MARKS:
S.NO S.NAME SUBJECT MARKS MARKS GRADE
1 VIKRANTH DBMS 80
80 A
2 ABHISHEK CO 60
70 B
3 PRAVEEN JAVA 70
4 VIKRANTH FLAT 60 60 C
5 ABHISHEK ES 15
15 F
6 PRAVEEN DAA 70
Now when we need to update the C grade as B with marks 60,it is
enough to update once in marks table.
2) CONCEPT OF FUNCTIONAL DEPENDENCY:
A functional dependency (FD) is a kind of IC
that generalizes the concept of the KEY.
Definition: “ Let R be a relation schema and Let
X and Y be nonempty sets of attributes in R.
We say that an instance r of R satisfies the
FD X Y if the following holds for every
pair of tuples t1 and t2 in r “.
If t1.X = t2.X, then t1.Y = t2.Y
An FD X Y essentially says that if two tuples agree on the values
in attributes X,
They must also agree on the values in attributes Y
EXAMPLE: relation R
A B C D
a1 b1 c1 d1
t1
a1 b1 c1 d2
t2
a1 b2 c2 d1
t3
a2 b1 c3 d1
t4
Relation R satisfies the FD AB -> C
EXAMPLE: relation R
A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1
a1 b1 c2 d1
If we add a tuple ( a1,b1,c2,d1) to the instance
the resultant instance would violate the FD
Above Relation R violates the FD AB C
X Y is read as
X functionally determines Y ,
or simply as
X determines Y
A primary key constraint is the special case of
the FD .
The attributes in the key play the role of X ,
and the set of all attributes in the relation
plays the role of Y.
A classic example of functional dependency is the employee
department.
Employee Department Department
Employee ID
name ID name
Human
0001 Abhinav 1
Resources
0002 Suresh 2 Marketing
Human
0003 Chandra 1
Resources
0004 Dinesh 3 Sales
An employee can only be a member of one department,
the unique ID of that employee determines the department.
Employee ID → Employee Name
Employee ID → Department ID
In addition to this relationship, the table also has
a functional dependency through a non-key
attribute
Department ID → Department Name
EXAMPLE
Employee Employee Name Salary City
number
1 Raju 50000 San Francisco
2 Ravi 38000 London
3 Kiran 25000 Tokyo
Employee number->Employee name.
Employee number->salary.
Employee number->city.
• NORMALIZATION
Normalization is used to minimize the
redundancy from a relation or set of relations.
It is also used to eliminate the undesirable
characteristics like Insertion, Update and
Deletion Anomalies.
Normalization divides the larger table into the
smaller table and links them using
relationship.
The normal form is used to reduce
redundancy from the database table.
Types of Normal Forms
4NF
• 1. First Normal Form –
• If a relation contain composite or multi-valued
attribute, it violates first normal form or a
relation is in first normal form if it does not
contain any composite or multi-valued attribute.
Definition: “A relation is in 1NF if it contains an
atomic value.”
(or)
“A relation is in first normal form if every
attribute in that relation is singled valued
attribute”.
A relation will be 1NF if it contains an atomic
value.
It states that an attribute of a table cannot
hold multiple values. It must hold only single-
valued attribute.
First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
Single Valued Attributes-
Single valued attributes are those attributes which can take
only one specific value for each entity.
Multi Valued Attributes-
Multi valued attributes are those attributes which can take more than
one value for a given entity from an entity set.
Here, the attributes “Mob_no” and “Email_id” are multi valued
attributes as they can take more than one values for a given entity
Example 1 – Relation STUDENT in table 1 is not
in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.
EXAMPLE-2
Student-id name courses
001 Suresh C1,C2
002 Swathi C1
003 Ramesh C2,C3
004 Ravikiran C3
005 Ratnakishore C2,C3
In the above table Courses is a multi valued attribute so it is not in 1NF.
EXAMPLE-2
Below Table is in 1NF as there is no multi valued attribute
Student-id name courses
001 suresh C1
001 suresh C2
002 swathi C!
003 ramesh C2
003 ramesh C3
004 ravikiran C3
005 ratnakishore C2
005 ratnakishore C3
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
Below Table is in 1NF as there is no multi valued attribute
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
• In the 2NF, relation must be in 1NF.
• all non-key attributes are fully functional
dependent on the primary key
Definition: “A relation schema R is said to be in
2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary
key”
FULL FUNCTIONAL DEPENDENCY
The set of attributes Y will be full
functionally dependent on the set of
attributes X if the following conditions are
satisfied:
i) Y is functionally dependent on X and
ii) Y is not functionally dependent on any subset
of X.
Example (Table violates 2NF)
<StudentProject>
StudentID ProjectID StudentName ProjectName
S89 P09 Siva Geo Location
S76 P07 Jacob Cluster
Exploration
S56 P03 Kishore IoT Devices
S92 P05 Ramesh Cloud
Deployment
• In the above table, we have partial
dependency; let us see how −
• The prime key attributes
are StudentID and ProjectID.
• As stated, the non-prime attributes
i.e. StudentName and ProjectName should be
functionally dependent on part of a candidate
key, to be Partial Dependent.
• The StudentName can be determined
by StudentID, which makes the relation Partial
Dependent.
• The ProjectName can be determined
by ProjectID, which makes the relation Partial
Dependent.
• Therefore, the <StudentProject> relation
violates the 2NF in Normalization.
• Example (Table converted to 2NF)
• To remove Partial Dependency and violation
on 2NF, decompose the above table into −
<StudentInfo>
And
<ProjectInfo>
Now the relation is in 2nd Normal form of Database Normalization
<StudentInfo>
<ProjectInfo>
StudentID StudentName ProjectID ProjectName
S89 Siva P09 Geo Location
P07 Cluster
S76 Jacob Exploration
S56 Kishore P03 IoT Devices
S92 Ramesh P05 Cloud
Deployment
Third Normal Form (3NF)
Definition: “A relation will be in 3NF if it is in 2NF
and not contain any transitive dependency”.
• 3NF is used to reduce the data duplication. It is
also used to achieve the data integrity.
• If there is no transitive dependency for non-
prime attributes, then the relation is in third
normal form.
FD’s
X Y and Y Z then X Z
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_PIN EMP_STATE EMP_CITY
222 Harry 201010 AP VIJAYAWADA
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
• Candidate key: {EMP_ID}
• Non-prime attributes: In the given table, all
attributes except EMP_ID are non-prime.
• Here, EMP_STATE & EMP_CITY dependent on
EMP_PIN and EMP_PIN dependent on
EMP_ID.
• The non-prime attributes (EMP_STATE,
EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third
normal form.
SOLUTION:
That's why we need to move the
EMP_CITY and EMP_STATE to the new
<EMPLOYEE_PIN> table, with EMP_PIN as a
Primary key.
EMPLOYEE table: EMPLOYEE_PIN table:
EMP_ID EMP_NAM EMP_PIN EMP_PIN EMP_STAT EMP_CITY
E E
222 Harry 201010 201010 UP Noida
333 Stephan 02228 02228 US Boston
444 Lan 60007 60007 US Chicago
555 Katharine 06389 06389 UK Norwich
666 John 462007 462007 MP Bhopal
BOYCE CODD NORMAL FORM (BCNF):
BCNF is the advance version of 3NF. It is
stricter than 3NF.
It is also known as 3.5 Normal form.
DEFINITION: “A table is in BCNF if every
functional dependency X → Y, X is the super
key of the table”.
For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
BCNF should satisfy the following two
conditions:
1) It should be in the Third Normal Form.
2) And, for any dependency X→ Y, X should be
a super key.
A super key is a set of attributes with in a
table whose values can be used to uniquely
identify a tuple.
A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called
a minimal superkey.
Example: Let's assume there is a company where
employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNT EMP_DEPT DEPT_TYPE EMP_DEPT_NO
RY
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are
as follows.
1) EMP_ID → EMP_COUNTRY
2)EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
If
EMP_DEPT_NO → EMP_DEPT
NOTE: The above FD violates BCNF because
EMP_DEPT_NO is not a super key .
To convert the given table into BCNF, we
decompose it into two tables:
EMP_COUNTRY table: EMP_DEPT table:
EMP_ID EMP_COU EMP_D DEPT_ EMP_D
NTRY EPT TYPE EPT_N
O
264 India Design D394 283
ing
264 India
Testing D394 300
Stores D283 232
Develo D283 549
ping
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
Now, this is in BCNF because left side part of
both the functional dependencies is a key.
Boyce-Codd Normal form is a stronger
generalization of third normal form. A table is
in Boyce-Codd Normal form if and only if at
least one of the following conditions are met
for each functional dependency X → Y:
• X is a super key
• It is a trivial functional dependency.
To determine the highest normal form of a
given relation R with functional
dependencies,
the first step is to check whether the BCNF
condition holds.
If R is found to be in BCNF, it can be safely
deduced that the relation is also in 3NF, 2NF
and 1NF as the hierarchy shows.
The 1NF has the least restrictive constraint – it
only requires a relation R to have atomic
values in each tuple..
The 3NF has more restrictive constraint than the
first two normal forms but is less restrictive
than the BCNF. In this manner, the restriction
increases as we traverse down the hierarchy
LOSSLESS JOIN AND DEPENDENCY PRESERVING
DECOMPOSITION.
• Decomposition of a relation is done when a
relation in relational model is not in
appropriate normal form.
• Relation R is decomposed into two or more
relations if decomposition is lossless join as
well as dependency preserving.
Lossless Join Decomposition
• If we decompose a relation R into relations R1
and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set,
following conditions must hold:
1) Union of Attributes of R1 and R2 must be equal to
attribute of R. Each attribute of R must be either in R1
or in R2.
Att(R1) U Att(R2) = Att(R)
2) Intersection of Attributes of R1 and R2 must not be
NULL. Att(R1) ∩ Att(R2) ≠ Φ
3) Common attribute must be a key for at least one
relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1)
or
Att(R1) ∩ Att(R2) -> Att(R2)
FOR EXAMPLE, A relation R (A, B, C, D) with
FD set{A->BC} is decomposed into R1(ABC)
and R2(AD) which is a lossless join
decomposition as:
1) First condition holds true as Att(R1) U Att(R2)
= (ABC) U (AD) = (ABCD) = Att(R).
2) Second condition holds true as Att(R1) ∩
Att(R2) = (ABC) ∩ (AD) ≠ Φ
3) Third condition holds true as Att(R1) ∩ Att(R2)
= A is a key of R1(ABC) because A->BC is given.
DEPENDENCY PRESERVING DECOMPOSITION
If we decompose a relation R into relations R1
and R2, All dependencies of R either must be a
part of R1 or R2 or must be derivable from
combination of FD’s of R1 and R2.
For Example: A relation R (A, B, C, D) with FD
set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving
because FD A->BC is a part of R1(ABC).
GATE Question: Consider a schema R(A,B,C,D)
and functional dependencies A->B and C->D.
Then the decomposition of R into R1(AB) and
R2(CD) is [GATE-CS-2001]
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless
join
• Answer: For lossless join decomposition,
these three conditions must hold true:
1) Att(R1) U Att(R2) = ABCD = Att(R)
2) Att(R1) ∩ Att(R2) = Φ, which violates the
condition of lossless join decomposition.
Hence the decomposition is not lossless.
For dependency preserving decomposition,
A->B can be ensured in R1(AB) and C->D can
be ensured in R2(CD). Hence it is dependency
preserving decomposition.
Fourth normal form(4NF).
Definition: A relation R is in 4NF if and only if
the following conditions are satisfied:
1) It should be in the Boyce-Codd Normal Form
(BCNF).
2) the table should not have any Multi-valued
Dependency.
• Conditions for MVD :
Any attribute say a multiple define another
attribute b;
• if any legal relation r(R), for all pairs of tuples
t1 and t2 in r, such that,
• t1[a] = t2[a] Then there exists t3 and t4 in r
such that.
• t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b] t1 = t4; t2 = t3 Then
multivalued (MVD) dependency exists.
To check the MVD in given table, we apply the
conditions stated above and we check it with
the values in the given table.
• Condition-1 for MVD –
• t1[a] = t2[a] = t3[a] = t4[a] Finding from table,
t1[a] = t2[a] = t3[a] = t4[a] = Geeks So, condition
1 is Satisfied.
• Condition-2 for MVD –
• t1[b] = t3[b] And t2[b] = t4[b] Finding from
table,
• t1[b] = t3[b] = MS And t2[b] = t4[b] = Oracle
So, condition 2 is Satisfied.
• Condition-3 for MVD –
• t1 = t4 And t2=t3 Finding from table,
• t1 = t4 = Reading And t2 = t3 = Music So,
condition 3 is Satisfied.
• conditions are satisfied, therefore,
• a --> --> b According to table we have got,
• name --> --> project And for,
• a --> --> C We get,
• name --> --> hobby Hence, we know that MVD
exists in the above table and it can be stated
by,
• name --> --> project --> --> hobby
DECOMPOSE THE TABLE INTO MULTIPLE TABLES
NAME PROJECT NAME HOBBY
GREEKS MICROSOFT GREEKS READING
GREEKS ORACLE GREEKS MUSIC
GREEKS MICROSOFT GREEKS MUSIC
GREEKS ORACLE GREEKS READING
SURROGATE KEY
DEFINITION - WHAT DOES SURROGATE
KEY MEAN?
• A surrogate key is a unique identifier used in
databases for a modelled entity or an object.
• It is a unique key whose only significance is to act
as the primary identifier of an object or entity and
is not derived from any other data in the database
and may or may not be used as the primary key.
• The usual surrogate key used is a unique
sequential number
• A Surrogate Key’s only purpose is to be a unique identifier in a
database, for example, incremental key.
Surrogate Key has no actual meaning and is used to represent
existence. It has an existence only for data analysis.
Example
<ProductPrice>
Key ProductID Price
505_92 1987 200
698_56 1256 170
304_57 1898 250
458_66 1666 110
Above, the surrogate key is Key in the
<ProductPrice> table.
Other Examples
• Counter can also be shown as Surrogate Key.
• System date/time stamp
• Random alphanumeric string.
FIFTH NORMAL FORM (5NF):
DEFINITION: A relation is in 5NF if it is in 4NF
and not contains any join dependency and
joining should be lossless.
5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
5NF is also known as Project-join normal form
(PJ/NF).
A JOIN DEPENDENCY (JD) can be said to exist
if the join of R1 and R2 over C is equal to
relation R.
Where, R1 and R2 are the decompositions R1(A,
B, C), and R2 (C,D) of a given relation R (A, B, C,
D).
Alternatively, R1 and R2 is a lossless
decomposition of R
Properties of 5NF:
A relation R is in 5NF if and only if it satisfies
following conditions:
1) R should be in 4NF (no multi-valued
dependency exists).
2) It cannot undergo lossless decomposition
(join dependency)
Example:
Consider the relation R below having the
schema R(supplier, product, consumer).
The primary key is a combination of all three
attributes of the relation.
Table-1 supplierproductconsumer
supplier product consumer
S1 P1 C1
S1 P2 C1
S2 P1 C1
S3 P3 C3
Table-2 Table-4
supplierproduct Table-3 consumerproduct
supplier product supplierconsumer
supplier consumer consumer product
S1 P1
S1 C1 C1 P1
S1 P2
S2 C1 C1 P2
S2 P1
S3 C3 C3 P3
S3 P3
Explanation:
Table 2, Table 3 and Table 4 when joined yield
the original table (Table 1).
Hence join dependency exists in Table 1,
therefore Table 1 is not in 5NF or PJNF.
However Table 2, Table 3 and Table 4 satisfy
5NF as it has no multivalued dependency and
cannot be decomposed further (join
dependency does not exists).