UNIT-IV
Schema refinement and
Normalization
Introduction to Schema Refinement
• Schema refinement is generally used for refining
or polishing tables.
• It is the last step before physical design.
1) Requirement analysis : user needs
2) Conceptual design : high-level description, often
using E/R diagrams
3) Logical design : from graphs to tables (relational
schema)
4) Schema refinement : checking tables for
redundancies and anomalies
▪ Data redundancy occurs when the same piece
of data is stored in two or more separate
places.
▪ This problem arises when a database is not
normalized.
▪ Problems caused due to redundancy are:
Insertion anomaly, Deletion anomaly, and
Updation anomaly.
Example:Student
In this table student id and course are
primary keys
• Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for student.
• Deletion Anomaly –
If the details of students in this table is deleted then the details of college will also
get deleted.
• Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over
the database which will be time-consuming and computationally costly.
Functional Dependency:
▪ Functional Dependency (FD) is a new
constraint that determines the relation of one
attribute to another attribute in a Database
Management System (DBMS).
▪ Functional Dependency helps to maintain the
quality of data in the database.
▪ Introduced by E. F. Codd, it helps in preventing
data redundancy and gets to know about bad
designs.
▪ A functional dependency is denoted by an
arrow "→".
▪ The functional dependency of attrbute X on Y
is represented by X → Y.
Types of Functional Dependency
Functional Dependency has three forms −
• Trivial Functional Dependency
• Non-Trivial Functional Dependency
• Completely Non-Trivial Functional
Dependency
Example table:
DeptId DeptName
001 Finance
002 Marketing
003 HR
Trivial Functional Dependency
• It occurs when B is a subset of A in −
• A ->B
Non –Trivial Functional Dependency
• It occurs when B is not a subset of A in −
• A ->B
Completely Non - Trivial Functional Dependency
• It occurs when A intersection B is null in −
• A ->B
Reasoning about FDs:
Armstrong’s Axioms Property of Functional Dependency
• Armstrong’s Axioms property was developed by William
Armstrong in 1974 to reason about functional
dependencies.
• The property suggests rules that hold true if the following
are satisfied:
• Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
• Reflexivity
A-> B, if B is a subset of A.
• UNION
• If A->B and A->C then A->B,C i.e. union property
• Augmentation
The last rule suggests: AC->BC, if A->B
Normalization
• Normalization is the process of organizing the data in
the database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly.
• It was proposed by Edger F Codd as part of his
relational data base.
• If the relation or table is having redundant data then it
is necessary to normalize the data by arranging it
properly in database.
• Needs to reduce redundancy and improves integrity.
• Types of Normal Forms
• There are the six types of normal forms:
• Normal Form Description
• 1NFA relation or table is in 1NF if it contains an atomic value.
• 2NFA relation will be in 2NF if it is in 1NF and all non-key attributes
are fully functional dependent on the primary key.
• 3NFA relation will be in 3NF if it is in 2NF and no transition
dependency exists.
• BCNF:Boyce-codd normal form. It should be in the Third Normal
Form and for any dependency A → B, A should be a super key.
• 4NFA relation will be in 4NF if it is in Boyce Codd normal form and
has no multi-valued dependency.
• 5NFA relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
• First normal form (1NF)
• As per the rule of first normal form, an
attribute (column) of a table cannot hold
multiple values. It should hold only atomic
values.
Example:
• Example-1:Student
Relation STUDENT in table 1 is not in 1NF
because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.
• Example-2:
ID Name Courses ------------------
1 A c1, c2
2 E c3
3 M C2, c3
To Convert this
ID Name Course ------------------
• 1 A c1
• 1 A c2
• 2 E c3
• 3 M c2
• 3 M c3
Second normal form (2NF)
A table is said to be in 2NF if both the following
conditions hold:
• Table is in 1NF (First normal form)
• 2NFA relation will be in 2NF if it is in 1NF and all non-
key attributes are fully functional dependent on the
primary key.
Example: Suppose a school wants to store the data of
teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach
more than one subjects, the table can have multiple
rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values.
However, it is not in 2NF because non prime attribute teacher_age is dependent on
teacher_id alone which is a proper subset of candidate key.
• To make the table complies with 2NF we can
decompose or break it in two tables like this:
teacher_details table:
• teacher_id teacher_age
• 111 38
• 222 38
• 333 40
Teacher id->teacher age
• teacher_subject table:
• teacher_id subject
• 111 Maths
• 111 Physics
• 222 Biology
• 333 Physics
• 333 Chemistry
Third Normal Form (3NF):
A relation is in third normal form, if there is no
transitive dependency for non-prime
attributes as well as it is in second normal
form.
• A relation is in 3NF if at least one of the
following condition holds in every non-trivial
function dependency X –> Y:
• X is a super key.
• Y is a prime attribute (each element of Y is
part of some candidate key).
Example:
EMPLOYEE_DETAILS table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
• Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP
_ZIP}....so on
• primary key: {EMP_ID}
• Non-prime attributes: In the given table, all attributes
except EMP_ID are non-prime.
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and
EMP_ZIP dependent on EMP_ID.
• Emp_zip->emp_state emp_zip->emp_city
• Emp_id->emp_zip
• The non-prime attributes (EMP_STATE, EMP_CITY) are
transitively dependent on super key(EMP_ID). It violates
the rule of third normal form.
• EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
Emp-id->emp-name
Emp-id->emp-zip
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
BCNF:
▪ Boyce-Codd Normal Form or BCNF is an
extension to the third normal form, and is also
known as 3.5 Normal Form.
• Boyce-Codd Normal Form, it should satisfy the
following two conditions:
• It should be in the Third Normal Form.
• And, for any dependency A → B, A should be
a super key.
• For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
• Example:
• Below we have a college enrolment table with
columns student_id, subject and professor.
• Student-id,subject-candidate keys (primary
keys) student subject professor
_id
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java
• This table satisfies the 1st Normal form because
all the values are atomic, column names are
unique and all the values stored in a particular
column are of same domain.
• This table also satisfies the 2nd Normal Form as
their is no Partial Dependency.
• And, there is no Transitive Dependency, hence
the table also satisfies the 3rd Normal Form.
• But this table is not in Boyce-Codd Normal Form.
• In the table above, student_id, subject form
primary key, which means subject column is
a prime attribute.
• (Student_id,subject)->professor
• But, there is one more
dependency, professor → subject.
• And while subject is a prime
attribute, professor is a non-prime attribute,
which is not allowed by BCNF.
• To make this relation(table) satisfy BCNF, we
will decompose this table into two
tables, student table and professor table.
• Below we have the structure for both the
tables.
• Student Table
student_id p_id
101 1
101 2
And, Professor Table
p_id professor subject
1 P.Java Java
2 P.Cpp C++
Hence this relation satisfy Boyce-Codd Normal Form
(Pid,professor)->subject
Fourth Normal Form:
▪Fourth Normal Form comes into picture
when Multi-valued Dependency occur in any
relation.
▪Needs to remove Multi-valued Dependency and
how to make any table satisfy the fourth normal
form.
Rules for 4th Normal Form
• For a table to satisfy the Fourth Normal Form,
it should satisfy the following two conditions:
• It should be in the Boyce-Codd Normal Form.
• And, the table should not have any Multi-
valued Dependency.
What is Multi-valued Dependency?
A table is said to have multi-valued dependency, if the
following conditions are true,
• For a dependency A → B, if for a single value of A,
multiple value of B exists, then the table may have multi-
valued dependency.
• Also, a table should have at-least 3 columns for it to have
a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other.(A->B,A->c then A->BC)
• If all these conditions are true for any relation(table), it is
said to have multi-valued dependency.
• Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
• The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.
• In the STUDENT relation, a student with STU_ID, 21 contains
two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition
of data.
• So to make the above table into 4NF, we can
decompose it into two tables:
• STUDENT_COURSE
• Stu-id->course
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
• STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
What is Join Dependency?
• If a table can be recreated by joining multiple tables and each of this
table have a subset of the attributes of the table, then the table is in
Join Dependency.
• It is a generalization of Multi-valued Dependency
• Join Dependency can be related to 5NF, wherein a relation is in 5NF,
only if it is already in 4NF and it cannot be decomposed further.
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not
contains any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Computer Anil Sem 1
Computer Jai Sem 1
Math Jai Sem1
Math Akash Sem2
Chemistry Pranay Sem 1
So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Sem 1 Computer
Sem 1 Math
Sem1 Chemistry
Sem2 Math
P2 SUBJECT LECTURER
Computer Anil
Computer Jai
Math Jai
Math Akash
Chemistry Pranay
P3
SEMSTER LECTURER
Semester 1 Anil
Semester 1 Jai
Semester 1 Jai
Semester 2 Akash
Semester 1 Pranay
Properties of decomposition:
What is decomposition?
• Decomposition is the process of breaking down in parts
or elements.
• It replaces a relation or table with a collection of
smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that
the information in the original relation can be
accurately reconstructed based on the decomposed
relations.
• If there is no proper decomposition of the relation,
then it may lead to problems like loss of information.
Properties of Decomposition
Following are the properties of Decomposition
1.Attribute preservation
If a relation ‘R’ is decomposed in to D(r1,r2,r3)
where D is known as decomposition ,if the
attributes in R appears in any of decomposed
tables then it is known as attribute preservation.
2.No Redundancy:
Decomposition is mainly used to reduce
redundancy,anamolies
3.Lossless Join:
When you decompose a relation into smaller tables
and when you reconstruct the original table by
joining the smaller tables with out any loss of
information.
4.Non additive Join:
Reconstructed table should not have additional
tuples or attributes
5.Dependency preservation:
Reconstructed table should have functional
dependency like A->B
Dependency Preservation:
• Dependency is an important constraint on the
database.
• Every dependency must be satisfied by at least
one decomposed table.
• If {A → B} holds, then two sets are functional
dependent. And, it becomes more useful for
checking the dependency easily if both sets in
a same relation.
Example
▪ Relation ‘R’ or table is having functional
dependency as A->B
▪ Now R is decomposed in to smaller tables
D(R1,R2,R3) where D is decomposition
▪ If the functional dependency A->B is found in
R1 and R3 then it is known as dependency
preservation.
Lossless design:
▪ Lossless design is one of the properties of
decomposition.
▪ If a relation R is divided in to smaller tables
D(r1,r2,r3) when you join D in order to
reconstruct the original table R with out any
loss of information then we can call it as
lossless design.
Example:
Relation R
Decomposed table R1,R2,R3
If R1 join R2 join R3=R, it is lossless join and
If R1UR2UR3=R, it is lossless join and
If R1UR2UR3 is not equal to R it is lossy design