DBMS, unit-5
DBMS, unit-5
Unit 5: Normalization
Redundancy in relation may cause insertion, deletion, and update anomalies. So, Normalization
helps to minimize the redundancy in relations.
Normalization is used to keep data consistent and to check that no loss of data as well as data
integrity.
Anomalies in DBMS -
There are three types of anomalies that occur when the database is not normalized. These are:
Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: A manufacturing company stores the employee details in a table Employee that has four
attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s name, Emp_Address
for storing employee’s address and Emp_Dept for storing the department details in which the
employee works. At some point of time the table looks like this:
This table is not normalized. We will see the problems that we face when a table in database is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets updated
in one department but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into the table if
Emp_Dept field doesn’t allow null.
Delete anomaly: Let’s say in future, company closes the department D890 then deleting the rows
that are having Emp_Dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.
Second Normal
Form (2NF)
NORMAL FORMS in
DBMS
Third Normal Form
(3NF)
In other words, you can say that a relation is in 1NF if each attribute contains only an
atomic(single) value only.
Example:
Let’s say a company wants to store the names and contact details of its employees. It creates a
table in the database that looks like this:
Two employees (Aman & Raj) have two mobile numbers that caused the Emp_Mobile field to have
multiple values for these two employees.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”,
the Emp_Mobile values for employees Aman & Raj violates that rule.
To make the table complies with 1NF we need to create separate rows for each mobile number in
such a way that none of the attributes contains multiple values.
A given relation is called in Second Normal Form (2NF) if and only if-
The StudentName can be determined by StudentID, which makes the relation Partial Dependent.
The ProjectName can be determined by ProjectNo, which makes the relation Partial Dependent.
Therefore, the <StudentProject> relation violates the 2NF in Normalization and is considered a bad
database design.
<StudentInfo> <ProjectInfo>
StudentID ProjectNo StudentName ProjectNo ProjectName
S01 199 Mahek 199 Geo Location
S02 120 Rahil 120 Cluster Exploration
A given relation is called in Third Normal Form (3NF) if and only if-
In other words,
A relation that is in First and Second Normal Form and in which no non-primary-key attribute is
transitively dependent on the primary key, then it is in Third Normal Form (3NF).
Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.
If a transitive dependency exists, we remove the transitively dependent attribute(s) from the
relation by placing the attribute(s) in a new relation along with a copy of the determinant.
Example-1:
In relation STUDENT,
FD set:
{STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}
Candidate Key:
{STUD_NO}
For this relation,
STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO.
It violates the third normal form.
Its time to summarize our reading. We have below image to summarize the reading on normal
forms:
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A → B,
A cannot be a non-prime attribute, if B is a prime attribute.
Example
Below we have a college enrolment table with columns student_id, subject and professor.
As you can see, we have also added some sample data to the table.
One student can enrol for multiple subjects. For example, student with student_id 101, has opted
for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject
may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are unique
and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by
BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student
table and professor table.
Student Table
student_id p_id
101 1
101 2
and so on...
Disadvantages
o More complicated SQL required for multitable subqueries and joins
o Extra work for DBMS can mean slower applications