Chapter 15
Normalization for Relational
Database
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Data Normalization
§ Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary
duplication of data
§ The process of decomposing relations with
anomalies to produce smaller, well-
structured relations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
3
Well-Structured Relations
§ A relation that contains minimal data redundancy and
allows users to insert, delete, and update rows without
causing data inconsistencies
§ Goal is to avoid anomalies
§ Insertion Anomaly–adding new rows forces user to create
duplicate data
§ Deletion Anomaly–deleting rows may cause a loss of data
that would be needed for other future rows
§ Modification Anomaly–changing data in a row forces
changes to other rows because of duplication
General rule of thumb: A table should not pertain to
more than one entity type
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example
EmpID Name Salary Course# CourseTitle Date
100 Alaa 32000 459 SPSS 9/9/2016
100 Alaa 32000 876 Surveys 7/8/2016
140 Atheer 40000 333 Visual Basic 1/1/2016
150 Aisha 23000 459 SPSS 9/9/2016
150 Aisha 23000 901 C++ 12/8/2016
140 Atheer 40000 901 C++ 12/8/2016
Is this a relation? Yes: Unique rows and no multivalued
attributes
What’s the primary key? Composite: Emp_ID, Course#
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
5
Anomalies in this Table
Insertion–can’t enter a new employee without having
the employee take a class. In addition, adding course for
existing employee duplicate employees’ data and course
data
Deletion–if we remove employee 140, we lose
information about the existence of a Visual Basic class
Modification–giving a salary increase to employee 100
forces us to update multiple records
Why do these anomalies exist?
Because there are two themes (entity types) in this
one relation. This results in data duplication and
an unnecessary dependency between the entities
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Functional Dependency
Functional dependency, denoted by
X Y between two sets of attributes X, Y
means that value of Y is determined by the
value of X.
The value of the X of a tuple uniquely (or
functionally) determine the value of
Y
Y is functionally dependent on X
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Functional Dependency (2)
Ssn→Ename
The value of an employee’s Social security number
uniquely determines the employee name
Pnumber →{Pname, Plocation}
The value of a project’s number (Pnumber)
uniquely determines the project name and
location
{Ssn, Pnumber}→Hours
A combination of Ssn and Pnumber values
uniquely determines the number of hours the
employee currently works on specific project
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Normal Forms Based on Primary
Key
Most practical relational design projects
take one of the following two approaches:
Perform a conceptual schema design using a
conceptual model such as ER or EER and map
the conceptual design into a set of relations
Design the relations based on external
knowledge derived from an existing
implementation of files or forms or reports
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Normalization of Relations
The normalization process takes a relation
schema through a series of tests to certify
whether it satisfies a certain normal
form.
There are three normal forms, which are
first, second, and third normal form
It is a purifying process that makes the
design have better quality and minimizes
redundancy
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Normalization Steps
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
First Normal Form
For a relation to be in 1st normal form,
redundant groups or multivalued attributes
should be removed
To change to 1NF:
Remove nested relation attributes into a
new relation
Propagate the primary key into it
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Removing multivalued attributes
New relation to be in 1st normal form
Dnumber Dlocations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Removing Relation within
Relation or Repeating Groups
St_code St_name Address Age
Student
courses
C# c_name hours
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example Repeating Groups
St_code St_name Address Age C# C_name Hours
20111 Karim AlRawda 18
DS34 Data Str. 3
OR23 Ope. Rese 3
DB12 Database 3
20112 Mona Alfayhaa 17
N22 Network 2
OR23 Ope. Rese 3
Db12 Database 3
DS34 Data Str. 3
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Example Repeating Groups
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
To Be in First Normal Form
The previous data to be in 1st normal form,
split the repeating group in another table with
the primary key of the relation to be as
foreign key in the new relation
St_code St_name Address Age
St_code C# C_name Hours
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Second Normal Form
Based on concept of full functional
dependency
Versus partial dependency- Remove partial dependency
Second normalize into a number of 2NF
relations
Nonprime attributes are associated only with part of
primary key on which they are fully functionally dependent
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Second Normal Form (2)
SSN P# Hours Ename Pname Plocation
Ename is functionally dependent on SSN only, no
need for P#
Pname and Plocation are functionally dependent on
P#
So this relation is not in the second normal form
Solution:
SSN P# Hours
SSN Ename P# Pname Plocation
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Third Normal Form
To be in 3rd normal form you should remove
transitive dependency
Non key attribute is dependent on non key
attribute
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Third Normal Form (2)
SSN Ename Address Dept_code Dept_name Mgr_SSN
Dept_name and mgr_SSN are functionally
dependent on dept_code which is non key
attribute
Solution:
SSN Ename Address Dept_code
Dept_code Dept_name Mgr_SSN
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Exercises
Consider the following report, suppose sales order number with item
ordered is the primary key, normalize this report to reach better design
Sales Order
Fiction Company
202 N. Main
Mahattan, KS 66502
CustomerNumber: 1001 Sales Order Number: 405
Customer Name: ABC Company Sales Order Date: 2/1/2000
Customer Address: 100 Points Clerk Number: 210
Manhattan, KS 66502 Clerk Name: Martin Lawrence
Item Ordered Description Quantity Unit Price Total
800 widgit small 40 60.00 2,400.00
801 tingimajigger 20 20.00 400.00
805 thingibob 10 100.00 1,000.00
Order Total 3,800.00
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Relation
R=salesOrderNo, salesOrderDate, custNo,
custName, address clerkNo, clerkName,
{itemsOrdered, description, quantity,
unitPrice}
To be in 1st normal form, remove repeating
group:
R1= salesOrderNo, salesOrderDate, custNo,
custName, address clerkNo, clerkName
R2= salesOrderNo, itemsOrdered,
description, quantity, unitPrice
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
2 nd
Normal Form
R1= salesOrderNo, salesOrderDate, custNo,
custName, address clerkNo, clerkName
R2.1= salesOrderNo, itemsOrdered, quantity
R2.2= itemsOrdered, description, unitPrice
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
3rd
Normal Form
R1.1= salesOrderNo, salesOrderDate, custNo,
clerkNo
R1.2= custNo, custName, address
R1.3= clerkNo, clerkName
R2.1= salesOrderNo, itemsOrdered, quantity
R2.2= itemsOrdered, description, unitPrice
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Exercises
Normalize the following schemas into 3rd
normal form:
BRANCH (Branch#, Branch_Addr, {ISBN,
Title, Author, Publisher, Num_copies})
1st Normal Form:
R1: Branch#, Branch_addr
R2: Branch#, ISBN, title, author, publisher,
num_copies
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
2nd Normal Form:
R1: Branch#, Branch_addr
R2.1: Branch#, ISBN, num_copies
R2.2: ISBN, title, author, publisher
3rd Normal Form:
No change in the previous relations.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Exercises
Project code, project title, project manager,
project budget {employeeNo,
employeename,completed_hour,
departmentNo, department_name,
rate_per_hour}
Note: rate per hour for each employee is
fixed regardless of the project. Completed
hour means the number of hours employee
accomplished in this project
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
1st Normal Form:
R1: Pcode, ptitle, pmgr, pbudget
R2: Pcode, empNo, empName, c_hours,
dNo, dName, rate
2nd Normal Form:
R1: Pcode, ptitle, pmgr, pbudget
R2.1: Pcode, empNo, c_hours
R2.2: empNo, empName, dNo, dName, rate
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
3rd Normal Form:
R1: Pcode, ptitle, pmgr, pbudget
R2.1: Pcode, empNo, c_hours
R2.2.1: empNo, empName, dNo, rate
R2.2.2: dNo, dName
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Exercises
Al Salam Hospital - Doctor's report Date: 10/5/2004
Doctor Id. : A121 Doctor Name : Dr. Ahmed
Department Id : A Department Name : Internal Diseases
P# Pat-name Address Given Treatments
Item# Description Quantity Unit Price
10 Saleh Maadi A01 Aspirin 10 1.5
A03 Panadol 6 3.5
B01 Vitamin C 12 4.0
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Relation
R=docId, docname, deptId, deptName,
{ patient(p#,pname,address){given
treatment (item#, description, quantity,
unit_price)}}
First Normal Form
R1= docId, docname, deptId, deptName
R2= docId, p#, pname, address
R3=docId, p#, item#, description, qantity,
unit_price
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Second Normal Form
R1= docId, docname, deptId, deptName
R2.1= docId, p#
R2.2= p#, pname, address
R3=docId, p#, item#, qantity
R3.2= item#, description, unit_price
Copyright © 2011 Ramez Elmasri and Shamkant Navathe
Third Normal Form
R1.1= docId, docname, deptId
R1.2=deptId, deptName
R2.2= p#, pname, address
R3=docId, p#, item#, qantity
R3.2= item#, description, unit_price
Copyright © 2011 Ramez Elmasri and Shamkant Navathe