0% found this document useful (0 votes)
271 views78 pages

Chapter 7

Chapter 7 of the Database Systems course focuses on database normalization, which is a process aimed at organizing data to reduce redundancy and improve integrity. It discusses various types of update anomalies, functional dependencies, and the steps involved in normalization, including First, Second, and Third Normal Forms. The chapter emphasizes the importance of normalization in achieving efficient database design and preventing data inconsistencies.

Uploaded by

Faridah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views78 pages

Chapter 7

Chapter 7 of the Database Systems course focuses on database normalization, which is a process aimed at organizing data to reduce redundancy and improve integrity. It discusses various types of update anomalies, functional dependencies, and the steps involved in normalization, including First, Second, and Third Normal Forms. The chapter emphasizes the importance of normalization in achieving efficient database design and preventing data inconsistencies.

Uploaded by

Faridah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

ISLAMIC UNIVERSITY IN UGANDA

Bridging Communities

DATABASE SYSTEMS
Chapter 7 : Database Normalization

Dr. Guma Ali


Objectives (1/1)
At the end of the chapter, the student should be able to:
• Understand database normalization.
• State the main purposes of database normalization.
• Describe the classifications of update anomalies.
• Describe functional dependencies.
• Examine the types of functional dependencies.
• Describe with relevant examples the process of
normalization.
2
Data Normalization (1/3)
• Normalization is a process in database design to
organize data in a way that reduces data redundancy,
improves data integrity, and enhances overall
efficiency.

• It is a systematic approach to decomposing tables in a


relational database to eliminate

o data redundancy and dependency,

o improves data consistency,


3
Data Normalization (2/3)
o reduces data anomalies,

o improves the efficiency of queries.

• Normalization rules divides larger tables into smaller


tables and links them using relationships.

• The main goal of normalization is to split large tables


into smaller ones and ensure that data dependencies are
logical, minimizing anomalies such as insertion, update,
and deletion anomalies.
4
Data Normalization (3/3)
• Database normalization uses a series of rules, often
called normal forms (NF), to ensure the structure of a
relational database adheres to certain standards.

5
Purpose of Normalization (1/2)

6
Purpose of Normalization (2/2)

7
Data Redundancy and Update
Anomalies (1/10)
• Update anomalies are issues that arise when inserting,
updating, or deleting data in a database that has not
been normalized properly.
• These anomalies occur primarily in databases with
redundancy and lack of proper structure, often leading
to inconsistencies and data integrity problems.
• There are three main types of update anomalies:

8
Data Redundancy and Update
Anomalies (2/10)
 Insert anomalies
 Update anomalies
 Delete anomalies

9
Data Redundancy and Update
Anomalies (3/10)
 Insert Anomalies
• Insert anomalies occur when there is a challenge in
adding new data to the database without the presence
of other unnecessary or unrelated data.
• This happens due to poor design, where certain fields
depend on other fields to exist.
o For example: Consider a student-course enrollment
table where each row contains the student ID,
course ID, student name, and course name: 10
Data Redundancy and Update
Anomalies (4/10)

• In this design, if a new course is added but no student is


enrolled yet, we cannot insert the course data into the
table without providing a student.
• The redundancy of student and course data in the same
table causes the problem.

11
Data Redundancy and Update
Anomalies (5/10)
 Update Anomalies
• Update anomalies occur when multiple records need to
be updated in a database but fail to be updated
consistently due to redundant data.
• If the same piece of data is stored in multiple places,
forgetting to update all instances can result in
inconsistencies.

12
Data Redundancy and Update
Anomalies (6/10)
o For example: In the same student-course
enrollment table, imagine that Bob changes his
name to Robert.
o To update this information, every row that contains
Bob’s student ID must be updated.
o If only one row is updated but others are left
unchanged, the database will contain inconsistent
data.

13
Data Redundancy and Update
Anomalies (7/10)

o If Bob’s name is updated to Robert in only one row:

o The inconsistent name "Bob" and "Robert" for the


same student ID causes data inconsistency.
14
Data Redundancy and Update
Anomalies (8/10)
 Delete Anomalies
• Delete anomalies occur when the deletion of data
inadvertently causes other important data to be lost.
• This typically happens when different types of data are
stored in the same table without proper normalization.
o For example: In the same student-course
enrollment table, if Alice drops the course "Data
Structures," and we delete her record from the
table: 15
Data Redundancy and Update
Anomalies (9/10)

o Not only would Alice’s course enrollment be lost, but


all information about the "Data Structures" course
could also be lost, even if other students may want to
enroll in it in the future.
• Update anomalies can be resolved through
Normalization.
16
Data Redundancy and Update
Anomalies (10/10)
• Update anomalies are a major issue in poorly designed
databases, leading to data redundancy and
inconsistency.
• These problems can be mitigated by applying
normalization techniques, which ensure the efficient
organization of data and prevent unnecessary
duplication.
• Proper normalization up to at least the 3NF is
recommended to eliminate the risk of update anomalies.
17
Functional Dependencies (1/23)
• Functional dependency describes the relationship
between attributes in a relation.
• In a functional dependency, one attribute (or a set of
attributes) determines another attribute (or set of
attributes).
o For example, Consider a table of student records
with the following attributes:
Student_ID (Primary Key), Student_Name,
Department 18
Functional Dependencies (2/23)
o If Student_ID determines Student_Name and
Department, then we can express this functional
dependency as:
Student_ID → Student_Name, Department
o This implies that for each unique Student_ID, there is a
specific Student_Name and Department.
• Functional dependencies are crucial in identifying
redundancy in databases and are used to achieve database
normalization, where the aim is to reduce data duplication
and maintain data integrity. 19
Functional Dependencies (3/23)
Types of Functional Dependencies
❑ Partial dependency
• Partial dependency is a concept in database
normalization that occurs when a non-key attribute in a
table depends on only a portion of the primary key,
rather than the entire primary key.
• In other words, a non-key attribute is functionally
dependent on only a subset of the primary key, rather
than the entire primary key. 20
Functional Dependencies (4/23)
o Suppose we have a table 'Employee' with the
following attributes:
{Employee_ID, Department_ID, Employee_Name,
Department_Name}.
o Here, 'Employee_ID' is the primary key and
'Department_ID' is a foreign key referencing the
'Department' Table.

21
Functional Dependencies (5/23)
Employee Table
Employee_ID Department_ID Employee_Name Department_Name
1 101 John HR
2 101 Smith HR
3 102 Anne IT

o In this case, 'Department_Name' depends only on


'Department_ID', not 'Employee_ID'.

o So, 'Department_Name' is partially dependent on


the primary key {Employee_ID}.
22
Functional Dependencies (6/23)
o If we were to delete or change the 'Department_ID',
the 'Department_Name' would change, but the
'Employee_ID' would not impact it.
o We would create a separate table for 'Department' with
the attributes {Department_ID, Department_Name} to
remove partial dependency and achieve normalization.
o Then, we would have a foreign key 'Department_ID'
in the 'Employee' table, referencing the 'Department'
table.
23
Functional Dependencies (7/23)
o After normalization, the tables would look like this:
Employee Table
Employee_ID Employee_Name Department_ID
1 John 101
2 Smith 101
3 Anne 102

Department Table
Department_ID Department_Name
101 HR
102 IT
24
Functional Dependencies (8/23)
• We have removed the partial dependency, and each
attribute in the tables depends on the primary key,
eliminating redundancy and ensuring data integrity.

25
Functional Dependencies (9/23)
❑ Full functional dependency
• Full functional dependency is a type of functional
dependency in database design that occurs when a non-
prime attribute (an attribute that is not part of any
candidate key) is functionally dependent on the entire
candidate key but not on any proper subset of the
candidate key.

26
Functional Dependencies (10/23)
• Full functional dependency indicates that if A and B are
attributes of a relation R, B is fully functionally
dependent on A if B is functionally dependent on A but
not on any proper subset of A.

• A functional dependency A → B is a full functional


dependency if removal of any attribute from A results in
the dependency no longer existing.

27
Functional Dependencies (11/23)
• A functional dependency A → B is a partial dependency
if there is some attribute that can be removed from A
and yet the dependency still holds.

• In other words, if you know the value of B, you can


uniquely determine the value of A without considering
any other attribute.

28
Functional Dependencies (12/23)
• This concept is essential in database design and
normalization to minimize data redundancy and
anomalies.
• Let's illustrate full functional dependency with an
example:
• Suppose we have a table called "StudentCourses" with
the following attributes:

29
Functional Dependencies (13/23)
StudentID (the identifier of the student)
CourseID (the identifier of the course)
Instructor (the Instructor's name for the course)
CourseName (the name of the course)
• We want to determine if a full functional dependency
exists between Instructor and CourseName.
• To do this, we need to check if the Instructor depends
solely on CourseID or any other combination of
attributes. 30
Functional Dependencies (14/23)
StudentCourses Table
StudentID CourseID Instructor CourseName
1 101 Mr. Smith Mathematics
1 102 Ms. Johnson Chemistry
2 101 Mr. Smith Mathematics
2 103 Ms. Davis History

• In this case, we can see that the Instructor attribute


depends solely on CourseID.

31
Functional Dependencies (15/23)
• If you know the CourseID, you can determine the
Instructor uniquely, as there is a one-to-one relationship
between CourseID and Instructor.
• There are no other attributes in the table that affect this
dependency.
• Therefore, there is a full functional dependency
between CourseID and Instructor.

32
Functional Dependencies (16/23)
• However, CourseName does not have a full functional
dependency on the Instructor.
• Multiple courses can have the same Instructor.
o For example, both Mathematics and Chemistry are
taught by different instructors, but they share the
same Instructor (Mr. Smith) for different students.
• Hence, CourseName is not fully functionally dependent
on Instructor because it depends on both Instructor and
CourseID. 33
Functional Dependencies (17/23)
❑ Transitive dependency
• A transitive dependency is a type of dependency in a
database where one element is indirectly dependent on
another through an intermediary element.
• In databases (particularly in database normalization), a
transitive dependency occurs when a non-prime
attribute (an attribute that is not part of a candidate key)
depends on another non-prime attribute rather than
directly depending on the primary key.
34
Functional Dependencies (18/23)
• Transitive dependency is a condition where A, B, and C
are attributes of a relation such that if A → B and B → C,
then C is transitively dependent on A via B (provided that
A is not functionally dependent on B or C).
• An example of a transitive dependency is:
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress

35
Functional Dependencies (19/23)
• The transitive dependency branchNo → bAddress exists
on staffNo via branchNo.

• In other words, the staffNo attribute functionally


determines the bAddress via the branchNo attribute and
neither branchNo nor bAddress functionally determines
staffNo.

o Suppose we have a table called "Student_Courses"


with the following attributes:
36
Functional Dependencies (20/23)
Student_ID (Primary Key)
Course_ID (Primary Key)
Course_Name
Professor_Name
o "Student_ID" and "Course_ID" form the primary
key in this table.
o Let's say "Professor_Name" depends on
"Course_Name" rather than directly on the primary
key. 37
Functional Dependencies (21/23)
o This leads to a transitive dependency, as
"Professor_Name" depends on "Course_Name,"
and "Course_Name" depends on "Course_ID"
(which is part of the primary key).
o Here's the breakdown:
Student_ID → Determines → Course_ID
Course_ID → Determines → Course_Name
Course_Name → Determines → Professor_Name
38
Functional Dependencies (22/23)
o As "Professor_Name" depends transitively on
"Student_ID" through "Course_ID" and
"Course_Name," any changes to the "Course_ID"
or "Course_Name" will affect the
"Professor_Name" attribute.
• This can lead to data anomalies and redundancy issues
when updating or deleting records, violating database
normalization principles.
39
Functional Dependencies (23/23)
• To remove transitive dependencies, one often uses
techniques such as normalization.
• In this case, we might split the table into two separate
tables to remove the transitive dependency.
• One table could store information about Courses
("Course_ID" and "Course_Name"), while the other
could store information about Professors ("Course_ID"
and "Professor_Name"), thus eliminating the transitive
dependency. 40
The Process of Normalization (1/22)
• Normalization can be accomplished and understood in
stages, each of which corresponds to a normal form.

• Normal forms refer to a set of rules for organizing data


in a way that minimizes redundancy and ensures data
integrity.

• Normal forms help us to make a good database design.

• Normalization rules are divided into the following


normal forms:
41
The Process of Normalization (2/22)

42
The Process of Normalization (3/22)
• The technique involves a series of rules that can be used
to test individual relations so that a database can be
normalized to any degree.

• When a requirement is not met, the relation violating


the requirement must be decomposed into relations that
individually meet the requirements of normalization.

• Normalization is often executed as a series of steps.

43
The Process of Normalization (4/22)
• Each step corresponds to a specific normal form that
has known properties.

• As normalization proceeds, the relations become


progressively more restricted (stronger) in format and
also less vulnerable to update anomalies.

• Figure 14.8 provides an overview of the process and


highlights the main actions taken in each step of the
process.
44
The Process of Normalization (6/22)
1. First Normal Form (1NF): Any multivalued
attributes (also called repeating groups) have been
removed, so there is a single value at the
intersection of each row and column of the table.
2. Second Normal Form (2NF): Any partial functional
dependencies have been removed (i.e., non-key
attributes are identified by the whole primary key).

46
The Process of Normalization (7/22)
3. Third Normal Form (3NF): Any transitive
dependencies have been removed (i.e., non-key
attributes are identified by only the primary key).

4. Boyce-Codd Normal Form (BCNF): Any remaining


anomalies that result from functional dependencies
have been removed.

47
The Process of Normalization (8/22)
5. Fourth Normal Form (4NF): Any multivalued
dependencies have been removed.

6. Fifth Normal Form (5NF): Any remaining


anomalies have been removed.

48
The Process of Normalization (10/22)
❑ Unnormalized Form (UNF)

• Unnormalized form is a table that contains one or more


repeating groups.

• We begin the process of normalization by first


transferring the data from the source into table format
with rows and columns.

• In this format, the table is in unnormalized form and is


referred to as an unnormalized table.
50
The Process of Normalization (11/22)
Represent the View in Tabular Form
• The first step (preliminary to normalization) is to
represent the user view as a single table, or relation,
with the attributes recorded as column headings.
• Sample data should be recorded in the rows of the table,
including any repeating groups that are present in the
data.
o For example, the table representing the project data
in tabular form. 51
The Process of Normalization (12/22)
Project Project Project Project Employee Employee Department Department Hourly
Code Name Manager Budget No. Name No. Name Rate
S100 Mohan D03 Database 21.00
Reservation
PC10 Mr. Ajay 120500 S101 Vipul D02 Testing 16.50
System
S102 Riyaz D01 IT 22.00
S103 Pavan D03 Database 18.50
Mrs.
PC11 HR System 500500 S104 Jitendra D02 Testing 17.00
Charu
S315 Pooja D01 IT 23.50
S137 Rahul D03 Database 21.50
Attendance Mr.
PC12 710700 S218 Avneesh D02 Testing 15.50
System Rajesh
S109 Vikas D01 IT 20.50

52
The Process of Normalization (13/22)
❑ First Normal Form (1NF)

• A relation is in 1NF if the following two constraints


both apply:

▪ The 1NF requires that each column in a table must


contain atomic values (indivisible values).

o It means that a column should NOT contain


multiple values, repeating groups, arrays, or lists
in a single table.
53
The Process of Normalization (14/22)
▪ A primary key has been defined, which uniquely
identifies each row in the relation.

• Applying this procedure to the project data table to


yield the new table shown.

54
Project Project Project Project Employee Employee Department Department Hourly
Code Name Manager Budget No. Name No. Name Rate
Reservation
PC10 Mr. Ajay 120500 S100 Mohan D03 Database 21.00
System
Reservation
PC10 Mr. Ajay 120500 S101 Vipul D02 Testing 16.50
System
Reservation
PC10 Mr. Ajay 120500 S102 Riyaz D01 IT 22.00
System
PC11 HR System Mrs. Charu 500500 S103 Pavan D03 Database 18.50
PC11 HR System Mrs. Charu 500500 S104 Jitendra D02 Testing 17.00
PC11 HR System Mrs. Charu 500500 S315 Pooja D01 IT 23.50
Attendance
PC12 Mr. Rajesh 710700 S137 Rahul D03 Database 21.50
System
Attendance
PC12 Mr. Rajesh 710700 S218 Avneesh D02 Testing 15.50
System
Attendance
PC12 Mr. Rajesh 710700 S109 Vikas D01 IT 20.50
System

55
The Process of Normalization (16/22)
❑ Second Normal Form (2NF)

• A relation is in 2NF if it is in 1NF and contains no


partial functional dependencies. i.e., If all of its non-key
attributes fully depend on the primary key.

• A partial functional dependency exists when a non-key


attribute is functionally dependent on part (but not all)
of the primary key.

56
The Process of Normalization (17/22)
• The following steps are required to convert a relation
with partial dependencies to 2NF:

▪ Create a new relation for each primary key attribute


(or combination of attributes) that is a determinant
in a partial dependency.

▪ Move the non-key attributes that are dependent on


this primary key attribute (or attributes) from the old
relation to the new relation.
57
The Process of Normalization (18/22)

58
The Process of Normalization (19/22)
❑ Third Normal Form (3NF)

• A relation is in 3NF if it is in 2NF and no transitive


dependencies exist. i.e., each non-key column in a table
must be dependent only on the primary key and not on
any other non-key columns.

• It means that a table should not contain any data that is


not directly related to the primary key.

59
The Process of Normalization (20/22)
• The following steps are required to convert a relation to
3NF:

▪ Remove the transitive dependencies.

▪ Make a separate table for transitive dependent


Fields.

• The results of performing these steps for the project


data relation are shown in tables below.

60
The Process of Normalization (21/22)

61
62
Exercise 1 (1/2)
You are provided with Enrolment Table:
StudentID StuName CourseCode CourseName Grade CreditHr Instructor InstrOffice
C1102 Database A 3 Mr Ringo 5.01
1021234 Moses
C1201 Internet B 3 Mr Omar 5.05
C1201 Internet C 3 Mr Omar 5.05
1024131 Shaquille C1401 Discrete B 3 Mr Belinda 5.10
Math

a) Normalize the Enrolment table to 1NF. Identify the


primary key.

63
Exercise 1 (2/2)
b) Normalize the result in question (a) to 2NF. Name any
new relations and identify the Primary Key and
Foreign Key.

c) Normalize the result in question (b) to 3NF. Name any


new relations, and identify the Primary Key and
Foreign Key.

64
Exercise 2 (1/2)
A collection of DreamHome Leases is shown in Figure 14.9.

65
Exercise 2 (2/2)
a) Convert the DreamHome Leases unnormalized form
(UNF).

b) Normalize the Enrolment table to 1NF. Identify the


primary key.

c) Normalize the result in question (a) to 2NF. Name any new


relations and identify the Primary Key and Foreign Key.

d) Normalize the result in question (b) to 3NF. Name any new


relations, and identify the Primary Key and Foreign Key.

66
Exercise 3 (1/2)
Examine the Patient Medication Form for the Wellmeadows
Hospital case study shown in Figure below.

67
Exercise 3 (2/2)
a) Identify the functional dependencies represented by the
attributes shown in the form. State any assumptions that
you make about the data and the attributes shown in this
form.

b) Describe and illustrate the process of normalizing the


attributes shown in Figure above to produce a set of well-
designed 3NF relations.

c) Identify the primary, alternate, and foreign keys in your


3NF relations.
68
Exercise 4 (1/3)
The table shown in Figure 14.19 lists sample dentist/patient
appointment data. A patient is given an appointment at a
specific time and date with a dentist located at a particular
surgery.
On each day of patient appointments, a dentist is allocated to
a specific surgery for that day.

69
Exercise 4 (2/3)

a) The table shown in Figure 14.19 is susceptible to


update anomalies. Provide examples of insertion,
deletion, and update anomalies.
70
Exercise 4 (3/3)
a) Identify the functional dependencies represented by
the attributes shown in the table of Figure 14.19. State
any assumptions you make about the data and the
attributes shown in this table.

b) Describe and illustrate the process of normalizing the


table shown in Figure 14.19 to 3NF relations. Identify
the primary, alternate, and foreign keys in your 3NF
relations.
71
Exercise 5 (1/2)
A company called FastCabs provides a taxi service to clients.
The table shown in Figure 14.21 displays some details of
client bookings for taxis.
Assume that a taxi driver is assigned to a single taxi, but a
taxi can be assigned to one or more drivers.

72
Exercise 6 (1/2)
• Let’s consider a scenario where we have a database
table for an online bookstore.
• The initial unnormalized table contains details of
books, authors, and orders.

73
Exercise 6 (2/2)
1) Normalize the unnormalized table to 1NF. Identify the
primary key.

2) Normalize the result in question (a) to 2NF. Name any


new relations and identify the Primary Key and
Foreign Key.

3) Normalize the result in question (b) to 3NF. Name any


new relations, and identify the Primary Key and
Foreign Key.
74
Exercise 6 (2/2)
a) Identify the functional dependencies that exist between
the columns of the table in Figure 14.21 and identify the
primary key and any alternate key(s) (if present) for the
table.

b) Describe why the table in Figure 14.21 is not in 3NF.

c) The table shown in Figure 14.21 is susceptible to update


anomalies. Provide examples of how insertion, deletion,
and modification anomalies could occur on this table.
75
Reading Lists (1/2)
1. Silberschatz, A., Korth, H. F., & Sudarshan, S. (2020).
Database System Concepts (7th ed.). New York: McGraw-
Hill.
2. Lemahieu, W., Broucke, V. S., & Baesens, B. (2018).
Principles of database management: The practical guide to
storing, managing and analyzing big and small data.
Cambridge University Press.
3. Elmasri, R., & Navathe, S. (2017). Fundamentals of
Database Systems (7th ed.). Pearson India
76
Reading Lists (2/2)
4. Jukic, N., Vrbsky, S., & Nestorov, S. (2016). Database
systems: Introduction to databases and data warehouses.
Prospect Press.
5. Connolly, T., & Begg, C. (2015). Database Systems: A
Practical Approach to Design, Implementation, and
Management (6th ed.). Essex, Harlow, England: Pearson
Education Limited.
6. Hoffer, J. A., Ramesh, V., & Heikki, T. (2013). Modern
Database Management (11th ed.). Boston, U.S: Pearson
Education Limited. 77
78

You might also like