WEEK 1
LECTURE 1: NORMALIZATION TECHNIQUES
Lecture Synopsis:
1.1 Understanding Normalization
1.2 Detailed steps of normalization (1NF, 2NF, 3NF)
1.3 Practical exercises on normalizing data
What Is Database Normalization?
Database normalization, or just normalization as it’s commonly called, is a process used
for data modelling or database creation, where you organize your data and tables so it
can be added and updated efficiently.
Normalization is the process of organizing the data in the database.
It’s something a person does manually, as opposed to a system or a tool doing it. It’s
commonly done by database developers and database administrators.
It can be done on any relational database, where data is stored in tables that are linked
to each other. This means that normalization in a DBMS (Database Management System)
can be done in Oracle, Microsoft SQL Server, MySQL, PostgreSQL and any other type of
database.
To perform the normalization process, you start with a rough idea of the data you want
to store, and apply certain rules to it in order to get it to a more efficient form.
1
Why Normalize a Database?
So why would anyone want to normalize their database?
Why do we want to go through this manual process of rearranging the data?
There are a few reasons we would want to go through this process:
Make the database more efficient
Prevent the same data from being stored in more than one place (called an “Insert
Anomaly”)
Prevent updates being made to some data but not others (called an “Update
Anomaly”)
Prevent data not being deleted when it is supposed to be, or from data being lost
when it is not supposed to be (called a “Delete Anomaly”)
Ensure the data is accurate
Reduce the storage space that a database takes up
Ensure the queries on a database run as fast as possible
Normalization in a DBMS is done to achieve these points (removing these anomalies).
Without normalization on a database, it can lead to data redundancy and can cause data
integrity and other problems as the database grows. The database can be slow, incorrect,
and messy.
2
Data Anomalies
An anomaly is where there is an issue in the data that is not meant to be there. This can
happen if a database is not normalized.
There different kinds of data anomalies that can occur and that can be prevented with a
normalized database.
Example
We’ll be using a student database as an example in this note, which records student,
class, and teacher information.
Student Student Fees Course Module 1 Module Module 3
ID Name Paid Name 2
1 John Sesay 200 BIT Web Design OOP 1
2 Maria Kanu 500 BSEM Database Maths Programming 2
3 Susan Johnson 400 BICT Multimedia
4 Mathew Cole 850 DIT
This table keeps track of a few pieces of information:
The student names
The fees a student has paid
The classes a student is taking, if any
This is not a normalized table, and there are a few issues with this.
3
Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without knowing
all the data we need to know. That is, when one cannot insert a new record into a table
due to lack of data.
For example, if we wanted to add a new student but did not know their course name.
The new record would look like this:
Student Student Name Fees Course Class 1 Class 2 Class 3
ID Paid Name
1 John Sesay 200 BIT Web Design Database
2 Maria Kanu 500 BSEM Database Maths Programming 2
3 Susan Johnson 400 BICT Multimedia
4 Mathew Cole 850 DIT
5 Alie Turay 500 ? ?
We would be adding incomplete data to our table, which can cause issues when trying to
analyze this data.
Update Anomaly
An update anomaly happens when we want to update data and we update some of the
data but not other data.
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would
have to query all of the columns that could have this Class field and rename each one
that was found.
4
Student Student Fees Course Class 1 Class 2 Class 3
ID Name Paid Name
1 John Sesay 200 BIT Web Design Database
2 Maria Kanu 500 BSEM Database Maths Programming
3 Susan Johnson 400 BICT Multimedia
4 Mathew Cole 850 DIT
There’s a risk that we miss out on a value, which would cause issues.
Ideally, we would only update the value once, in one location.
Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up
deleting more than what we intended.
For example, let’s say Susan Johnson quits and her record needs to be deleted from the
system. We could delete her row:
Student Student Fees Course Class 1 Class 2 Class 3
ID Name Paid Name
1 John Smith 200 Economics Economics Biology 1
5
2 Maria 500 Computer Biology 1 Business Programming
Griffin Science Intro 2
3 Susan 400 Medicine Biology 2
Johnson
4 Matt Long 850 Dentistry
But, if we delete this row, we lose the record of the Biology 2 class, because it’s not
stored anywhere else. The same can be said for the Medicine course.
We should be able to delete one type of data or one record without having impacts on
other records we don’t want to delete.
What Are The Normal Forms?
The process of normalization involves applying rules to a set of data. Each of these rules
transforms the data to a certain structure, called a normal form.
There are three main normal forms that you should consider (Actually, there are six
normal forms in total, but the first three are the most common).
Whenever the first rule is applied, the data is in “first normal form“. Then, the second
rule is applied and the data is in “second normal form“. The third rule is then applied and
the data is in “third normal form“.
Fourth and fifth normal forms are then achieved from their specific rules.
6
First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Ensures that the database table is organized such that each column contains atomic
values, and each record is unique. This eliminates repeating groups, thereby
structuring data into tables and columns.
Examples 1:
EMPLOYEE table:
EMP_ID EMP_NAME DEPARTMENT BRANCH
1 John Sesay IT Aberdeen, Lumley
2 Henry Tucker Finance Kissy, Wellington
3 Sam Johnny HR Aberdeen, Lumley, Kissy,
Wellington
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME DEPARTMENT BRANCH
1 John Sesay IT Aberdeen
1 John Sesay IT Lumley
2 Harry Tucker Finance Kissy
2 Harry Tucker Finance Wellington
7
3 Sam Johnny HR Aberdeen
3 Sam Johnny HR Lumley
3 Sam Johnny HR Kissy
3 Sam Johnny HR Wellington
Examples 2:
Assume, a video library maintains a database of movies rented out. Without any
normalization in database, all information is stored in one table as shown below.
ID Full name Address Movies Rented Contact
101 John Sesay 7 Campbell Street Strike Back, 13 Hours The 23200123456
Secret Soldiers Of
Benghazi
102 Abu Koroma 6 Sanders Street Picture Of Her, Bob Marley 23211258963
One Love
103 Peter Amara 12 Amara Drive Strike Back 23233000000
Here you see Movies Rented column has multiple values.
Now let’s move into 1st Normal Forms:
First Normal Form (1NF)
Each table cell should contain a single value.
Each record needs to be unique.
The above table in 1NF:
8
ID Full name Address Movies Rented Contact
101 John Sesay 7 Campbell Street Strike Back 23200123456
102 John Sesay 7 Campbell Street 13 Hours The Secret Soldiers Of 23200123456
Benghazi
102 Abu Koroma 6 Sanders Street Picture Of Her 23211258963
102 Abu Koroma 6 Sanders Street Bob Marley One Love 23211258963
103 Peter Amara 12 Amara Drive Strike Back 23233000000
Examples 3:
Employee table, it displays employees are working with multiple departments.
ID Employee Age Department
1 Melvin 32 Marketing, Sales
2 Edward 45 Quality Assurance
3 Alex 36 Human Resource
Employee table following 1NF:
ID Employee Age Department
1 Melvin 32 Marketing
1 Melvin 32 Sales
2 Edward 45 Quality Assurance
3 Alex 36 Human Resource
9
Examples 4:
Student Database
ID Name Age Gender Phone Faculty Courses Modules Lecturers
Number
1 John Doe 22 Male +23276123456 FICT DIT Database Mr Umar,
Systems, Mr Jelil
Object-
Oriented
Programming
Methods 1
2 Jane 21 Female +23277654321 FDI BIT Fundamentals Mr
Smith of Computer Amandus,
Systems, Mr Kanu
Database
Design &
Management
2
3 Abdul 23 Male +23278987654 FABE BBIT Multimedia Mr Sahid,
Kamara Technology, Mr Umar
Web Design 1
4 Aminata 20 Female +23279321987 FICT BICT Object- Mr Jelil,
Bangura Oriented Mr
Programming Amandus
Methods 1,
Database
Design &
Management
2
5 Mohamed 24 Male +23276555666 FDI BSEM Web Design Mr Kanu,
Sesay 1, Mr Sahid
Fundamentals
of Computer
Systems
10
The above table in 1NF:
ID Name Age Gender Phone Faculty Courses Modules Lecturers
Number
1 John Doe 22 Male 232123456 FICT DIT Database Mr Umar
Systems
1 John Doe 22 Male 232123457 FICT DIT Object- Mr Jelil
Oriented
Programming
Methods 1
2 Jane 21 Female 232123458 FDI BIT Fundamentals Mr
Smith of Computer Amandus
Systems
2 Jane 21 Female 232123459 FDI BIT Database Mr Kanu
Smith Design &
Management
2
3 Abdul 23 Male 232123460 FABE BBIT Multimedia Mr Sahid
Kamara Technology
3 Abdul 23 Male 232123461 FABE BBIT Web Design 1 Mr Umar
Kamara
4 Aminata 20 Female 232123462 FICT BICT Object- Mr Jelil
Bangura Oriented
Programming
Methods 1
4 Aminata 20 Female 232123463 FICT BICT Database Mr
Bangura Design & Amandus
Management
2
5 Mohamed 24 Male 232123464 FDI BSEM Web Design 1 Mr Kanu
Sesay
5 Mohamed 24 Male 232123465 FDI BSEM Fundamentals Mr Sahid
Sesay of Computer
Systems
11
Second Normal Form (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on
the primary key. That is, all attributes that are not primary key within the entity
should depend solely on the unique identifier of the entity.
Examples 1: Student Database
Students Table:
Student Name Age Gender Phone Faculty Courses
ID Number
1 John Doe 22 Male 23276123456 FICT DIT
2 Jane Smith 21 Female 23277654321 FDI BIT
3 Abdul Kamara 23 Male 23278987654 FABE BBIT
4 Aminata 20 Female 23279321987 FICT BICT
Bangura
5 Mohamed Sesay 24 Male 23276555666 FDI BSEM
Modules Table:
Module ID Module Name Lecturer
1 Database Systems Mr Umar
2 Object-Oriented Programming Methods 1 Mr Jelil
3 Fundamentals of Computer Systems Mr Amandus
4 Database Design & Management 2 Mr Kanu
5 Multimedia Technology Mr Sahid
6 Web Design 1 Mr Umar
12
In this 2NF structure:
The Students table contains unique student information, fully dependent on the
Student ID.
The Modules table contains unique module information, fully dependent on the
Module ID.
Third Normal Form (3NF)
A relation (table) is in Third Normal Form if:
1. It is in Second Normal Form (2NF): This means that it is already in First Normal
Form (1NF), and all non-key attributes are fully functionally dependent on the
primary key.
2. It contains no transitive dependencies: This means that non-key attributes
must not depend on other non-key attributes. Every non-key attribute must be
directly dependent on the primary key and not on any other non-key attribute.
In simpler terms, 3NF ensures that:
There is no redundancy in non-key attributes.
All attributes are directly dependent on the primary key.
3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
13
Students Table:
Student ID Name Age Gender Phone Number
1 John Doe 22 Male 23276123456
2 Jane Smith 21 Female 23277654321
3 Abdul Kamara 23 Male 23278987654
4 Aminata Bangura 20 Female 23279321987
5 Mohamed Sesay 24 Male 23276555666
Modules Table:
Module ID Module Name Student ID Lecturer ID
1 Database Systems
2 Object-Oriented Programming Methods 1
3 Fundamentals of Computer Systems
4 Database Design & Management 2
5 Multimedia Technology
6 Web Design 1
Lecturer Table:
Lecturer ID Lecturer
1 Mr Umar
2 Mr Jelil
3 Mr Amandus
4 Mr Kanu
5 Mr Sahid
6 Mr Umar
14
Faculty Table:
Faculty ID Faculty Student ID Lecturer ID
1 FICT
2 FDI
3 FABE
4 FICT
5 FDI
Course Table:
Course ID Courses Student ID Lecturer ID
1 DIT
2 BIT
3 BBIT
4 BICT
5 BSEM
15