DBMS Unit-4 Notes
DBMS Unit-4 Notes
Notes of Unit-4
Normalization
Normalization in DBMS is a technique using which you can organize the data in the database tables
so that:
• There is less repetition of data
• A large set of data is structured into a bunch of smaller tables
• The tables have a proper relationship between them
DBMS Normalization is a systematic approach to decompose (break down) tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion anomaly in DBMS, Update
anomaly in DBMS and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data and set up the
relationship between tables.
If a table is not properly normalized and has data redundancy(repetition) then it will not only eat up
extra memory space but will also make it difficult for you to handle and update the data in the database,
without losing data.
Insertion, Updation and Deletion Anomalies are very frequent if the database is not normalized.
To understand these anomalies let us take an example of a Student table.
As we can see, data for the fields branch, hod and office_tel are repeated for the students who are in
the same branch in the college, this is Data Redundancy.
• Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted or else we will have to set the branch information as NULL.
• Also, if we have to insert data for 100 students of the same branch, then the branch information
will be repeated for all those 100 students.
• If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.
• So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch information.
• What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science
department? In that case, all the student records will have to be updated, and if by mistake we
miss any record, it will lead to data inconsistency.
• This is an Updation anomaly because you need to update all the records in your table just
because one piece of information got changed.
3. Deletion Anomaly in DBMS
• In our Student table, two different pieces of information are kept together, the Student
information and the Branch information.
• So if only a single student is enrolled in a branch, and that student leaves the college, or for
some reason, the entry for the student is deleted, we will lose the branch information too.
• So never in DBMS, we should keep two different entities together, which in the above example
is Student and branch.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
For a table to be in the First Normal Form, it should follow the following 4 rules:
If we have an Employee table in which we store the employee information along with the employee
skillset, the table will look like this:
Emp_id Emp_name Emp_mobile Emp_skills
So how do you fix the above table? There are two ways to do this:
1. Remove the emp_skills column from the Employee table and keep it in some other table.
2. Or add multiple rows for the employee and each row is linked with one skill.
1 Python
1 JavaScript
2 HTML
2 CSS
2 JavaScript
3 Java
3 Linux
3 C++
Let's take an example to understand Partial dependency and the Second Normal Form.
When a table has a primary key that is made up of two or more columns, then all the columns(not
included in the primary key) in that table should depend on the entire primary key and not on a part of
it. If any column(which is not in the primary key) depends on a part of the primary key then we say
we have Partial dependency in the table.
If we have two tables Students and Subjects, to store student information and information related to
subjects.
Student table:
1 Akon CSE
2 Bkon Mechanical
Subject Table:
Subject_id Subject_name
1 C Language
2 DSA
3 Operating System
And we have another table Score to store the marks scored by students in any subject like this:
Student_id Subject_id Marks Teacher_name
1 1 70 Miss. C
1 2 82 Mr. D
2 1 65 Mr. Op
Now in the above table, the primary key is student_id + subject_id, because both these information
are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject information
or just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be
Normalized as per the Second Normal Form.
Updated Subject table:
1 C Language Miss. C
2 DSA Mr. D
1 1 70
1 2 82
2 1 65
1 1 70 Theory 100
1 2 82 Theory 100
2 1 42 Practical 50
• In the table above, the column exam_type depends on both student_id and subject_id,
because,
o a student can be in the CSE branch or the Mechanical branch,
o and based on that they may have different exam types for different subjects.
o The CSE students may have both Practical and Theory for Compiler Design,
o whereas Mechanical branch students may only have Theory exams for Compiler
Design.
• But the column total_marks just depends on the exam_type column. And
the exam_type column is not a part of the primary key. Because the primary key is student_id
+ subject_id, hence we have a Transitive dependency here.
How to solve Transitive Dependency?
You can create a separate table for ExamType and use it in the Score table.
We have created a new table ExamType and we have added more related information in it
like duration(duration of exam in mins.), and now we can use the exam_type_id in the Score table.
New ExamType table:
Exam_type_id Exam_type Total_marks Duration
1 Practical 50 45
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads
to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as
a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Functional Dependency in DBMS
A Functional Dependency in DBMS is a fundamental concept that describes the relationship between
attributes (columns) in a table. It shows how the values in one or more attributes determine the value
in another. In layperson's terms, it describes how data in one column or set of columns can relate to
data in another column. It helps to maintain the quality of the data in DBMS.
Functional Dependency is represented in the form of an equation. Here, you have a set of attributes
(A, B, C, etc.) and an arrow (->) denoting the Dependency. For example, if we have a table of employee
data with columns "EmployeeID," "FirstName," and "LastName," we can express a functional
dependency like this:
EmployeeID -> FirstName, LastName.
Another important term you should know is Partial dependency in DBMS, which is a Database
Management system (DBMS) concept that describes a specific type of dependency between attributes
(columns) within a relational database table.
How to Denote a Functional Dependency in DBMS?
In DBMS, you denote functional dependencies using a notation. It contains two main components: the
left-hand side (LHS) and the right-hand side (RHS) of an arrow (->).
For example, if we have a table with attributes "A," "B," and "C," and attribute "A" determines the
values of attributes "B" and "C," you would denote it as
A -> B, C
This notation indicates that the value(s) in attribute "A" determines the value(s) in attributes "B" and
"C." In other words, if you know the value of "A," you can determine the values of "B" and "C."
To illustrate this concept, let's consider an example with a table of student data:
We want to express a functional dependency based on the student's birthdate (StudentDOB) and class
(Class). A non-trivial functional dependency in this case would be
StudentDOB, Class -> StudentName.
This functional dependency means that given a combination of a student's date of birth and class, you
can uniquely determine their name. It's non-trivial because it provides valuable information about the
relationship between attributes in the table.
Multivalued Functional Dependency
A multivalued functional dependency in a database occurs when one or more attributes determine
multiple unrelated sets of values in another attribute. It shows that changes in the determining attributes
can lead to various combinations of values in the dependent attribute, indicating complex relationships
within the data.
Here's a simple example:
In this example, we can assume that Student_Address depends on Student_City, and Student_City
depends on Student_ID. This creates a transitive dependency in DBMS, where Student_ID indirectly
determines Student_Address.
Closure of a Set of Attributes
The closure of a set of attributes X is the set of those attributes that can be functionally determined
from X. The closure of X is denoted as X+.
When given a closure problem, you’ll have a set of functional dependencies over which to compute
the closure and the set X for which to find the closure. A functional dependency A1, A2, …, An -> B
in a table holds if two records that have the same value of attributes A1, A2, …, An also have the same
value for attribute B.
The closure of X is the set of all attributes such that two records that have the same value of X also
have the same value of X+.
Real-World Examples
Database courses usually teach closures using abstract examples, and we’ll look at some abstract
examples later. However, let’s first look at a real-world example, which is easier to understand.
Imagine we have a table Course Editions . The table contains information about editions of courses
date_of_birth is also in {year, teacher}+, and I know three columns: {year, teacher, date_of_birth}. If
I know the year and date of birth, I can also determine the age. Now, {year, teacher}+ has four columns
{year, teacher, date_of_birth, age}. I have used two of the three functional dependencies. I can’t use
the remaining dependency, course, year -> teacher because I don’t know the course. Now that I have
used all of the dependencies I can, {year, teacher}+ = {year, teacher, date_of_birth, age}.