0% found this document useful (0 votes)
8 views

DBMS Unit-4 Notes

Normalization in DBMS is a process that organizes data in tables to reduce redundancy and improve data integrity by establishing proper relationships between smaller tables. It addresses issues like insertion, update, and deletion anomalies, ensuring data consistency and logical storage. The document outlines various normal forms (1NF to 5NF) and their significance in structuring database tables effectively.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DBMS Unit-4 Notes

Normalization in DBMS is a process that organizes data in tables to reduce redundancy and improve data integrity by establishing proper relationships between smaller tables. It addresses issues like insertion, update, and deletion anomalies, ensuring data consistency and logical storage. The document outlines various normal forms (1NF to 5NF) and their significance in structuring database tables effectively.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Database Management Systems

Notes of Unit-4

Normalization
Normalization in DBMS is a technique using which you can organize the data in the database tables
so that:
• There is less repetition of data
• A large set of data is structured into a bunch of smaller tables
• The tables have a proper relationship between them
DBMS Normalization is a systematic approach to decompose (break down) tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion anomaly in DBMS, Update
anomaly in DBMS and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data and set up the
relationship between tables.

Why we need Normalization in DBMS?

Normalization is required for,


• Eliminating redundant(useless) data, therefore handling data integrity, because if data is
repeated it increases the chances of inconsistent data.
• Normalization helps in keeping data consistent by storing the data in one table and referencing
it everywhere else.
• Storage optimization although that is not an issue these days because Database storage is cheap.
• Breaking down large tables into smaller tables with relationships, so it makes the database
structure more scalable and adaptable.
• Ensuring data dependencies make sense i.e. data is logically stored.

Problems without Normalization in DBMS

If a table is not properly normalized and has data redundancy(repetition) then it will not only eat up
extra memory space but will also make it difficult for you to handle and update the data in the database,
without losing data.
Insertion, Updation and Deletion Anomalies are very frequent if the database is not normalized.
To understand these anomalies let us take an example of a Student table.

Roll no. Name Branch Hod Office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

As we can see, data for the fields branch, hod and office_tel are repeated for the students who are in
the same branch in the college, this is Data Redundancy.

Data modification anomalies can be categorized into three types:

1. Insertion Anomaly in DBMS

• Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted or else we will have to set the branch information as NULL.

• Also, if we have to insert data for 100 students of the same branch, then the branch information
will be repeated for all those 100 students.

• These scenarios are nothing but Insertion anomalies.

• If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.

• So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch information.

2. Updation Anomaly in DBMS

• What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science
department? In that case, all the student records will have to be updated, and if by mistake we
miss any record, it will lead to data inconsistency.

• This is an Updation anomaly because you need to update all the records in your table just
because one piece of information got changed.
3. Deletion Anomaly in DBMS

• In our Student table, two different pieces of information are kept together, the Student
information and the Branch information.

• So if only a single student is enrolled in a branch, and that student leaves the college, or for
some reason, the entry for the student is deleted, we will lose the branch information too.

• So never in DBMS, we should keep two different entities together, which in the above example
is Student and branch.

Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency


exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.

1. First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single (atomic) valued attributes/columns.

2. Values stored in a column should be of the same domain.

3. All the columns in a table should have unique names.

4. And the order in which data is stored should not matter.

Let's see an example.

If we have an Employee table in which we store the employee information along with the employee
skillset, the table will look like this:
Emp_id Emp_name Emp_mobile Emp_skills

1 John Tick 9999957773 Python, JavaScript

2 Darth Trader 8888853337 HTML, CSS, JavaScript

3 Rony Shark 7777720008 Java, Linux, C++

The above table has 4 columns:


• All the columns have different names.
• All the columns hold values of the same type like emp_name has all the
names, emp_mobile has all the contact numbers, etc.
• The order in which we save data doesn't matter.
• But the emp_skills column holds multiple comma-separated values, while as per the First
Normal form, each column should have a single value.
Hence the above table fails to pass the First Normal form.

So how do you fix the above table? There are two ways to do this:
1. Remove the emp_skills column from the Employee table and keep it in some other table.
2. Or add multiple rows for the employee and each row is linked with one skill.

1. Create Separate tables for Employee and Employee Skills


So the Employee table will look like this:

Emp_id Emp_name Emp_mobile

1 John Tick 9999957773

2 Darth Trader 8888853337

3 Rony Shark 7777720008

And the new Employee_Skill table:


Emp_id Emp_skill

1 Python

1 JavaScript

2 HTML

2 CSS

2 JavaScript

3 Java

3 Linux

3 C++

2. Add Multiple rows for Multiple skills


You can also simply add multiple rows to add multiple skills. This will lead to repetition of the data,
but that can be handled as you further Normalize your data using the Second Normal form and the
Third Normal form.

Emp_id Emp_name Emp_mobile Emp_skill

1 John Tick 9999957773 Python

1 John Tick 9999957773 JavaScript

2 Darth Trader 8888853337 HTML

2 Darth Trader 8888853337 CSS

2 Darth Trader 8888853337 JavaScript

3 Rony Shark 7777720008 Java

3 Rony Shark 7777720008 Linux

3 Rony Shark 7777720008 C++


2. Second Normal Form (2NF)

For a table to be in the Second Normal Form,

1. It should be in the First Normal form.

2. And, it should not have Partial Dependency.

Let's take an example to understand Partial dependency and the Second Normal Form.

What is Partial Dependency?

When a table has a primary key that is made up of two or more columns, then all the columns(not
included in the primary key) in that table should depend on the entire primary key and not on a part of
it. If any column(which is not in the primary key) depends on a part of the primary key then we say
we have Partial dependency in the table.

Confused? Let's take an example.

If we have two tables Students and Subjects, to store student information and information related to
subjects.

Student table:

Student_id Student_name Branch

1 Akon CSE

2 Bkon Mechanical

Subject Table:

Subject_id Subject_name

1 C Language

2 DSA

3 Operating System

And we have another table Score to store the marks scored by students in any subject like this:
Student_id Subject_id Marks Teacher_name

1 1 70 Miss. C

1 2 82 Mr. D

2 1 65 Mr. Op

Now in the above table, the primary key is student_id + subject_id, because both these information
are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject information
or just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will be
Normalized as per the Second Normal Form.
Updated Subject table:

Subject_id Subject_name Teacher_name

1 C Language Miss. C

2 DSA Mr. D

3 Operating System Mr. Op

Updated Score table:

Student_id Subject_id Marks

1 1 70

1 2 82

2 1 65

3. Third Normal Form (3NF)


A table is said to be in the Third Normal Form when,
1. It satisfies the First Normal Form and the Second Normal form.
2. And, it doesn't have Transitive Dependency.
What is Transitive Dependency?
In a table we have some column that acts as the primary key and other columns depends on this column.
But what if a column that is not the primary key depends on another column that is also not a primary
key or part of it? Then we have Transitive dependency in our table.
Let's take an example. We had the Score table in the Second Normal Form above. If we have to store
some extra information in it, like,
1. exam_type
2. total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the percentage
of marks scored by each student.
The Score table will look like this:

Student_id Subject_id Marks Exam_type Total_marks

1 1 70 Theory 100

1 2 82 Theory 100

2 1 42 Practical 50

• In the table above, the column exam_type depends on both student_id and subject_id,
because,
o a student can be in the CSE branch or the Mechanical branch,
o and based on that they may have different exam types for different subjects.
o The CSE students may have both Practical and Theory for Compiler Design,
o whereas Mechanical branch students may only have Theory exams for Compiler
Design.
• But the column total_marks just depends on the exam_type column. And
the exam_type column is not a part of the primary key. Because the primary key is student_id
+ subject_id, hence we have a Transitive dependency here.
How to solve Transitive Dependency?
You can create a separate table for ExamType and use it in the Score table.
We have created a new table ExamType and we have added more related information in it
like duration(duration of exam in mins.), and now we can use the exam_type_id in the Score table.
New ExamType table:
Exam_type_id Exam_type Total_marks Duration

1 Practical 50 45

2 Theory 100 180

3 Workshop 150 300

Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF)

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads
to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics
STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)

o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as
a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
Functional Dependency in DBMS

A Functional Dependency in DBMS is a fundamental concept that describes the relationship between
attributes (columns) in a table. It shows how the values in one or more attributes determine the value
in another. In layperson's terms, it describes how data in one column or set of columns can relate to
data in another column. It helps to maintain the quality of the data in DBMS.
Functional Dependency is represented in the form of an equation. Here, you have a set of attributes
(A, B, C, etc.) and an arrow (->) denoting the Dependency. For example, if we have a table of employee
data with columns "EmployeeID," "FirstName," and "LastName," we can express a functional
dependency like this:
EmployeeID -> FirstName, LastName.
Another important term you should know is Partial dependency in DBMS, which is a Database
Management system (DBMS) concept that describes a specific type of dependency between attributes
(columns) within a relational database table.
How to Denote a Functional Dependency in DBMS?
In DBMS, you denote functional dependencies using a notation. It contains two main components: the
left-hand side (LHS) and the right-hand side (RHS) of an arrow (->).
For example, if we have a table with attributes "A," "B," and "C," and attribute "A" determines the
values of attributes "B" and "C," you would denote it as
A -> B, C
This notation indicates that the value(s) in attribute "A" determines the value(s) in attributes "B" and
"C." In other words, if you know the value of "A," you can determine the values of "B" and "C."

Types of Functional Dependencies in DBMS


Here are some of the important types of Functional Dependency In DBMS.

Trivial Functional Dependency


A trivial functional dependency in DBMS occurs when an attribute or set of attributes (columns) on
the left-hand side (LHS) of a functional dependency arrow (->) already determines the attributes on
the right-hand side (RHS) without any extra information.
Suppose we have a table of students with attributes "StudentID" and "StudentName." In this case, if
we state the functional dependency as
StudentID -> StudentName,
This is a trivial functional dependency. Because within a single "StudentID," there can be only one
corresponding "StudentName." In other words, the value of "StudentID" determines the value of
"StudentName" without any more information or conditions.
Non-trivial Functional Dependency
A non-trivial functional dependency is a specific type of dependency between attributes (columns) in
a table. Here, the relationship is not obvious or trivial. It conveys meaningful information about how
the values in one set of attributes determine the values in another.

To illustrate this concept, let's consider an example with a table of student data:

Student Id Student Name Student DOB Class

101 Alice 1995-05-15 10A

102 Bob 2000-03-20 10B

103 Carol 1999-07-10 10A

104 Dave 2000-01-05 10B

We want to express a functional dependency based on the student's birthdate (StudentDOB) and class
(Class). A non-trivial functional dependency in this case would be
StudentDOB, Class -> StudentName.
This functional dependency means that given a combination of a student's date of birth and class, you
can uniquely determine their name. It's non-trivial because it provides valuable information about the
relationship between attributes in the table.
Multivalued Functional Dependency
A multivalued functional dependency in a database occurs when one or more attributes determine
multiple unrelated sets of values in another attribute. It shows that changes in the determining attributes
can lead to various combinations of values in the dependent attribute, indicating complex relationships
within the data.
Here's a simple example:

Student_ID Student_Name Courses_Enrolled

1 Alice {Math, English}

2 Bob {Science, History}

3 Carol {Math, Science}

In this case, the multivalued dependency in DBMS holds:


• Alice is Student 1 enrolled in {Math, English}.
• Bob is Student 2 enrolled in {Science, History}.
• Carol is Student 3 enrolled in {Math, Science}.
Transitive Functional Dependency
A transitive functional dependency in DBMS is a relational database table's relationship between
attributes (columns). It occurs when one attribute's value determines another's value through an
intermediary (a third) attribute.
Example:
Consider a database table called "Student_Info" with the following attributes:
Student_ID (unique identifier for each student),
Student_Name,
Student_Address
Student_City.

In this example, we can assume that Student_Address depends on Student_City, and Student_City
depends on Student_ID. This creates a transitive dependency in DBMS, where Student_ID indirectly
determines Student_Address.
Closure of a Set of Attributes
The closure of a set of attributes X is the set of those attributes that can be functionally determined
from X. The closure of X is denoted as X+.
When given a closure problem, you’ll have a set of functional dependencies over which to compute
the closure and the set X for which to find the closure. A functional dependency A1, A2, …, An -> B
in a table holds if two records that have the same value of attributes A1, A2, …, An also have the same
value for attribute B.
The closure of X is the set of all attributes such that two records that have the same value of X also
have the same value of X+.
Real-World Examples
Database courses usually teach closures using abstract examples, and we’ll look at some abstract
examples later. However, let’s first look at a real-world example, which is easier to understand.

Imagine we have a table Course Editions . The table contains information about editions of courses

taught at a certain university.


Each year, each course can be taught by a different teacher. And each teacher has a date of birth. With
the year and the date of birth, you can determine the age of the teacher at the time the course was
taught.
(Note: This database is poorly designed. Don’t base your databases on this. We are only using this
example to illustrate the concept of attribute closures.)
Course Editions

Course Year Teacher Date_of_birth Age

Databases 2019 Chris Cape 1974-10-12 45

Mathematics 2019 Daniel Parr 1985-05-17 34

Databases 2020 Jennifer Clock 1990-06-09 30

Here are the functional dependencies in this table:


course, year -> teacher
Given the course and year, you can determine the teacher who taught the course that year.
teacher -> date_of_birth
Given a teacher, you can determine the teacher’s date of birth.
year, date_of_birth -> age
Given the year and date of birth, you can determine the age of the teacher at the time the course was
taught.
Now, let’s look at some of the attribute closures.
First, consider the closure of a set {year}, denoted {year}+. If I am given the year, what columns can I
determine? I can for sure determine the year. So, the column year is an element of {year}+.
Are there any other columns that I can determine? The first functional dependency course, year ->
teacher requires the course in addition to the year, so it doesn't add anything to {year}+. The functional
dependency year, date_of_birth -> age requires the date of birth in addition to the year, so it doesn't
add anything to {year}+ either. So, {year}+ contains only one column, year, that is {year}+ = {year}.
Next, let’s look at {year, teacher}+. Given the year and teacher, what other columns can we determine?
Well, as before, the year and teacher are given, so they are in {year, teacher}+. If I know the teacher, I
also know the date of birth because of the teacher -> date_of_birth functional dependency. So,

date_of_birth is also in {year, teacher}+, and I know three columns: {year, teacher, date_of_birth}. If
I know the year and date of birth, I can also determine the age. Now, {year, teacher}+ has four columns
{year, teacher, date_of_birth, age}. I have used two of the three functional dependencies. I can’t use
the remaining dependency, course, year -> teacher because I don’t know the course. Now that I have
used all of the dependencies I can, {year, teacher}+ = {year, teacher, date_of_birth, age}.

You might also like