Unit-2-Normalization
Unit-2-Normalization
Normalization is the process of organizing the data in the database with two goal:
To minimize the redundancy from a relation or set of relations.
To ensure the data dependencies
It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
Benefits of Normalization:
Quicker updates
Less storage space
Clear relationship
Flexible structure
Less inconsistency
Less redundancy
Types of Normalization:
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
BCNF A relation will be in BCNF if it is in 3NF and every determinant set must the
super key
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi- valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables: TEACHER_DETAIL
table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation will be in 3NF if it is in 2NF and not contain any transitive dependency.. If there is no
transitive dependency for non-prime attributes, then the relation must be in third normal form.
3NF states that all column reference in referenced data that are not dependent on the primary key
should be removed.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity
In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as follows:
Example: Let's assume there is a company where employees work in more than one
department.
Below we have a college enrolment table with columns student_id, subject and professor.
103 C# P.Chash
One student can enrol for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.
In the table above, student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject
may have two different professors. Hence, there is a dependency
between subject and professor here, where subject depends on the professor name.
In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Student Table
student_id p_id
101 1
101 2
102 3
103 4
104 1
1 P.Java Java
2 P.Cpp C++
3 P.Java2 java
4 P.hash C#
BASIS FOR
3NF BCNF
COMPARISON
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency. If the table has two or more independent one – many relationships then the relation
is said to be in multivalued dependency. i.e, For a dependency A → B, if for a single value of A,
multiple values of B exists, then the relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
The spurious row as asterisked. Now, if this result is joined with P3 over the
column 'company 'product_name' the following table is obtained:
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a
lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the property
of lossless join.
Denormalization:
The process of introducing redundancy in the relations is called as denormalization.
Example: Auditing
DKNF: