Normalizatio
n
Normalization
Normalization is the process to eliminate data
redundancy and enhance data integrity in the table.
Normalization also helps to organize the data in the
database.
It is a multi-step process that sets the data into
tabular form and removes the duplicated data from
the relational tables.
Levels of Normalization
Normalization works through a series of stages
called Normal forms.
There are several levels of normalization, each with
its own set of guidelines, known as normal forms.
First Normal Form (1NF):
This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value,
and each column should have a unique name. The
first normal form helps to eliminate duplicate data
and simplify queries.
1st Normal Form (1NF)
· A table is referred to as being in its First Normal
Form if atomicity of the table is 1.
· Atomicity states that a single cell cannot hold
multiple values. It must hold only a single-valued
attribute.
· The First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
Example - 1NF
Student Table
Second Normal Form (2NF)
Second Normal Form (2NF):
2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the
primary key. This means that each column should
be directly related to the primary key, and not to
other columns.
The first condition for the table to be in Second
Normal Form is that the table has to be in First
Normal Form. The table should not possess partial
dependency.
Example - 2NF
Location Table Composite primary key
cust_id, storeid
Non-key attribute
store_location
Table does not fulfill the
second normal form.
To remove the partial functional dependency from the location table,
split the table into two parts
Third Normal Form (3NF)
Third Normal Form (3NF):
3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This
means that each column should be directly related
to the primary key, and not to any other columns in
the same table.
The main condition is that there should be no
transitive dependency for non-prime attributes,
which indicates that non-prime attributes should
not depend on other non-prime attributes in a
table.
A transitive dependency is a functional dependency
in which A → C (A determines C) indirectly,
Example - 3NF
In the student table,
stu_id determines subid, and
subid determines sub.
Boyce-Codd Normal Form (BCNF)
BCNF is a standard for organizing database tables
to minimize data repetition.
BCNF is the advance version of 3NF. It is stricter
than 3NF.
A table is in BCNF if every functional dependency X
→ Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.
Example
A company where employees work in more than
one department.
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
Functional dependencies
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
BCNF Conversion
The table is not in BCNF because neither EMP_DEPT nor
EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it
into three tables:
EMP_COUNTRY table
EMP_DEPT table
EMP_DEPT_MAPPING table
Three Tables
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_ID EMP_DEPT EMP_DEPT DEPT_TYPE EMP_DEPT_NO
D394 283 Designing D394 283
D394 300 Testing D394 300
D283 232
Stores D283 232
D283 549
Developing D283 549
Functional Dependencies
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
The Final Table is in BCNF because left side part of
both the functional dependencies is a key.
Fourth normal form (4NF)
Multivalued dependencies are handled by 4NF.
A relation will be in 4NF if it is in Boyce Codd normal
form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A,
multiple values of B exists, then the relation will be
a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
STUDENT table is in 3NF, but the
21 Computer Dancing COURSE and HOBBY are two
independent entity.
21 Math Singing Hence, there is no relationship
between COURSE and HOBBY.
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
To make the STUDENT table into 4NF, decompose
it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
Decomposition
STU_ID COURSE STU_ID HOBBY
21 Computer 21 Dancing
21 Math 21 Singing
34 Chemistry 34 Dancing
74 Biology 74 Cricket
59 Physics 59 Hockey
Fifth Normal Form (5NF)
Join dependencies are handled by 5NF.
A relation is in 5NF if it is in 4NF and not contains
any join dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into
as many tables as possible in order to avoid
redundancy.
5NF is also known as Project-join normal form
(PJ/NF).
Lossless Decomposition
If the information is not lost from the relation that is
decomposed, then the decomposition will be
lossless.
The lossless decomposition guarantees that the
join of relations will result in the same relation as it
was decomposed.
The relation is said to be lossless decomposition if
natural joins of all the decomposition give the
original relation.
EMPLOYEE_DEPARTMENT table
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The relation is decomposed into two relations
EMPLOYEE and DEPARTMENT
Employee and Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Joining Operation
When these two relations are joined on the
common column "EMP_ID", then the resultant
relation will be
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The decomposition is Lossless join
decomposition.
END of the Session