0% found this document useful (0 votes)
18 views

Normal Forms

notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Normal Forms

notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Normalization

 Normalization is the process of minimizing redundancy from a relation or set


of relations.
 Redundancy in relation may cause insertion, deletion, and update anomalies. So, it
helps to minimize the redundancy in relations.
What is Database Normalization?
 Normal forms are a series of guidelines that help to ensure that the design of
a database is efficient, organized, and free from data anomalies.
 There are several levels of normalization, each with its own set of guidelines, known
as normal forms.
Important Points Regarding Normal Forms in DBMS

First Normal Form (1NF):

This is the most basic level of normalization.

In 1NF, each table cell should contain only a single value, and each column
should have a unique name.

The first normal form helps to eliminate duplicate data and simplify queries.

Second Normal Form (2NF):

2NF eliminates redundant data by requiring that each non-key attribute be
dependent on the primary key(No partial dependencies)

This means that each column should be directly related to the primary key, and
not to other columns.

Third Normal Form (3NF):

3NF builds on 2NF by requiring that all non-key attributes are independent of
each other(No transitive FD).

This means that each column should be directly related to the primary key, and
not to any other columns in the same table.

Boyce-Codd Normal Form (BCNF):

BCNF is a stricter form of 3NF that ensures that each determinant in a table is a
candidate key.

In other words, BCNF ensures that each non-key attribute is dependent only on
the candidate key.

Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures
that a table does not contain any multi-valued dependencies.

Fifth Normal Form (5NF): 5NF is the highest level of normalization and
involves decomposing a table into smaller tables to remove data redundancy and
improve data integrity.
First Normal Form (1NF):
If a relation contains a composite or multi-valued attribute, it violates
the first normal form,
or the relation is in first normal form if it does not contain any
composite or multi-valued attribute.
A relation is in first normal form if every attribute in that relation is a
singled valued attribute.
A table is in 1 NF if:
1.There are only Single Valued Attributes. 2.Attribute Domain does
not change.
3.There is a unique name for every Attribute/Column. 4.The order in
which data is stored does not matter.
Consider the examples given below.
Example-1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown
in table 2.
Example-2:
ID Name
Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3

In the above table, Course is a multi-valued attribute so it is not in 1NF. Below


Table is in 1NF as there is no multi-valued attribute:
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Note: A database design is considered bad if it is not even in the First Normal Form (1NF).
Second Normal Form (2NF):
Second Normal Form (2NF) is based on the concept of full functional
dependency.
Second Normal Form applies to relations with composite keys, that is, relations
with a primary key composed of two or more attributes.
A relation with a single-attribute primary key is automatically in at least 2NF.
A relation that is not in 2NF may suffer from the update anomalies.
To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on any proper
subset of any candidate key of the table. In other words,

A relation that is in First Normal Form and every non-primary-key attribute is fully
functionally dependent on the primary key, then the relation is in Second Normal
Form (2NF).
Note – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency. The normalization of 1NF relations to 2NF involves
the removal of partial dependencies. If a partial dependency exists, we remove
the
partially dependent attribute(s) from the relation by placing them in a new
relation along with a copy of their determinant.
Consider the examples given below.
Example-1: Consider table as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }
Here, COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
 Hence, COURSE_FEE would be a non-prime attribute, as it does not belong
to the one only candidate key {STUD_NO, COURSE_NO} ;
 But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
 Now, non-prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation is not in
2NF.
 To convert the above relation to 2NF, we need to split the table into two
tables such as :
 Table 1: STUD_NO, COURSE_NO
 Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
Note – 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we dont need to store its Fee
as 1000 for all the 100 records, instead once we can store it in the second table as
the course fee for C1 is 1000.
Example-2: Consider following functional dependencies in relation
R (A, B, C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In this case, we can see that the relation R has a composite candidate key {A,B} as
AB->C. Therefore, A and B together uniquely determine the value of C.
Similarly, BC -> D shows that B and C together uniquely determine the value of
D.
The relation R is already in 1NF because it does not have any repeating groups or
nested relations.
However, we can see that the non-prime attribute D is functionally dependent on
only part of a candidate key, BC. This violates the 2NF condition.
Third Normal Form (3NF):
A relation is in third normal form, if there is no transitive dependency for non-
prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y:
X.is a super key.
Y.is a prime attribute (each element of Y is part of some candidate key). In other
words,
A relation that is in First and Second Normal Form and in which no non-primary-
key attribute is transitively dependent on the primary key, then it is in Third Normal
Form (3NF).

Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) from the relation by placing the attribute(s) in a new relation
along with a copy of the determinant.
Consider the examples given below.
Example-1:
In relation STUDENT given in
Table 4,

FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, , STUD_NO ->
STUD_AGE,
STUD_STATE -> STUD_COUNTRY }
Candidate Key:
{STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To convert it in third normal form,
we will decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Example-2:
Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D, E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute
are on right sides of all functional dependencies are prime. So its already in 3NF.
Note –
Third Normal Form (3NF) is considered adequate for normal relational database
design because most of the 3NF tables are free of insertion, update, and deletion
anomalies.
Moreover, 3NF always ensures functional dependency preserving and lossless.
Boyce-Codd Normal Form (BCNF)
•Application of the general definitions of 2NF and 3NF may identify
additional redundancy caused by dependencies that violate one or more
candidate keys.
•However, despite these additional constraints, dependencies can still
exist that will cause redundancy to be present in 3NF relations.
•This weakness in 3NF resulted in the presentation of a stronger normal
form called the Boyce-Codd Normal Form (Codd, 1974).
Although, 3NF is an adequate normal form for relational databases, still,
this (3NF) normal form may not remove 100% redundancy because of
X−>Y functional dependency if X is not a candidate key of the given
relation. This can be solved by Boyce-Codd Normal Form (BCNF).
Boyce-Codd Normal Form (BCNF)
Boyce–Codd Normal Form (BCNF) is based on functional dependencies
that take into account all candidate keys in a relation; however, BCNF
also has additional constraints compared with the general definition of
3NF.
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a superkey for every functional dependency (FD)
X−>Y in a given relation.
Note: To test whether a relation is in BCNF, we identify all the
determinants and make sure that they are candidate keys.
BCNF in DBMS
It can be inferred that every relation in BCNF is also in 3NF. To put
it another way, a relation in 3NF need not be in BCNF.
To determine the highest normal form of a given relation R with
functional dependencies, the first step is to check whether the BCNF
condition holds. If R is found to be in BCNF, it can be safely deduced
that the relation is also in 3NF, 2NF, and 1NF as the hierarchy shows.
The 1NF has the least restrictive constraint – it only requires a relation
R to have atomic values in each tuple. The 2NF has a slightly more
restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less
restrictive than the BCNF. In this manner, the restriction increases as we traverse
down the hierarchy.
Example 1
Let us consider the student database, in which data of the student are mentioned.
Stu_I Stu_Branch Stu_Course Branch_Num Stu_Course_
D ber No
101 Computer Science & Engineering DBMS B_001 201
101 Computer Science & Engineering Computer Networks B_001 202

102 Electronics & Communication VLSI Technology B_003 401


Engineering
102 Electronics & Communication Mobile B_003 402
Engineering Communication

Functional Dependency of the above is as mentioned:


Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF?
The table present above is not in BCNF, because as we can see that neither Stu_ID
nor Stu_Course is a Super Key. As the rules mentioned above clearly tell that for a
table to be in BCNF, it must follow the property that for functional dependency
X−>Y, X must be in Super Key and here this property fails, that’s why this table is
not in BCNF.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables. Here
is the full procedure through which we transform this table into BCNF. Let us first
divide this main table into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_ID to Stu_Course_No Table
Stu_ID Stu_Course_No

101 201

101 202

102 401
Stu_ID Stu_Course_No

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the condition of
Super Key, that in functional dependency X−>Y, X is a Super Key.
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all

attributes of the relation, So AC will be the candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be only 1 candidate key {AC}.
Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this

example and others will be non-prime {B, D, E} in this example.



Step-3: The relation R is in 1st normal form as a relational DBMS does not allow
multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a
proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset of candidate
key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor
D is a prime attribute) and in B->E (neither B is a super key nor E is a prime
attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key or
RHS should be a prime attribute. So the highest normal form of relation will be the
2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful observation is
required to conclude that the above dependency is a Transitive Dependency as the
prime attribute B transitively depends on the key AB through C. Now, the first and
the third FD are in BCNF as they both contain the candidate key (or simply KEY) on
sides. The second dependency, however, is not in BCNF but is definitely in 3NF due to
the presence of the prime attribute on the right side. So, the highest normal form of R is
3NF as all three FDs satisfy the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C) A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency preserving,
however, it always satisfies the lossless join condition. For example, relation R (V, W,
X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not always
possible to eliminate them completely.
Advantages of Normal Form

Reduced data redundancy: Normalization helps to eliminate duplicate data in tables,
reducing the amount of storage space needed and improving database efficiency.

Improved data consistency: Normalization ensures that data is stored in a consistent
and organized manner, reducing the risk of data inconsistencies and
errors.

Simplified database design: Normalization provides guidelines for organizing tables
and data relationships, making it easier to design and maintain a database.

Improved query performance: Normalized tables are typically easier to search and
retrieve data from, resulting in faster query performance.

Easier database maintenance: Normalization reduces the complexity of a database by
breaking it down into smaller, more manageable tables, making it easier to add, modify,
and delete data.
Overall, using normal forms in DBMS helps to improve data quality, increase database
efficiency, and simplify database design and maintenance.
First Normal Form
If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-
valued attribute. A relation is in first normal form if every attribute in that relation is
singled valued attribute.

Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.


Example 2 –
ID Name
Courses
1 ------------------
A c1, c2
2 E c3
3 M C2, c3

In the above table Course is a multi-valued attribute so it is not in 1NF. Below
Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
To be in second normal form, a relation must be in first normal form and relation must
not contain any partial dependency. A relation is in 2NF if it has No
Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the
table. Partial Dependency – If the proper subset of candidate key determines non-prime
attribute, it is called partial dependency.

Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000

{Note that, there are many courses having the same course fee} Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the
value of STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it
does not belong to the one only candidate key {STUD_NO, COURSE_NO} ; But,
COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a
partial dependency and so this relation is not in 2NF. To convert the above relation
to 2NF, we need to split the table into two tables such as : Table 1: STUD_NO,
COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000

NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its Fee
as 1000 for all the 100 records, instead, once we can store it in the second table as
the course fee for C1 is 1000.

Example 2 – Consider following functional dependencies in relation R (A,
B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency,
i.e., any proper subset of AB doesn’t determine any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO -
> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third
normal form.
To convert it in third normal form, we will decompose the relation STUDENT
(STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE,
COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All
possible candidate keys in above relation are {A, E, CD, BC} All attributes are on right
sides of all functional dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD set
as
{BC->D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine
all attribute of relation, So AC will be candidate key. A or C can’t be derived
from any other attribute of the relation, so there will be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute. The relation is in 2nd normal form because BC->D is in
2nd normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd
normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper
subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D
is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute)
but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be
prime attribute. So the highest normal form of relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super keys so
above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any transitive
dependency for non-prime attributes. The basic condition with the Third Normal Form is
that, the relation must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial functional
dependency X -> Y:

X is a Super Key.

Y is a Prime Attribute ( this means that element of Y is some part of Candidate Key).
For more, refer to Third Normal Form in DBMS.
BCNF (Boyce-Codd Normal Form)
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third Normal Form.
Here we have some additional rules than Third Normal Form. The basic condition for
any relation to be in BCNF is that it must be in Third Normal Form.
We have to focus on some basic rules that are for BCNF:
1.Table must be in Third Normal Form.
2.In relation X->Y, X must be a superkey in a relation. For more, refer to BCNF in
DBMS.
Fourth Normal Form
Fourth Normal Form contains no non-trivial multivaued dependency except candidate
key. The basic condition with Fourth Normal Form is that the relation must be in BCNF.
The basic rules are mentioned below.
1.It must be in BCNF.
2.It does not have any multi-valued dependency. For more, refer to Fourth Normal Form
in DBMS. Fifth Normal Form
Fifth Normal Form is also called as Projected Normal Form. The basic conditions of
Fifth Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed. For more, refer to Fifth Normal
Form in DBMS. Applications of Normal Forms in DBMS

Data consistency: Normal forms ensure that data is consistent and does not contain any
redundant information. This helps to prevent inconsistencies and errors in the database.

Data redundancy: Normal forms minimize data redundancy by organizing data into
tables that contain only unique data. This reduces the amount of storage space
required for the database and makes it easier to manage.

Query performance: Normal forms can improve query performance by reducing the
number of joins required to retrieve data. This helps to speed up query processing and
improve overall system performance.

Database maintenance: Normal forms make it easier to maintain the database by
reducing the amount of redundant data that needs to be updated, deleted, or modified.
This helps to improve database management and reduce the risk of errors or
inconsistencies.

Database design: Normal forms provide guidelines for designing databases that are
efficient, flexible, and scalable. This helps to ensure that the database can be easily
modified, updated, or expanded as needed.
Some Important Points about Normal Forms

BCNF is free from redundancy.

If a relation is in BCNF, then 3NF is also satisfied.

If all attributes of relation are prime attribute, then the relation is always in 3NF.

A relation in a Relational Database is always and at least in 1NF form.

Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.

If a Relation has only singleton candidate keys( i.e. every candidate key consists of only
1 attribute), then the Relation is always in 2NF( because no Partial functional dependency
possible).

Sometimes going for BCNF form may not preserve functional dependency. In that case
go for BCNF only if the lost FD(s) is not required, else normalize till 3NF
only.

There are many more Normal forms that exist after BCNF, like 4NF and more. But in
real world database systems it’s generally not required to go beyond BCNF.

You might also like