0% found this document useful (0 votes)
10 views

Unit 3_Normalization

The document outlines the course details for a Relational Database Management System class, focusing on normalization concepts including functional dependencies, candidate keys, and various normal forms (1NF, 2NF, 3NF, BCNF, etc.). It provides examples of functional dependencies and explains the significance of normalization in reducing redundancy and avoiding anomalies in databases. Additionally, it describes the process of decomposing tables to achieve higher normalization levels.

Uploaded by

dracula 0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit 3_Normalization

The document outlines the course details for a Relational Database Management System class, focusing on normalization concepts including functional dependencies, candidate keys, and various normal forms (1NF, 2NF, 3NF, BCNF, etc.). It provides examples of functional dependencies and explains the significance of normalization in reducing redundancy and avoiding anomalies in databases. Additionally, it describes the process of decomposing tables to achieve higher normalization levels.

Uploaded by

dracula 0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Department of Computer Science and Engineering

Course Name- Relational Database Management System


Course Code – COM-402
Topic- Unit 3: Normalization
Faculty- Ms. Parul Sharma

Model Institute of Engineering & Technology (Autonomous)


Course Outcomes
2

Course Outcomes Description

Identify the basic concepts, architecture and various data models


CO1
used in Database Management Systems (DBMS).
Identify basic database storage structures and access techniques
CO2
such as file organizations, indexing methods.
Design queries using Structured Query Language (SQL) for
CO3
database definition and database manipulation.
Recognize the use of normalization and functional dependencies in
CO4 DBMS.

Implement the concept of transaction, concurrency control and


CO5 recovery in DBMS.
NORMALIZATION
Functional Dependencies

1. If one set of attributes in a table determines


another set of attributes in the table, then the
second set of attributes is said to be
functionally dependent on the first set of
attributes.
Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN} 
0-55-123456-9 Main Street $22.95 {Title}
0-123-45678-0 Ulysses $34.00
{ISBN} 
{Price}
1-22-233700-0 Visual $25.00
Basic
Functional Dependencies
Example 2
PubID PubName PubPhone Table Scheme: {PubID, PubName,
1 Big House 999-999-9999
PubPhone}
2 Small House 123-456-7890
Functional Dependencies: {PubId} 
3 Alpha Press 111-111-1111
{PubPhone}
{PubId} 
Example 3
{PubName}
{PubName, PubPhone} 
AuID AuName AuPhone
Table{PubID}
Scheme: {AuID, AuName, AuPhone}
1 Sleepy 321-321-1111

2 Snoopy 232-234-1234
Functional Dependencies: {AuId} 
{AuPhone}
3 Grumpy 665-235-6532

4 Jones 123-333-3333
{AuId} 
5 Smith 654-223-3455
{AuName}
6 Joyce 666-666-6666 {AuName, AuPhone} 
7 Roman 444-444-4444 {AuID}
FD – Example

Database to track reviews of papers submitted to an


academic conference. Prospective authors submit
papers for review and possible acceptance in the
published conference proceedings. Details of the
entities
 Author information includes a unique author number, a
name, a mailing address, and a unique (optional) email
address.
 Paper information includes the primary author, the paper
number, the title, the abstract, and review status
(pending, accepted,rejected)
 Reviewer information includes the reviewer number, the
name, the mailing address, and a unique (optional) email
address
 A completed review includes the reviewer number, the
FD – Example

Functional Dependencies
 AuthNo  AuthName, AuthEmail, AuthAddress
 AuthEmail  AuthNo
 PaperNo  Primary-AuthNo, Title, Abstract,
Status
 RevNo  RevName, RevEmail, RevAddress
 RevEmail  RevNo
 RevNo, PaperNo  AuthComm, Prog-Comm,
Date, Rating1, Rating2, Rating3, Rating4,
Rating5
Determinant and Dependent
8

Functional Dependency

EmpNum  EmpEmail

Attribute on the LHS is known as the determinant and RHS is


called Dependent
• EmpNum is a determinant of EmpEmail
• EmpEmail is a dependent of EmpNum

91.2914
Transitive dependency
9

Transitive dependency

Consider attributes A, B, and C, and where


A  B and B  C.
Functional dependencies are transitive, which means that
we also have the functional dependency AC
We say that C is transitively dependent on A through B.

91.2914
Transitive dependency
10

EmpNum  DeptNum
EmpNum EmpEmail DeptNum DeptNname

DeptNum  DeptName

EmpNum EmpEmail DeptNum DeptNname

DeptName is transitively dependent on EmpNum via DeptNum


EmpNum  DeptName

91.2914
Partial dependency
11

A partial dependency exists when an attribute B is functionally


dependent on an attribute A, and A is a component of a multipart
candidate key.

InvNum LineNum Qty InvDate

Candidate key: {InvNum, LineNum}


And
InvNum InvDate

InvDate is partially dependent on {InvNum,


91.2914
LineNum}
Candidate keys

Candidate Keys are attributes or sets of attributes within a relational database


table that can uniquely identify each tuple (row) in that table. Each candidate key
must satisfy three essential conditions:

1. Uniqueness: No two rows can have the same value of the candidate key.
2. Minimality: The set is minimal, meaning that if any attribute is removed from
the candidate key, it no longer remains capable of uniquely identifying every
tuple in the table.
3. Not Null: All attributes in the candidate key must be non-null to ensure that
every tuple can always be uniquely identified.

A candidate key is considered a candidate for selection as the primary key of the
table. The primary key is a specially designated candidate key chosen to uniquely
identify tuples in a relational table, primarily used for indexing and referencing in
relational operations. Other candidate keys not chosen as the primary key are
referred to as alternate keys.
Levels of Normalization
 Levels of normalization based on the amount
of redundancy in the database.
 Various levels of normalization are:
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
 Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
BCNF
4NF
5NF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level
First Normal Form (1NF)
A table is considered to be in 1NF if all the fields
contain
only scalar values (as opposed to list of values).
Example
ISBN Title (Not 1NF)
AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition
1. Place all items that appear in the repeating group
in a new table
2. Designate a primary key for each new table
produced.
3. Duplicate in the new table the primary key of the
table from which the repeating ISBN
group
AuName
was
AuPhone
extracted or vice versa. 0-321-32132-1 Sleepy 321-321-1111

Example
ISBN
(1NF)
Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455

1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666


Basic
1-22-233700-0 Roman 444-444-4444
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
 The database is in first normal form
 All nonkey attributes in the table must be fully functionally
dependent on the entire primary key
 There should be no partial dependency.
Given a relation R( A, B, C, D) and Functional
Dependency set FD = { AB → CD, B → C },
determine whether the given R is in 2NF? If not
convert it into 2 NF.
Let us calculate the closure of AB
AB + = ABCD (from the method we studied earlier)
Since the closure of AB contains all the attributes of R,
hence AB is Candidate Key
From the definition of Candidate Key(Candidate Key is
a Super Key whose no proper subset is a Super
key)
Since all key will have AB as an integral part, and we
have proved that AB is Candidate Key, Therefore, any
superset of AB will be Super Key but not Candidate key.
Hence there will be only one candidate key AB
Definition of 2NF: No non-prime attribute should be
partially dependent on Candidate Key
Since R has 4 attributes: - A, B, C, D, and Candidate Key
is AB, Therefore, prime attributes (part of candidate
key) are A and B while a non-prime attribute are C and
D
a) FD: AB → CD satisfies the definition of 2NF, that non-
prime attribute(C and D) are fully dependent on
candidate key AB
b) FD: B → C does not satisfy the definition of 2NF, as a
non-prime attribute(C) is partially dependent on
candidate key AB( i.e. key should not be broken at any
cost)
As FD B → C, the above table R( A, B, C, D) is not in
2NF
Convert the table R(A, B, C, D) in 2NF:
Since FD: B → C, our table was not in 2NF, let's
decompose the table
R1(B, C)
Since the key is AB, and from FD AB → CD, we can create
R2(A, B, C, D) but this will again have a problem of
partial dependency B → C, hence R2(A, B, D).
Finally, the decomposed table which is in 2NF
a) R1( B, C)
b) R2(A, B, D)
Given a relation R( P, Q, R, S, T) and Functional
Dependency set FD = { PQ → R, S → T }, determine
whether the given R is in 2NF? If not convert it
into 2 NF.
Let us calculate the closure of PQS
PQS + = PQSRT (from the method we studied earlier)
Since the closure of PQS contains all the attributes of R,
hence PQS is Candidate Key
From the definition of Candidate Key (Candidate Key is
a Super Key whose no proper subset is a Super
key)
Since all key will have PQS as an integral part, and we
have proved that PQS is Candidate Key. Therefore, any
superset of PQS will be Super Key but not Candidate
Hence there will be only one candidate key PQS
Definition of 2NF: No non-prime attribute should be
partially dependent on Candidate Key.
Since R has 5 attributes: - P, Q, R, S, T and Candidate
Key is PQS, Therefore, prime attributes (part of
candidate key) are P, Q, and S while a non-prime
attribute is R and T
a) FD: PQ → R does not satisfy the definition of 2NF, that
non-prime attribute( R) is partially dependent on part
of candidate key PQS.
 b) FD: S → T does not satisfy the definition of 2NF, as a

non-prime attribute(T) is partially dependent on


candidate key PQS (i.e., key should not be broken at
any cost).
Hence, FD PQ → R and S → T, the above table R( P,
Q, R, S, T) is not in 2NF
Convert the table R( P, Q, R, S, T) in 2NF:
Since due to FD: PQ → R and S → T, our table was not in
2NF, let's decompose the table
R1(P, Q, R) (Now in table R1 FD: PQ → R is Full F D,
hence R1 is in 2NF)
R2( S, T) (Now in table R2 FD: S → T is Full F D, hence
R2 is in 2NF)
And create one table for the key, since the key is PQS.
R3(P, Q, S)
Finally, the decomposed tables which is in 2NF
are:
a) R1( P, Q, R)
Third Normal Form (3NF)

For a table to be in 3NF, there are two requirements


 The table should be second normal form
 No attribute is transitively dependent on the primary key

This form dictates that all non-key or non prime attributes of a table
must be functionally dependent on a candidate key or prime attribute
can also be dependent on Non prime attribute
i.e. there can be no interdependencies among non-key attributes.
Example

 Here EmployeeId is CK and DepartmentID, which is non prime


attribute determines DepartmentName and DepartmentHead
 So EmployeeId transitively determines DepartmentName and
DepartmentHead
3NF - Decomposition

1. Move all items involved in transitive dependencies to a


new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign
key on the original entity.
Decomposition in 3NF

Here the problem causing FDs are separated


Example

Consider a table that tracks departments, their locations,


and the managers:
Table: Department(DeptID, Location, ManagerID)
Assume the following functional dependencies:
• DeptID -> Location

• DeptID -> ManagerID

• ManagerID -> Location


 The given relation in not in 3NF
• 3NF Definition:
• A relation is in 3NF if it's in 2NF and every non-key
attribute is non-transitively dependent on the
candidate keys.
• Identifying the Issue:
• DeptID -> Location and DeptID -> ManagerID are direct
functional dependencies,
meaning Location and ManagerID are directly
determined by DeptID.
• However, ManagerID -> Location introduces a
transitive dependency. DeptID determines ManagerID,
and ManagerID determines Location. Therefore, Locatio
n is transitively dependent on DeptID.
• Since Location is a non-key attribute and it's
transitively dependent on DeptID (a candidate key),
the table violates the 3NF condition
Example: Employee Project
Supervision
Table: ProjectAssignments
• Attributes: EmpID (Employee ID), ProjectID (Project

ID), SupervisorID (ID of the supervisor)


• Primary Key: (EmpID, ProjectID)

Functional Dependencies:
1. EmpID, ProjectID -> SupervisorID (an employee and

project combination determines the supervisor)


2. SupervisorID -> EmpID (the supervisor ID determines a

specific employee ID)


1. EmpID, ProjectID -> SupervisorID
2. SupervisorID -> EmpID

Here first FD is okay for 3NF


2nd FD Non Prime Prime attribute , therefore no problem
for 3NF
Boyce-Codd Normal Form
(BCNF)
BCNF (Boyce Codd Normal Form) is the advanced version of 3NF.
• A table is in BCNF if for every one of its non-trivial functional
dependencies (i.e., dependencies where the right-hand side of
the dependency is not a subset of the left-hand side), the left-
hand side is a superkey. A superkey is a set of one or more
columns that, taken together, allow you to uniquely identify a
row in the table.

• A table is in BCNF if every functional dependency X->Y, X is the


super key of the table. For BCNF, the table should be in 3NF,
and for every FD. LHS is super key.
Example

Consider a relation R with attributes (student, subject, teacher).

Student Teacher Subject


Jhansi P.Naresh Database
jhansi K.Das C
subbu P.Naresh Database
subbu R.Prasad C

FD
(student, Teacher) -> subject
(student, subject) -> Teacher
Teacher -> subject
Candidate keys are (student, teacher) and (student,
subject).

The above relation is in 3NF [since there is no transitive


dependency].

A relation R is in BCNF if for every non-trivial FD X->Y, X


must be a key.

The above relation is not in BCNF, because in the


FD (teacher->subject),
teacher is not a key.
Decomposition for BCNF

Teacher-> subject violates BCNF [since teacher is not a candidate key].

If X->Y violates BCNF then divide R into R1(X, Y) and R2(R-Y).


So R is divided into two relations R1(Teacher, subject) and R2(student, Teacher).

Teacher Subject Student Teacher

P.Naresh database Jhansi P.Naresh


Jhansi K.Das
K.DAS C
Subbu P.Naresh
R.Prasad C
Subbu R.Prasad
Example: Consider a table Course_Section(CourseID,
Instructor, Textbook) where:
• CourseID -> Instructor

• CourseID -> Textbook

• Instructor -> Textbook

 In this table, CourseID is a candidate key.


 The dependency Instructor -> Textbook violates BCNF
because Instructor is not a superkey.
To bring this table into BCNF, we could decompose it into
two tables:

• Course_Info(CourseID, Instructor)
• Instructor_Info(Instructor, Textbook)
 This decomposition removes redundancy and each
table is in BCNF.
Example: Employee Project
Supervision
Table: ProjectAssignments
• Attributes: EmpID (Employee ID), ProjectID (Project

ID), SupervisorID (ID of the supervisor)


• Primary Key: (EmpID, ProjectID)

Functional Dependencies:
1. EmpID, ProjectID -> SupervisorID (an employee and

project combination determines the supervisor)


2. SupervisorID -> EmpID (the supervisor ID determines a

specific employee ID)


1. EmpID, ProjectID -> SupervisorID
2. SupervisorID -> EmpID

Here first FD is okay for 3NF


2nd FD Non Prime Prime attribute , therefore no problem
for 3NF

But for BCNF


1st FD LHS is superkey, no problem
2nd FD LHS is Non Super Key, therefore not a BCNF
Decomposition – Loss of
Information

What is Lossless Decomposition in DBMS?


 The decomposition of a given relation X is known as a
lossless decomposition when the X decomposes into two
relations X1 and X2 in a way that the natural joining of X1
and X2 gives us the original relation X in return.

What is dependency-preserving in DBMS?

If a decomposition does not cause any dependencies to be


lost it is called a dependency-preserving decomposition.
Lossless decomposition rules

✅ Rule 1: Join Rule (Definition of Lossless


Decomposition)
The natural join of the decomposed relations must
give back the original relation.
🔹If a relation R is decomposed into R₁ and R₂,
then the decomposition is lossless if:
R₁ ⨝ R₂ = R (without any spurious tuples)
✅ Rule 2: Superkey Rule (Condition to Ensure
Lossless Join)
The intersection of the decomposed relations
must be a superkey in at least one of them.
 If R is decomposed into R₁ and R₂, and

X = R₁ ∩ R₂,
then the decomposition is lossless if:
Example of lossless

Cand_ID is CK
Decomposition of table
Join of decomposed table
BCNF - Decomposition
EXAMPLE:
R(ABCD); {ABCD, DA}
CK={AB, BD} PA={A B D} NPA={C}
SOLUTION:
Check for 3NF,
It is a 3NF
Check for BCNF, Fd1 is okay but FD2 is not following BCNF
R(ABCD) is decomposed into R1={D, A} and R2={B C D}
Check for lossless decomposition
 Check for lossless decomposition
Common attribute of the decomposed table should be CK of
any of the table
Attribute (R1∩ R2) = CK of any of R1 or R2 or R
Here it is D, and D is CK in R1 as D A, Therefore it is
lossless
Check for dependency preserving

R(ABCD) is decomposed into R1={D, A} and R2={B C D}

Now checking for dependency preservation of original table,


Fd1: ABCD and FD2: DA

FD2 can be determined from R1


FD1 ABCD cannot be determined directly, therefore:
First find dependency in R2
Take closure of attributes in R2(BCD)
Check from {ABCD, DA}
Closure B=B, C=C and D=D then BD= {B D A C}

Therefore we can say that BDC, it can be FD of R2


Check for dependency preserving

Now checking for dependency preservation,


Fd1: ABCD

Fd1: find closure of AB in FDs of decomposed tables: R1


{BDC} and R2{DA}
Closure AB={A, B}

Therefore it can be seen that Fd1: ABCD is not preserved.


Fourth Normal Form (4NF)
 To be in Fourth Normal Form,
 a relation must first be in Boyce-Codd Normal Form.
 has no multi-valued dependencies

Multivalued Dependency
A multivalued dependency (MVD) exists when, for a given value of
attribute X, there are multiple, independent values of attributes Y and Z,
and these values are independent of each other.

XY
XZ
Here
StudentLanguage and StudentSport
And language and sport are independent , therefore not
in 4nf
Decomposition
These tables doesn't violate 4 normal form by
itself.
That's because:
• tables have only two attributes: Student, Sports and

Student Language
• There are no third attributes that could show

independence
• So, this table is in 4NF
Fifth Normal Form (5NF)

 Fifth Normal Form (5NF) is the highest level of database


normalization, designed to eliminate redundancy caused by
"join dependencies". A join dependency arises when data
needs to be joined from multiple tables to answer a query,
even if those tables are already in 4NF. 5NF ensures a table
can't be broken down further without losing information, and
all join dependencies are implied by the candidate keys
Domain Key Normal Form
(DKNF)

DKNF, or Domain-Key Normal Form, is a database normalization


principle that ensures all constraints in a relational database
schema are based on key attributes and domain constraints,
eliminating redundancy and preventing anomalies.
Types of Constraints Allowed in DKNF:
1. Domain Constraints

1. Define valid values for attributes


2. Example: Age must be between 0 and 120
2. Key Constraints
1. Uniqueness based on candidate keys or primary key
2. Example: EmployeeID must uniquely identify each record

You might also like