0% found this document useful (0 votes)
11 views

7. Normalization

The document outlines a lecture on Database Management Systems, focusing on normalization, its importance, and the various normal forms (1NF, 2NF, 3NF, BCNF). It discusses data redundancy, update anomalies, and functional dependencies, emphasizing the need for a relational database to minimize redundancy and ensure accurate data representation. The lecture concludes with a summary of key concepts learned, including the necessity for databases to be at least in 3NF.

Uploaded by

araf32277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

7. Normalization

The document outlines a lecture on Database Management Systems, focusing on normalization, its importance, and the various normal forms (1NF, 2NF, 3NF, BCNF). It discusses data redundancy, update anomalies, and functional dependencies, emphasizing the need for a relational database to minimize redundancy and ensure accurate data representation. The lecture concludes with a summary of key concepts learned, including the necessity for databases to be at least in 3NF.

Uploaded by

araf32277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Database Management System (DBMS)

Course Code: CSI 223


Semester: Summer '06

Lecturer:
Kamruddin Md. Nur
Dept. of Computer Science
Stamford University Bangladesh
Dr. Kamruddin Nur

L-7: Office: Professor, Computer Science


Associate
[email protected]
Siddeswari

Normalization Block A
Room # 216

Ph: (02) 8355512-14 x-214


Cell: 0194 06 49 20
Email: [email protected]
Blogs: http://moonbd.blogspot.com
Lecture Content

● Normalization
● Why Normalization
● Normal Forms

Reading: Chapter - 13 Text Book

© Kamruddin Nur 2
Recap
 The main objective of relational database is to create an accurate
representation of data, relationships between data, and constraints

 To achieve this objective,

- We must identify a suitable set of relations

 A technique that helps such (accurate) relations is Normalization

© Kamruddin Nur 3
Normalization
“A technique to produce / design a set of relations that is optimal
from the point of view of database updating.”

 Series of tests on a relation to determine whether it satisfies or


violates the requirements of a given normal form

 3 Normal forms are initially proposed by E. F. Codd (1972)


- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)

 Subsequently, R. Boyce and E. F. Codd (Codd, 1974) introduced a


stronger definition of 3NF, Called Boyce-Codd Normal Form
(BCNF).
 Later,
- 4NF and 5NF was introduced, Fagin (1977, 1979)
© Kamruddin Nur 4
Normalization
✔ Formal method that identifies relations based on their primary or
candidate keys and the functional dependencies among their
Attributes.

✔ Series of tests, which can be applied on individual relations so that


a relational schema can be normalized to a specific form to prevent
the possible update anomalies

✔ Update anomalies are insertion, deletion, or modification anomalies

© Kamruddin Nur 5
Data Redundancy
 Major aim of relational database design is to group attributes
into relations to minimize data redundancy and reduce file
storage space required by base relations.
 Problems associated with data redundancy are illustrated by
comparing the following Staff and Branch relations with the
StaffBranch

© Kamruddin Nur 6
Data Redundancy

© Kamruddin Nur 7
Data Redundancy
● StaffBranch relation has redundant data: details of a branch are
repeated for every member of staff.

● In contrast, branch information appears only once for each branch


in Branch relation and only branchNo is repeated in Staff relation,
to represent where each member of staff works.

© Kamruddin Nur 8
Update Anomalies
 Relations that contain redundant information may potentially
suffer from update anomalies.

 Types of update anomalies include:


✔ Insertion
✔ Deletion
✔ Modification

© Kamruddin Nur 9
Data Redundancy

© Kamruddin Nur 10
Insertion Anomalies
✔ New member of staff joins branch B005
● Insert new row into StaffBranch table
● Type wrong address: 163 Main St, Glasgow.
● Database is now inconsistent!

✔ Establish new branch with no members of staff


● B008, 57 Princes St, Edinburgh
● No staff members, so staffNo must be NULL
● But staffNo is the primary key of the StaffBranch table, so
cannot be NULL!
© Kamruddin Nur 11
Deletion Anomalies
✔ Mary Howe, staffNo SA9, leaves the company
● Delete the appropriate row of StaffBranch
● This also deletes details of branch B007 where Mary Howe
works
● But no-one else works at branch B007, so we no longer
know the address of this branch!

© Kamruddin Nur 12
Modification Anomalies
✔ Branch B003 has transferred to a new location
● New address is 145 Main St, Glasgow
● Must change three rows of the StaffBranch relation

© Kamruddin Nur 13
Functional Dependency
● Main concept associated with Normalization
● Describes relationships between attributes in a relation
● If A and B are attributes of relation R
if each value of A in R is associated with exactly one value of B in
R then A B

● Left hand side of a functional dependency is called a determinant.


Here, A is the determinant

© Kamruddin Nur 14
Functional Dependency cont.
Let A, B, and C be subsets of the attributes of relation R.
Armstrong’s axioms are as follows:
1. Reflexivity
If B is a subset of A, then A B

2. Augmentation
If A B, then A,C C

3. Transitivity
If A B and B C, then A C

© Kamruddin Nur 15
Example: Functional Dependency

© Kamruddin Nur 16
Identifying Candidate Keys
✔ A candidate key is an attribute, or set of attributes, that uniquely
identifies a row
- Must be irreducible
- No part of a candidate can ever be NULL

✔ An attribute A that functionally determines every other attribute


of the relation is a candidate key
- For each value of A there is exactly one value of each of the
other attributes
- So each value of A must identify a single row

© Kamruddin Nur 17
Identifying Primary Keys
✔ A primary key is a candidate key chosen to identify rows
uniquely within a table
- Other candidate keys called alternate keys

✔ Some guidelines on choosing the primary key


- Pick the candidate key with fewest attributes
- Pick the candidate key with shortest length
- Pick the candidate key that makes most sense

© Kamruddin Nur 18
Why Normalization?
✔ The main objective of relational database is to create an accurate
representation of data, its relationships and constraints.

✔ The achieve the above objective,


We must identify a suitable set of relations.

✔ Normalization process helps identifying such relations.

© Kamruddin Nur 19
1st Normal Form
A relation in which intersection of each row and column contains
one and only one value.

How to achieve:
➔ if required break table into different entity table to minimize redundant
data (update anomalies).

© Kamruddin Nur 20
ONF to 1NF

© Kamruddin Nur 21
Problems with 1NF
INSERT anomalies:
Can not add a module with no texts

UPDATE anomalies:
To change lecturer for M1, we have
to change two rows

DELETE anomalies:
If we remove M3, we remove L2 as
well

© Kamruddin Nur 22
2nd Normal Form
A relation that is in 1st Normal Form and every non-primary-key
attribute is fully functionally dependent on the primary key.

Full Functional Dependency:


if A and B are attributes of a relation,
B is fully functionally dependent on A,
if B is functionally dependent on A, but not on any proper subset of A

How To Achieve:
➔ Break into tables by removing non-primary-attributes along with a
copy of part of primary key on which they are fully functionally
dependent.
➔ In other word, making attributes fully functional dependent on primary
keys.
© Kamruddin Nur 23
Finding Functional Dependencies (FD)
The primary key is {Module, Text}
so,
{Module, Text}  {Dept, Lecturer}
{Module} {Dept}
{Module} {Lecturer}
{Lecturer} {Module}

But also,
{Module}  {Dept, Lecturer}

So, Lecturer and Dept are partially


dependent on primary key!

© Kamruddin Nur 24
1NF to 2NF

© Kamruddin Nur 25
Problems Resolved in 2NF
Problems in 1NF: In 2NF:
the first two problems (INSERT
INSERT anomalies and UPDATE)are resolved but
Can not add a module with no texts not DELETE

UPDATE anomalies:
To change lecturer for M1, we have
to change two rows

DELETE anomalies:
If we remove M3, we remove L2 as
well

© Kamruddin Nur 26
3rd Normal Form
A relation which is in 1st and 2nd Normal Form, and in which no non-
primary-key attribute is transitively dependent on the primary key.

Transitive Dependency:
if A B, B C
Then A C
if and only if B A and C A
How To Achieve:
➔ In the above transitive dependency, A is not functionally dependent
on any of B or C
➔ Which means B or C are not a part of a relation which has attribute A.
➔ So create a table with B and C.

© Kamruddin Nur 27
2NF not In 3NF
2NFa is not in 3NF
Because,
{Module} {Lecturer}
{Lecturer}  {Dept}

So, there is a transitive FD form the


primary key {Module} to {Dept}

© Kamruddin Nur 28
2NF to 3NF

© Kamruddin Nur 29
Summary

From this lecture we have learned the details of


 Normalization
 Data Redundancy
 Functional Dependencies
 Insert, Update, Delete Anomalies
 1NF, 2NF and 3NF
 A database should be at least in 3NF

© Kamruddin Nur 30

You might also like