0% found this document useful (0 votes)
210 views9 pages

Normalization Notes

Normalization is the process of efficiently organizing data in a database. It eliminates redundant data - for example, storing the same data in more than one table. It also ensures data dependencies make sense.

Uploaded by

mingichi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
210 views9 pages

Normalization Notes

Normalization is the process of efficiently organizing data in a database. It eliminates redundant data - for example, storing the same data in more than one table. It also ensures data dependencies make sense.

Uploaded by

mingichi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Normalization

• Normalization is the process of efficiently organizing data in a database with two


goals in mind
• First goal: eliminate redundant data
– for example, storing the same data in more than one table
• Second Goal: ensure data dependencies make sense
– for example, only storing related data in a table

Benefits of Normalization

• Less storage space


• Quicker updates
• Less data inconsistency
• Clearer data relationships
• Easier to add data
• Flexible Structure

Rules of Data Normalization

• 1NF Eliminate Repeating Groups - Make a separate table for each set of related
attributes, and give each table a primary key

• 2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-


valued key, remove it to a separate table

• 3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute


to a description of the key, remove them to a separate table

• BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between


candidate key attributes, separate them out into distinct tables

• 4NF Isolate Independent Multiple Relationships - No table may contain two or


more 1:n or n:m relationships that are not directly related

• 5NF Isolate Semantically Related Multiple Relationships - There may be practical


constrains on information that justify separating logically related many-to-many
relationships.
EMPLOYEES_PROJECTS_TIME

A table with fields containing too much data.

EmployeeID Name Project Time Project Title

STAR manual
30-452-T3, 30-457-T3, 32-244-
EN1-26 Sean O'Brien 0.25, 0.40, 0.30 ISO procedures
T3
Employee handbook

STAR manual
30-452-T3, 30-382-TC, 32-244-
EN1-33 Amy Guya 0.05, 0.35, 0.60 Web Site
T3
New catalog

STAR manual
EN1-35 Steven Baranco 30-452-T3, 31-238-TC 0.15, 0.80
STAR prototype

EN1-36 Elizabeth Roslyn 35-152-TC 0.90 STAR pricing

EN1-38 Carol Schaaf 36-272-TC 0.75 Order system

STAR prototype
EN1-40 Alexandra Wing 31-238-TC, 31-241-TC 0.20, 0.70
New catalog

First Normal Form


A table is in first normal form (1NF) if there are no repeating groups.
A repeating group is a set of logically related fields or values that occur multiple times
in one record.
• Eliminate the repeating groups in the individual tables
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key

A table with repeating groups of fields.

Last First
EmployeeID Project1 Time1 Project2 Time2 Project3 Time3
Name Name

EN1-26 O'Brien Sean 30-452-T3 0.25 30-457-T3 0.40 32-244-T3 0.30

EN1-33 Guya Amy 30-452-T3 0.05 30-382-TC 0.35 32-244-T3 0.60

EN1-35 Baranco Steven 30-452-T3 0.15 31-238-TC 0.80

EN1-36 Roslyn Elizabeth 35-152-TC 0.90

EN1-38 Schaaf Carol 36-272-TC 0.75

EN1-40 Wing Alexandra 31-238-TC 0.20 31-241-TC 0.70

*EmployeeID LastName FirstName *ProjectNumber ProjectTitle

EN1-26 O'Brien Sean 30-452-T3 STAR manual


EN1-26 O'Brien Sean 30-457-T3 ISO procedures

EN1-26 O'Brien Sean 31-124-T3 Employee handbook

EN1-33 Guya Amy 30-452-T3 STAR manual

EN1-33 Guya Amy 30-482-TC Web Site

EN1-33 Guya Amy 31-241-TC New catalog

EN1-35 Baranco Steven 30-452-T3 STAR manual

EN1-35 Baranco Steven 31-238-TC STAR prototype

EN1-36 Roslyn Elizabeth 35-152-TC STAR pricing

EN1-38 Schaaf Carol 36-272-TC Order system

EN1-40 Wing Alexandra 31-238-TC STAR prototype

EN1-40 Wing Alexandra 31-241-TC New catalog

Second Normal Form (2NF)

A relation is in 2NF if it is in 1NF and every non-key attribute is functionally dependent


upon the primary key.

Take the table above and design new tables that will eliminate the repeated data in the
non-key fields.

1. To decide what fields belong together in a table, think about which field determines
the values in other fields. Create a table for those fields and enter the sample data.
2. Think about what the primary key for each table would be and about the
relationship between the tables. If necessary, add foreign keys .
3. Mark the primary key for each table and make sure that you don't have repeating
data in non-key fields.

EMPLOYEES

*EmployeeID Last Name First Name

EN1-26 O'Brien Sean

EN1-33 Guya Amy

EN1-35 Baranco Steven

EN1-36 Roslyn Elizabeth

EN1-38 Schaaf Carol

EN1-40 Wing Alexandra


EMPLOYEES_PROJECTS
*EmployeeID Last Name First Name

EN1-26 O'Brien Sean

EN1-33 Guya Amy *EmployeeID *ProjectNum


EN1-35 Baranco Steven EN1-26 30-452-T3
EN1-36 Roslyn Elizabeth EN1-26 30-457-T3
EN1-38 Schaaf Carol EN1-26 31-124-T3
EN1-40 Wing Alexandra EN1-33 30-328-TC

PROJECTS EN1-33 30-452-T3

EN1-33 32-244-T3
*ProjectNum ProjectTitle
EN1-35 30-452-T3
30-452-T3 STAR manual
EN1-35 31-238-TC
30-457-T3 ISO procedures
EN1-36 35-152-TC
30-482-TC Web site
EN1-38 36-272-TC
31-124-T3 Employee handbook
EN1-40 31-238-TC
31-238-TC STAR prototype
EN1-40 31-241-TC
31-238-TC New catalog

35-152-TC STAR pricing

36-272-TC Order system

Third Normal Form (3NF)

A relation is in 3NF if it is in second normal form (2NF) and there are no transitive
dependencies.

A transitive dependency is a type of functional dependency in which the value in a non-key


field is determined by the value in another non-key field and that field is not a candidate key.

A table with a single field primary key and repeating values in non-key fields.

*ProjectNum ProjectTitle ProjectMgr Phone

30-452-T3 STAR manual Garrison 2756

30-457-T3 ISO procedures Jacanda 2954

30-482-TC Web site Friedman 2846

31-124-T3 Employee handbook Jones 3102

31-238-TC STAR prototype Garrison 2756

31-241-TC New catalog Jones 3102

35-152-TC STAR pricing Vance 3022


36-272-TC Order system Jacanda 2954

The phone number is repeated each time a manager name is repeated. It's dependent on the
manager, which is dependent on the project number (a transitive dependency).

Take the table above and create new tables to fix the problem

1. Think about which fields belong together and create new tables to hold them.
2. Enter the sample data and check for unnecessarily (not part of primary key)
repeated values.
3. Identify the primary key for each table and, if necessary, add foreign keys.

PROJECTS MANAGERS

ProjectTitle ProjectMgr
*ProjectNum

30-452-T3 STAR manual Garrison

ISO
30-457-T3 Jacanda
procedures

30-482-TC Web site Friedman


*ProjectMgr Phone
Employee
31-124-T3 Jones
handbook Friedman 2846
STAR Garrison 2756
31-238-TC Garrison
prototype
Jacanda 2954
31-241-TC New catalog Jones
Jones 3102
35-152-TC STAR pricing Vance
Vance 3022
36-272-TC Order system Jacanda

Boyce-Codd Normal Form

A table is in third normal form (3NF) and all determinants are candidate keys.

Boyce-Codd normal form (BCNF) can be thought of as a "new" third normal form. It was
introduced to cover situations that the "old" third normal form did not address. Keep in mind
the mean of a determinant (determines the value in another field) and candidate keys (qualify
for designation as primary key). This normal form applies to situations where you have
overlapping candidate keys.

If a table has no non-key fields, it is automatically in BCNF.


A table is in Boyce-Codd normal form if and only if, for every one of its non-trivial functional
dependencies X → Y, X is a superkey—that is, X is either a candidate key or a superset
thereof.

Imagine that you were designing a table for a college to hold information about courses,
students, and teaching assistants. You have the following business rules.

 Each course can have many students.


 Each student can take many courses.
 Each course can have multiple teaching assistants (TAs).
 Each TA is associated with only one course.
 For each course, each student has one TA.

COURSES_STUDENTS_TA's

CourseNum Student TA

ENG101 Jones Clark

ENG101 Grayson Chen

ENG101 Samara Chen

MAT350 Grayson Powers

MAT350 Jones O'Shea

MAT350 Berg Powers

To uniquely identify each record, you could choose CourseNum + Student as a primary key.
This would satisfy third normal form also because the combination of CourseNum and Student
determines the value in TA. Another candidate key would be Student + TA. In this case, you
have overlapping candidate keys (Student is in both). The second choice, however, would not
comply with third normal form because the CourseNum is not determined by the combination
of Student and TA; it only depends on the value in TA (see the business rules). This is the
situation that Boyce-Codd normal form addresses; the combination of Student + TA could not
be considered to be a candidate key.
If you wanted to assign a TA to a course before any students enrolled, you couldn't because
Student is part of the primary key. Also, if the name of a TA changed, you would have to
update it in multiple records.
If you assume you have just these fields, this data would be better stored in three tables: one
with CourseNum and Student, another with Student and TA, and a third with CourseNum and
TA.
COURSES STUDENTS
TA's
*CourseNum *Student *Student *TA
*CourseNum *TA
ENG101 Jones Jones Clark
ENG101 Clark
ENG101 Grayson Grayson Chen
ENG101 Chen
ENG101 Samara Samara Chen
MAT350 O'Shea
MAT350 Grayson Grayson Powers
MAT350 Powers
MAT350 Jones Jones O'Shea

MAT350 Berg Berg Powers

Forth Normal Form (4NF)


A table is in Boyce-Codd normal form (BCNF) and there are no multi-valued dependencies.

A multi-valued dependency occurs when, for each value in field A, there is a set of values
for field B and a set of values for field C but fields B and C are not related.

A multi-valued dependency occurs when the table contains fields that are not logically
related. An often used example is the following table:

MOVIES

*Movie *Star *Producer

Once Upon a Time Julie Garland Alfred Brown

Once Upon a Time Mickey Rooney Alfred Brown

Once Upon a Time Julie Garland Muriel Humphreys

Once Upon a Time Mickey Rooney Muriel Humphreys

Moonlight Humphrey Bogart Alfred Brown

Moonlight Julie Garland Alfred Brown

A movie can have more than one star and more than one producer. A star can be in more than
one movie. A producer can produce more than one movie. The primary key would have to
include all three fields and so this table would be in BCNF. But you have unnecessarily
repeated values, with the data maintenance problems that causes, and you would have
trouble with deletion anomalies.
The Star and the Producer really aren't logically related. The Movie determines the Star and
the Movie determines the Producer. The answer is to have a separate table for each of those
logical relationships - one holding Movie and Star and the other with Movie and Producer, as
shown below:
STARS

*Movie *Star

Once Upon a Time Julie Garland

Once Upon a Time Mickey Rooney


PRODUCERS
Moonlight Humphrey Bogart
*Movie *Producer
Moonlight Julie Garland
Once Upon a Time Alfred Brown

Once Upon a Time Muriel Humphreys

Moonlight Alfred Brown

Fifth Normal Form:


A table is in fourth normal form (4NF) and there are no cyclic dependencies.
A cyclic dependency can occur only when you have a multi-field primary key consisting of
three or more fields. For example, let's say your primary key consists of fields A, B, and C. A
cyclic dependency would arise if the values in those fields were related in pairs of A and B, B
and C, and A and C.
Fifth normal form is also called projection-join normal form. A projection is a new table holding
a subset of fields from an original table. When properly formed projections are joined, they
must result in the same set of data that was contained in the original table.

BUYING

*Buyer *Product *Company

Chris jeans Levi

Chris jeans Wrangler

Chris shirts Levi

Lori jeans Levi

A table with cyclic dependencies


The primary key consists of all three fields. One data maintenance problem that occurs is that
you need to add a record for every buyer who buys a product for every company that makes
that product or they can't buy from them. That may not appear to be a big deal in this sample
of 2 buyers, 2 products, and 2 companies (2 X 2 X 2 = 8 total records). But what if you went
to 20 buyers, 50 products, and 100 companies (20 X 50 X 100 = 100,000 potential records)?
It quickly gets out of hand and becomes impossible to maintain.

To solve this by dividing this into the following two tables.


PRODUCTS

*Product *Company

jeans Wrangler

jeans Levi

shirts Levi
BUYERS

*Buyer *Product

Chris jeans

Chris shirts

Lori jeans

However, if you joined the two tables above on the Product field, it would produce a record not
part of the original data set (it would say that Lori buys jeans from Wrangler). This is where
the projection-join concept comes in.
The correct solution would be three tables:

COMPANIES
PRODUCTS
BUYERS *Buyer *Company
*Product *Company
*Buyer *Product Chris Levi
jeans Wrangler
Chris jeans Chris Wrangler
jeans Levi
Chris shirts Lori Levi
shirts Levi
Lori jeans

The difference between 3NF and BCNF

for a functional dependency A à B,

3NF allows this dependency in a relation


if B is a primary-key attribute and A is not a candidate key,

whereas BCNF insists that for this dependency to remain in a


relation, A must be a candidate key.

You might also like