Normalization Notes
Normalization Notes
Benefits of Normalization
• 1NF Eliminate Repeating Groups - Make a separate table for each set of related
attributes, and give each table a primary key
STAR manual
30-452-T3, 30-457-T3, 32-244-
EN1-26 Sean O'Brien 0.25, 0.40, 0.30 ISO procedures
T3
Employee handbook
STAR manual
30-452-T3, 30-382-TC, 32-244-
EN1-33 Amy Guya 0.05, 0.35, 0.60 Web Site
T3
New catalog
STAR manual
EN1-35 Steven Baranco 30-452-T3, 31-238-TC 0.15, 0.80
STAR prototype
STAR prototype
EN1-40 Alexandra Wing 31-238-TC, 31-241-TC 0.20, 0.70
New catalog
Last First
EmployeeID Project1 Time1 Project2 Time2 Project3 Time3
Name Name
Take the table above and design new tables that will eliminate the repeated data in the
non-key fields.
1. To decide what fields belong together in a table, think about which field determines
the values in other fields. Create a table for those fields and enter the sample data.
2. Think about what the primary key for each table would be and about the
relationship between the tables. If necessary, add foreign keys .
3. Mark the primary key for each table and make sure that you don't have repeating
data in non-key fields.
EMPLOYEES
EN1-33 32-244-T3
*ProjectNum ProjectTitle
EN1-35 30-452-T3
30-452-T3 STAR manual
EN1-35 31-238-TC
30-457-T3 ISO procedures
EN1-36 35-152-TC
30-482-TC Web site
EN1-38 36-272-TC
31-124-T3 Employee handbook
EN1-40 31-238-TC
31-238-TC STAR prototype
EN1-40 31-241-TC
31-238-TC New catalog
A relation is in 3NF if it is in second normal form (2NF) and there are no transitive
dependencies.
A table with a single field primary key and repeating values in non-key fields.
The phone number is repeated each time a manager name is repeated. It's dependent on the
manager, which is dependent on the project number (a transitive dependency).
Take the table above and create new tables to fix the problem
1. Think about which fields belong together and create new tables to hold them.
2. Enter the sample data and check for unnecessarily (not part of primary key)
repeated values.
3. Identify the primary key for each table and, if necessary, add foreign keys.
PROJECTS MANAGERS
ProjectTitle ProjectMgr
*ProjectNum
ISO
30-457-T3 Jacanda
procedures
A table is in third normal form (3NF) and all determinants are candidate keys.
Boyce-Codd normal form (BCNF) can be thought of as a "new" third normal form. It was
introduced to cover situations that the "old" third normal form did not address. Keep in mind
the mean of a determinant (determines the value in another field) and candidate keys (qualify
for designation as primary key). This normal form applies to situations where you have
overlapping candidate keys.
Imagine that you were designing a table for a college to hold information about courses,
students, and teaching assistants. You have the following business rules.
COURSES_STUDENTS_TA's
CourseNum Student TA
To uniquely identify each record, you could choose CourseNum + Student as a primary key.
This would satisfy third normal form also because the combination of CourseNum and Student
determines the value in TA. Another candidate key would be Student + TA. In this case, you
have overlapping candidate keys (Student is in both). The second choice, however, would not
comply with third normal form because the CourseNum is not determined by the combination
of Student and TA; it only depends on the value in TA (see the business rules). This is the
situation that Boyce-Codd normal form addresses; the combination of Student + TA could not
be considered to be a candidate key.
If you wanted to assign a TA to a course before any students enrolled, you couldn't because
Student is part of the primary key. Also, if the name of a TA changed, you would have to
update it in multiple records.
If you assume you have just these fields, this data would be better stored in three tables: one
with CourseNum and Student, another with Student and TA, and a third with CourseNum and
TA.
COURSES STUDENTS
TA's
*CourseNum *Student *Student *TA
*CourseNum *TA
ENG101 Jones Jones Clark
ENG101 Clark
ENG101 Grayson Grayson Chen
ENG101 Chen
ENG101 Samara Samara Chen
MAT350 O'Shea
MAT350 Grayson Grayson Powers
MAT350 Powers
MAT350 Jones Jones O'Shea
A multi-valued dependency occurs when, for each value in field A, there is a set of values
for field B and a set of values for field C but fields B and C are not related.
A multi-valued dependency occurs when the table contains fields that are not logically
related. An often used example is the following table:
MOVIES
A movie can have more than one star and more than one producer. A star can be in more than
one movie. A producer can produce more than one movie. The primary key would have to
include all three fields and so this table would be in BCNF. But you have unnecessarily
repeated values, with the data maintenance problems that causes, and you would have
trouble with deletion anomalies.
The Star and the Producer really aren't logically related. The Movie determines the Star and
the Movie determines the Producer. The answer is to have a separate table for each of those
logical relationships - one holding Movie and Star and the other with Movie and Producer, as
shown below:
STARS
*Movie *Star
BUYING
*Product *Company
jeans Wrangler
jeans Levi
shirts Levi
BUYERS
*Buyer *Product
Chris jeans
Chris shirts
Lori jeans
However, if you joined the two tables above on the Product field, it would produce a record not
part of the original data set (it would say that Lori buys jeans from Wrangler). This is where
the projection-join concept comes in.
The correct solution would be three tables:
COMPANIES
PRODUCTS
BUYERS *Buyer *Company
*Product *Company
*Buyer *Product Chris Levi
jeans Wrangler
Chris jeans Chris Wrangler
jeans Levi
Chris shirts Lori Levi
shirts Levi
Lori jeans