Database Normalization
Normalization is a process that “improves” a database
design by generating relations that are of higher normal
forms.
The objective of normalization:
“to create relations where every dependency is on the
key, the whole key, and nothing but the key”.
How
• A properly normalized database should have the
following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Levels of Normalization
• Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)
Number of Tables
Redundancy
– Third Normal Form (3NF)
Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFin inorder
orderto
toavoid
avoid
the
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
AB
Example: Suppose we keep track of employee email
addresses, and we only track one email address for each
employee. Suppose each employee is identified by their
unique employee number. We say there is a functional
dependency of email address on employee number:
employee number email address
91.2914 5
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123
[email protected] John Doe
456
[email protected] Peter Smith
555
[email protected] Alan Lee
633
[email protected] Peter Doe
787
[email protected] Alan Lee
If EmpNum is the PK then the FDs:
EmpNum EmpEmail
EmpNum EmpFname
EmpNum EmpLname
must exist.
91.2914 6
Functional Dependencies
EmpNum EmpEmail
EmpNum EmpFname 3 different ways you
EmpNum EmpLname might see FDs depicted
EmpEmail
EmpNum EmpFname
EmpLname
EmpNum EmpEmail EmpFname EmpLname
91.2914 7
Determinant
Functional Dependency
EmpNum EmpEmail
Attribute on the LHS is known as the determinant
• EmpNum is a determinant of EmpEmail
91.2914 8
Transitive dependency
Transitive dependency
Consider attributes A, B, and C, and where
A B and B C.
Functional dependencies are transitive, which
means that we also have the functional dependency
AC
We say that C is transitively dependent on A
through B.
91.2914 9
Transitive dependency
EmpNum DeptNum
EmpNum EmpEmail DeptNum DeptNname
DeptNum DeptName
EmpNum EmpEmail DeptNum DeptNname
DeptName is transitively dependent on EmpNum via DeptNum
EmpNum DeptName
91.2914 10
Partial dependency
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a multipart candidate key.
InvNum LineNum Qty InvDate
Candidate keys: {InvNum, LineNum} InvDate is
partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is
part of a candidate key
91.2914 11
First Normal Form (1NF)
A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
ISBN Title AuName AuPhone PubName PubPhone Price
0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00
Snoopy, 232-234-1234,
Grumpy 665-235-6532
0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95
Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00
1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00
Basic
Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition
1. Place all items that appear in the repeating group in a new table
2. Designate a primary key for each new table produced.
3. Duplicate in the new table the primary key of the table from
which the repeating group was extracted or vice versa.
Example (1NF)
ISBN AuName AuPhone
0-321-32132-1 Sleepy 321-321-1111
ISBN Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234
0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532
0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333
0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455
1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666
Basic
1-22-233700-0 Roman 444-444-4444
Functional Dependencies
1. If one set of attributes in a table determines another set of
attributes in the table, then the second set of attributes is
said to be functionally dependent on the first set of
attributes.
Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN} {Title}
0-55-123456-9 Main Street $22.95 {ISBN} {Price}
0-123-45678-0 Ulysses $34.00
1-22-233700-0 Visual $25.00
Basic
Functional Dependencies
Example 2
PubID PubName PubPhone Table Scheme: {PubID, PubName, PubPhone}
1 Big House 999-999-9999 Functional Dependencies: {PubId} {PubPhone}
2 Small House 123-456-7890
{PubId} {PubName}
3 Alpha Press 111-111-1111
{PubName, PubPhone} {PubID}
Example 3
AuID AuName AuPhone
1 Sleepy 321-321-1111
Table Scheme: {AuID, AuName, AuPhone}
2 Snoopy 232-234-1234 Functional Dependencies: {AuId} {AuPhone}
3 Grumpy 665-235-6532 {AuId} {AuName}
4 Jones 123-333-3333 {AuName, AuPhone} {AuID}
5 Smith 654-223-3455
6 Joyce 666-666-6666
7 Roman 444-444-4444
FD – Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and possible
acceptance in the published conference proceedings. Details of the
entities
– Author information includes a unique author number, a name, a mailing
address, and a unique (optional) email address.
– Paper information includes the primary author, the paper number, the
title, the abstract, and review status (pending, accepted,rejected)
– Reviewer information includes the reviewer number, the name, the
mailing address, and a unique (optional) email address
– A completed review includes the reviewer number, the date, the paper
number, comments to the authors, comments to the program chairperson,
and ratings (overall, originality, correctness, style, clarity)
FD – Example
Functional Dependencies
– AuthNo AuthName, AuthEmail, AuthAddress
– AuthEmail AuthNo
– PaperNo Primary-AuthNo, Title, Abstract, Status
– RevNo RevName, RevEmail, RevAddress
– RevEmail RevNo
– RevNo, PaperNo AuthComm, Prog-Comm, Date,
Rating1, Rating2, Rating3, Rating4, Rating5
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally dependent on the entire
primary key
Note: Remember that we are dealing with non-key attributes
Example 1 (Not 2NF)
Scheme {Title, PubId, AuId, Price, AuAddress}
1. Key {Title, PubId, AuId}
2. {Title, PubId, AuID} {Price}
3. {AuID} {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a subset of a key
Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme {City, Street, HouseNumber, HouseColor, CityPopulation}
1. key {City, Street, HouseNumber}
2. {City, Street, HouseNumber} {HouseColor}
3. {City} {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a proper subset of the
key
Example 3 (Not 2NF)
Scheme {studio, movie, budget, studio_city}
6. Key {studio, movie}
7. {studio, movie} {budget}
8. {studio} {studio_city}
9. studio_city is not a part of a key
10. studio_city functionally depends on studio which is a proper subset of the key
2NF - Decomposition
1. If a data item is fully functionally dependent on only a part of the
primary key, move that data item and that part of the primary key to a
new table.
2. If other data items are functionally dependent on the same part of the
key, place them in the new table also
3. Make the partial primary key copied from the original table the
primary key for the new table. Place all items that appear in the
repeating group in a new table
Example 1 (Convert to 2NF)
Old Scheme {Title, PubId, AuId, Price, AuAddress}
New Scheme {Title, PubId, AuId, Price}
New Scheme {AuId, AuAddress}
2NF - Decomposition
Example 3 (Convert to 2NF)
Old Scheme {Studio, Movie, Budget, StudioCity}
New Scheme {Movie, Studio, Budget}
New Scheme {Studio, City}
Example 2 (Convert to 2NF)
Old Scheme {City, Street, HouseNumber, HouseColor, CityPopulation}
New Scheme {City, Street, HouseNumber, HouseColor}
New Scheme {City, CityPopulation}
Third Normal Form (3NF)
This form dictates that all non-key attributes of a table must be functionally
dependent on a candidate key i.e. there can be no interdependencies among
non-key attributes.
For a table to be in 3NF, there are two requirements
– The table should be second normal form
– No attribute is transitively dependent on the primary key
Example (Not in 3NF)
Scheme {Title, PubID, PageCount, Price }
1. Key {Title, PubId}
2. {Title, PubId} {PageCount}
3. {PageCount} {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID} {Price} hence not in 3NF
Third Normal Form (3NF)
Example 2 (Not in 3NF)
Scheme {Studio, StudioCity, CityTemp}
1. Primary Key {Studio}
2. {Studio} {StudioCity}
3. {StudioCity} {CityTemp}
4. {Studio} {CityTemp}
5. Both StudioCity and CityTemp depend on the entire key hence 2NF
6. CityTemp transitively depends on Studio hence violates 3NF
BuildingI Contractor Fee
Example 3 (Not in 3NF) D
100 Randolph 1200
Scheme {BuildingID, Contractor, Fee} 150 Ingersoll 1100
7. Primary Key {BuildingID} 200 Randolph 1200
8. {BuildingID} {Contractor}
250 Pitkin 1100
9. {Contractor} {Fee}
300 Randolph 1200
10. {BuildingID} {Fee}
11. Fee transitively depends on the BuildingID
12. Both Contractor and Fee depend on the entire key hence 2NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign key on the
original entity.
Transitive dependency
• Consider attributes A, B, and C, and where
• A B and B C.
• Functional dependencies are transitive, which means that we also
have the functional dependency AC
• We say that C is transitively dependent on A through B.
3NF - Decomposition
Example 1 (Convert to 3NF)
Old Scheme {Title, PubID, PageCount, Price }
New Scheme {PubID, PageCount, Price}
New Scheme {Title, PubID, PageCount}
Example 2 (Convert to 3NF)
Old Scheme {Studio, StudioCity, CityTemp}
New Scheme {Studio, StudioCity}
BuildingI Contractor Contractor Fee
New Scheme {StudioCity, CityTemp} D
100 Randolph Randolph 1200
150 Ingersoll Ingersoll 1100
Example 3 (Convert to 3NF) 200 Randolph Pitkin 1100
Old Scheme {BuildingID, Contractor, Fee} 250 Pitkin
New Scheme {BuildingID, Contractor} 300 Randolph
New Scheme {Contractor, Fee}
Boyce-Codd Normal Form (BCNF)
• BCNF does not allow dependencies between attributes that belong to candidate keys.
• BCNF is a refinement of the third normal form in which it drops the restriction of a non-key
attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1 - Address (Not in BCNF)
Scheme {City, Street, ZipCode }
1. Key1 {City, Street }
2. Key2 {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} {ZipCode}
5. {ZipCode} {City}
6. Dependency between attributes belonging to a key
Boyce Codd Normal Form (BCNF)
Example 2 - Movie (Not in BCNF)
Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
1. Key1 {MovieTitle, PersonName}
2. Key2 {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID} {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF
Example 3 - Consulting (Not in BCNF)
Scheme {Client, Problem, Consultant}
6. Key1 {Client, Problem}
7. Key2 {Client, Consultant}
8. No non-key attribute hence 3NF
9. {Client, Problem} {Consultant}
10. {Client, Consultant} {Problem}
11. Dependency between attributess belonging to keys violates BCNF
BCNF - Decomposition
1. Place the two candidate primary keys in separate entities
2. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
Example 1 (Convert to BCNF)
Old Scheme {City, Street, ZipCode }
New Scheme1 {ZipCode, Street}
New Scheme2 {City, Street}
• Loss of relation {ZipCode} {City}
Alternate New Scheme1 {ZipCode, Street }
Alternate New Scheme2 {ZipCode, City}
Decomposition – Loss of Information
1. If decomposition does not cause any loss of information it is called a
lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is
called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However the
dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3 rd normal form
that also preserves the dependencies.
• 3NF may be better than BCNF in some cases
Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieTitle, PersonName}
• Loss of relation {MovieID} {MovieTitle}
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieID, MovieTitle}
• We got the {MovieID} {MovieTitle} relationship back
Example 3 (Convert to BCNF)
Old Scheme {Client, Problem, Consultant}
New Scheme {Client, Consultant}
New Scheme {Client, Problem}
Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one relationships
between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie Movie ScreeningCity Genre
4. Violates 4NF Hard Code Los Angles Comedy
Hard Code New York Comedy
Bill Durham Santa Cruz Drama
Bill Durham Durham Drama
The Code Warrier New York Horror
Fourth Normal Form (4NF)
Example 2 (Not in 4NF) Manager Child Employee
Scheme {Manager, Child, Employee} Jim Beth Alice
1. Primary Key {Manager, Child, Employee} Mary Bob Jane
2. Each manager can have more than one child Mary NULL Adam
3. Each manager can supervise more than one employee
4. 4NF Violated
Example 3 (Not in 4NF)
Scheme {Employee, Skill, ForeignLanguage}
5. Primary Key {Employee, Skill, Language }
6. Each employee can speak multiple languages
7. Each employee can have multiple skills
Employee Skill Language
8. Thus violates 4NF
1234 Cooking French
1234 Cooking German
1453 Carpentry Spanish
1453 Cooking Spanish
2345 Cooking Spanish
4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.
Example 1 (Convert to 3NF)
Old Scheme {MovieName, ScreeningCity, Genre}
New Scheme {MovieName, ScreeningCity}
New Scheme {MovieName, Genre}
Movie Genre Movie ScreeningCity
Hard Code Comedy Hard Code Los Angles
Bill Durham Drama Hard Code New York
The Code Warrier Horror Bill Durham Santa Cruz
Bill Durham Durham
The Code Warrier New York
4NF - Decomposition
Example 2 (Convert to 4NF) Manager Child Manager Employee
Old Scheme {Manager, Child, Employee}
Jim Beth Jim Alice
New Scheme {Manager, Child} Mary Bob Mary Jane
New Scheme {Manager, Employee} Mary Adam
Example 3 (Convert to 4NF)
Old Scheme {Employee, Skill, ForeignLanguage}
New Scheme {Employee, Skill}
New Scheme {Employee, ForeignLanguage}
Employee Skill Employee Language
1234 Cooking 1234 French
1453 Carpentry 1234 German
1453 Cooking 1453 Spanish
2345 Cooking 2345 Spanish
Fifth Normal Form (5NF)
• Fifth normal form is satisfied when all tables are broken
into as many tables as possible in order to avoid
redundancy. Once it is in fifth normal form it cannot be
broken into smaller relations without changing the facts or
the meaning.
Domain Key Normal Form (DKNF)
• The relation is in DKNF when there can be no insertion or
deletion anomalies in the database.
+ Recap Normalization Process 7-37
+ 7-38
Workout1: Hardware Company
( Unnormalized Data)
Sales-person Sales-person Year Depart-ment
Number Product Name Commission of Number Manager Product Unit
Number Percentage Hire Name Name Price Quantity
137 19440 Baker 101995 73 Scott Hammer 17.50 473
24013 Saw 26.25 170
26722 Pliers 11.50 688
186 16386 Adams 15 2001 59 Lopez Wrench 12.95 1745
19440 Hammer 17.50 2529
21765 Drill 32.99 1962
24013 Saw 26.25 3071
204 21765 Dickens 10 1998 73 Scott Drill 32.99 809
26722 Pliers 11.50 734
361 16386 Carlyle 20 2001 73 Scott Wrench 12.95 3729
21765 Drill 32.99 3110
26722 Pliers 11.50 2738
SALESPERSON/PRODUCT Table
¨ Records contain multivalued attributes.
+
Workout2:
09/01/2020
+
Workout3
09/01/2020