Chapter 6
Chapter 6
Normalization
- a process for evaluating and correcting table structures to minimize data redundancies, thereby
reducing the likelihood of data anomalies
- involves assigning attributes to tables based on the concepts of determination and functional
dependency
- improve the existing data structure and create an appropriate database design.
- The main goal of normalization is to eliminate data anomalies by eliminating unnecessary or
unwanted data redundancies
Note:
1. From a structural point of view, 2NF is better than 1NF, and 3NF is better than 2NF.
2. For most purposes in business database design, 3NF is as high as you need to go in the
normalization process.
Denormalization
- produces a lower normal form; that is, a 3NF will be converted to a 2NF through
denormalization.
- However, the price you pay for increased per formance through denormalization is greater data
redundancy
Note:
1. The base data shows that a project has multiple employees assigned to it. Note that the data in
Figure 6.1 is unnormalized data, reflected by the existence of several multival ued data elements
(EMP_NUM, EMP_NAME, JOB_CLASS, CHARGE_HOUR, HOURS_BILLED)
2. does not conform to the relational table requirements discussed
3. Deficiency: The data structure invites data inconsistencies.
- Example:
o For example, the JOB_CLASS value “Elect. Engineer” might be entered as “Elect.Eng.” in
some cases, “El. Eng.” in others, and “EE” in still others.
o The structure would allow John G. News and Alice K. Johnson in the Evergreen project to
charge different rates even though they have the same job classification.
4. D: The data structure contains several multivalued attributes that make data management tasks
very difficult.
- all of the employees working on a project are in a single cell, it is hard to identify each employee
individually and for the database to answer questions
5. D: Employee data is redundant in the table because employees can work on multiple projects.
- Example:
o For example, changing the job classification for Alice K. Johnson would require updating
at least two rows
The Normalization Process
- The objective of normalization is to ensure that each table conforms to the concept of well-
formed relations—in other words, tables that have the following characteristics
2. Each row/column intersection contains only one (a single) value and not a group of values.
3. No data item will be unnecessarily stored in more than one table (tables have minimum
controlled redundancy).
- The reason for this requirement is to ensure that the data is updated in only one place.
4. All nonprime attributes in a relation (table) are dependent on the primary key—the entire
primary key and nothing but the primary key.
- The reason for this requirement is to ensure that the data is uniquely identifiable by a primary
key value.
5. Each relation (table) has no insertion, update, or deletion anomalies, which ensures the integrity
and consistency of the data.
Conversion to First Normal Form (1NF)
repeating group
- derives its name from the fact that a group of multiple entries of the same or multiple types can
exist for any single key attribute occurrence
- Start by presenting the data in a tabular format, where each cell has a single value and there are
no repeating groups
- To eliminate the repeating groups, change the table from a project focus to an assignment focus
- This will create separate rows for each employee assigned to each project, converting the
multivalued attributes into single-valued attributes.
Step 2: Identify the Primary Key
- Even a casual observer will note that PROJ_NUM is not an adequate primary key because the
project number does not uniquely identify each row
- To maintain a proper primary key that will uniquely identify any attribute value, the new key
must be composed of a combination of PROJ_NUM and EMP_NUM
- Example:
- if you know that PROJ_NUM 5 15 and EMP_NUM 5 103, the entries for the attributes
PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS must be Evergreen, June E.
Arbough, Elect. Engineer, $84.50, and 23.8, respectively.
- The identification of the PK in Step 2 means that you have already identified the following
dependency
o PROJ_NUM, EMP_NUM → PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS
Note:
1. Achieving 1NF is not sufficient to address all of the anomalies that existed in the original
structure.
2. 1NF has dealt with the repeating groups and ensured that our table conforms to the
requirements for a relational table, however anomalies remains
- there are additional dependencies in addition to the primary key dependency. For example, the
project number determines the project name. In other words, the project name is dependent on
the project number. You can write that dependency as: PROJ_NUM → PROJ_NAME
dependency diagram
- are very helpful in getting a bird’s-eye view of all the relationships among a table’s attributes,
and their use makes it less likely that you will overlook an important dependency.
Note:
1. The primary key attributes are bold, underlined, and in a different color.
2. The arrows above the attributes indicate all desirable dependencies—that is, dependencies
based on the primary key.
- Example: note that the entity’s attributes are dependent on the combination of PROJ_NUM and
EMP_NUM.
3. The arrows below the dependency diagram indicate less desirable dependencies. Two types of
such dependencies exist
a. Partial dependencies. You need to know only the PROJ_NUM to determine the
PROJ_NAME; that is, the PROJ_NAME is dependent on only part of the primary key. Also,
you need to know only the EMP_NUM to find the EMP_NAME, the JOB_CLASS, and the
CHG_HOUR. A dependency based on only a part of a composite primary key is a partial
dependency.
b. Transitive dependencies. Note that CHG_HOUR is dependent on JOB_CLASS. Because
neither CHG_HOUR nor JOB_CLASS is a prime attribute—that is, neither attribute is at least
part of a key
Conversion to Second Normal Form (2NF)
- For each component of the primary key that acts as a determinant in a partial dependency,
create a new table with a copy of that component as the primary key.
- To construct the revised dependency diagram, write each key component on a separate line and
then write the original (composite) key on the last line.
- For example:
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NU
o Each component will become the key in a new table. In other words, the original table is
now divided into three tables (PROJECT, EMPLOYEE, and ASSIGNMENT)
- The dependencies for the original key components are found by examining the arrows below
the dependency diagram shown in Figure 6.3
- The attributes that are dependent in a partial dependency are removed from the original table
and placed in the new table with the dependency’s determinant.
- In other words, the three tables that result from the conversion to 2NF are given appropriate
names (PROJECT, EMPLOYEE, and ASSIGNMENT) and are described by the following relational
schemas:
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
Conversion to Third Normal Form (3NF)
- For every transitive dependency, write a copy of its determinant as a primary key for a new
table
- If you have three different transitive dependencies, you will have three different determinants.
As with the conversion to 2NF, it is important that the determinant remain in the original table
to serve as a foreign key.
- Example:
o Figure 6.4 shows only one table that contains a transitive dependency. Therefore, write
the determinant for this transitive dependency as: JOB_CLASS
- Using Figure 6.4, identify the attributes that are dependent on each determinant identified in
Step 1
- Place the dependent attri butes in the new tables with their determinants and remove them
from their original tables.
- Example:
o eliminate CHG_HOUR from the EMPLOYEE table shown in Figure 6.4 to leave the
EMPLOYEE table dependency definition as: EMP_NUM → EMP_NAME, JOB_CLASS
- Draw a new dependency diagram to show all of the tables you have defined in Steps 1 and 2.
- JOB seems appropri ate. Check all of the tables to make sure that each table has a determinant
and that no table contains inappropriate dependencies.
- In other words, after the 3NF conversion has been completed, your database will contain four
tables:
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)