CoSc 2041 Chapter 4-1
CoSc 2041 Chapter 4-1
Chapter Four
Logical Database Design
The whole purpose of the data base design is to create an accurate representation of the data, the
relationship between the data and the business constraints pertinent to that organization. Therefore,
one can use one or more technique to design a data base. One such a technique was the E-R model.
In this chapter we use another technique known as “Normalization” with a different emphasis to
the database design defines the structure of a database with a specific data model.
Logical design is the process of constructing a model of the information used in an enterprise based
on a specific data model (e.g. relational, hierarchical or network or object), but independent of a
particular DBMS and other physical considerations.
The purpose of normalization is to find the suitable set of relations that supports the data
requirements of an enterprise.
A suitable set of relations has the following characteristics;
The first step before applying the rules in relational data model is converting the conceptual design
to a form suitable for relational logical model, which is in a form of tables.
1
Fundamentals of Database Systems
Rule 3: Relationships: relationship will be mapped by using a foreign key attribute. Foreign key
is a primary or candidate key of one relation used to create association between tables.
✓ For a relationship with One-to-One Cardinality: post the primary or candidate key of
one of the table into the other as a foreign key. In cases where one entity is having
partial participation on the relationship, it is recommended to post the candidate key of
the partial participants to the total participant so as to save some memory location due
to null values on the foreign key attribute. E.g.: for a relationship between Employee
and Department where employee manages a department, the cardinality is one-to-one
as one employee will manage only one department and one department will have one
manager. here the PK of the Employee can be posted to the Department or the PK of
the Department can be posted to the Employee. But the Employee is having partial
participation on the relationship "Manages" as not all employees are managers of
departments. thus, even though both way is possible, it is recommended to post the
primary key of the employee to the Department table as a foreign key.
✓ For a relationship with One-to-Many Cardinality: Post the primary key or candidate
key from the “one” side as a foreign key attribute to the “many” side. E.g.: For a
relationship called “Belongs To” between Employee (Many) and Department (One) the
primary or candidate key of the one side which is Department should be posted to the
many side which is Employee table.
✓ For a relationship having Associative Entity property: in cases where the relationship
has its own attributes (associative entity), one has to create a new table for the
associative entity and post primary key or candidate key from the participating entities
as foreign key attributes in the new table.
The following ER has been designed to represent the requirement of an organization to capture
Employee Department and Project information. And Employee works for department where an
employee might be assigned to manage a department. Employees might participate on different
projects within the organization. An employee might as well be assigned to lead a project where
the starting and ending date of his/her project leadership and bonus will be registered.
2
Fundamentals of Database Systems
FNam LNam
ee ee
EI Salar DI DLoc
D Nam y Manag D
e es
1 1
Employee Department
M 1 M WorksFor 1
Tel DNam
e
StartDate
Leads
EndDate
Participate
PBonu
s
M
M
Project
PFund
PID PNam
e
After we have drawn the ER diagram, the next thing is to map the ER into relational schema so as
the rules of the relational data model can be tested for each relational schema. The mapping can
be done for the entities followed by relationships based on the rule of mapping. the mapping has
been done as follows.
3
Fundamentals of Database Systems
4
Fundamentals of Database Systems
At the end of the mapping we will have the following relational schema (tables) for the logical
database design phase.
Department
DID DName DLoc MEID
Project
PID PName PFund
Telephone
EID Tel
Employee
EID FName LName Salary EDID
Emp_Partc_Project
EID PID
Emp_Lead_Project
EID PID PBonus StartDate EndDate
After converting the ER diagram in to table forms, the next phase is implementing the process of
normalization, which is a collection of rules each table should satisfy.
Normalization
A relational database is merely a collection of data, organized in a particular manner. As the father
of the relational database approach, Codd created a series of rules (tests) called normal forms that
help define that organization
One of the best ways to determine what information should be stored in a database is to clarify
what questions will be asked of it and what data would be included in the answers.
Database normalization is a series of steps followed to obtain a database design that allows for
consistent storage and efficient access of data in a relational database. These steps reduce data
redundancy and the risk of data becoming inconsistent.
NORMALIZATION is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without suffering the update
anomalies which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Normalization may reduce system performance since data will be cross referenced from many
tables. Thus, denormalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.
5
Fundamentals of Database Systems
All the normalization rules will eventually remove the update anomalies that may exist during data
manipulation after the implementation. The update anomalies are;
The type of problems that could occur in insufficiently normalized table is called update
anomalies which includes;
1. Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the
places in the database where information about that new entry needs to be stored. Additionally,
we may have difficulty to insert some data. In a properly normalized database, information
about a new entry needs to be inserted into only one place in the database; in an inadequately
normalized database, information about a new entry may need to be inserted into more than
one place and, human fallibility being what it is, some of the needed additional insertions may
be missed.
2. Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when
it is time to remove that entry. Additionally, deletion of one data may result in lose of other
information. In a properly normalized database, information about an old, to-be-gotten-rid-of
entry needs to be deleted from only one place in the database; in an inadequately normalized
database, information about that old entry may need to be deleted from more than one place,
and, human fallibility being what it is, some of the needed additional deletions may be missed.
3. Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a
properly normalized database table, whatever information is modified by the user, the change
will be used accordingly.
To avoid the update anomalies in a given table, the solution is to decompose it to smaller tables
based on the rule of normalization. However, the decomposition has two important properties
a. The Lossless-join property insures that any instance of the original relation can be
identified from the instances of the smaller relations.
6
Fundamentals of Database Systems
Deletion Anomalies:
If employee with ID 16 is deleted, then ever information about skill C++ and the type of
skill is deleted from the database. Then we will not have any information about C++ and
its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether
Pascal is allowed as a value for skill and we have no clue about the type of skill that Pascal
should be categorized as.
Modification Anomalies:
What if the address for Helico is changed from Piazza to Mexico? We need to look for
every occurrence of Helico and change the value of School_Add from Piazza to Mexico,
which is prone to error.
Database-management system can work only with the information that we put explicitly
into its tables for a given database and into its rules for working with those tables, where
such rules are appropriate and possible.
7
Fundamentals of Database Systems
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain values
of data item B always appear with certain values of data item A. if the data item A is the
determinant data item and B the dependent data item then the direction of the association is from
A to B and not vice versa.
The essence of this idea is that if the existence of something, call it A, implies that B must exist
and have a certain value, then we say that "B is functionally dependent on A." We also often
express this idea by saying that "A functionally determines B," or that "B is a function of A," or
that "A functionally governs B." Often, the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B." It is important to note that the value of B must
be unique for a given value of A, i.e., any given value of A must imply just one and only one value
of B, for the relationship to qualify for the name "function." (However, this does not necessarily
prevent different values of A from implying the same value of B.)
However, for normalization, we are interested in finding 1:1 (one to one) dependencies, lasting for
all times (intension rather than extension of the database), and the determinant having the minimal
number of attributes.
X → Y holds if whenever two tuples have the same value for X, they must have the same value
for Y
FDs are derived from the real-world constraints on the attributes and they are properties on the
database intension not extension.
Example
Dinner Course Type of Wine
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner.
8
Fundamentals of Database Systems
Dinner → Wine
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner → Wine
Dinner → Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the primary
key (if we have composite primary key) then that attribute is partially functionally dependent on
the primary key.
Then if {A, B}→C and B→C and A→C does not hold
Then C Fully functionally dependent on {A, B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
9
Fundamentals of Database Systems
Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher.
A normal form below represents a stronger condition than the previous one
UnNormalized Form(UNF):
Identify all data elements
First Normal Form(1NF):
Find the key with which you can find all data i.e. remove any repeating group
Second Normal Form(2NF):
Remove part-key dependencies (partial dependency). Make all data dependent on the
whole key.
Third Normal Form(3NF)
Remove non-key dependencies (transitive dependencies). Make all data dependent on
nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to the third
normal form (there is no transitive dependency).
10
Fundamentals of Database Systems
UNNORMALIZED
EmpID FirstName LastName Skill SkillType School SchoolAdd SkillLevel
12 Abebe Mekuria SQL, Database, AAU, Sidist_Kilo 5
VB6 Programming Helico Piazza 8
16 Lemma Alemu C++ Programming Unity Gerji 6
IP Programming Jimma Jimma City 4
28 Chane Kebede SQL Database AAU Sidist_Kilo 10
65 Almaz Belay SQL Database Helico Piazza 9
Prolog Programming Jimma Jimma City 8
Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru Oracle Database Unity Gerji 5
94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7
11
Fundamentals of Database Systems
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-
valued property. To convert it to a 2NF we need to remove all partial dependencies of non key
attributes on part of the primary key.
FD1: {EmpID}→EmpName
FD2: {ProjNo}→ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo}→ Incentive
As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.
EMPLOYEE
EmpID EmpName
12
Fundamentals of Database Systems
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
This schema is in its 2NF since the primary key is a single attribute and there are no
repeating groups (multi valued attributes).
Let’s take StudID, Year and Dormitary and see the dependencies.
13
Fundamentals of Database Systems
STUDENT DORM
Year Dormitary
StudID Stud Stud Dept Year
F_Name L_Name 1 401
125/97 Abebe Mekuria Info Sc 1
3 403
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3
Generally, even though there are other four additional levels of Normalization, a table is said to be
normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the following:
The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent, as shown below.
14
Fundamentals of Database Systems
A------>>B
A------->>C
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if
every join dependency in the table is a consequence of the candidate keys of the table.
Def: A table is in DKNF if every constraint on the table is a logical consequence of the
definition of keys and domains.
The underlying ideas in normalization are simple enough. Through normalization we want to design
for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.
15
Fundamentals of Database Systems
Pitfalls of Normalization
16