unit3
unit3
Functional Dependency (FD) determines the relation of one attribute to another attribute
in a database management system (DBMS) system. Functional dependency helps you to
maintain the quality of data in the database. A functional dependency is denoted by an
arrow →. The functional dependency of X on Y is represented by X → Y. Functional
Dependency plays a vital role to find the difference between good and bad database
design.
Example:
In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.
Use an entity relation diagram (ERD) to provide the big picture, or macro view, of an
organization’s data requirements and operations. This is created through an iterative process
that involves identifying relevant entities, their attributes and their relationships.
● Multivalued dependency:
● Trivial functional dependency:
● Non-trivial functional dependency:
● Transitive dependency:
Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table. A multivalued dependency is a complete constraint
between two sets of attributes in a relation. It requires that certain tuples be present in a
relation.
Example:
car_model-> colour
The Trivial dependency is a set of attributes which are called a trivial if the set of attributes
are included in that attribute.
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Functional dependency which also known as a nontrivial dependency occurs when A->B
holds true where B is not a subset of A. In a relationship, if attribute B is not a subset of
attribute A, then it is considered as a non-trivial dependency.
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
Transitive dependency:
Example:
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{ Company} -> {Age} should hold, that makes sense because if we know the company name,
we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three
or more attributes.
What is Normalization?
Normalization is a method of organizing the data in the database which helps you to avoid
data redundancy, insertion, update & deletion anomaly. It is a process of analyzing the
relation schemas based on their different functional dependencies and primary key.
Normalization is inherent to relational database theory. It may have the effect of duplicating
the same data within the database which may result in the creation of additional tables.
● Functional Dependency avoids data redundancy. Therefore same data do not repeat at
multiple locations in that database
● It helps you to maintain the quality of data in the database
● It helps you to defined meanings and constraints of databases
● It helps you to identify bad designs
● It helps you to find the facts regarding the database design
Summary
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the process
of determining how much redundancy exists in a table. The goals of normalization are to:
NORMALIZATION may also be defined as:- it is a database design technique that reduces
data redundancy and eliminates undesirable characteristics like Insertion, Update and
Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links
them using relationships. The purpose of Normalization in SQL is to eliminate redundant
(repetitive) data and ensure data is stored logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization with
the introduction of the First Normal Form, and he continued to extend theory with Second
and Third Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-
Codd Normal Form.
Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally
we only want minimal redundancy for PK to FK. Everything else should be derived from
other tables. There are five normal forms.
In the first normal form, only single values are permitted at the intersection of each row and
column; hence, there are no repeating groups.
To normalize a relation that contains a repeating group, remove the repeating group and form
two new relations.
The PK of the new relation is a combination of the PK of the original relation plus an
attribute from the newly created relation for unique identification.
We will use the Student_Grade_Report table below, from a School database, as our
example to explain the process for 1NF.
Student_Grade_Report (StudentNo, StudentName, Major, CourseNo, CourseName,
InstructorNo, InstructorName, InstructorLocation, Grade)
● In the Student Grade Report table, the repeating group is the course information. A
student can take many courses.
● Remove the repeating group. In this case, it’s the course information for each student.
● Identify the PK for your new table.
● The PK must uniquely identify the attribute value (StudentNo and CourseNo).
● After removing all the attributes related to the course and student, you are left with the
student course table (StudentCourse).
● The Student table (Student) is now in first normal form with the repeating group
removed.
● The two new tables are shown below.
For the second normal form, the relation must first be in 1NF. The relation is automatically in
2NF if, and only if, the PK comprises a single attribute.
If the relation has a composite PK, then each non-key attribute must be fully dependent on
the entire PK and not on a subset of the PK (i.e., there must be no partial dependency or
augmentation).
To be in third normal form, the relation must be in second normal form. Also all transitive
dependencies must be removed; a non-key attribute may not be functionally dependent on
another non-key attribute.
● Eliminate all dependent attributes in transitive relationship(s) from each of the tables
that have a transitive relationship.
● Create new table(s) with removed dependency.
● Check new table(s) as well as table(s) modified to make sure that each table has a
determinant and that no table contains inappropriate dependencies.
● See the four new tables below.
At this stage, there should be no anomalies in third normal form. Let’s look at the
dependency diagram (Figure 12.1) for this example. The first step is to remove repeating
groups, as discussed above.
To recap the normalization process for the School database, review the dependencies shown
in Figure 12.1.
Figure 12.1 Dependency diagram, by A. Watt.
When a table has more than one candidate key, anomalies may result even though the relation
is in 3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and
only if, every determinant is a candidate key.
BCNF Example 1
Student_i Adviso
Major
d r
Physic
111 Smith
s
Physic
671 White
s
The semantic rules (business rules applied to the database) for this table are:
The functional dependencies for this table are listed below. The first one is a candidate key;
the second is not.
To reduce the St_Maj_Adv relation to BCNF, you create two new tables:
St_Adv table
Student_i Adviso
d r
111 Smith
111 Chan
320 Dobbs
671 White
803 Smith
Adv_Maj table
Adviso
Major
r
Physic
Smith
s
Chan Music
Dobbs Math
Physic
White
s
BCNF Example 2
A relation is in BCNF if, and only if, every determinant is a candidate key. We need to create
a table that incorporates the first three FDs (Client_Interview2 table) and another table
(StaffRoom table) for the fourth FD.
Client_Interview2 table
StaffRoom table
Join Dependency:
A Join dependency is generalization of Multivalued dependency.A JD {R1, R2, ..., Rn} is
said to hold over a relation R if R1, R2, R3, ..., Rn is a lossless-join decomposition of R .
There is no set of sound and complete inference rules for JD.
Inclusion Dependency:
An Inclusion Dependency is a statement of the form that some columns of a relation
are contained in other columns. A foreign key constraint is an example of inclusion
dependency.
During the normalization process of database design, make sure that proposed entities meet
required normal form before table structures are created. Many real-world databases have
been improperly designed or burdened with anomalies if improperly modified during the
course of time. You may be asked to redesign and modify existing databases. This can be a
large undertaking if the tables are not properly normalized.
Key Terms and Abbrevations
first normal form (1NF): only single values are permitted at the intersection of each row and
column so there are no repeating groups
second normal form (2NF): the relation must be in 1NF and the PK comprises a single
attribute
third normal form (3NF): the relation must be in 2NF and all transitive dependencies must be
removed; a non-key attribute may not be functionally dependent on another non-key attribute
If two or more independent relation are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence of
one or more other rows in that same table. Put another way, two attributes (or columns) in a
table are independent of one another, but both depend on a third attribute. A multivalued
dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency. The table should have at least 3 attributes and B and C
should be independent for A ->> B multivalued dependency. For example,
Person->-> mobile,
Person ->-> food_likes
This is read as “person multidetermines mobile” and “person multidetermines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than one.
SID SNAME
S1 A
S2 B
CID CNAME
C1 C
C2 D
Table – R1 X R2
S1 A C2 D
S2 B C1 C
S2 B C2 D
Example –
Table – R1
COMPANY PRODUCT
C1 pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table – R2
AGENT COMPANY
Aman C1
Aman C2
Mohan C1
Agent->->Company
Table – R3
AGENT PRODUCT
Aman pendrive
Aman mic
Aman speaker
Mohan speaker
Agent->->Product
Table – R1⋈R2⋈R3
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate
keys of R. A relation decomposed into two relations must have loss-less join Property, which
ensures that no spurious or extra tuples are generated, when relations are reunited through a
natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
Example – Consider the above schema, with a case as “if a company makes a product and an
agent is an agent for that company, then he always sells that product for the company”. Under
these circumstances, the ACP table is shown as:
Table – ACP
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three
relations will be shown as:
Table – R1
AGENT COMPANY
A1 PQR
A1 XYZ
A2 PQR
Table – R2
AGENT PRODUCT
A1 Nut
A1 Bolt
A2 Nut
Table – R3
COMPANY PRODUCT
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2
over ‘Agent’and ‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is
a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.
Summary