0% found this document useful (0 votes)
31 views82 pages

Dbms Unit IV

The document discusses schema refinement through normalization, outlining its purpose in eliminating anomalies during database operations. It explains concepts such as functional dependency, various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), and the significance of lossless join and dependency preserving decomposition. Additionally, it provides examples of how normalization addresses issues like redundancy, update, insertion, and deletion anomalies in database management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views82 pages

Dbms Unit IV

The document discusses schema refinement through normalization, outlining its purpose in eliminating anomalies during database operations. It explains concepts such as functional dependency, various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), and the significance of lossless join and dependency preserving decomposition. Additionally, it provides examples of how normalization addresses issues like redundancy, update, insertion, and deletion anomalies in database management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 82

UNIT-IV:

• SCHEMA REFINEMENT (NORMALIZATION) :


1) Purpose of Normalization or schema refinement.
2) concept of functional dependency.
3) normal forms based on functional
dependency(1NF, 2NF and3 NF)
4) concept of surrogate key, Boyce-codd normal
form(BCNF).
5) Lossless join and dependency preserving
decomposition.
6) Fourth normal form(4NF).
7) Fifth Normal Form (5NF).
1) Purpose of Normalization or schema
refinement.
NORMALIZATION:- Normalization is a schema
refinement process, It helps in removing
anomalies during insert, update and delete
operations.
Definition: “Normalization is a process of
decomposing relations to produce smaller and
well defined relations”.
NEED OF NORMALIZATION:-
Normalization is a refinement approach based
on decompositions.
 Decompositions eliminates redundancy
storage and maintains data consistency.
 Redundant storage of the information is the
root cause of this problems.
PROBLEMS CAUSED BY REDUNDENCY:-
 Storing the information redundantly, that is
more than in one place with in a database can
lead to several problems.
 By looking at the following table which is in
unnormalized form we can get the brief idea
on anomalies which occurs
STUDENT DETAILS:
S.NO S.NAME SUBJECT MARKS GRADE
1 VIKRANTH DBMS 80 A+
2 ABHISHEK CO 60 B
3 PRAVEEN JAVA 70 A
4 60 B
ABHISHEK FLAT
5 ABHISHEK ES 15 F
6 PRAVEEN DAA 70 A
REDUNDENT STORAGE:
 Some information stored repeatedly is
known as a redundant storage.
 In the above tables S.NO 2,4 and 3,6 had
repeated twice and marks 60,70 with there
grades A,B is also repeated, is an example of
redundant storage
UPDATE ANAMOLIES:
Definition: “ If one copy of such repeated data
is updated, an inconsistency is created unless
all copies are similarly up dated.”
 When we tend to update the grade as B for 60
marks in the record of S.NO 2 then we need to
make the similar update for S.NO 4 otherwise
the data will be in inconsistent state
INSERTION ANAMOLIES: “It may not be
possible to store certain information unless
some other, unrelated, information is stored
as well “.
DELETION ANAMOLIES:
“It may not be possible to delete certain
information without loosing some other,
unrelated, information as well.”
 When we want to delete the record SNO
5 ,the information marks 15 with F grade will
be deleted is not repeated in any other record.
DECOMPOSITION: “A decomposition of a relation
schema R consists of replacing the relation schema
by few (or more) relation schemas that each contain
a subset of the attributes of R and together include
all attributes in R”.
We can decompose the above STUDENT_DETAILS into
two relations.

STUDENT_DETAILS (SNO,SNAME,SUBJECT, MARKS, GRADE)

1. Student (sno,sname,subject,marks) 2. Marks(marks,grade)


STUDENT : MARKS:

S.NO S.NAME SUBJECT MARKS MARKS GRADE

1 VIKRANTH DBMS 80
80 A
2 ABHISHEK CO 60
70 B
3 PRAVEEN JAVA 70

4 VIKRANTH FLAT 60 60 C
5 ABHISHEK ES 15
15 F
6 PRAVEEN DAA 70

Now when we need to update the C grade as B with marks 60,it is


enough to update once in marks table.
2) CONCEPT OF FUNCTIONAL DEPENDENCY:
 A functional dependency (FD) is a kind of IC
that generalizes the concept of the KEY.
Definition: “ Let R be a relation schema and Let
X and Y be nonempty sets of attributes in R.
We say that an instance r of R satisfies the
FD X Y if the following holds for every
pair of tuples t1 and t2 in r “.
If t1.X = t2.X, then t1.Y = t2.Y

An FD X Y essentially says that if two tuples agree on the values


in attributes X,
They must also agree on the values in attributes Y
EXAMPLE: relation R

A B C D
a1 b1 c1 d1
t1
a1 b1 c1 d2
t2
a1 b2 c2 d1
t3
a2 b1 c3 d1
t4

Relation R satisfies the FD AB -> C


EXAMPLE: relation R
A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1
a1 b1 c2 d1

If we add a tuple ( a1,b1,c2,d1) to the instance


the resultant instance would violate the FD
Above Relation R violates the FD AB C
X Y is read as
X functionally determines Y ,
or simply as

X determines Y
 A primary key constraint is the special case of
the FD .
 The attributes in the key play the role of X ,
and the set of all attributes in the relation
plays the role of Y.
A classic example of functional dependency is the employee
department.
Employee Department Department
Employee ID
name ID name
Human
0001 Abhinav 1
Resources
0002 Suresh 2 Marketing
Human
0003 Chandra 1
Resources
0004 Dinesh 3 Sales

An employee can only be a member of one department,


the unique ID of that employee determines the department.
Employee ID → Employee Name

Employee ID → Department ID
In addition to this relationship, the table also has
a functional dependency through a non-key
attribute
Department ID → Department Name
EXAMPLE
Employee Employee Name Salary City
number

1 Raju 50000 San Francisco

2 Ravi 38000 London

3 Kiran 25000 Tokyo

Employee number->Employee name.


Employee number->salary.
Employee number->city.
• NORMALIZATION
 Normalization is used to minimize the
redundancy from a relation or set of relations.
It is also used to eliminate the undesirable
characteristics like Insertion, Update and
Deletion Anomalies.
 Normalization divides the larger table into the
smaller table and links them using
relationship.
 The normal form is used to reduce
redundancy from the database table.
Types of Normal Forms

4NF
• 1. First Normal Form –
• If a relation contain composite or multi-valued
attribute, it violates first normal form or a
relation is in first normal form if it does not
contain any composite or multi-valued attribute.
Definition: “A relation is in 1NF if it contains an
atomic value.”
(or)
“A relation is in first normal form if every
attribute in that relation is singled valued
attribute”.
 A relation will be 1NF if it contains an atomic
value.
 It states that an attribute of a table cannot
hold multiple values. It must hold only single-
valued attribute.
 First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
Single Valued Attributes-
Single valued attributes are those attributes which can take
only one specific value for each entity.
Multi Valued Attributes-

Multi valued attributes are those attributes which can take more than
one value for a given entity from an entity set.

Here, the attributes “Mob_no” and “Email_id” are multi valued


attributes as they can take more than one values for a given entity
Example 1 – Relation STUDENT in table 1 is not
in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.
EXAMPLE-2

Student-id name courses


001 Suresh C1,C2
002 Swathi C1
003 Ramesh C2,C3
004 Ravikiran C3
005 Ratnakishore C2,C3

In the above table Courses is a multi valued attribute so it is not in 1NF.


EXAMPLE-2

Below Table is in 1NF as there is no multi valued attribute

Student-id name courses


001 suresh C1
001 suresh C2
002 swathi C!
003 ramesh C2
003 ramesh C3
004 ravikiran C3
005 ratnakishore C2
005 ratnakishore C3
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
Below Table is in 1NF as there is no multi valued attribute

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab


Second Normal Form (2NF)
• In the 2NF, relation must be in 1NF.
• all non-key attributes are fully functional
dependent on the primary key
Definition: “A relation schema R is said to be in
2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary
key”
FULL FUNCTIONAL DEPENDENCY
The set of attributes Y will be full
functionally dependent on the set of
attributes X if the following conditions are
satisfied:
i) Y is functionally dependent on X and
ii) Y is not functionally dependent on any subset
of X.
Example (Table violates 2NF)

<StudentProject>
StudentID ProjectID StudentName ProjectName

S89 P09 Siva Geo Location

S76 P07 Jacob Cluster


Exploration

S56 P03 Kishore IoT Devices

S92 P05 Ramesh Cloud


Deployment
• In the above table, we have partial
dependency; let us see how −
• The prime key attributes
are StudentID and ProjectID.
• As stated, the non-prime attributes
i.e. StudentName and ProjectName should be
functionally dependent on part of a candidate
key, to be Partial Dependent.
• The StudentName can be determined
by StudentID, which makes the relation Partial
Dependent.
• The ProjectName can be determined
by ProjectID, which makes the relation Partial
Dependent.
• Therefore, the <StudentProject> relation
violates the 2NF in Normalization.
• Example (Table converted to 2NF)
• To remove Partial Dependency and violation
on 2NF, decompose the above table into −

<StudentInfo>
And
<ProjectInfo>
Now the relation is in 2nd Normal form of Database Normalization
<StudentInfo>
<ProjectInfo>

StudentID StudentName ProjectID ProjectName

S89 Siva P09 Geo Location

P07 Cluster
S76 Jacob Exploration

S56 Kishore P03 IoT Devices

S92 Ramesh P05 Cloud


Deployment
Third Normal Form (3NF)
Definition: “A relation will be in 3NF if it is in 2NF
and not contain any transitive dependency”.
• 3NF is used to reduce the data duplication. It is
also used to achieve the data integrity.
• If there is no transitive dependency for non-
prime attributes, then the relation is in third
normal form.
FD’s
X Y and Y Z then X Z
Example:
EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_PIN EMP_STATE EMP_CITY

222 Harry 201010 AP VIJAYAWADA

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


• Candidate key: {EMP_ID}
• Non-prime attributes: In the given table, all
attributes except EMP_ID are non-prime.
• Here, EMP_STATE & EMP_CITY dependent on
EMP_PIN and EMP_PIN dependent on
EMP_ID.

• The non-prime attributes (EMP_STATE,


EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third
normal form.
SOLUTION:
That's why we need to move the
EMP_CITY and EMP_STATE to the new
<EMPLOYEE_PIN> table, with EMP_PIN as a
Primary key.
EMPLOYEE table: EMPLOYEE_PIN table:
EMP_ID EMP_NAM EMP_PIN EMP_PIN EMP_STAT EMP_CITY
E E

222 Harry 201010 201010 UP Noida

333 Stephan 02228 02228 US Boston

444 Lan 60007 60007 US Chicago

555 Katharine 06389 06389 UK Norwich

666 John 462007 462007 MP Bhopal


BOYCE CODD NORMAL FORM (BCNF):
 BCNF is the advance version of 3NF. It is
stricter than 3NF.
 It is also known as 3.5 Normal form.
 DEFINITION: “A table is in BCNF if every
functional dependency X → Y, X is the super
key of the table”.
 For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
BCNF should satisfy the following two
conditions:
1) It should be in the Third Normal Form.
2) And, for any dependency X→ Y, X should be
a super key.
 A super key is a set of attributes with in a
table whose values can be used to uniquely
identify a tuple.
 A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called
a minimal superkey.
Example: Let's assume there is a company where
employees work in more than one department.
EMPLOYEE table:

EMP_ID EMP_COUNT EMP_DEPT DEPT_TYPE EMP_DEPT_NO


RY

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549


In the above table Functional dependencies are
as follows.

1) EMP_ID → EMP_COUNTRY

2)EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}


If
EMP_DEPT_NO → EMP_DEPT

NOTE: The above FD violates BCNF because


EMP_DEPT_NO is not a super key .
 To convert the given table into BCNF, we
decompose it into two tables:

EMP_COUNTRY table: EMP_DEPT table:

EMP_ID EMP_COU EMP_D DEPT_ EMP_D


NTRY EPT TYPE EPT_N
O
264 India Design D394 283
ing
264 India
Testing D394 300
Stores D283 232
Develo D283 549
ping
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
Now, this is in BCNF because left side part of
both the functional dependencies is a key.
Boyce-Codd Normal form is a stronger
generalization of third normal form. A table is
in Boyce-Codd Normal form if and only if at
least one of the following conditions are met
for each functional dependency X → Y:
• X is a super key
• It is a trivial functional dependency.
 To determine the highest normal form of a
given relation R with functional
dependencies,
 the first step is to check whether the BCNF
condition holds.
 If R is found to be in BCNF, it can be safely
deduced that the relation is also in 3NF, 2NF
and 1NF as the hierarchy shows.
 The 1NF has the least restrictive constraint – it
only requires a relation R to have atomic
values in each tuple..
The 3NF has more restrictive constraint than the
first two normal forms but is less restrictive
than the BCNF. In this manner, the restriction
increases as we traverse down the hierarchy
LOSSLESS JOIN AND DEPENDENCY PRESERVING
DECOMPOSITION.
• Decomposition of a relation is done when a
relation in relational model is not in
appropriate normal form.
• Relation R is decomposed into two or more
relations if decomposition is lossless join as
well as dependency preserving.
Lossless Join Decomposition
• If we decompose a relation R into relations R1
and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set,
following conditions must hold:
1) Union of Attributes of R1 and R2 must be equal to
attribute of R. Each attribute of R must be either in R1
or in R2.
Att(R1) U Att(R2) = Att(R)
2) Intersection of Attributes of R1 and R2 must not be
NULL. Att(R1) ∩ Att(R2) ≠ Φ
3) Common attribute must be a key for at least one
relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1)
or
Att(R1) ∩ Att(R2) -> Att(R2)
FOR EXAMPLE, A relation R (A, B, C, D) with
FD set{A->BC} is decomposed into R1(ABC)
and R2(AD) which is a lossless join
decomposition as:
1) First condition holds true as Att(R1) U Att(R2)
= (ABC) U (AD) = (ABCD) = Att(R).
2) Second condition holds true as Att(R1) ∩
Att(R2) = (ABC) ∩ (AD) ≠ Φ
3) Third condition holds true as Att(R1) ∩ Att(R2)
= A is a key of R1(ABC) because A->BC is given.
DEPENDENCY PRESERVING DECOMPOSITION
If we decompose a relation R into relations R1
and R2, All dependencies of R either must be a
part of R1 or R2 or must be derivable from
combination of FD’s of R1 and R2.

For Example: A relation R (A, B, C, D) with FD


set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving
because FD A->BC is a part of R1(ABC).
GATE Question: Consider a schema R(A,B,C,D)
and functional dependencies A->B and C->D.
Then the decomposition of R into R1(AB) and
R2(CD) is [GATE-CS-2001]
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless
join
• Answer: For lossless join decomposition,
these three conditions must hold true:
1) Att(R1) U Att(R2) = ABCD = Att(R)
2) Att(R1) ∩ Att(R2) = Φ, which violates the
condition of lossless join decomposition.
Hence the decomposition is not lossless.
For dependency preserving decomposition,
A->B can be ensured in R1(AB) and C->D can
be ensured in R2(CD). Hence it is dependency
preserving decomposition.
Fourth normal form(4NF).
Definition: A relation R is in 4NF if and only if
the following conditions are satisfied:
1) It should be in the Boyce-Codd Normal Form
(BCNF).
2) the table should not have any Multi-valued
Dependency.
• Conditions for MVD :
Any attribute say a multiple define another
attribute b;
• if any legal relation r(R), for all pairs of tuples
t1 and t2 in r, such that,
• t1[a] = t2[a] Then there exists t3 and t4 in r
such that.
• t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b] t1 = t4; t2 = t3 Then
multivalued (MVD) dependency exists.
To check the MVD in given table, we apply the
conditions stated above and we check it with
the values in the given table.
• Condition-1 for MVD –
• t1[a] = t2[a] = t3[a] = t4[a] Finding from table,
t1[a] = t2[a] = t3[a] = t4[a] = Geeks So, condition
1 is Satisfied.
• Condition-2 for MVD –
• t1[b] = t3[b] And t2[b] = t4[b] Finding from
table,
• t1[b] = t3[b] = MS And t2[b] = t4[b] = Oracle
So, condition 2 is Satisfied.
• Condition-3 for MVD –
• t1 = t4 And t2=t3 Finding from table,
• t1 = t4 = Reading And t2 = t3 = Music So,
condition 3 is Satisfied.
• conditions are satisfied, therefore,
• a --> --> b According to table we have got,
• name --> --> project And for,
• a --> --> C We get,
• name --> --> hobby Hence, we know that MVD
exists in the above table and it can be stated
by,
• name --> --> project --> --> hobby
DECOMPOSE THE TABLE INTO MULTIPLE TABLES

NAME PROJECT NAME HOBBY


GREEKS MICROSOFT GREEKS READING
GREEKS ORACLE GREEKS MUSIC
GREEKS MICROSOFT GREEKS MUSIC
GREEKS ORACLE GREEKS READING
SURROGATE KEY
DEFINITION - WHAT DOES SURROGATE
KEY MEAN?
• A surrogate key is a unique identifier used in
databases for a modelled entity or an object.
• It is a unique key whose only significance is to act
as the primary identifier of an object or entity and
is not derived from any other data in the database
and may or may not be used as the primary key.
• The usual surrogate key used is a unique
sequential number
• A Surrogate Key’s only purpose is to be a unique identifier in a
database, for example, incremental key.
Surrogate Key has no actual meaning and is used to represent
existence. It has an existence only for data analysis.
Example
<ProductPrice>

Key ProductID Price

505_92 1987 200

698_56 1256 170

304_57 1898 250

458_66 1666 110


Above, the surrogate key is Key in the
<ProductPrice> table.
Other Examples
• Counter can also be shown as Surrogate Key.
• System date/time stamp
• Random alphanumeric string.
FIFTH NORMAL FORM (5NF):
 DEFINITION: A relation is in 5NF if it is in 4NF
and not contains any join dependency and
joining should be lossless.
 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
 5NF is also known as Project-join normal form
(PJ/NF).
A JOIN DEPENDENCY (JD) can be said to exist
if the join of R1 and R2 over C is equal to
relation R.
Where, R1 and R2 are the decompositions R1(A,
B, C), and R2 (C,D) of a given relation R (A, B, C,
D).
Alternatively, R1 and R2 is a lossless
decomposition of R
Properties of 5NF:
A relation R is in 5NF if and only if it satisfies
following conditions:
1) R should be in 4NF (no multi-valued
dependency exists).
2) It cannot undergo lossless decomposition
(join dependency)
Example:
Consider the relation R below having the
schema R(supplier, product, consumer).
The primary key is a combination of all three
attributes of the relation.
Table-1 supplierproductconsumer
supplier product consumer

S1 P1 C1

S1 P2 C1

S2 P1 C1

S3 P3 C3

Table-2 Table-4
supplierproduct Table-3 consumerproduct
supplier product supplierconsumer
supplier consumer consumer product
S1 P1
S1 C1 C1 P1
S1 P2
S2 C1 C1 P2
S2 P1
S3 C3 C3 P3
S3 P3
Explanation:
 Table 2, Table 3 and Table 4 when joined yield
the original table (Table 1).
 Hence join dependency exists in Table 1,
therefore Table 1 is not in 5NF or PJNF.
However Table 2, Table 3 and Table 4 satisfy
5NF as it has no multivalued dependency and
cannot be decomposed further (join
dependency does not exists).

You might also like