Cs501 fd nf

CS501: DATABASE AND DATA
MINING
Functional dependency and Normalization1

SCHEMA FOR UNIVERSITY DATABASE
 Let’s consider the relations
 department <deptName, building, budget>
 instructor <ID,name, deptName, salary>
S bi i d d Suppose we combine instructor and department
into inst_dept
2

 Result is possible repetition of information
3

WHAT ABOUT SMALLER SCHEMAS?
 Suppose we had started with inst_dept. How would we know
to split up (decompose) it into instructor and department?
 Write a rule “if there were a schema (dept name building Write a rule if there were a schema (dept_name, building,
budget), then dept_name would be a candidate key”
 Denote as a functional dependency:
dept name  building budgetdept_name  building, budget
 In inst_dept, because dept_name is not a candidate key, the
building and budget of a department may have to be repeated.
 This indicates the need to decompose inst deptThis indicates the need to decompose inst_dept
 Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)employee1 (ID, name)
employee2 (name, street, city, salary)
 The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is areconstruct the original employee relation and so, this is a
lossy decomposition 4

EXAMPLE OF LOSSLESS JOIN DECOMPOSITION
Lossless join decomposition
Decomposition of R = (A, B, C)
R = (A B) R = (B C)R1 = (A, B) R2 = (B, C)
A B A B CBC


1
2


1
2
r  (r)
A
B
1
2
A
B
 (r)r B,C(r)
A,B (r) B,C (r)
A B C
A
A,B(r)


1
2
A
B
6

1ST NORMAL FORM
 Domain is atomic if its elements are considered
to be indivisible units
 Examples of non-atomic domains:
Set of names composite attributes Set of names, composite attributes
 Identification numbers like CS101 that can be broken up
into parts
 A relational schema R is in first normal form if
the domains of all attributes of R are atomicf f
7

1ST NORMAL FORM (CONTD)
 Atomicity is actually a property of how they y p p y
elements of the domain are used.
 Example: Strings would normally be considered
indivisibleindivisible
 Suppose that students are given roll numbers which
are strings of the form CS0012 or EE1127
f f f If the first two characters are extracted to find the
department, the domain of roll numbers is not
atomic.
 Doing so is a bad idea: leads to encoding of
information in application program rather than in
the database.
8

GOAL- DEVISE A THEORY FOR THE
FOLLOWING
 Decide whether a particular relation r is in “good”p g
form.
 In the case that a relation r is not in “good” form,
d it i t t f l ti { }decompose it into a set of relations {r1, r2, ..., rn}
such that
 each relation is in good formg
 the decomposition is a lossless-join decomposition
 Our theory is based on:
f i l d d i functional dependencies
 multivalued dependencies
9

FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY
 Constraints on the set of legal relationsg
 Require that the value for a certain set of
attributes determines uniquely the value for
th t f tt ib tanother set of attributes
 A functional dependency is a generalization of
the notion of a keythe notion of a key
10

 Let R be a relation schema
  R and   R  R and   R
 The functional dependency
  
h ld R if d l if f l l l iholds on R if and only if for any legal relations
r(R), whenever any two tuples t1 and t2 of r agree
on the attributes , they also agree on the
ib  Th iattributes . That is,
t1,t2  r (t1[] = t2 []  t1[ ] = t2 [ ] )
11

EXAMPLE
 Consider r(A,B ) with the following instance of r.( ) g
1 4
A B
 On this instance, A  B does NOT hold, but B
1 4
1 5
3 7
 On this instance, A  B does NOT hold, but B
 A does hold
12

 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only ify y
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints Functional dependencies allow us to express constraints
that cannot be expressed using superkeys. Consider the
schema:
inst_dept (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept name buildingdept_name building
and ID  building
but would not expect the following to hold:
dept_name  salary 13

USE OF FUNCTIONAL DEPENDENCYUSE OF FUNCTIONAL DEPENDENCY
 We use functional dependencies to:
 test relations to see if they are legal under a given setes e a o s o see ey a e ega e a g ve se
of functional dependencies.
 If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
 specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy
the set of functional dependencies F.
N t A ifi i t f l ti h Note: A specific instance of a relation schema
may satisfy a functional dependency even if the
functional dependency does not hold on all legal
i tinstances.
 For example, a specific instance of instructor may, by
chance, satisfy
IDname  ID. 14

FUNCTIONAL DEPENDENCY (CONTD)
 A functional dependency is trivial if it isp y
satisfied by all instances of a relation
 Example:
ID name  ID ID, name  ID
 name  name
 In general,    is trivial if   
15

CLOSURE OF A SET OF FUNCTIONAL
DEPENDENCY
 Given a set F of functional dependencies, therep ,
are certain other functional dependencies that
are logically implied by F.
F l If A  B d B  C th For example: If A  B and B  C, then we can
infer that A  C
 The set of all functional dependencies logically
implied by F is the closure of F.
 We denote the closure of F by F+.
F+ i t f F F+ is a superset of F.
16

CLOSURE OF A SET OF FUNCTIONAL
DEPENDENCY
 We can find F+, the closure of F, by repeatedly, y p y
applying
Armstrong’s Axioms:
if  th   ( fl i it ) if   , then    (reflexivity)
 if   , then      (augmentation)
 if   , and   , then    (transitivity)
 These rules are
 sound (generate only functional dependencies that
actually hold) andactually hold), and
 complete (generate all functional dependencies that
hold).
17

EXAMPLE
 R = (A, B, C, G, H, I)
F = { A  B
A  CA  C
CG  H
CG  I
B  H}B  H}
 some members of F+
 A  H
 by transitivity from A  B and B  H by transitivity from A  B and B  H
 AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
 CG  HI
 by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivityand then transitivity
18

PROCEDURE FOR COMPUTING F+
 To compute the closure of a set of functional
dependencies F:
F + = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
f h i f f ti l d d i f d f i F +for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any furtheruntil F does not change any further
19

CLOSURE OF FDS
 Additional rules:
 If    holds and    holds, then     holds
(union)
 If     holds then    holds and    holds If     holds, then    holds and    holds
(decomposition)
 If    holds and     holds, then     holds
( d t iti it )(pseudotransitivity)
The above rules can be inferred from Armstrong’s
axioms.
20

FUNCTIONAL DEPENDENCY EXAMPLE
 Flight <flight no, c arr, c dept, pl type>g f g _ , _ , _ p , p _ yp
 Seats_free <flight_no, date, seats_avl>
 The following FDs hold
 flight_no → c_arr
 flight_no → c_dept
 flight no → pl type flight_no → pl_type
 flight_no, date → seats_avl
21

FUNCTIONAL DEPENDENCY EXAMPLE
 Stud addr <name, address>_ ,
 Stud_grade <name, subject, grade>
 FDs that hold are
 name → address
 name, subject → grade
22

 Which FDs hold here?
X Y Z W
x1 y1 z1 w11 y1 1 1
x1 y2 z1 w2
x2 y2 z2 w2
x2 y3 z2 w3
x3 y3 z2 w4
23

 x→y xy →z holds
 x→z holds yz →x
 x→w
 y→x y→x
 y→z
 y→w
 z→x
 z→y
 z→w z w
 w→x
 w→y holds
 w→z 24

FULL FUNCTIONAL DEPENDENCY
 When the functional dependency is ‘minimal’ inp y
size (i.e., containing non redundant terms)
 FD X →A for which there is no proper subset Y of
X h th t Y A (A i id t b f llX such that Y →A (A is said to be fully
functionally dependent on X)
25

CLOSURE OF ATTRIBUTE SETS
 The set of all attributes functionally determinedy
by α under a set F of FDs
 It is denoted by α+
 Let’s consider the following example
 A → BC
 AC → D AC → D
 D → B
 AB → D
 So
 A+={A,B,C,D}, B+={B}, …
26

COVER OF A SET OF FDS
Let f and g be two FDs on a relation scheme RLet f and g be two FDs on a relation scheme R.
Then f is a cover of g if f+=g+
This is also known as f is equivalent to g
f
A→BC
g
A→BC
B →C
A →B
AB →C
A→BC
B →C
AB →C
Here f+=g+
So g covers fSo g covers f
27

MINIMAL COVER OR CANONICAL COVER
 A cover is said to be minimal if it has no
redundant terms
 Denoted by Fc
 Example:
Fc
F
A → BC
A → CD
D → B
A → BC
AC → D
D → B
AB → D
28

EXTRANEOUS ATTRIBUTE
 An attribute of a FD is said to be extraneous if
we can remove it without changing the closure of
the set of FDs
F ll Formally,
 Consider a set F of FDs and α→β in F
 Attribute A is extraneous in α if A  α and F logically Attribute A is extraneous in α if A  α, and F logically
implies (F-{α → β}) U{(α –A) → β}
 Attribute A is extraneous in β if A  β, and F logically
implies (F {α → β}) U{α → (β A)}implies (F-{α → β}) U{α → (β - A)}
29

EXAMPLE
 Suppose F:{AB →C and A →C}pp { }
 Then B is extraneous in AB →C
 Again F:{AB →CD and A →C}
 Then C is extraneous in AB →CD
30

Included in the
definition of
NORMAL FORMS
 First Normal Form (1NF)
definition of
relation
( )
 Second Normal Form (2NF)
 Third Normal Form (3NF)
Defined in terms
of FDs
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF) Defined using
MVDs
 Fifth Normal Form (5NF)
 Also known as Project Join Normal Form (PJNF)
MVDs
Defined usingDefined using
join dependency
31

2ND NORMAL FORM
 2NF: A relation schema R is in 2NF if
 it is 1NF and every non-key attribute is fully
functionally dependent on the primary key of R
 key attribute: An attribute that is part of some key key attribute: An attribute that is part of some key
 non-key attribute: An attribute that is not part of any
key
32

EXAMPLE
 Let’s consider the following supplier-partsg pp p
database system
 first <sno, status, city, pno, qty>
 Here a possible primary key is (sno, pno)
 FDs for relation first
citysno
status
qty
pno
33

 Instance of relation first
sno status city pno qty
s1 20 mumbai p1 300
s1 20 mumbai p2 200
s1 20 mumbai p3 400s1 20 mumbai p3 400
s1 20 mumbai p4 200
s1 20 mumbai p5 100
1 20 b i 6 700s1 20 mumbai p6 700
s2 10 chennai p1 200
s4 20 mumbai p2 230
s4 20 mumbai p4 432s4 20 mumbai p4 432
s4 20 mumbai p5 120
34

ANOMALIES
 Insert:
 Insertion not possible until a supplier supplied some
items
 Ex s5 located in Delhi in cannot be inserted Ex. s5 located in Delhi in cannot be inserted
 Delete:
 May loose some additional informationy
 Ex. if s3, p2 is deleted then we loose the information
that s3 is located in Chennai
 Update: Update:
 Same city value appears in many places
 Ex. if s1 moves from Mumbai to Ahmedabad then
update is to be done in many places 35

DECOMPOSITION
 The relation first must be decomposed in such af p
way so that the decomposed relations satisfy 2NF
 second <sno, status, city> and
 sp <sno, pno, qty>
 FDs for the above relations
sno
city
sno
qty
status
pno
qty
36

EXAMPLE OF 2NF RELATIONS
second sno pno qty
sp
sno status city
s1 20 mumbai
2 10 h i
s1 p1 300
s1 p2 200
s1 p3 400
s2 10 chennai
s3 10 chennai
s4 20 mumbai
s1 p3 400
s1 p4 200
s1 p5 100
1 6 700
s5 30 delhi
s1 p6 700
s2 p1 200
s2 p2 120
s3 p2 340
s4 p2 230
s4 p4 432
37
s4 p4 432
s4 p5 120

 Thus in r(A,B,C,D) if (A,B) is a primary key and( , , , ) ( , ) p y y
A →D holds
 Then by 2NF r can be replaced by r1 and r2 as
f llfollows
 r1(A,D) primary key {A}
 r2(A,B,C) primary key {A,B} and foreign key A( , , ) p y y { , } g y
references r1(A)
38

3NF
 A relation is in 3NF iff
 it is in 2NF and
 every non-key attribute is non-transitively dependent
on the primary keyon the primary key
 No transitive dependency means no mutual
dependency
 Now consider relation second
city
sno
status
39

ANOMALIES
 Insert:
 A particular city has a particular status
 Ex: any supplier in city Kanpur has 10 status
C t b i t d til th i t ll li Cannot be inserted until there is actually a supplier
located in that city
 Delete:
 If we delete S5 then we lose information that Delhi
has status 30
 Update: Update:
 The status of a given city appears in many places
 So updating the status value may be problematic
40

 Now if we decompose the relation second into twop
relations such that they satisfy 3NF
 sc <sno, city>
 cs <city, status>
 The FDs of the above relations are
sno city statuscity
41

 Thus if r(A,B,C) and A is a primary key and B →( , , ) p y y
C holds
 Then by 3NF r can be replaced by
 r1 (B,C) and B is a primary key
 r2(A,B) and A is a primary key and foreign key B
references r1(A)
42

PROPERTIES OF DECOMPOSITION
 Decomposition1: Relation second is decomposedp p
into
 sc <sno, city>
< it t t > cs <city,status>
 Decomposition2: Relation second is decomposed
into
 sc <sno,city>
 ss <sno,status>
Whi h f h b d i i i Which of the above decomposition is
lossless and dependency preserving?
43

DESIRABLE PROPERTIES OF
DECOMPOSITION
 Lossless joinj
 When decomposing a relation into number of smaller
ones then it is crucial that the decomposition be
losslesslossless
 Dependency preservation
 The system must not create relation that does not
satisfy all the given functional dependencies
44

LOSSLESS JOIN
 Let R be a relation schema and F be a set of
functional dependencies
 Let R1 and R2 form a decomposition of R
 The decomposition will be lossless if atleast one
of the following functional dependencies is in F+
R1∩R2→R1R1∩R2→R1
R1∩R2→R2
In other words, R1 ∩ R2 forms a super key of
ith R Reither R1 or R2
45

DEPENDENCY PRESERVATION
 Create legal relations preserving theg p g
dependencies
 Let F be a set of functional dependencies on a
h R d l t R R R bschema R and let R1, R2, …, Rn be a
decomposition of R
 The restriction of F to Ri is the set of all The restriction of F to Ri is the set of all
functional dependencies in F+ that include only
attributes of Ri
Th f i i F F F i h f The set of restrictions F1, F2, …, Fn is the set of
dependencies that can be checked efficiently
 Now we check whether testing only the Now we check whether testing only the
restrictions is sufficient?
46

 Let F′=F1 U F2 U … U Fn1 2 n
 F′ is the set of all functional dependencies on
schema R but in general F′≠F
 But if F′+=F+ is satisfied then we say that it is a
dependency preserving decomposition
47

DEPENDENCY PRESERVING: EXAMPLE
 Example:p
 Suppose F={A →B, B →C} and the original relation is
r<A,B,C>
 And the decompositions are r <A B> and r <A C> And the decompositions are r1<A,B> and r2<A,C>
 Is it dependency preserving?
48

TESTING FOR DEPENDENCY PRESERVATION
 Compute F+
 For each schema Ri in D do
 Begin
 Fi:=restrictions of F+ to Ri;
D is an input set
and
D={R1,R2,…,Rn} of
decomposed Fi: restrictions of F to Ri;
 End
 F′:=
F h t i ti F d
decomposed
relation schemas
 For each restriction Fi do
 Begin
 F′=F′ U Fi
 End
 Compute F′+
 If (F′+ =F+) then return true; If (F F ) then return true;
 else return false;
49

BOYCE/CODD NORMAL FORM (BCNF)
 Can handle relation with
 two or more candidate keys
 composite candidate keys
l d k overlapped keys
 The above conditions might not occur very often
 For a relation where the above does not hold For a relation where the above does not hold,
3NF and BCNF are equivalent
 BCNF is strictly stronger than 3NF definition
50

DETERMINANT
 In a FD, the left side is termed as determinant,
whereas the right side is termed as dependent
 A relation is in BCNF iff every determinant
i did t kis a candidate key
 Assumption
 The determinants are not too big The determinants are not too big
 All FDs are nontrivial
51

 Let’s check whether the following relations are in
BCNF
 Relation first <sno, status, city, pno, qty>
 Three determinants – {sno}, {city}, {sno, pno}{ } { y} { p }
 Only the last one was candidate key
 So not in BCNF
 Relation second <sno, status, city>, , y
 Two determinants- {sno}, {city}
 Only sno is candidate key
 So not in BCNF
 Relation sp <sno, pno, qty>
 One determinant-{sno,pno}
 That is also candidate keyThat is also candidate key
 It is in BCNF
52

EXAMPLE 1
 Now let us consider relation suppliers <sno,pp ,
sname, status, city>
 Here both sno and sname are candidate keys
 So FDs of this relation
sno status
itcitysname
So here suppliers is in BCNF 53

EXAMPLE 2
 Now consider the relation ssp <sno, sname, pno,p , , p ,
qty>
 Here the candidate keys are {sno, pno} and
{ }{sname, pno}
 So here the candidate keys overlap
 But is it BCNF? But is it BCNF?
 No, as the relation contains two other determinants
{sno} and {sname}
 And these are not candidate keys
54

EXAMPLE 2 (CONTD)
 So a possible decomposition will bep p
 ss <sno, sname>
 sp<sno,pno,qty>
A d h lid d i i And another valid decomposition
 ss <sno, sname>
 sp<sname,pno,qty>sp sname,pno,qty
55

EXAMPLE 3
 Let’s consider a relation sjt <s,j,t>j ,j,
 Here attributes s: student, j: subject and t:
teacher
 The meaning of each tuple
 “student s is taught subject j by teacher t”
 Now the following constraints apply Now the following constraints apply
1. For each subject, each student of the subject
is taught by only one teacher
2. Each teacher teaches only one subject
3. However, each subject is taught by several
teachers
56

FDs
{s,j}→t
t→j
j→t does not hold
jt
What is a possible instance of relation sjt?
s j t
Sarala Maths Prof Raj
sjt
Sarala Maths Prof. Raj
Sarala Physics Prof. Atul
Uma Maths Prof. Raj
57Uma Physics Prof. Pathak

 So what are the candidate keys?y
 {s,j} and {s,t}
 But the relation is not in BCNF as the
d i i did kdeterminant t is not a candidate key
 So how do we decompose sjt?
 Relation sjt can be decomposed into Relation sjt can be decomposed into
 st<s,t>
 tj<t,j>
58

st tj
s t
Sarala Prof. Raj
st
t j
Prof. Raj Maths
tj
Sarala Prof. Atul
Uma Prof. Raj
Uma Prof Pathak
Prof. Atul Physics
Prof. Pathak Physics
Uma Prof. Pathak
There is a problem with this decomposition.
The decomposition is not independent.
Because of FD {s,j}→t
59

 So the main problem with the last decompositionp p
is that the relations cannot be independently
updated
Wh l ti t b d d i t When a relation cannot be decomposed into
independent components then it is said to be
atomic
 So sometime there may be conflicts between
 BCNF components
D i i i d d Decomposing into independent components
 Thus it may not always possible to satisfy both of
them at the same time
60

Cs501 fd nf

More Related Content

What's hot

Similar to Cs501 fd nf

More from Kamal Singh Lodhi

Recently uploaded

Cs501 fd nf