CS501: DATABASE AND DATA
MINING
Functional dependency and Normalization1
SCHEMA FOR UNIVERSITY DATABASE
 Let’s consider the relations
 department <deptName, building, budget>
 instructor <ID,name, deptName, salary>
S bi i d d Suppose we combine instructor and department
into inst_dept
2
 Result is possible repetition of information
3
WHAT ABOUT SMALLER SCHEMAS?
 Suppose we had started with inst_dept. How would we know
to split up (decompose) it into instructor and department?
 Write a rule “if there were a schema (dept name building Write a rule if there were a schema (dept_name, building,
budget), then dept_name would be a candidate key”
 Denote as a functional dependency:
dept name  building budgetdept_name  building, budget
 In inst_dept, because dept_name is not a candidate key, the
building and budget of a department may have to be repeated.
 This indicates the need to decompose inst deptThis indicates the need to decompose inst_dept
 Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)employee1 (ID, name)
employee2 (name, street, city, salary)
 The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is areconstruct the original employee relation and so, this is a
lossy decomposition 4
LOSSY DECOMPOSITION
5
EXAMPLE OF LOSSLESS JOIN DECOMPOSITION
Lossless join decomposition
Decomposition of R = (A, B, C)
R = (A B) R = (B C)R1 = (A, B) R2 = (B, C)
A B A B CBC


1
2


1
2
r  (r)
A
B
1
2
A
B
 (r)r B,C(r)
A,B (r) B,C (r)
A B C
A
A,B(r)


1
2
A
B
6
1ST NORMAL FORM
 Domain is atomic if its elements are considered
to be indivisible units
 Examples of non-atomic domains:
Set of names composite attributes Set of names, composite attributes
 Identification numbers like CS101 that can be broken up
into parts
 A relational schema R is in first normal form if
the domains of all attributes of R are atomicf f
7
1ST NORMAL FORM (CONTD)
 Atomicity is actually a property of how they y p p y
elements of the domain are used.
 Example: Strings would normally be considered
indivisibleindivisible
 Suppose that students are given roll numbers which
are strings of the form CS0012 or EE1127
f f f If the first two characters are extracted to find the
department, the domain of roll numbers is not
atomic.
 Doing so is a bad idea: leads to encoding of
information in application program rather than in
the database.
8
GOAL- DEVISE A THEORY FOR THE
FOLLOWING
 Decide whether a particular relation r is in “good”p g
form.
 In the case that a relation r is not in “good” form,
d it i t t f l ti { }decompose it into a set of relations {r1, r2, ..., rn}
such that
 each relation is in good formg
 the decomposition is a lossless-join decomposition
 Our theory is based on:
f i l d d i functional dependencies
 multivalued dependencies
9
FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY
 Constraints on the set of legal relationsg
 Require that the value for a certain set of
attributes determines uniquely the value for
th t f tt ib tanother set of attributes
 A functional dependency is a generalization of
the notion of a keythe notion of a key
10
FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY
 Let R be a relation schema
  R and   R  R and   R
 The functional dependency
  
h ld R if d l if f l l l iholds on R if and only if for any legal relations
r(R), whenever any two tuples t1 and t2 of r agree
on the attributes , they also agree on the
ib  Th iattributes . That is,
t1,t2  r (t1[] = t2 []  t1[ ] = t2 [ ] )
11
EXAMPLE
 Consider r(A,B ) with the following instance of r.( ) g
1 4
A B
 On this instance, A  B does NOT hold, but B
1 4
1 5
3 7
 On this instance, A  B does NOT hold, but B
 A does hold
12
FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only ify y
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints Functional dependencies allow us to express constraints
that cannot be expressed using superkeys. Consider the
schema:
inst_dept (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept name buildingdept_name building
and ID  building
but would not expect the following to hold:
dept_name  salary 13
USE OF FUNCTIONAL DEPENDENCYUSE OF FUNCTIONAL DEPENDENCY
 We use functional dependencies to:
 test relations to see if they are legal under a given setes e a o s o see ey a e ega e a g ve se
of functional dependencies.
 If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
 specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy
the set of functional dependencies F.
N t A ifi i t f l ti h Note: A specific instance of a relation schema
may satisfy a functional dependency even if the
functional dependency does not hold on all legal
i tinstances.
 For example, a specific instance of instructor may, by
chance, satisfy
IDname  ID. 14
FUNCTIONAL DEPENDENCY (CONTD)
 A functional dependency is trivial if it isp y
satisfied by all instances of a relation
 Example:
ID name  ID ID, name  ID
 name  name
 In general,    is trivial if   
15
CLOSURE OF A SET OF FUNCTIONAL
DEPENDENCY
 Given a set F of functional dependencies, therep ,
are certain other functional dependencies that
are logically implied by F.
F l If A  B d B  C th For example: If A  B and B  C, then we can
infer that A  C
 The set of all functional dependencies logically
implied by F is the closure of F.
 We denote the closure of F by F+.
F+ i t f F F+ is a superset of F.
16
CLOSURE OF A SET OF FUNCTIONAL
DEPENDENCY
 We can find F+, the closure of F, by repeatedly, y p y
applying
Armstrong’s Axioms:
if  th   ( fl i it ) if   , then    (reflexivity)
 if   , then      (augmentation)
 if   , and   , then    (transitivity)
 These rules are
 sound (generate only functional dependencies that
actually hold) andactually hold), and
 complete (generate all functional dependencies that
hold).
17
EXAMPLE
 R = (A, B, C, G, H, I)
F = { A  B
A  CA  C
CG  H
CG  I
B  H}B  H}
 some members of F+
 A  H
 by transitivity from A  B and B  H by transitivity from A  B and B  H
 AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
 CG  HI
 by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivityand then transitivity
18
PROCEDURE FOR COMPUTING F+
 To compute the closure of a set of functional
dependencies F:
F + = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
f h i f f ti l d d i f d f i F +for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any furtheruntil F does not change any further
19
CLOSURE OF FDS
 Additional rules:
 If    holds and    holds, then     holds
(union)
 If     holds then    holds and    holds If     holds, then    holds and    holds
(decomposition)
 If    holds and     holds, then     holds
( d t iti it )(pseudotransitivity)
The above rules can be inferred from Armstrong’s
axioms.
20
FUNCTIONAL DEPENDENCY EXAMPLE
 Flight <flight no, c arr, c dept, pl type>g f g _ , _ , _ p , p _ yp
 Seats_free <flight_no, date, seats_avl>
 The following FDs hold
 flight_no → c_arr
 flight_no → c_dept
 flight no → pl type flight_no → pl_type
 flight_no, date → seats_avl
21
FUNCTIONAL DEPENDENCY EXAMPLE
 Stud addr <name, address>_ ,
 Stud_grade <name, subject, grade>
 FDs that hold are
 name → address
 name, subject → grade
22
 Which FDs hold here?
X Y Z W
x1 y1 z1 w11 y1 1 1
x1 y2 z1 w2
x2 y2 z2 w2
x2 y3 z2 w3
x3 y3 z2 w4
23
 x→y xy →z holds
 x→z holds yz →x
 x→w
 y→x y→x
 y→z
 y→w
 z→x
 z→y
 z→w z w
 w→x
 w→y holds
 w→z 24
FULL FUNCTIONAL DEPENDENCY
 When the functional dependency is ‘minimal’ inp y
size (i.e., containing non redundant terms)
 FD X →A for which there is no proper subset Y of
X h th t Y A (A i id t b f llX such that Y →A (A is said to be fully
functionally dependent on X)
25
CLOSURE OF ATTRIBUTE SETS
 The set of all attributes functionally determinedy
by α under a set F of FDs
 It is denoted by α+
 Let’s consider the following example
 A → BC
 AC → D AC → D
 D → B
 AB → D
 So
 A+={A,B,C,D}, B+={B}, …
26
COVER OF A SET OF FDS
Let f and g be two FDs on a relation scheme RLet f and g be two FDs on a relation scheme R.
Then f is a cover of g if f+=g+
This is also known as f is equivalent to g
f
A→BC
g
A→BC
B →C
A →B
AB →C
A→BC
B →C
AB →C
Here f+=g+
So g covers fSo g covers f
27
MINIMAL COVER OR CANONICAL COVER
 A cover is said to be minimal if it has no
redundant terms
 Denoted by Fc
 Example:
Fc
F
A → BC
A → CD
D → B
A → BC
AC → D
D → B
AB → D
28
EXTRANEOUS ATTRIBUTE
 An attribute of a FD is said to be extraneous if
we can remove it without changing the closure of
the set of FDs
F ll Formally,
 Consider a set F of FDs and α→β in F
 Attribute A is extraneous in α if A  α and F logically Attribute A is extraneous in α if A  α, and F logically
implies (F-{α → β}) U{(α –A) → β}
 Attribute A is extraneous in β if A  β, and F logically
implies (F {α → β}) U{α → (β A)}implies (F-{α → β}) U{α → (β - A)}
29
EXAMPLE
 Suppose F:{AB →C and A →C}pp { }
 Then B is extraneous in AB →C
 Again F:{AB →CD and A →C}
 Then C is extraneous in AB →CD
30
Included in the
definition of
NORMAL FORMS
 First Normal Form (1NF)
definition of
relation
( )
 Second Normal Form (2NF)
 Third Normal Form (3NF)
Defined in terms
of FDs
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF) Defined using
MVDs
 Fifth Normal Form (5NF)
 Also known as Project Join Normal Form (PJNF)
MVDs
Defined usingDefined using
join dependency
31
2ND NORMAL FORM
 2NF: A relation schema R is in 2NF if
 it is 1NF and every non-key attribute is fully
functionally dependent on the primary key of R
 key attribute: An attribute that is part of some key key attribute: An attribute that is part of some key
 non-key attribute: An attribute that is not part of any
key
32
EXAMPLE
 Let’s consider the following supplier-partsg pp p
database system
 first <sno, status, city, pno, qty>
 Here a possible primary key is (sno, pno)
 FDs for relation first
citysno
status
qty
pno
33
 Instance of relation first
sno status city pno qty
s1 20 mumbai p1 300
s1 20 mumbai p2 200
s1 20 mumbai p3 400s1 20 mumbai p3 400
s1 20 mumbai p4 200
s1 20 mumbai p5 100
1 20 b i 6 700s1 20 mumbai p6 700
s2 10 chennai p1 200
s2 10 chennai p2 120
s3 10 chennai p2 340
s4 20 mumbai p2 230
s4 20 mumbai p4 432s4 20 mumbai p4 432
s4 20 mumbai p5 120
34
ANOMALIES
 Insert:
 Insertion not possible until a supplier supplied some
items
 Ex s5 located in Delhi in cannot be inserted Ex. s5 located in Delhi in cannot be inserted
 Delete:
 May loose some additional informationy
 Ex. if s3, p2 is deleted then we loose the information
that s3 is located in Chennai
 Update: Update:
 Same city value appears in many places
 Ex. if s1 moves from Mumbai to Ahmedabad then
update is to be done in many places 35
DECOMPOSITION
 The relation first must be decomposed in such af p
way so that the decomposed relations satisfy 2NF
 second <sno, status, city> and
 sp <sno, pno, qty>
 FDs for the above relations
sno
city
sno
qty
status
pno
qty
36
EXAMPLE OF 2NF RELATIONS
second sno pno qty
sp
sno status city
s1 20 mumbai
2 10 h i
s1 p1 300
s1 p2 200
s1 p3 400
s2 10 chennai
s3 10 chennai
s4 20 mumbai
s1 p3 400
s1 p4 200
s1 p5 100
1 6 700
s5 30 delhi
s1 p6 700
s2 p1 200
s2 p2 120
s3 p2 340
s4 p2 230
s4 p4 432
37
s4 p4 432
s4 p5 120
 Thus in r(A,B,C,D) if (A,B) is a primary key and( , , , ) ( , ) p y y
A →D holds
 Then by 2NF r can be replaced by r1 and r2 as
f llfollows
 r1(A,D) primary key {A}
 r2(A,B,C) primary key {A,B} and foreign key A( , , ) p y y { , } g y
references r1(A)
38
3NF
 A relation is in 3NF iff
 it is in 2NF and
 every non-key attribute is non-transitively dependent
on the primary keyon the primary key
 No transitive dependency means no mutual
dependency
 Now consider relation second
city
sno
status
39
ANOMALIES
 Insert:
 A particular city has a particular status
 Ex: any supplier in city Kanpur has 10 status
C t b i t d til th i t ll li Cannot be inserted until there is actually a supplier
located in that city
 Delete:
 If we delete S5 then we lose information that Delhi
has status 30
 Update: Update:
 The status of a given city appears in many places
 So updating the status value may be problematic
40
 Now if we decompose the relation second into twop
relations such that they satisfy 3NF
 sc <sno, city>
 cs <city, status>
 The FDs of the above relations are
sno city statuscity
41
 Thus if r(A,B,C) and A is a primary key and B →( , , ) p y y
C holds
 Then by 3NF r can be replaced by
 r1 (B,C) and B is a primary key
 r2(A,B) and A is a primary key and foreign key B
references r1(A)
42
PROPERTIES OF DECOMPOSITION
 Decomposition1: Relation second is decomposedp p
into
 sc <sno, city>
< it t t > cs <city,status>
 Decomposition2: Relation second is decomposed
into
 sc <sno,city>
 ss <sno,status>
Whi h f h b d i i i Which of the above decomposition is
lossless and dependency preserving?
43
DESIRABLE PROPERTIES OF
DECOMPOSITION
 Lossless joinj
 When decomposing a relation into number of smaller
ones then it is crucial that the decomposition be
losslesslossless
 Dependency preservation
 The system must not create relation that does not
satisfy all the given functional dependencies
44
LOSSLESS JOIN
 Let R be a relation schema and F be a set of
functional dependencies
 Let R1 and R2 form a decomposition of R
 The decomposition will be lossless if atleast one
of the following functional dependencies is in F+
R1∩R2→R1R1∩R2→R1
R1∩R2→R2
In other words, R1 ∩ R2 forms a super key of
ith R Reither R1 or R2
45
DEPENDENCY PRESERVATION
 Create legal relations preserving theg p g
dependencies
 Let F be a set of functional dependencies on a
h R d l t R R R bschema R and let R1, R2, …, Rn be a
decomposition of R
 The restriction of F to Ri is the set of all The restriction of F to Ri is the set of all
functional dependencies in F+ that include only
attributes of Ri
Th f i i F F F i h f The set of restrictions F1, F2, …, Fn is the set of
dependencies that can be checked efficiently
 Now we check whether testing only the Now we check whether testing only the
restrictions is sufficient?
46
 Let F′=F1 U F2 U … U Fn1 2 n
 F′ is the set of all functional dependencies on
schema R but in general F′≠F
 But if F′+=F+ is satisfied then we say that it is a
dependency preserving decomposition
47
DEPENDENCY PRESERVING: EXAMPLE
 Example:p
 Suppose F={A →B, B →C} and the original relation is
r<A,B,C>
 And the decompositions are r <A B> and r <A C> And the decompositions are r1<A,B> and r2<A,C>
 Is it dependency preserving?
48
TESTING FOR DEPENDENCY PRESERVATION
 Compute F+
 For each schema Ri in D do
 Begin
 Fi:=restrictions of F+ to Ri;
D is an input set
and
D={R1,R2,…,Rn} of
decomposed Fi: restrictions of F to Ri;
 End
 F′:=
F h t i ti F d
decomposed
relation schemas
 For each restriction Fi do
 Begin
 F′=F′ U Fi
 End
 Compute F′+
 If (F′+ =F+) then return true; If (F F ) then return true;
 else return false;
49
BOYCE/CODD NORMAL FORM (BCNF)
 Can handle relation with
 two or more candidate keys
 composite candidate keys
l d k overlapped keys
 The above conditions might not occur very often
 For a relation where the above does not hold For a relation where the above does not hold,
3NF and BCNF are equivalent
 BCNF is strictly stronger than 3NF definition
50
DETERMINANT
 In a FD, the left side is termed as determinant,
whereas the right side is termed as dependent
 A relation is in BCNF iff every determinant
i did t kis a candidate key
 Assumption
 The determinants are not too big The determinants are not too big
 All FDs are nontrivial
51
 Let’s check whether the following relations are in
BCNF
 Relation first <sno, status, city, pno, qty>
 Three determinants – {sno}, {city}, {sno, pno}{ } { y} { p }
 Only the last one was candidate key
 So not in BCNF
 Relation second <sno, status, city>, , y
 Two determinants- {sno}, {city}
 Only sno is candidate key
 So not in BCNF
 Relation sp <sno, pno, qty>
 One determinant-{sno,pno}
 That is also candidate keyThat is also candidate key
 It is in BCNF
52
EXAMPLE 1
 Now let us consider relation suppliers <sno,pp ,
sname, status, city>
 Here both sno and sname are candidate keys
 So FDs of this relation
sno status
itcitysname
So here suppliers is in BCNF 53
EXAMPLE 2
 Now consider the relation ssp <sno, sname, pno,p , , p ,
qty>
 Here the candidate keys are {sno, pno} and
{ }{sname, pno}
 So here the candidate keys overlap
 But is it BCNF? But is it BCNF?
 No, as the relation contains two other determinants
{sno} and {sname}
 And these are not candidate keys
54
EXAMPLE 2 (CONTD)
 So a possible decomposition will bep p
 ss <sno, sname>
 sp<sno,pno,qty>
A d h lid d i i And another valid decomposition
 ss <sno, sname>
 sp<sname,pno,qty>sp sname,pno,qty
55
EXAMPLE 3
 Let’s consider a relation sjt <s,j,t>j ,j,
 Here attributes s: student, j: subject and t:
teacher
 The meaning of each tuple
 “student s is taught subject j by teacher t”
 Now the following constraints apply Now the following constraints apply
1. For each subject, each student of the subject
is taught by only one teacher
2. Each teacher teaches only one subject
3. However, each subject is taught by several
teachers
56
FDs
{s,j}→t
t→j
j→t does not hold
jt
What is a possible instance of relation sjt?
s j t
Sarala Maths Prof Raj
sjt
Sarala Maths Prof. Raj
Sarala Physics Prof. Atul
Uma Maths Prof. Raj
57Uma Physics Prof. Pathak
 So what are the candidate keys?y
 {s,j} and {s,t}
 But the relation is not in BCNF as the
d i i did kdeterminant t is not a candidate key
 So how do we decompose sjt?
 Relation sjt can be decomposed into Relation sjt can be decomposed into
 st<s,t>
 tj<t,j>
58
st tj
s t
Sarala Prof. Raj
st
t j
Prof. Raj Maths
tj
Sarala Prof. Atul
Uma Prof. Raj
Uma Prof Pathak
Prof. Atul Physics
Prof. Pathak Physics
Uma Prof. Pathak
There is a problem with this decomposition.
The decomposition is not independent.
Because of FD {s,j}→t
59
 So the main problem with the last decompositionp p
is that the relations cannot be independently
updated
Wh l ti t b d d i t When a relation cannot be decomposed into
independent components then it is said to be
atomic
 So sometime there may be conflicts between
 BCNF components
D i i i d d Decomposing into independent components
 Thus it may not always possible to satisfy both of
them at the same time
60

Cs501 fd nf

  • 1.
    CS501: DATABASE ANDDATA MINING Functional dependency and Normalization1
  • 2.
    SCHEMA FOR UNIVERSITYDATABASE  Let’s consider the relations  department <deptName, building, budget>  instructor <ID,name, deptName, salary> S bi i d d Suppose we combine instructor and department into inst_dept 2
  • 3.
     Result ispossible repetition of information 3
  • 4.
    WHAT ABOUT SMALLERSCHEMAS?  Suppose we had started with inst_dept. How would we know to split up (decompose) it into instructor and department?  Write a rule “if there were a schema (dept name building Write a rule if there were a schema (dept_name, building, budget), then dept_name would be a candidate key”  Denote as a functional dependency: dept name  building budgetdept_name  building, budget  In inst_dept, because dept_name is not a candidate key, the building and budget of a department may have to be repeated.  This indicates the need to decompose inst deptThis indicates the need to decompose inst_dept  Not all decompositions are good. Suppose we decompose employee(ID, name, street, city, salary) into employee1 (ID, name)employee1 (ID, name) employee2 (name, street, city, salary)  The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is areconstruct the original employee relation and so, this is a lossy decomposition 4
  • 5.
  • 6.
    EXAMPLE OF LOSSLESSJOIN DECOMPOSITION Lossless join decomposition Decomposition of R = (A, B, C) R = (A B) R = (B C)R1 = (A, B) R2 = (B, C) A B A B CBC   1 2   1 2 r  (r) A B 1 2 A B  (r)r B,C(r) A,B (r) B,C (r) A B C A A,B(r)   1 2 A B 6
  • 7.
    1ST NORMAL FORM Domain is atomic if its elements are considered to be indivisible units  Examples of non-atomic domains: Set of names composite attributes Set of names, composite attributes  Identification numbers like CS101 that can be broken up into parts  A relational schema R is in first normal form if the domains of all attributes of R are atomicf f 7
  • 8.
    1ST NORMAL FORM(CONTD)  Atomicity is actually a property of how they y p p y elements of the domain are used.  Example: Strings would normally be considered indivisibleindivisible  Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 f f f If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.  Doing so is a bad idea: leads to encoding of information in application program rather than in the database. 8
  • 9.
    GOAL- DEVISE ATHEORY FOR THE FOLLOWING  Decide whether a particular relation r is in “good”p g form.  In the case that a relation r is not in “good” form, d it i t t f l ti { }decompose it into a set of relations {r1, r2, ..., rn} such that  each relation is in good formg  the decomposition is a lossless-join decomposition  Our theory is based on: f i l d d i functional dependencies  multivalued dependencies 9
  • 10.
    FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY Constraints on the set of legal relationsg  Require that the value for a certain set of attributes determines uniquely the value for th t f tt ib tanother set of attributes  A functional dependency is a generalization of the notion of a keythe notion of a key 10
  • 11.
    FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY Let R be a relation schema   R and   R  R and   R  The functional dependency    h ld R if d l if f l l l iholds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the ib  Th iattributes . That is, t1,t2  r (t1[] = t2 []  t1[ ] = t2 [ ] ) 11
  • 12.
    EXAMPLE  Consider r(A,B) with the following instance of r.( ) g 1 4 A B  On this instance, A  B does NOT hold, but B 1 4 1 5 3 7  On this instance, A  B does NOT hold, but B  A does hold 12
  • 13.
    FUNCTIONAL DEPENDENCYFUNCTIONAL DEPENDENCY K is a superkey for relation schema R if and only if K  R  K is a candidate key for R if and only ify y  K  R, and  for no   K,   R  Functional dependencies allow us to express constraints Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: inst_dept (ID, name, salary, dept_name, building, budget ). We expect these functional dependencies to hold: dept name buildingdept_name building and ID  building but would not expect the following to hold: dept_name  salary 13
  • 14.
    USE OF FUNCTIONALDEPENDENCYUSE OF FUNCTIONAL DEPENDENCY  We use functional dependencies to:  test relations to see if they are legal under a given setes e a o s o see ey a e ega e a g ve se of functional dependencies.  If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.  specify constraints on the set of legal relations  We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. N t A ifi i t f l ti h Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal i tinstances.  For example, a specific instance of instructor may, by chance, satisfy IDname  ID. 14
  • 15.
    FUNCTIONAL DEPENDENCY (CONTD) A functional dependency is trivial if it isp y satisfied by all instances of a relation  Example: ID name  ID ID, name  ID  name  name  In general,    is trivial if    15
  • 16.
    CLOSURE OF ASET OF FUNCTIONAL DEPENDENCY  Given a set F of functional dependencies, therep , are certain other functional dependencies that are logically implied by F. F l If A  B d B  C th For example: If A  B and B  C, then we can infer that A  C  The set of all functional dependencies logically implied by F is the closure of F.  We denote the closure of F by F+. F+ i t f F F+ is a superset of F. 16
  • 17.
    CLOSURE OF ASET OF FUNCTIONAL DEPENDENCY  We can find F+, the closure of F, by repeatedly, y p y applying Armstrong’s Axioms: if  th   ( fl i it ) if   , then    (reflexivity)  if   , then      (augmentation)  if   , and   , then    (transitivity)  These rules are  sound (generate only functional dependencies that actually hold) andactually hold), and  complete (generate all functional dependencies that hold). 17
  • 18.
    EXAMPLE  R =(A, B, C, G, H, I) F = { A  B A  CA  C CG  H CG  I B  H}B  H}  some members of F+  A  H  by transitivity from A  B and B  H by transitivity from A  B and B  H  AG  I  by augmenting A  C with G, to get AG  CG and then transitivity with CG  I  CG  HI  by augmenting CG  I to infer CG  CGI, and augmenting of CG  H to infer CGI  HI, and then transitivityand then transitivity 18
  • 19.
    PROCEDURE FOR COMPUTINGF+  To compute the closure of a set of functional dependencies F: F + = F repeat for each functional dependency f in F+ apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + f h i f f ti l d d i f d f i F +for each pair of functional dependencies f1and f2 in F + if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any furtheruntil F does not change any further 19
  • 20.
    CLOSURE OF FDS Additional rules:  If    holds and    holds, then     holds (union)  If     holds then    holds and    holds If     holds, then    holds and    holds (decomposition)  If    holds and     holds, then     holds ( d t iti it )(pseudotransitivity) The above rules can be inferred from Armstrong’s axioms. 20
  • 21.
    FUNCTIONAL DEPENDENCY EXAMPLE Flight <flight no, c arr, c dept, pl type>g f g _ , _ , _ p , p _ yp  Seats_free <flight_no, date, seats_avl>  The following FDs hold  flight_no → c_arr  flight_no → c_dept  flight no → pl type flight_no → pl_type  flight_no, date → seats_avl 21
  • 22.
    FUNCTIONAL DEPENDENCY EXAMPLE Stud addr <name, address>_ ,  Stud_grade <name, subject, grade>  FDs that hold are  name → address  name, subject → grade 22
  • 23.
     Which FDshold here? X Y Z W x1 y1 z1 w11 y1 1 1 x1 y2 z1 w2 x2 y2 z2 w2 x2 y3 z2 w3 x3 y3 z2 w4 23
  • 24.
     x→y xy→z holds  x→z holds yz →x  x→w  y→x y→x  y→z  y→w  z→x  z→y  z→w z w  w→x  w→y holds  w→z 24
  • 25.
    FULL FUNCTIONAL DEPENDENCY When the functional dependency is ‘minimal’ inp y size (i.e., containing non redundant terms)  FD X →A for which there is no proper subset Y of X h th t Y A (A i id t b f llX such that Y →A (A is said to be fully functionally dependent on X) 25
  • 26.
    CLOSURE OF ATTRIBUTESETS  The set of all attributes functionally determinedy by α under a set F of FDs  It is denoted by α+  Let’s consider the following example  A → BC  AC → D AC → D  D → B  AB → D  So  A+={A,B,C,D}, B+={B}, … 26
  • 27.
    COVER OF ASET OF FDS Let f and g be two FDs on a relation scheme RLet f and g be two FDs on a relation scheme R. Then f is a cover of g if f+=g+ This is also known as f is equivalent to g f A→BC g A→BC B →C A →B AB →C A→BC B →C AB →C Here f+=g+ So g covers fSo g covers f 27
  • 28.
    MINIMAL COVER ORCANONICAL COVER  A cover is said to be minimal if it has no redundant terms  Denoted by Fc  Example: Fc F A → BC A → CD D → B A → BC AC → D D → B AB → D 28
  • 29.
    EXTRANEOUS ATTRIBUTE  Anattribute of a FD is said to be extraneous if we can remove it without changing the closure of the set of FDs F ll Formally,  Consider a set F of FDs and α→β in F  Attribute A is extraneous in α if A  α and F logically Attribute A is extraneous in α if A  α, and F logically implies (F-{α → β}) U{(α –A) → β}  Attribute A is extraneous in β if A  β, and F logically implies (F {α → β}) U{α → (β A)}implies (F-{α → β}) U{α → (β - A)} 29
  • 30.
    EXAMPLE  Suppose F:{AB→C and A →C}pp { }  Then B is extraneous in AB →C  Again F:{AB →CD and A →C}  Then C is extraneous in AB →CD 30
  • 31.
    Included in the definitionof NORMAL FORMS  First Normal Form (1NF) definition of relation ( )  Second Normal Form (2NF)  Third Normal Form (3NF) Defined in terms of FDs  Boyce-Codd Normal Form (BCNF)  Fourth Normal Form (4NF) Defined using MVDs  Fifth Normal Form (5NF)  Also known as Project Join Normal Form (PJNF) MVDs Defined usingDefined using join dependency 31
  • 32.
    2ND NORMAL FORM 2NF: A relation schema R is in 2NF if  it is 1NF and every non-key attribute is fully functionally dependent on the primary key of R  key attribute: An attribute that is part of some key key attribute: An attribute that is part of some key  non-key attribute: An attribute that is not part of any key 32
  • 33.
    EXAMPLE  Let’s considerthe following supplier-partsg pp p database system  first <sno, status, city, pno, qty>  Here a possible primary key is (sno, pno)  FDs for relation first citysno status qty pno 33
  • 34.
     Instance ofrelation first sno status city pno qty s1 20 mumbai p1 300 s1 20 mumbai p2 200 s1 20 mumbai p3 400s1 20 mumbai p3 400 s1 20 mumbai p4 200 s1 20 mumbai p5 100 1 20 b i 6 700s1 20 mumbai p6 700 s2 10 chennai p1 200 s2 10 chennai p2 120 s3 10 chennai p2 340 s4 20 mumbai p2 230 s4 20 mumbai p4 432s4 20 mumbai p4 432 s4 20 mumbai p5 120 34
  • 35.
    ANOMALIES  Insert:  Insertionnot possible until a supplier supplied some items  Ex s5 located in Delhi in cannot be inserted Ex. s5 located in Delhi in cannot be inserted  Delete:  May loose some additional informationy  Ex. if s3, p2 is deleted then we loose the information that s3 is located in Chennai  Update: Update:  Same city value appears in many places  Ex. if s1 moves from Mumbai to Ahmedabad then update is to be done in many places 35
  • 36.
    DECOMPOSITION  The relationfirst must be decomposed in such af p way so that the decomposed relations satisfy 2NF  second <sno, status, city> and  sp <sno, pno, qty>  FDs for the above relations sno city sno qty status pno qty 36
  • 37.
    EXAMPLE OF 2NFRELATIONS second sno pno qty sp sno status city s1 20 mumbai 2 10 h i s1 p1 300 s1 p2 200 s1 p3 400 s2 10 chennai s3 10 chennai s4 20 mumbai s1 p3 400 s1 p4 200 s1 p5 100 1 6 700 s5 30 delhi s1 p6 700 s2 p1 200 s2 p2 120 s3 p2 340 s4 p2 230 s4 p4 432 37 s4 p4 432 s4 p5 120
  • 38.
     Thus inr(A,B,C,D) if (A,B) is a primary key and( , , , ) ( , ) p y y A →D holds  Then by 2NF r can be replaced by r1 and r2 as f llfollows  r1(A,D) primary key {A}  r2(A,B,C) primary key {A,B} and foreign key A( , , ) p y y { , } g y references r1(A) 38
  • 39.
    3NF  A relationis in 3NF iff  it is in 2NF and  every non-key attribute is non-transitively dependent on the primary keyon the primary key  No transitive dependency means no mutual dependency  Now consider relation second city sno status 39
  • 40.
    ANOMALIES  Insert:  Aparticular city has a particular status  Ex: any supplier in city Kanpur has 10 status C t b i t d til th i t ll li Cannot be inserted until there is actually a supplier located in that city  Delete:  If we delete S5 then we lose information that Delhi has status 30  Update: Update:  The status of a given city appears in many places  So updating the status value may be problematic 40
  • 41.
     Now ifwe decompose the relation second into twop relations such that they satisfy 3NF  sc <sno, city>  cs <city, status>  The FDs of the above relations are sno city statuscity 41
  • 42.
     Thus ifr(A,B,C) and A is a primary key and B →( , , ) p y y C holds  Then by 3NF r can be replaced by  r1 (B,C) and B is a primary key  r2(A,B) and A is a primary key and foreign key B references r1(A) 42
  • 43.
    PROPERTIES OF DECOMPOSITION Decomposition1: Relation second is decomposedp p into  sc <sno, city> < it t t > cs <city,status>  Decomposition2: Relation second is decomposed into  sc <sno,city>  ss <sno,status> Whi h f h b d i i i Which of the above decomposition is lossless and dependency preserving? 43
  • 44.
    DESIRABLE PROPERTIES OF DECOMPOSITION Lossless joinj  When decomposing a relation into number of smaller ones then it is crucial that the decomposition be losslesslossless  Dependency preservation  The system must not create relation that does not satisfy all the given functional dependencies 44
  • 45.
    LOSSLESS JOIN  LetR be a relation schema and F be a set of functional dependencies  Let R1 and R2 form a decomposition of R  The decomposition will be lossless if atleast one of the following functional dependencies is in F+ R1∩R2→R1R1∩R2→R1 R1∩R2→R2 In other words, R1 ∩ R2 forms a super key of ith R Reither R1 or R2 45
  • 46.
    DEPENDENCY PRESERVATION  Createlegal relations preserving theg p g dependencies  Let F be a set of functional dependencies on a h R d l t R R R bschema R and let R1, R2, …, Rn be a decomposition of R  The restriction of F to Ri is the set of all The restriction of F to Ri is the set of all functional dependencies in F+ that include only attributes of Ri Th f i i F F F i h f The set of restrictions F1, F2, …, Fn is the set of dependencies that can be checked efficiently  Now we check whether testing only the Now we check whether testing only the restrictions is sufficient? 46
  • 47.
     Let F′=F1U F2 U … U Fn1 2 n  F′ is the set of all functional dependencies on schema R but in general F′≠F  But if F′+=F+ is satisfied then we say that it is a dependency preserving decomposition 47
  • 48.
    DEPENDENCY PRESERVING: EXAMPLE Example:p  Suppose F={A →B, B →C} and the original relation is r<A,B,C>  And the decompositions are r <A B> and r <A C> And the decompositions are r1<A,B> and r2<A,C>  Is it dependency preserving? 48
  • 49.
    TESTING FOR DEPENDENCYPRESERVATION  Compute F+  For each schema Ri in D do  Begin  Fi:=restrictions of F+ to Ri; D is an input set and D={R1,R2,…,Rn} of decomposed Fi: restrictions of F to Ri;  End  F′:= F h t i ti F d decomposed relation schemas  For each restriction Fi do  Begin  F′=F′ U Fi  End  Compute F′+  If (F′+ =F+) then return true; If (F F ) then return true;  else return false; 49
  • 50.
    BOYCE/CODD NORMAL FORM(BCNF)  Can handle relation with  two or more candidate keys  composite candidate keys l d k overlapped keys  The above conditions might not occur very often  For a relation where the above does not hold For a relation where the above does not hold, 3NF and BCNF are equivalent  BCNF is strictly stronger than 3NF definition 50
  • 51.
    DETERMINANT  In aFD, the left side is termed as determinant, whereas the right side is termed as dependent  A relation is in BCNF iff every determinant i did t kis a candidate key  Assumption  The determinants are not too big The determinants are not too big  All FDs are nontrivial 51
  • 52.
     Let’s checkwhether the following relations are in BCNF  Relation first <sno, status, city, pno, qty>  Three determinants – {sno}, {city}, {sno, pno}{ } { y} { p }  Only the last one was candidate key  So not in BCNF  Relation second <sno, status, city>, , y  Two determinants- {sno}, {city}  Only sno is candidate key  So not in BCNF  Relation sp <sno, pno, qty>  One determinant-{sno,pno}  That is also candidate keyThat is also candidate key  It is in BCNF 52
  • 53.
    EXAMPLE 1  Nowlet us consider relation suppliers <sno,pp , sname, status, city>  Here both sno and sname are candidate keys  So FDs of this relation sno status itcitysname So here suppliers is in BCNF 53
  • 54.
    EXAMPLE 2  Nowconsider the relation ssp <sno, sname, pno,p , , p , qty>  Here the candidate keys are {sno, pno} and { }{sname, pno}  So here the candidate keys overlap  But is it BCNF? But is it BCNF?  No, as the relation contains two other determinants {sno} and {sname}  And these are not candidate keys 54
  • 55.
    EXAMPLE 2 (CONTD) So a possible decomposition will bep p  ss <sno, sname>  sp<sno,pno,qty> A d h lid d i i And another valid decomposition  ss <sno, sname>  sp<sname,pno,qty>sp sname,pno,qty 55
  • 56.
    EXAMPLE 3  Let’sconsider a relation sjt <s,j,t>j ,j,  Here attributes s: student, j: subject and t: teacher  The meaning of each tuple  “student s is taught subject j by teacher t”  Now the following constraints apply Now the following constraints apply 1. For each subject, each student of the subject is taught by only one teacher 2. Each teacher teaches only one subject 3. However, each subject is taught by several teachers 56
  • 57.
    FDs {s,j}→t t→j j→t does nothold jt What is a possible instance of relation sjt? s j t Sarala Maths Prof Raj sjt Sarala Maths Prof. Raj Sarala Physics Prof. Atul Uma Maths Prof. Raj 57Uma Physics Prof. Pathak
  • 58.
     So whatare the candidate keys?y  {s,j} and {s,t}  But the relation is not in BCNF as the d i i did kdeterminant t is not a candidate key  So how do we decompose sjt?  Relation sjt can be decomposed into Relation sjt can be decomposed into  st<s,t>  tj<t,j> 58
  • 59.
    st tj s t SaralaProf. Raj st t j Prof. Raj Maths tj Sarala Prof. Atul Uma Prof. Raj Uma Prof Pathak Prof. Atul Physics Prof. Pathak Physics Uma Prof. Pathak There is a problem with this decomposition. The decomposition is not independent. Because of FD {s,j}→t 59
  • 60.
     So themain problem with the last decompositionp p is that the relations cannot be independently updated Wh l ti t b d d i t When a relation cannot be decomposed into independent components then it is said to be atomic  So sometime there may be conflicts between  BCNF components D i i i d d Decomposing into independent components  Thus it may not always possible to satisfy both of them at the same time 60