0% found this document useful (0 votes)
17 views60 pages

Lecture7-Keys_and_FD

The document outlines a lecture on functional dependencies and keys in database design, emphasizing the importance of minimizing redundancy and detecting anomalies. It provides definitions, examples, and exercises related to functional dependencies, keys, and super keys, illustrating how to determine them within relational schemas. The content is based on the 7th edition of a database system concepts book and includes references from Dr. Sudeepa Roy.

Uploaded by

dogiathuyasd18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views60 pages

Lecture7-Keys_and_FD

The document outlines a lecture on functional dependencies and keys in database design, emphasizing the importance of minimizing redundancy and detecting anomalies. It provides definitions, examples, and exercises related to functional dependencies, keys, and super keys, illustrating how to determine them within relational schemas. The content is based on the 7th edition of a database system concepts book and includes references from Dr. Sudeepa Roy.

Uploaded by

dogiathuyasd18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

▪ We start @ 13:15

▪ Remember to join MS Teams – code: 6esizxc


▪ You can download the lecture from MS Teams for your convenience
Instructor: Krystian Wojtkiewicz
School of Computer Science and Engineering
International University, VNU-HCMC Lecture 7: Keys and FD
ACKNOWLEDGEMENT
The following slides have been
created based on Database system
concepts book, 7th Edition.

The following slides are referenced


from Dr. Sudeepa Roy, Duke University.
• Functional Dependencies
• Keys/ Super keys
• Attribute closure
• Minimal cover
uid uname gid
142 Bart dps
• Why is UserGroup (uid, uname, gid) is a bad design?
123 Milhouse gov o It has redundancy, while uname is
recorded multiple times, once for each
857 Lisa abc group that a user belongs to
✓ Leads to update, insertion, deletion
857 Lisa gov anomalies
• Wouldn’t it be nice to have a systematic approach
456 Ralph abc to detecting and removing redundancy in designs?
o Dependencies, decompositions, and normal
456 Ralph gov forms

… … …
A functional dependency (FD) on a relation
R is a statement of the form X → Y, where X
and Y are sets of attributes in a relation R.

“If two tuples of R agree on X, then they


must also agree Y”.

We write this FD formally as


X→ Y and say that X functionally
determine Y
DF tells us about any two tuples t and u in the relation R.
If two tuples u and t have the same value in the left side, then
they also have the same value in their right side.
It’s common for the right side of an FD to be a single attribute.
A1, A2, …, An → B1, B2, …,Bm is equivalent to the set of FD’s
A1, A2, …, An → B1
A1, A2, …, An → B2

A1, A2, …, An → Bm
In a relation r, a set of attributes Y is functionally
dependent upon another set of attributes X (X→Y) iff…
for all pairs of tuples t1 and t2 in r…

if t1[X]=t2[X]…

it MUST be the case that t1[Y]=t2[Y]


A B C D
a1 b1 c1 d1
What FDs hold in the current state of this a1 b1 c1 d2
relation?
A→D; A,B→C; B,C →D a2 b2 c2 d1
a2 b2 c2 d2
StudentID Year Class Instructor
t1 1 Sophomore COMP355 Wu
t2 2 Sophomore COMP285 Wu
t3 3 Junior COMP355 Wu
FD EXAMPLE (1) t4 3 Junior COMP285 Wu
t5 2 Sophomore COMP355 Russo
t6
4 Sophomore COMP355 Russo

What FDs hold in the current state of this relation?


StudentID→Year
{StudentID,Class} →{Instructor}
StudentID Year Class Instructor
t1 1 Sophomore COMP355 Wu
t2 2 Sophomore COMP285 Wu
t3 3 Junior COMP355 Wu
t4 3 Junior COMP285 Wu
t5
2 Sophomore COMP355 Russo
t6
4 Sophomore COMP355 Russo

FD EXAMPLE (2)
Every student is classified as
{StudentID}→ {Year} either a Freshman, Sophomore,
{StudentID,Class}→ {Instructor} Junior, or Senior.
Key(s): {StudentID,Class} Students can take only a single
section of a class, taught by a
single instructor.
StudentID Year Class Instructor
t1 1 Sophomore COMP355 Wu
t2 2 Sophomore COMP285 Wu
t3 3 Junior COMP355 Wu
t4 3 Junior COMP285 Wu
t5
2 Sophomore COMP355 Russo
t6
4 Sophomore COMP355 Russo

FD EXAMPLE (3)
{StudentID} ↛{Instructor} {Class} ↛{Year}
{StudentID} ↛ {Class} {Class} ↛ {StudentID}
{Year} ↛ {StudentID} {Class} ↛ {Instructor}
{Year} ↛ {Instructor} {Instructor} ↛ {Class}
{Year} ↛ {Class} {Instructor} ↛ {Year}
{Instructor} ↛ {StudentID}
Title Year Length Genre studioName starName
Star war 1977 124 SciFi Fox Carrie
Fisher

Star war 1977 124 SciFi Fox Mark The relation


Hamill ▪ Movies1 (title, year, length, genre,
studioName, starName)
Star war 1977 124 SciFi Fox Harrison
Ford

Gone with 1939 231 Drama MGM Vivien


the wind Leigh

Wayne’s 1992 95 Comedy Paramount Dana


World Carvey

Wayne’s 1992 95 Comedy Paramount Mike


World Meyers
Title Year Length Genre studioName starName
Star war 1977 124 SciFi Fox Carrie
Fisher

Star war 1977 124 SciFi Fox Mark The relation


Hamill ▪ Movies1 (title, year, length, genre,
studioName, starName)
Star war 1977 124 SciFi Fox Harrison
Ford ▪ The Movies1 is not good design
because it holds information of
Gone with 1939 231 Drama MGM Vivien three different relations: Movies,
Studio, and StarsIn.
the wind Leigh

Wayne’s 1992 95 Comedy Paramount Dana


World Carvey

Wayne’s 1992 95 Comedy Paramount Mike


World Meyers
Title Year Length Genre studioName starName
Star war 1977 124 SciFi Fox Carrie
Fisher

Star war 1977 124 SciFi Fox Mark The relation


Hamill ▪ Movies1 (title, year, length, genre,
studioName, starName)
Star war 1977 124 SciFi Fox Harrison
Ford ▪ The Movies1 is not good design
because it holds information of
Gone with 1939 231 Drama MGM Vivien three different relations: Movies,
Studio, and StarsIn.
the wind Leigh
▪ We claim that the following FD
holds in this schema
Wayne’s 1992 95 Comedy Paramount Dana
Title, year → length, genre, studioName
World Carvey (right)
Wayne’s 1992 95 Comedy Paramount Mike
World Meyers
Title Year Length Genre studioName starName
Star war 1977 124 SciFi Fox Carrie
Fisher

Star war 1977 124 SciFi Fox Mark The relation


Hamill ▪ Movies1 (title, year, length, genre,
studioName, starName)
Star war 1977 124 SciFi Fox Harrison
Ford ▪ The Movies1 is not good design
because it holds information of
Gone with 1939 231 Drama MGM Vivien three different relations: Movies,
Studio, and StarsIn.
the wind Leigh
▪ We claim that the following FD
holds in this schema
Wayne’s 1992 95 Comedy Paramount Dana
Title, year → length, genre, studioName
World Carvey (right)
Wayne’s 1992 95 Comedy Paramount Mike ▪ On the other hand, we observe that
World Meyers the stament:
Title, year → StarName (wrong, not FD)
Title Year Length Genre studioName starName
Star war 1977 124 SciFi Fox Carrie
Fisher

Star war 1977 124 SciFi Fox Mark The relation


Hamill ▪ Movies1 (title, year, length, genre,
studioName, starName)
Star war 1977 124 SciFi Fox Harrison
Ford ▪ The Movies1 is not good design
because it holds information of
Gone with 1939 231 Drama MGM Vivien three different relations: Movies,
Studio, and StarsIn.
the wind Leigh
▪ We claim that the following FD
holds in this schema
Wayne’s 1992 95 Comedy Paramount Dana
Title, year → length, genre, studioName
World Carvey (right)
Wayne’s 1992 95 Comedy Paramount Mike ▪ On the other hand, we observe that
World Meyers the stament:
Title, year → StarName (wrong, not FD)

Ex: Title, year, length → genre, StudioName, StarName


Consider the following visual depiction
of the functional dependencies of a
relational schema.
1. List all FDs in algebraic notation
2. Identify all key(s) of this relation

FD EXERCISE A B C D E
Functional Dependencies Keys

A→ B DA
CD → E DB A B C D E
BD → A
D→C
We say a set of one more attributes K is a key for a relation R if:
▪ Those attributes functionally determine all other attributes
of the relation. That is, it’s impossible for two distinct tuples
of R to agree on all K.
▪ No proper subset of K functionally determine all other
attributes of R, i.e., a key must be minimal.
When a key consists of a single attribute A, we often say that A
(rather than {A}) is a key.
A set of attributes 𝐾 is a key for a relation 𝑅 if
• 𝐾 → all (other) attributes of 𝑅
• That is, 𝐾 is a “super key”
• No proper subset of 𝐾 satisfies the above condition
• That is, 𝐾 is minimal
Ex: Attributes {title, year, starName}
form a key for the relation Movies1.
▪ Suppose two tuples agree on
these three attributes: title, year,
and starName.
▪ Because they agree on title and
year, they must agree on other
attributes length, genre, and
studioName.
▪ Argue that no proper subset of
{title, year, starName} functionally
determines all other attributes.
▪ Why title and year do not
determine starName, because
many movies have more than one
star.
▪ Thus {title, year} is not a key,
similar to {year, starName}, and
{title, starName}
▪ Sometimes a relation has more than one key. If so, it’s
common to designate one of the keys as the primary key
(PK).
▪ In commercial database systems, the choice of PK can
influence some implementation issues such as how the
relation is stored on disk. However, the theory of FD’s give no
special role to PK.
• Source attributes (𝑆𝐴): Those that are appearing only in the
left side of the functional dependency (FD), or the ones that
are not part of any FDs.
• Intermediate attributes (𝐼𝐴): Those that are the ones
appearing on both sides of the FDs.
• Target attributes (𝑇𝐴): Those that are only appearing on the
right side of the FDs.
Step 1: Determine source attributes (SA), Intermediate
attributes (IA).
Step 2: If IA =  then
K = SA is the only key.
Return K;
Step 3: Determine all subsets of IA.
Step 4: Determine the super keys Si from ∀𝑋𝑖 ⊂ 𝐼𝐴.
IF 𝑆𝐴 ∪ 𝑋𝑖 + = 𝑅 + THEN

𝑆𝑖 = 𝑆𝐴 ∪ 𝑋𝑖

Step 5: Return all minimal 𝑆𝑖 .


1. Consider a relation with schema R(A,B,C) and FD’s F
={AB →C, C →A}
What are all the keys of R?

2. Consider a relation with schema R(A,B,C,D,E,G) and FD’s


F ={E →C, A →D, AB →E, DG →B}
What are all the keys of R?
▪ A set of attributes that contains a key is called a super key,
short form “superset of a key”.
▪ Thus, every key is a super key.

▪ Every super key satisfies the first condition of a key: it


functionally determines all other attributes of the relation. It
does not need to satisfy minimality.
▪ Ex: In the relation above, there are
many super keys. Not only is the
key
▪ {title, year, starName}: a key

▪ {title, year, starName, length}

▪ {title, year, starName, studioName}

are super keys


Suppose R is a relation with attributes A1, A2, ..An. As a function
of n, tell how many super keys R has, if:
1. The only key is A1
2. The only keys are A1 and A2
3. The only keys are {A1, A2} and {A3, A4}
4. The only keys are {A1, A2} and {A1, A3}
1. The only key is A1: 2n-1
2. The only keys are A1 and A2: 3*2n-2
3. The only keys are {A1, A2} and {A3, A4}:
7*2n-4
4. The only keys are {A1, A2} and {A1, A3}:
3*2n-3
The only keys are A1 and A2: 3*2n-2
▪ The number of super keys that have A1 is 2n-1.
▪ The number of super keys that have A2 is 2n-1.
▪ If we add them, we get 2*2n-1.
▪ But we would have counted some of these
super keys twice.
▪ The precise number we double-counted are
the ones that have both A1 and A2.
▪ So, we have to subtract these super keys, of
which there are 2n-2 (since they have both A1
and A2 in them).
▪ So, the final answer is: 2*2n-1 - 2n-2 = 3*2n-2.
The only keys are {A1, A2} and {A3, A4}: 7*2n-4
▪ The number of super keys that have {A1, A2} is 2n-2.
▪ The number of super keys that have {A3, A4} is 2n-2.
▪ If we add them, we get 2*2n-2.
▪ But we would have counted some of these super keys
twice.
▪ The precise number we double-counted are the ones that
have both {A1, A2} and {A3, A4}.
▪ So, we have to subtract these super keys, of which there
are 2n-4.
▪ So, the final answer is: 2*2n-2 - 2n-4 = 7*2n-4.
The
3
only keys are {A1, A2} and {A1, A3}: 3*2n-

▪ The number of super keys that have {A1, A2} is


2n-2.
▪ The number of super keys that have {A1, A3} is
2n-2.
▪ If we add them, we get 2*2n-2.
▪ But we would have counted some of these
super keys twice.
▪ The precise number we double-counted are
the ones that have both {A1, A2} and {A1, A3}.
▪ So, we have to subtract these super keys, of
which there are 2n-3 (since they have A1,A2, and
A3 in them).
▪ So, the final answer is: 2*2n-2 - 2n-3 = 3*2n-3.
Given a relation 𝑅 and a set of FD’s ℱ
• Does another FD follow from ℱ?
• Are some of the FD’s in ℱ redundant (i.e., they follow
from the others)?
• Is 𝐾 a key of 𝑅?
• What are all the keys of 𝑅?
RULES OF ℱD’S
Armstrong’s axioms
• Reflexivity: If 𝑌 ⊆ 𝑋, then 𝑋 → 𝑌
• Augmentation: If 𝑋 → 𝑌, then 𝑋𝑍 → 𝑌𝑍 for any 𝑍
• Transitivity: If 𝑋 → 𝑌 and 𝑌 → 𝑍, then 𝑋 → 𝑍
Rules derived from axioms
• Splitting: If 𝑋 → 𝑌𝑍, then 𝑋 → 𝑌 and 𝑋 → 𝑍
• Combining: If 𝑋 → 𝑌 and 𝑋 → 𝑍, then 𝑋 → 𝑌𝑍
ℱ Using these rules, you can prove or disprove an FD
given a set of ℱDs
The set of FD’s:
Title, year → length
Title, year → genre
Title, year → studioName

is equivalent to the singe FD


Title, year → length, genre, studioName
Consider one of the FD’s such as:
Title, year → length
If we try to split the left side into
Title → length
Year → length
Then we get false FD’s. That is, title does not
functionally determine length, since there can
be several movies with the same title.
▪ They are the FD’s X → Y such that Y  X.
That is, a trivial FD has a right side that is a
subset of its left side.
▪ For example:
Title, year → title or title → title, are trivial FD’s
Given FD X → Y is equivalent to X → Z
where the Z is a subset of Y, that do not belong to X,
such that Z  Y.

For example:
Title, year → title, genre
is equivalent to
Title, year → genre
Given 𝑅, a set of FD’s ℱ that hold in 𝑅, and a set of
attributes X in 𝑅:
The closure of X (denote X+) with respect to ℱ
is the set of all attributes {𝐴1,𝐴2,…} functionally
determined by X (that is, X → 𝐴1𝐴2 …)
Output: The closure X+
Start with closure = X
Input: A set of attributes X If Z →Y is in ℱ and Z is already in
and set of FD’s of ℱ the closure, then also add Y to
the closure
Repeat until no new attributes can
be added.
Input: R, ℱ, X  R+
Output: X+
Step 1: Set X+ = X
Step 2: temp = X+
f Z →Y  ℱ
if(Z  X+)
X+ = X+  Y
ℱ=ℱ–f
Step 3: if (X+=Temp)
“X+ is a result” stop
else
Return step 2
Let us consider a relation R(A,B,C,D,E,G) and FD’s
ℱ ={AB → C, BC → AD, D → E, CG → B} .
What is {A,B}+?
X ={A,B}
Next, X = {A,B,C} based on AB → C
X = {A,B,C,D} based on BC → D
X = {A,B,C,D,E} based on D → E and no more changes to X are
possible.
Thus, {A,B}+ = {A,B,C,D,E}
▪ Let us consider a relation R(A,B,C,D,E,G) and FD’s
ℱ = {AB → C, BC → AD, D → E, CG → B} .
Suppose we wish to test whether AB → D follows from these
FD’s.
▪ We computer {A,B}+ = {A,B,C,D,E}. Since D is a member of
the closure, we conclude that AB → D does follow.
▪ However, D → A does not follow.
1. Let us consider a relation R(A,B,C,D,E,G) and FD’s
ℱ ={AB → D, AC → BD, D → G, CG →A} .
What is {A,C}+, {B,D}+?

2. Let us consider a relation R(A,B,C,D,E,G) and FD’s


ℱ ={AB → C, BC → D, D → EG, BE →C}
Suppose we wish to test whether AB → EG follows from these
FD’s.
If we are given a set of FD’s F, then any set
of FD’s equivalent to F is said to be a basic
for F. A minimal basic for a relation is a
minimal cover B that satisfies three
conditions.
1. If for any FD in B we remove one or
more attributes from the left side of
FD, the result is no longer a minimal
cover.
2. All the FDs in B have singleton right
sides.
3. If any FD is removed from B, the
result is no longer a minimal cover
Consider a relation R(A,B,C) and FD’s
F= {A→ B,B →A, B →C, C →A, C →B, AB
→C, AC →B, BC →A}.

Relation R and its FD’s have several


minimal cover such as:
{A→ B, B →A, B →C, C →B} and
{A→ B, B →C, C →A}
Given R(A,B,C,D) and F = {A → BC, B → C, AB → D}.

Find minimal cover of F (set of functional dependencies).


Given R(A,B,C,D) and F = {AB → CD, B → C, C → D}.

Find minimal cover of F (set of functional dependencies).


Given R(A,B,C,D) has FD’s F = {A→B, B →C, C →D}. Suppose also that
we wish to project out the attribute B, leaving a relation R1(A,C,D).

Find F1?
Input: Two relations R, R1, computed by the projection
R1 = L(R). Also, a set of FD’s F that hold in R.
Output: The set of FD’s that hold in R1.
1. Let F1 be the eventual output set of FD’s. Initially, F1 is
empty
2. For each set of attributes X that is a subset of the
attributes of R1, compute X+. This computation is
performed with respect to the set of FD’s F, and may
involve attributes that are in the schema of R but not R1.
Add to F1 all nontrivial FD’s X →A such that A both in X+
and an attribute of R1.
3. Now, F1 is a basic for the FD’s that hold in R1 but may not
be a minimal cover. We may construct a minimal basic by
modifying F1 as follows:
a. If there is an FD f1 in F1 that follows from the other FD’s
in F1, remove f1 from F1.
b. Let Y → B be an FD in F1, with at least two attributes in
Y, and let Z be Y with one of its attributes removed. If Z
→B follows from the FD’s in F1 (including Y →B), then
replace Y → B by Z →B.
c. Repeat the above steps in all possible ways until no
more changes to F1 can be made.
Given R(A,B,C,D) has FD’s F={A→B, B →C, C →D}. Suppose also that we wish to
project out the attribute B, leaving a relation R1(A,C,D). Find F1?

Answer:

In principle, to find the FD’s for R1, we need to take the closure of all eight
subsets of {A,C,D}, using the full set of FD’s including those involving B.
Subset of {A,C,D}: , A,C,D,AC,AD,CD, ACD.
Closures of all subsets: {}+= , {A}+ = {A,B,C,D}, {C}+ = {C,D},
{D}+ = {D}, {AC}+ = {A,B,C,D}, {CD}+ = {C,D}, {ACD}+ =
{A,B,C,D}.
F={A→B, B →C, C →D}
F1+ = R1(F) = {A→C, A→D, C→D}. We can observe that A→D
follows from the other two by transitivity. Therefore, a minimal
cover for the FD’s of R1 is
F1 = {A→C, C→D}.
1. Consider a relation with schema R(A,B,C,D) and FD’s
F ={AB →C, C →D, D →A}
a. What are all the nontrivial FD’s that follow from the
given FD’s? You should restrict yourself to FD’s with
single attribute on the right side.
b. What are all the keys of R?
c. What are all the super keys for R that are not keys?
2. Consider a relation with schema R(A,B,C,D)
and FD’s
F ={AB →C, BC →D, CD →A, AD →B}
a. What are all the nontrivial FD’s that follow from

EXERCISES the given FD’s? You should restrict yourself to


FD’s with single attribute on the right side
(Restrict to one attribute on right side).
b. What are all the keys of R?
c. What are all the super keys for R that are not
keys?
3. Suppose we have relation R(A,B,C,D,E), with
some set of FD’s, F ={AB →DE, C →E, D →C, E
EXERCISES →A} and we wish to project those FD’s onto
relation S(A,B,C). Find F1?
4. Suppose we have relation R(A,B,C,D,E), with
some set of FD’s, F ={AB →D, AC→E, BC →D, D
EXERCISES →A, E → B} and we wish to project those FD’s
onto relation S(A,B,C). Find F1?
Thank you for your attention!

You might also like