0% found this document useful (0 votes)
12 views132 pages

DB Systems - Chapter 5 - Relational Database Design-New

Uploaded by

quanglinh280405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views132 pages

DB Systems - Chapter 5 - Relational Database Design-New

Uploaded by

quanglinh280405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Faculty of Computer Science and Engineering

Ho Chi Minh City University of Technology

Chapter 5:
Relational Database Design

Database Systems
(CO2013)
Computer Science Program
Assoc. Prof. Dr. Võ Thị Ngọc Châu
([email protected])
Semester 1 – 2022-2023
Content
Chapter 1: An Overview of Database Systems

Chapter 2: The Entity-Relationship Model

Chapter 3: The Relational Data Model

Chapter 4: The SQL Language

Chapter 5: Relational Database Design

Chapter 6: Physical Storage and Data Management

Chapter 7: Database Security

2
Chapter 5:
Relational Database Design
5.1. Design Guidelines for Relation
Schemas
5.2. Functional Dependencies
5.3. Normal Forms Based on Primary Keys
5.4. Boyce-Codd Normal Form
5.5. Properties of Relational
Decompositions
5.6. Algorithms for Relational Database
Schema Design 3
Main References
Text:
[1] R. Elmasri, S. R. Navathe, Fundamentals of Database
Systems- 6th Edition, Pearson- Addison Wesley, 2011.
R. Elmasri, S. R. Navathe, Fundamentals of Database Systems- 7th
Edition, Pearson, 2016.
References:
[1] S. Chittayasothorn, Relational Database Systems: Language,
Conceptual Modeling and Design for Engineers, Nutcha Printing Co.
Ltd, 2017.
[3] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System
Concepts – 7th Edition, McGraw-Hill, 2020.
[4] H. G. Molina, J. D. Ullman, J. Widom, Database Systems: The
Complete Book - 2nd Edition, Prentice-Hall, 2009.
[5] R. Ramakrishnan, J. Gehrke, Database Management Systems – 4th
Edition, McGraw-Hill, 2018.
[6] M. P. Papazoglou, S. Spaccapietra, Z. Tari, Advances in Object-
Oriented Data Modeling, MIT Press, 2000.
[7]. G. Simsion, Data Modeling: Theory and Practice, Technics
Publications, LLC, 2007. 4
Database
design

Phases of database design


and implementation for
large databases DDL: data definition language
5
Source: [1] SDL: storage definition language
Relational database design
Two main database design approaches
A top-down design methodology (design by analysis)
Start with a number of groupings of attributes into relations
that exist together naturally (e.g. invoice, report, and form)
Analyze the relations individually and collectively, leading to
further decomposition until all desirable properties are met

A bottom-up design methodology (design by


synthesis)
Consider the basic relationships among individual attributes
as the starting point to construct relation schemas
Is not very popular in practice because of the large number
of the binary relationships between the attributes
6
5.1. Design Guidelines for
Relation Schemas
Informal measures of quality for relation schema
design
Semantics of the attributes
How to interpret the attribute values stored in a tuple of the
relation – how the attribute values in a tuple relate to one another

Reducing the redundant values in tuples


Storage space & update anomalies

Reducing the null values in tuples


Multiple interpretations of nulls

Disallowing the possibility of generating spurious tuples


Join relations with equality conditions on attributes that are either primary
keys or foreign keys in a way that guarantees that no spurious tuples are
generated 7
Semantics of the relation attributes
Semantics specifies how to interpret the attribute values stored
in a tuple of the relation, i.e. how the attribute values in a tuple
relate to one another.
It is assumed that attributes belonging to one relation have
certain real-world meaning and a proper interpretation
associated with them.
If the conceptual design is done carefully, followed by a
systematic mapping into relations, most of the semantics will
have been accounted for and the resulting design should have a
clear meaning.
Design a relation schema so that it is easy to explain its
meaning and avoid semantic ambiguities.
Do NOT combine attributes from multiple entity types and
relationship types into a single relation.
8
Semantics of the relation attributes

Better Relations

(Each tuple captures the details of two real-world entities:


employee, department.)
Worse Relations

9
Reducing the redundant values
in tuples
Repeated data in one relation
More storage space
Considered for less execution time of certain queries

The anomalies (Kì dị dữ liệu)


Cause more execution time for insertion, deletion, modification

Design the base relation schemas so that NO insertion, deletion,


or modification anomalies are present in the relations.
If any anomalies are present, note them clearly, ensure that the
programs that update the database will operate correctly.

It is advisable to use anomaly-free base relations and to specify


views that include the joins for placing together the attributes
frequently referenced in important queries.
10
Reducing the redundant values
in tuples

Better Relations

Where are redundant values? Worse Relations

11
Reducing the redundant values
in tuples
Where are redundant values? Worse Relations

John, Carter | 246813579 | 1970-03-01 | 123 Str, Spring, TX | 5


10 | Sales | 987987987
Redundancy
Insertion anomalies:

- Insertion for a new employee in an existing department with dnumber=5. Detail


of department 5 must be entered again exactly to be consistent with its detail in
other existing tuples. Otherwise, DATA INCONSISTENCY occurs.

- Insertion for a new department with dnumber=10 which has not yet had any
employee? IMPOSSIBLE because Ssn is the primary key!!!
12
Reducing the redundant values
in tuples
Where are redundant values? Worse Relations

Redundancy
Deletion anomalies:

- Deletion for the last single employee of a department leads to deletion of the
details of that department. This makes a data loss for departments.

- Delete the tuple with Ssn = 888665555 delete data of department 1

department 1’s data loss 13


Reducing the redundant values
in tuples
Where are redundant values? Worse Relations

Redundancy
Modification anomalies:

- Changes on redundant values lead to update more than one tuple to ensure data
consistency.

- Change Dname from “Research” to “Research & Development” Update 4 tuples


data inconsistency if any of them has not been updated.
14
Reducing the null values in tuples
Nulls can have multiple interpretations:
The attribute does not apply to this tuple.
The attribute value for this tuple is unknown.
The value is known but absent as it has not been recorded yet.
The problem of nulls
Waste storage space
Problems with understanding the meaning of the attributes and with
specifying JOIN operations at the logical level.
How to account for them when aggregate operations such as COUNT
or SUM are applied
Avoid placing attributes in a base relation whose values may
frequently be null. If nulls are unavoidable, make sure that they
apply in exceptional cases only and do not apply to a majority of
tuples in the relation. 15
Disallowing the possibility of
generating spurious tuples
Spurious tuples

NOT valid, NOT exist in the mini-world of the database.

Generated when the relations from improper design are linked to


each other, resulting in undesirable information.

Design relation schemas so that they can be joined with equality


conditions on attributes that are either primary keys or foreign
keys in a way that guarantees that no spurious tuples are
generated.

Avoid relations that contain matching attributes that are not


(foreign key, primary key) combinations, because joining on
such attributes may produce spurious tuples.
16
Disallowing the possibility of
generating spurious tuples
PROJECT WORKS_ON PROJECT EMPLOYEE_PLOCATION

The project list Employees The project list Employees work on


work on projects at locations
projects. of their projects.

Proper design IMProper design 17


Disallowing the possibility of
generating spurious tuples
PROJECT pnumber=pno WORKS_ON PROJECT plocation=plocation EMPLOYEE_PLOCATION

Correct results INCorrect results with SPURIOUS tuples


18
Summary of design guidelines

Good database design by minimizing the


number of problematic relation schemas

Mixtures in one single relation

Redundant data in a relation

Null values in a relation

Improperly related relations for joins

19
Summary of design guidelines
The problems with problematic relation schemas

Anomalies that cause redundant work to be done during


insertion into and modification of a relation, and that may
cause accidental loss of information during a deletion from
a relation

Waste of storage space due to nulls and the difficulty of


performing aggregation operations (e.g. count, sum, max,
min, average) and joins due to null values

Generation of invalid and spurious data during joins on


improperly related base relations
20
5.2. Functional Dependencies
Functional dependency (FD) (Phụ thuộc hàm)
The single most important concept in relational schema
design theory
A constraint between two sets of attributes from the database
A property of the semantics of the attributes
An FD cannot be inferred automatically from a given relation extension r
but must be defined explicitly by someone who knows the semantics of
the attributes of a relation schema R.
The database designers will use their understanding of the semantics of
the attributes of a relation schema R - that is, how they relate to one
another - to specify the functional dependencies that should hold on all
relation states (extensions) r of R.
For example: a functional dependency: Pnumber → Plocation
The value of a project‟s number (Pnumber) uniquely (functionally)
determines the project‟s location (Plocation).
A project‟s location (Plocation) is functionally dependent on the value of
the project‟s number (Pnumber).
There is a functional dependency from Pnumber to Plocation. 21
5.2. Functional Dependencies
Functional dependency (FD)
A constraint between two sets of attributes from the database
A constraint on the possible tuples that can form a relation state
r of a relation schema R = {A1, A2, …, An}
Denoted by X → Y
X, Y: two sets of attributes that are subsets of R
For any two tuples t1 and t2 in r that have t1[X] = t2[X], they
must also have t1[Y] = t2[Y].
The values of Y of a tuple in r depend on (are determined by)
the values of X.
The values of X of a tuple uniquely (functionally) determine
the values of Y.
If X → Y in R, this does not say whether or not Y → X in R.
22
5.2. Functional Dependencies

t2

t8

A functional dependency: Instructor → Course


t2[Instructor] = t8[Instructor] → t2[Course] = t8[Course]
23
5.2. Functional Dependencies

t1
t2

A functional dependency: Text → Course

But Teacher → Course is NOT a functional dependency.


t1[Teacher] = t2[Teacher], but t1[Course] ≠ t2[Course]

24
5.2. Functional Dependencies
Inference Rules for Functional Dependencies
(Các quy tắc suy diễn cho các phụ thuộc hàm)
An FD X → Y is inferred from a set of dependencies F specified
on R if X → Y holds in every legal relation state r of R; that is,
whenever r satisfies all the dependencies in F, X → Y also
holds in r.
(Phụ thuộc hàm X → Y được suy diễn từ tập phụ thuộc F trên
lược đồ R nếu X → Y đúng với mọi trạng thái quan hệ r của R,
nghĩa là: khi r thỏa tất cả các phụ thuộc hàm trong F thì cũng
thỏa X → Y).
F X → Y to denote that the functional dependency X → Y is
inferred from the set of functional dependencies F.
To determine a systematic way to infer dependencies, a set of
inference rules can be used to infer new dependencies from a
given set of dependencies. 25
5.2. Functional Dependencies
Inference Rules for Functional Dependencies
IR1 (Reflexive rule, phản xạ):
if X Y, then X → Y
IR2 (Augmentation rule, tăng cường):
{X→Y} XZ → YZ
IR3 (Transitive rule, truyền):
{ X → Y, Y → Z } X→Z
IR4 (Decomposition, or projective, rule, chiếu):
{ X → YZ } X → Y, X → Z
IR5 (Union, or additive, rule, hợp):
{ X → Y, X → Z } X → YZ
IR6 (Pseudotransitive rule, truyền giả):
{ X → Y, WY → Z } WX → Z 26
5.2. Functional Dependencies
Inference Rules for Functional Dependencies
IR1 (Reflexive rule, phản xạ): Armstrong‟s
if X Y, then X → Y inference rules
IR2 (Augmentation rule, tăng cường): (axioms, given
facts):
{X→Y} XZ → YZ
- Sound
IR3 (Transitive rule, truyền):
- Complete
{ X → Y, Y → Z } X→Z
IR4 (Decomposition, or projective, rule, chiếu):
{ X → YZ } X → Y, X → Z
IR5 (Union, or additive, rule, hợp):
{ X → Y, X → Z } X → YZ
IR6 (Pseudotransitive rule, truyền giả):
{ X → Y, WY → Z } WX → Z 27
Functional Dependencies
q Inference Rules for Functional Dependencies
q Armstrong’s inference rules (axioms, given facts)
q Sound: Given a set of functional dependencies F
specified on a relation schema R, any dependency that
can be inferred from F by using IR1 through IR3 holds
in every relation state r of R that satisfies the
dependencies in F.
q Complete: Using IR1 through IR3 repeatedly to infer
dependencies until no more dependencies can be
inferred results in the complete set of all possible
dependencies that can be inferred from F, called the
closure of F.
5.2. Functional Dependencies
The closure of F, the set of functional
dependencies specified on relation schema R
(Bao đóng của tập phụ thuộc hàm)
F+ = set of all dependencies that include F and all
dependencies that can be inferred from F
For example: Find the closure of F as follows by
using the aforementioned inference rules.
F = { SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN} }

F+ = { SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN},


SSN → SSN, ENAME → ENAME, BDATE → BDATE, ADDRESS → ADDRESS,
DNUMBER → DNUMBER, DNAME → DNAME, DMGRSSN → DMGRSSN,
SSN → ENAME, SSN → BDATE, SSN → ADDRESS, SSN → DNUMBER,
DNUMBER → DNAME, DNUMBER → DMGRSSN,
SSN → DNAME, SSN → DMGRSSN, …
} 28
5.2. Functional Dependencies
The closure of X under F, where X is the set
of attributes of relation schema R
(Bao đóng của tập thuộc tính X đối với
tập phụ thuộc hàm F)
X = set of attributes that appears as a left-hand
side of some functional dependency in F
X+ = the set of all attributes that are dependent
on X based on F
For example: Find the closure of {SSN} under F
as follows by using the aforesaid inference rules.
F = {SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN}}

{ SSN } + = {SSN, ENAME, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN}


29
5.2. Functional Dependencies
An algorithm to determine X+ under F:

F = { SSN → ENAME, PNUMBER → {PNAME, PLOCATION}, {SSN, PNUMBER} → HOURS }

{ SSN } + = { SSN, ENAME }


{ PNUMBER } + = { PNUMBER, PNAME, PLOCATION }
{ SSN, PNUMBER } + = { SSN, ENAME, PNUMBER, PNAME, PLOCATION, HOURS }
30
5.2. Functional Dependencies
Closure X+ of a set of attributes X under F
where X is a set of attributes that appears as a
left-hand side of a functional dependency in F
F = { AB → C, A → DE, B → M, M → GH, D → IJ }

{A, B}+ = ???

{A}+ = ???

{B}+ = ???

{M}+ = ???

{D}+ = ???
31
5.2. Functional Dependencies
Closure X+ of a set of attributes X under F
where X is a set of attributes that appears as a
left-hand side of a functional dependency in F
F = { AB → C, A → DE, B → M, M → GH, D → IJ }

{A, B}+ = {A, B, C, D, E, M, G, H, I, J}

{A}+ = {A, D, E, I, J}

{B}+ = {B, M, G, H}

{M}+ = {M, G, H}

{D}+ = {D, I, J}
32
5.2. Functional Dependencies
Superkey (siêu khóa) of a relation schema R: a subset of
attributes with the property that no two tuples in any relation
state r of R should have the same combination of values for
these attributes.
One default superkey = the set of all its attributes
Key (khóa) of a relation schema R (K): a superkey of R with the
additional property that removing any attribute A from K leaves
a set of attributes K' that is not a superkey of R.
Key is called minimal superkey (siêu khóa tối tiểu).
Candidate key (khóa dự tuyển): any key of R.
Primary key (khóa chính, khóa sơ cấp): one of the candidate
keys of R is used for implementation.
Secondary key (khóa phụ, khóa thứ cấp): the remaining
candidate keys of R with NOT NULL and UNIQUE constraints.
Prime attribute (thuộc tính nguyên tố): a member (attribute) of
some candidate key of R. 33
5.2. Functional Dependencies
An algorithm to determine one key K of relation schema R:

F = { SSN → ENAME, PNUMBER → {PNAME, PLOCATION}, {SSN, PNUMBER} → HOURS }


K = { SSN, ENAME, PNUMBER, PNAME, PLOCATION, HOURS }
(K – {SSN})+ = { ENAME, PNUMBER, PNAME, PLOCATION, HOURS }
(K – {ENAME})+ = { SSN, PNUMBER, PNAME, PLOCATION, HOURS, ENAME } = R K = (K – ENAME)
(K – {PNUMBER})+ = { SSN, PNAME, PLOCATION, HOURS, ENAME }
(K – {PNAME})+ = { SSN, PNUMBER, PLOCATION, HOURS, ENAME, PNAME } = R K = (K – PNAME)
(K – {PLOCATION})+ = {SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION} = R K = (K – PLOCATION)
(K – {HOURS})+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS} = R K = (K – HOURS)
Key K = {SSN, PNUMBER} 34
5.2. Functional Dependencies

Identifying a key K of R based on a set of


given functional dependencies F
R = (A, B, C, D, E, G)

F = { C → E, A → EG, E → AD }

K = ???

35
5.2. Functional Dependencies
Identifying a key K of R based on a set of
given functional dependencies F
R = (A, B, C, D, E, G)
F = {C E, A EG, E AD}
K = {A, B, C, D, E, G}
(K-{A})+ = {B, C, D, E, G, A} K = {B, C, D, E, G}
(K-{B})+ = {C, D, E, G, A} K = {B, C, D, E, G}
(K-{C})+ = {B, D, E, G, A} K = {B, C, D, E, G}
(K-{D})+ = {B, C, E, G, A, D} K = {B, C, E, G}
(K-{E})+ = {B, C, G, E, A, D} K = {B, C, G}
(K-{G})+ = {B, C, E, A, D, G} K = {B, C} 36
5.2. Functional Dependencies

Identifying a key K of R based on a set of


given functional dependencies F
R = (A, B, C, E, G)

F = {A B, BC E, EG A}

K = ???

37
5.2. Functional Dependencies
Identifying a key K of R based on a set of
given functional dependencies F
R = (A, B, C, E, G)
F = {A B, BC E, EG A}
K = ???
K = {A, B, C, E, G}
(K-{A})+ = {B, C, E, G, A} = R K = {B, C, E, G}
(K-{B})+ = {C, E, G, A, B} = R K = {C, E, G}
(K-{C})+ = {E, G, A, B} R K = {C, E, G}
(K-{E})+ = {C, G} R K = {C, E, G}
(K-{G})+ = {C, E} R K = {C, E, G}
38
5.2. Functional Dependencies
Identifying all keys K of R based on a set of given
functional dependencies F
Find N = U - f Fright(f)
N is a set of the attributes not in the right side of any functional
dependency in F. U is a set of all the attributes of R.
Find N+, the closure of N under F
If N+ = U then there is only one key K = N.
Otherwise, find D = f Fright(f) - f Fleft(f)
D is a set of the attributes only in the right side of the functional
dependencies in F.
Find L = U – (N+ D)
L is a set of the attributes neither in the right side nor functionally
determined by N.
Let Li be any subset of L. Find (N Li)+ under F.
If (N Li)+ = U then K = N Li.
Ki, Kj such that Ki Kj, remove Kj. 39
5.2. Functional Dependencies
Identifying all keys K of R based on a set
of given functional dependencies F
R = (A, B, C, E, G)

F = {A B, BC E, EG A}

All keys K ???

40
5.2. Functional Dependencies
R = (A, B, C, E, G)
F = {A B, BC E, EG A}
N=R- f Fright(f)

N = {A, B, C, E, G} – {A, B, E} = {C, G}


N+ = {C, G}
D= f Fright(f) - f Fleft(f)

D = {A, B, E} – {A, B, C, E, G} =
L = R – (N+ D) = {A, B, E}
Subsets of L: L1 = {A}, L2 = {B}, L3 = {E}, L4 = {A, B}, L5
= {A, E}, L6 = {B, E}, L7 = {A, B, E}
(N L1)+ = {C, G, A, B, E} K1 = {C, G, A}
(N L2)+ = {C, G, B, E, A} K2 = {C, G, B}
(N L3)+ = {C, G, E, A, B} K3 = {C, G, E} 41
5.2. Functional Dependencies

Identifying all keys K of R based on a set of


given functional dependencies F
R = (A, B, C, D, E, G)

F = { C → E, A → EG, E → AD }

K = ???

42
5.2. Functional Dependencies
Identifying all keys K of R based on a set
of given functional dependencies F
R = (A, B, C, D, E, G)
F = {C E, A EG, E AD}
N=R- f Fright(f)

N = {A, B, C, D, E, G} – {A, D, E, G} = {B, C}


N+ = {B, C, E, A, D, G} = U
Only one key K = N = {B, C}

43
5.2. Functional Dependencies

Identifying all keys K of R based on a set


of given functional dependencies F
R = (A, B, C, E, G)

F = {A BC, CG E, B G, E A}

All keys K ???

44
5.2. Functional Dependencies
Identifying all keys K of R based on a set of given functional dependencies F
R = (A, B, C, E, G)
F = {A BC, CG E, B G, E A}
All keys K ???
N=R- f Fright(f)
N = {A, B, C, E, G} – {B, C, E, G, A} =
D= f Fright(f) - f Fleft(f)
D = {B, C, E, G, A} – {A, C, G, B, E} =
L = U - (N+ D) = {A, B, C, E, G}
Subsets of L: L1 = {A}, L2 = {B}, L3 = {C}, L4 = {E}, L5 = {G}, L6 = {A, B}, L7 = {A, C},
L8 = {A, E}, L9 = {A, G}, L10 = {B, C}, L11 = {B, E}, L12 = {B, G}, L13 = {C, E}, L14 = {C,
G}, L15 = {E, G}, L16 = {A, B, C}, L17 = {A, B, E}, L18 = {A, B, G}, L19 = {A, C, E}, L20 =
{A, C, G}, L21 = {B, C, E}, L22 = {B, C, G}, L23 = {A, E, G}, L24 = {B, E, G}, L25 = {C, E,
G}, L26 = {A, B, C, E}, L27 = {A, B, C, G}, L28 = {A, B, E, G}, L29 = {A, C, E, G}, L30 = {B,
C, E, G}, L31 = {A, B, C, E, G}
(N L1)+ = {A, B, C, G, E} = R K = {A}
(N L2)+ = {B, G} R
(N L3)+ = {C} R
(N L4)+ = {E, A, B, C, G} = R K = {E}
(N L5)+ = {G} R
(N L10)+ = {B, C, G, E, A} = R K = {B, C}
(N L12)+ = {B, G} R
(N L14)+ = {C, G, E, A, B} = R K = {C, G} 45
5.2. Functional Dependencies
F = {A → C, AC → D, E → AD, E → H}
R = {A, C, D, E, H}
Identify a key of R?
K = {A, C, D, E, H}
(K-{A})+ = {C, D, E, H, A} = R K = {C, D, E, H}
(K-{C})+ = {D, E, H, A, C} = R K = {D, E, H}
(K-{D})+ = {E, H, A, D, C} = R K = {E, H}
(K-{E})+ = {H} R K = {E, H}
(K-{H})+ = {E, A, D, H, C} = R K = {E}
46
5.2. Functional Dependencies
F = {A → C, AC → D, E → AD, E → H}
R = {A, C, D, E, H}
Identify all the keys of R?
N=R- f Fright(f) = {A, C, D, E, H} – {C, D, A, H}
N = {E}
N+ = {E, A, D, H, C} = R K = {E}

47
5.2. Functional Dependencies
Membership of a functional dependency X→Y
with respect to a set of functional
dependencies F, i.e. X→Y is inferred from F,
denoted by F╞ X→Y

Check if F╞ X→Y

Find X+, the closure of X under F

If Y X+ then F╞ X→Y

48
5.2. Functional Dependencies
Membership of a functional dependency X→Y
with respect to a set of functional
dependencies F, i.e. X→Y is inferred from F,
denoted by F╞ X→Y

F = {A → C, AC → D, E → AD, E → H}

F ╞ A → CD

A+ under F = {A, C, D} {C, D} F ╞ A → CD

F ╞ E → AH ??? 49
5.2. Functional Dependencies
Membership of a functional dependency X→Y
with respect to a set of functional
dependencies F, i.e. X→Y is inferred from F,
denoted by F╞ X→Y
F = {A → C, AC → D, E → AD, E → H}

F ╞ A → CD

A+ under F = {A, C, D} {C, D} F ╞ A → CD

F ╞ E → AH ???

E+ under F = {E, A, D, H, C} {A, H} F ╞ E → AH


50
5.2. Functional Dependencies
A set of functional dependencies F is said to
cover another set of functional dependencies
E if every FD in E is also in F+; that is, if every
dependency in E can be inferred from F.
Alternatively, E is covered by F.

Two sets of functional dependencies E and F


are equivalent if E+ = F+; that is, every
dependency in E can be inferred from F and
every dependency in F can be inferred from E;
that is also, E covers F and F covers E. 51
5.2. Functional Dependencies

F = {A → C, AC → D, E → AD, E → H}

G = {A → CD, E → AH}

Are F and G equivalent?

* Check if F covers G (F phủ G).

* Check if G covers F (G phủ F).

52
5.2. Functional Dependencies
F = {A → C, AC → D, E → AD, E → H}
G = {A → CD, E → AH}
Are F and G equivalent?
* Check if F covers G (F phủ G).

F╞ A → CD
A+ under F = {A, C, D} {C, D}
F╞ E → AH
E+ under F = {E, A, D, H, C} {A, H}
F covers G.
53
5.2. Functional Dependencies
F = {A → C, AC → D, E → AD, E → H}
G = {A → CD, E → AH}
Are F and G equivalent?
* Check if G covers F (G phủ F).
G╞ A → C
A+ under G = {A, C, D} {C}
G╞ AC → D
{A, C}+ under G = {A, C, D} {D}
G╞ E → AD
E+ under G = {E, A, H, C, D} {A, D}
G╞ E → H
E+ under G = {E, A, H, C, D} {H}
G covers F. In the previous slide, F covers G.
F and G are equivalent. 54
5.2. Functional Dependencies
The minimal cover F (phủ tối tiểu F) of a set
of functional dependencies E is a set of
functional dependencies that satisfies the
property that every dependency in E is in the
closure F+ of F. In addition, this property is
lost if any dependency from the set F is
removed; F must have no redundancies in it,
and the dependencies in F are in a standard
form. 55
5.2. Functional Dependencies
The minimal cover F of a set of functional
dependencies E

1. Every dependency in F has a single attribute for its right-


hand side.

2. We cannot replace any dependency X → A in F with a


dependency Y → A, where Y is a proper subset of X, and still
have a set of dependencies that is equivalent to F.

3. We cannot remove any dependency from F and still have a


set of dependencies that is equivalent to F.
56
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E

57
5.2. Functional Dependencies

The minimal cover F of a set of functional


dependencies E

E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}

The minimal cover F of E ???

58
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
The minimal cover F of E ???
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
F = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
Check for ABCD → I by removing A:
F = {A → B, BCD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ BCD → I: {B, C, D}+ under E = {B, C, D} {I}
F ╞ ABCD → I: {A, B, C, D}+ under F = {A, B, C, D, I} {I} always true
Check for ABCD → I by removing B:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ ACD → I: {A, C, D}+ under E = {A, C, D, B, I} {I}
F ╞ ABCD → I: always true
Check for ACD → I by removing C:
F = {A → B, AD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ AD → I: {A, D}+ under E = {A, D, B} {I}
F ╞ ACD → I: always true
Check for ACD → I by removing D:
F = {A → B, AC → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ AC → I: {A, C}+ under E = {A, C, B} {I}
59
F ╞ ACD → I: always true
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
The minimal cover F of E ???
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
Check for IJ → G by removing I:
F = {A → B, ACD → I, J → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ J → G: {J}+ under E = {J} {G}
F ╞ IJ → G: {I, J}+ under F = {I, J, G, H} {G} always true
Check for IJ → G by removing J:
F = {A → B, ACD → I, I → G, IJ → H, ACDJ → G, ACDJ → I}
E ╞ I → G: {I}+ under E = {I} {G}
F ╞ IJ → G: always true
Check for IJ → H by removing I:
F = {A → B, ACD → I, IJ → G, J → H, ACDJ → G, ACDJ → I}
E ╞ J → H: {J}+ under E = {J} {H}
F ╞ IJ → H: always true
Check for IJ → H by removing J:
F = {A → B, ACD → I, IJ → G, I → H, ACDJ → G, ACDJ → I}
E ╞ I → H: {I}+ under E = {I} {H}
60
F ╞ IJ → H: always true
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
The minimal cover F of E ???
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
Check for ACDJ → G by removing A:
F = {A → B, ACD → I, IJ → G, IJ → H, CDJ → G, ACDJ → I}
E ╞ CDJ → G: {C, D, J}+ under E = {C, D, J} {G}
F ╞ ACDJ → G: {A, C, D, J}+ under F = {A, C, D, J, B, I, G, H} {G} always true
Check for ACDJ → G by removing C:
F = {A → B, ACD → I, IJ → G, IJ → H, ADJ → G, ACDJ → I}
E ╞ ADJ → G: {A, D, J}+ under E = {A, D, J, B} {G}
F ╞ ACDJ → G: always true
Check for ACDJ → G by removing D:
F = {A → B, ACD → I, IJ → G, IJ → H, ACJ → G, ACDJ → I}
E ╞ ACJ → G: {A, C, J}+ under E = {A, C, J, B} {G}
F ╞ ACDJ → G: always true
Check for ACDJ → G by removing J:
F = {A → B, ACD → I, IJ → G, IJ → H, ACD → G, ACDJ → I}
E ╞ ACD → G: {A, C, D}+ under E = {A, C, D, B, I} {G}
61
F ╞ ACDJ → G: always true
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
The minimal cover F of E ???
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
Check for ACDJ → I by removing A:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, CDJ → I}
E ╞ CDJ → I: {C, D, J}+ under E = {C, D, J} {G}
F ╞ ACDJ → I: {A, C, D, J}+ under F = {A, C, D, J, B, I, G, H} {I} always true
Check for ACDJ → I by removing C:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ADJ → I}
E ╞ ADJ → I: {A, D, J}+ under E = {A, D, J, B} {I}
F ╞ ACDJ → I: always true
Check for ACDJ → I by removing D:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACJ → I}
E ╞ ACJ → I: {A, C, J}+ under E = {A, C, J, B} {I}
F ╞ ACDJ → I: always true
Check for ACDJ → I by removing J:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G}
E ╞ ACD → I: {A, C, D}+ under E = {A, C, D, B, I} {I}
62
F ╞ ACDJ → I: always true
5.2. Functional Dependencies
The minimal cover F of a set of functional dependencies E
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
The minimal cover F of E ???
E = {A → B, ABCD → I, IJ → G, IJ → H, ACDJ → GI}
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACDJ → I}
Check for ACDJ → I by removing A:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, CDJ → I}
E ╞ CDJ → I: {C, D, J}+ under E = {C, D, J} {G}
F ╞ ACDJ → I: {A, C, D, J}+ under F = {A, C, D, J, B, I, G, H} {I} always true
Check for ACDJ → I by removing C:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ADJ → I}
E ╞ ADJ → I: {A, D, J}+ under E = {A, D, J, B} {I}
F ╞ ACDJ → I: always true
Check for ACDJ → I by removing D:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G, ACJ → I}
E ╞ ACJ → I: {A, C, J}+ under E = {A, C, J, B} {I}
F ╞ ACDJ → I: always true
Check for ACDJ → I by removing J:
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G}
E ╞ ACD → I: {A, C, D}+ under E = {A, C, D, B, I} {I}
63
F ╞ ACDJ → I: always true
Functional Dependencies
q The minimal cover F of a set of functional dependencies E
q E={A→B,ABCD→I,IJ→G,IJ→H,ACDJ→GI}

q The minimal cover F of E ???

Continue till you obtain the final minimal cover F:

F = {A → B, ACD → I, IJ → G, IJ → H}
5.2. Functional Dependencies
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G}
R = {A, B, C, D, G, H, I, J}
Identify a key of R???
K = {A, B, C, D, G, H, I, J}
(K-{A})+ = {B, C, D, G, H, I, J} R K = {A, B, C, D, G, H, I, J}
(K-{B})+ = {A, C, D, G, H, I, J, B} = R K = {A, C, D, G, H, I, J}
(K-{C})+ = {A, D, G, H, I, J, B} R K = {A, C, D, G, H, I, J}
(K-{D})+ = {A, C, G, H, I, J, B} R K = {A, C, D, G, H, I, J}
(K-{G})+ = {A, C, D, H, I, J, B, G} = R K = {A, C, D, H, I, J}
(K-{H})+ = {A, C, D, I, J, B, G, H} = R K = {A, C, D, I, J}
(K-{I})+ = {A, C, D, J, B, I, G, H} = R K = {A, C, D, J}
(K-{J})+ = {A, C, D, B, I} R K = {A, C, D, J}
64
5.2. Functional Dependencies
F = {A → B, ACD → I, IJ → G, IJ → H, ACDJ → G}
R = {A, B, C, D, G, H, I, J}
Identify all keys of R???

N=R- f Fright(f) = {A, B, C, D, G, H, I, J}–{B, I, G, H} = {A, C, D, J}


N+ = {A, C, D, J, B, I, G, H} = R
R has only one key K = N = {A, C, D, J}.

65
5.3. Normal Forms Based on
Primary Keys
Normalization (Chuẩn hóa)
A process of analyzing the given relation schemas
based on their FDs and (primary) keys to achieve
the desirable properties of:
(1) minimizing redundancy

(2) minimizing the insertion, deletion, and update anomalies

Normal form of a relation


(Dạng chuẩn của một quan hệ)
The highest normal form condition that the relation
meets the degree to which it has been
normalized 66
5.3. Normal Forms Based on
Primary Keys
The process of normalization through
decomposition produces the relational
schemas that have:
The lossless join (kết không có mất thông tin) or
non-additive join (kết không có thêm thông tin)
property
(Đặc tính bảo toàn thông tin)
No spurious tuple generation problem occurs.
The dependency preservation property
(Đặc tính bảo toàn phụ thuộc hàm)
Each FD is represented in some individual relation.
67
5.3. Normal Forms Based on
Primary Keys
Normal forms based on primary keys and corresponding normalization
Generalized to all the keys for the primary key, prime attributes for the key
attributes, and nonprime attributes for the nonkey attributes

Source: [1] 68
5.3. Normal Forms Based on
Primary Keys
Pure relations based on the relational data model are all in 1NF.
Atomic attributes: KHÔNG có nhóm lặp hay quan hệ lồng.

Relations in 2NF have non-key attributes functionally dependent


on the whole key.
Atomic attributes: KHÔNG có nhóm lặp hay quan hệ lồng.
Thuộc tính không khóa KHÔNG có phụ thuộc hàm riêng phần vào
khóa.

Relations in 3NF have non-key attributes only functionally


dependent on the whole key.
Atomic attributes: Không có nhóm lặp hay quan hệ lồng.
Thuộc tính không khóa KHÔNG có phụ thuộc hàm riêng phần vào
khóa.
Thuộc tính không khóa KHÔNG có phụ thuộc hàm bắc cầu vào khóa.
69
5.3. Normal Forms Based on
Primary Keys
DLOCATIONS: non-atomic
attribute (repeating group)

Unnormalized table

1NF relation

70
5.3. Normal Forms Based on
Primary Keys

PROJS: non-atomic
attribute (composite,
repeating group)

Unnormalized table 1NF relations 71


5.3. Normal Forms Based on
Primary Keys
What problems exist with 1NF relations?

1NF relation

Primary key: {DNUMBER, DLOCATION}

Functional dependencies:
{DNUMBER, DLOCATION} {DNAME, DMGRSSN}
DNUMBER {DNAME, DMGRSSN}
72
5.3. Normal Forms Based on
Primary Keys
What problems exist with 1NF relations?

1NF relation

Redundancy

Primary key: {DNUMBER, DLOCATION}

Functional dependencies:
{DNUMBER, DLOCATION} {DNAME, DMGRSSN}
DNUMBER {DNAME, DMGRSSN} partially functionally
dependent on the key 73
5.3. Normal Forms Based on
Primary Keys
What problems exist with 1NF relations?

1NF relation

Redundancy

Insertion anomalies: (4, Houston)? (Sales, 123456789, 10)?

Deletion anomalies: (1, Houston)? (Research, 333445555, 5)?

Modification anomalies: DMGRSSN of Department 5 to 123456789?

DNAME of Department 5 to “Research & Development”?


74
5.3. Normal Forms Based on
Primary Keys
What problems exist with 1NF relations?
NORMALIZATION to 2NF relations
1NF relation

Primary key: {DNUMBER, DLOCATION}

Functional dependencies:
{DNUMBER, DLOCATION} {DNAME, DMGRSSN}
DNUMBER {DNAME, DMGRSSN} partially functionally new
dependent on the key relation75
5.3. Normal Forms Based on
Primary Keys
What problems exist with 1NF relations?
NORMALIZATION to 2NF relations
1NF relation

2NF relations

76
5.3. Normal Forms Based on
Primary Keys
What problems exist with 2NF relations?

Primary key: SSN

Functional dependencies:
SSN {ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER {DNAME, DMGRSSN}

Atomic attributes, no partial dependency on the key 2NF relation


77
5.3. Normal Forms Based on
Primary Keys
Redundancy

Primary key: SSN ANY PROBLEMS?


Functional dependencies:
SSN {ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER {DNAME, DMGRSSN} transitively functionally
dependent on the key
Atomic attributes, no partial dependency on the key 2NF relation
78
5.3. Normal Forms Based on
Primary Keys
Redundancy

Insertion anomalies: Employee 999777555 in Department 4?


Department 10 with name Sales?
Deletion anomalies: Employee 888665555? Department 5?
Modification anomalies: DMGRSSN of Department 4 to 123456789?
DNAME of Department 5 to “Research & Development”?

Atomic attributes, no partial dependency on the key 2NF relation 79


5.3. Normal Forms Based on
Primary Keys
Redundancy

Primary key: SSN


ANY PROBLEMS?
NORMALIZATION to 3NF
Functional dependencies:
SSN {ENAME, BDATE, ADDRESS, DNUMBER}
DNUMBER {DNAME, DMGRSSN} transitively functionally
dependent on the key
new relation
80
Atomic attributes, no partial dependency on the key 2NF relation
5.3. Normal Forms Based on
Primary Keys

2NF relation

3NF relations

81
5.3. Normal Forms Based on
Primary Keys
What problems exist with 3NF relations?

3NF relation

Candidate key = {STUDENT, COURSE}


FD1: {STUDENT, COURSE} → INSTRUCTOR
FD2: INSTRUCTOR → COURSE 82
5.3. Normal Forms Based on
Primary Keys
What problems exist with 3NF relations?

3NF relation

Redundancy

Candidate key = {STUDENT, COURSE}


FD1: {STUDENT, COURSE} → INSTRUCTOR
FD2: INSTRUCTOR → COURSE a prime attribute is functionally dependent
83
on a NONprime attribute.
5.3. Normal Forms Based on
Primary Keys
What problems exist with 3NF relations?

3NF relation

Redundancy

Insertion anomalies: Elmasri teaches Database?


Deletion anomalies: Wong studies Database?
Modification anomalies: Mark teaches Algorithms instead of Database? 84
5.3. Normal Forms Based on
Primary Keys
What problems exist with 3NF relations?

3NF relation

Redundancy

NORMALIZATION

BCNF relations

Candidate key = {STUDENT, COURSE}


FD1: {STUDENT, COURSE} → INSTRUCTOR
FD2: INSTRUCTOR → COURSE a prime attribute is functionally dependent
85
on a NONprime attribute.
5.4. Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF)
A simpler form of 3NF, but stricter than 3NF
Every relation in BCNF is also in 3NF; however, a
relation in 3NF is not necessarily in BCNF.
For a relation in 3NF, each non-key attribute must:
Fully functionally dependent on every key
NOT transitively dependent on every key
Not functionally dependent on other nonkey attributes
For a relation in BCNF, all key/non-key attributes must be
functionally dependent on superkeys.
Every X → A holds in R and X is a superkey. R in BCNF.
Any X → A holds in R and X is not a superkey.
R is not in BCNF. 86
5.4. Boyce-Codd Normal Form
3NF relation

Candidate key = {STUDENT, COURSE} Normalization


FD1: {STUDENT, COURSE} → INSTRUCTOR to BCNF ???
FD2: INSTRUCTOR → COURSE 87
DECOMPOSITION 1

3NF relation

BCNF relations

88
DECOMPOSITION 1

3NF relation

BCNF relations

How about FD1?


FD1: {STUDENT, COURSE} → INSTRUCTOR 89
DECOMPOSITION 2

3NF relation

BCNF relations

90
DECOMPOSITION 2

3NF relation

BCNF relations

How about FD1?


FD1: {STUDENT, COURSE} → INSTRUCTOR 91
5.5. Properties of
Relational Decompositions
Given a relation schema R and a set F of
functional dependencies that should hold on
the attributes of R specified by the database
designers, a decomposition of R is a set of
relation schemas D = {R1, R2, ... , Rm}
where no attributes are lost (the attribute
preservation condition).
𝑚

𝑅𝑖 = 𝑅
𝑖=1
92
5.5. Properties of
Relational Decompositions

Properties of relational decompositions


Dependency preservation property of a
decomposition

(Phân rã bảo toàn phụ thuộc hàm)

Lossless (nonadditive) join property of a


decomposition

(Phân rã bảo toàn thông tin)


93
5.5. Properties of
Relational Decompositions
Dependency preservation property of a decomposition
The dependency preservation condition: each functional
dependency X Y specified in F either appeared directly in
one of the relation schemas Ri in the decomposition D or
could be inferred from the dependencies that appear in some
Ri.

If a decomposition is not dependency-preserving, some


dependency is lost in the decomposition.
To check that a lost dependency holds, the JOIN of two or more
relations in the decomposition must be performed to get a
relation that includes all left- and right-hand-side attributes of
the lost dependency, and then check that the dependency holds
on the result of the JOIN.
94
5.5. Properties of
Relational Decompositions
Dependency preservation property of a decomposition
Given a set of dependencies F on R, the projection of F on R ,
denoted by 𝜋𝑅𝑖 (𝐹) where Ri is a subset of R, is the set of
dependencies X Y in F+ such that the attributes in X U Y are all
contained in Ri. Hence, the projection of F on each relation schema
Ri in the decomposition D is the set of functional dependencies in F+,
the closure of F, such that all their left- and right-hand-side
attributes are in Ri. A decomposition D = {R1, R2, ..., Rm} of R is
dependency-preserving with respect to F if the union of the
projections of F on each Ri in D is equivalent to F; that is,
+
(𝜋𝑅1 (𝐹)) ∪ ⋯ ∪ (𝜋𝑅𝑚 (𝐹)) = 𝐹+
95
5.5. Properties of
Relational Decompositions
Dependency preservation property of a decomposition
A decomposition D = {R1, R2, ..., Rm} of R is dependency-
preserving with respect to F if the union of the projections of F on
each Ri in D is equivalent to F; that is,
+
(𝜋𝑅1 (𝐹)) ∪ ⋯ ∪ (𝜋𝑅𝑚 (𝐹)) = 𝐹+

In the previous examples, do their decompositions


possess the dependency preservation property?

96
5.5. Properties of
Relational Decompositions
+
Dependency preservation property of a decomposition (𝜋𝑅1 (𝐹)) ∪ ⋯ ∪ (𝜋𝑅𝑚 (𝐹)) = 𝐹+
In the previous examples, do their decompositions possess the dependency preservation
property?

Relation: TEACH (STUDENT, COURSE, INSTRUCTOR)


Primary key: STUDENT, COURSE
Functional dependencies
FD1: {STUDENT, COURSE} → INSTRUCTOR
FD2: INSTRUCTOR → COURSE
BEFORE

Relation: STUDENT_COURSE (STUDENT, COURSE) DECOMPOSITION 1


Primary key: STUDENT, COURSE
Relation: COURSE_INSTRUCTOR (COURSE, INSTRUCTOR)
Primary key: INSTRUCTOR
Functional dependency: INSTRUCTOR → COURSE

Relation: STUDENT_INSTRUCTOR (STUDENT, INSTRUCTOR) DECOMPOSITION 2


Primary key: STUDENT, INSTRUCTOR
Relation: COURSE_INSTRUCTOR (COURSE, INSTRUCTOR)
Primary key: INSTRUCTOR
Functional dependency: INSTRUCTOR → COURSE
97
5.5. Properties of
Relational Decompositions
Lossless (nonadditive) join property of a decomposition

No spurious tuples are generated when a NATURAL JOIN


operation is applied to the relations in the decomposition.

A decomposition D = {R1, R2, ... , Rm} of R has the lossless


(nonadditive) join property with respect to the set of
dependencies F on R if, for every relation state r of R that
satisfies F, the following holds, where * is the NATURAL JOIN
of all the relations in D:

∗ 𝜋𝑅1 𝑟 , … , 𝜋𝑅𝑚 𝑟 =𝑟
98
5.5. Properties of
Relational Decompositions
Lossless (nonadditive) join property of a decomposition

Test the lossless (nonadditive) join property of a binary


decomposition

A decomposition D = {R1, R2} of R has the lossless


(nonadditive) join property with respect to a set of
functional dependencies F on R if and only if either:

- The FD ((R1 R2) (R1 - R2)) is in F+, or

- The FD ((R1 R2) (R2 - R1)) is in F+.


99
5.5. Properties of
Relational Decompositions
Test the lossless (nonadditive) join property of
a binary decomposition
A decomposition D = {R1, R2} of R has the lossless
(nonadditive) join property with respect to a set of
functional dependencies F on R if and only if either:

- The FD ((R1 R2) (R1 - R2)) is in F+, or

- The FD ((R1 R2) (R2 - R1)) is in F+.

In the previous examples, do their decompositions


possess the lossless (nonadditive) join property? 100
5.5. Properties of
Relational Decompositions
Test the lossless (nonadditive) join property of a binary decomposition

A decomposition D = {R1, R2} of R … either:


- The FD ((R1 R2) (R1 - R2)) is in F+, or
- The FD ((R1 R2) (R2 - R1)) is in F+.
Relation: TEACH (STUDENT, COURSE, INSTRUCTOR)
Primary key: STUDENT, COURSE
Functional dependencies
FD1: {STUDENT, COURSE} → INSTRUCTOR
FD2: INSTRUCTOR → COURSE BEFORE

Relation: STUDENT_COURSE (STUDENT, COURSE)


DECOMPOSITION 1
Primary key: STUDENT, COURSE
Relation: COURSE_INSTRUCTOR (COURSE, INSTRUCTOR)
COURSE STUDENT
Primary key: INSTRUCTOR
Functional dependency: INSTRUCTOR → COURSE COURSE INSTRUCTOR
Relation: STUDENT_INSTRUCTOR (STUDENT, INSTRUCTOR) DECOMPOSITION 2
Primary key: STUDENT, INSTRUCTOR
Relation: COURSE_INSTRUCTOR (COURSE, INSTRUCTOR) INSTRUCTOR STUDENT
Primary key: INSTRUCTOR
Functional dependency: INSTRUCTOR → COURSE INSTRUCTOR COURSE
101
5.6. Algorithms for
Relational Database Schema Design

Algorithm for dependency-preserving


decomposition into 3NF schemas

Algorithm for dependency-preserving and


nonadditive (lossless) join decomposition
into 3NF schemas

Algorithm for lossless (nonadditive) join


decomposition into BCNF schemas
102
5.6. Algorithms for
Relational Database Schema Design
Algorithm for dependency-preserving decomposition into 3NF schemas

103
5.6. Algorithms for
Relational Database Schema Design
Algorithm for dependency-preserving and nonadditive (lossless) join
decomposition into 3NF schemas

104
Find the minimal cover: F = {A->B, ACD->I, IJ->G, IJ->H}

Find all keys: Key = {A, C, D, J}

Start normalization: R (A, B, C, D, G, H, I, J)

F = {A->B, ACD->I, IJ->G, IJ->H}

Key = {A, C, D, J}
Identify the current normal form: 1NF because of: A->B, ACD->I
Normalization according to the algorithm:

R1 (A, B)

R2 (A, C, D, I)

R3 (I, J, G, H)

Added for key: R4 (A, C, D, J)


5.6. Algorithms for
Relational Database Schema Design

Algorithm for lossless (nonadditive) join decomposition


into BCNF schemas

106
5.6. Algorithms for
Relational Database Schema Design
Algorithm for lossless (nonadditive) join decomposition
into BCNF schemas
Given a relation schema R = (A, B, C, D, E) and a set F of functional dependencies
as follows:
F = {A → BC, CD → E, B → D, E → A}
Perform nonadditive (lossless) join decomposition on R into BCNF schemas.

Find keys of R K = {A}, {B, C}, {C, D}, {E}


Create a schema from functional dependencies with each distinct X on the left side
that violate BCNF:
R1 = (B, D), Key1 = {B},
F1 = {B → D}
R2 = (A, B, C, E), Key2 = {A}, {B, C}, {E},
F2 = {A → BCE, BC → AE, E → ABC}

Lost FD = {CD → E} 107


5.7. Multivalued Dependencies and
Fourth Normal Form
Multivalued dependencies
A multivalued dependency X Y specified on
relation schema R, where X and Y are both
subsets of R, specifies the following constraint on
any relation state r of R: if two tuple t1 and t2
exist in r such that t1[X] = t2[X], then two tuples
t3 and t4 should also exist in r with the following
properties, where Z is used to denote (R-(XUY)):
t3[X] = t4[X] = t1[X] = t2[X]

t3[Y] = t1[Y] and t4[Y] = t2[Y]

t3[Z] = t2[Z] and t4[Z] = t1[Z] 108


5.7. Multivalued Dependencies and
Fourth Normal Form

Multivalued dependencies

ENAME PNAME
ENAME DNAME
No association between PNAME and DNAME
109
5.7. Multivalued Dependencies and
Fourth Normal Form

Multivalued dependencies (MVDs)

A multivalued dependency X Y is a trivial MVD


if:
(a) Y is a subset of X

(b) X U Y = R

A trivial MVD does not specify any significant or


meaningful constraint on R.

A nontrivial MVD satisfies neither (a) nor (b).


110
5.7. Multivalued Dependencies and
Fourth Normal Form
Fourth normal form
A relation schema R is in 4NF with respect to a
set of dependencies (that includes functional
dependencies and multivalued dependencies) if,
for every nontrivial multivalued dependency X Y
in F+, X is a superkey for R.

MVDs: ENAME PNAME


ENAME DNAME
BCNF 4NF 111
Summary
Database design process from conceptual
modeling:
Conceptual database design
The entity-relationship model
Choice of a DBMS a representational data model
The relational data model

Data model mapping for a logical database schema


Database design process from synthesis and
normalization
Functional dependencies and multivalued
dependencies (constraints) 112
Summary
Normalization: 1NF, 2NF, 3NF, BCNF
Further reading: 4NF, 5NF, 6NF
Properties of relational decompositions
Attribute preservation property (always required)
Dependency preservation property
Lossless (nonadditive) join property
Algorithm for dependency-preserving
decomposition into 3NF schemas
Algorithm for dependency-preserving and
nonadditive (lossless) join decomposition into 3NF
schemas
Algorithm for lossless (nonadditive) join
decomposition into BCNF schemas 113
Chapter 5: Relational Database Design

114
Review
1. Discuss attribute semantics as an informal measure
of goodness for a relation schema. Give an example.
2. Discuss insertion, deletion, and modification
anomalies. Why are they considered bad? Illustrate
with examples.
3. Why should NULLs in a relation be avoided as
much as possible? Illustrate with examples.
4. Discuss the problem of spurious tuples and how we
may prevent it. Give an example.
5. State the informal guidelines for relation schema
design that we discussed. Illustrate how violation of
these guidelines may be harmful.
115
Review
6. What is a functional dependency? What are the possible
sources of the information that defines the functional
dependencies that hold among the attributes of a relation
schema?
7. Define first, second, and third normal forms when only
primary keys are considered. How do the general definitions of
2NF and 3NF, which consider all keys of a relation, differ from
those that consider only primary keys?
8. What undesirable dependencies are avoided when a relation
is in 2NF?
9. What undesirable dependencies are avoided when a relation
is in 3NF?
10. In what way do the generalized definitions of 2NF and 3NF
extend the definitions beyond primary keys?
11. Define Boyce-Codd normal form. How does it differ from
3NF? Why is it considered a stronger form of 3NF? 116
Review
12. What is meant by the completeness and soundness of Armstrong‟s
inference rules?
13. What is meant by the closure of a set of functional dependencies?
Illustrate with an example.
14. When are two sets of functional dependencies equivalent? How can
we determine their equivalence?
15. What is a minimal set of functional dependencies? Does every set
of dependencies have a minimal equivalent set? Is it always unique?
16. What is meant by the attribute preservation condition on a
decomposition?
17. What is the dependency preservation property for a
decomposition? Why is it important?
18. What is the lossless (or nonadditive) join property of a
decomposition? Why is it important?
19. Between the properties of dependency preservation and
losslessness, which one must definitely be satisfied? Why? 117
Review
20. Suppose that we have the following requirements for a university database that is used to
keep track of students‟ transcripts:
a. The university keeps track of each student‟s name (Sname), student number (Snum),
Social Security number (Ssn), current address (Sc_addr) and phone (Sc_phone), permanent
address (Sp_addr) and phone (Sp_phone), birth date (Bdate), sex (Sex), class (Class)
(„freshman‟, „sophomore‟, … , „graduate‟), major department (Major_code), minor department
(Minor_code) (if any), and degree program (Prog) („b.a.‟, „b.s.‟, … , „ph.d.‟). Both Ssn and
student number have unique values for each student.
b. Each department is described by a name (Dname), department code (Dcode), office
number (Doffice), office phone (Dphone), and college (Dcollege). Both name and code have
unique values for each department.
c. Each course has a course name (Cname), description (Cdesc), course number (Cnum),
number of semester hours (Credit), level (Level), and offering department (Cdept). The course
number is unique for each course.
d. Each section has an instructor (Iname), semester (Semester), year (Year), course
(Sec_course), and section number (Sec_num). The section number distinguishes different
sections of the same course that are taught during the same semester/year; its values are 1,
2, 3, … , up to the total number of sections taught during each semester.
e. A grade record refers to a student (Ssn), a particular section, and a grade (Grade).
Design a relational database schema for this database application. First show all the functional
dependencies that should hold among the attributes. Then design relation schemas for the
database that are each in 3NF or BCNF. Specify the key attributes of each relation. Note any
unspecified requirements, and make appropriate assumptions to render the specification
complete. 118
Review
21. Consider the relation DISK_DRIVE (Serial_number, Manufacturer,
Model, Batch, Capacity, Retailer). Each tuple in the relation
DISK_DRIVE contains information about a disk drive with a unique
Serial_number, made by a manufacturer, with a particular model
number, released in a certain batch, which has a certain storage
capacity and is sold by a certain retailer. For example, the tuple
Disk_drive („1978619‟, „WesternDigital‟, „A2235X‟, „765234‟, 500,
„CompUSA‟) specifies that WesternDigital made a disk drive with
serial number 1978619 and model number A2235X, released in batch
765234; it is 500GB and sold by CompUSA.
Write each of the following dependencies as an FD:
a. The manufacturer and serial number uniquely identifies the drive.
b. A model number is registered by a manufacturer and therefore can‟t
be used by another manufacturer.
c. All disk drives in a particular batch are the same model.
d. All disk drives of a certain model of a particular manufacturer have
exactly the same capacity. 119
Review
22.

120
Review
23. Consider the universal relation R = {A, B, C, D, E, F, G, H, I,
J} and the set of functional dependencies F = {{A, B}→{C},
{A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is
the key for R? Decompose R into 2NF, then 3NF and BCNF
relations.
24. For the following different set of functional dependencies G
= {{A, B}→{C}, {B, D}→{E, F}, {A, D}→{G, H}, {A}→{I},
{H}→{J}}. What is the key for R? Decompose R into 2NF, then
3NF and BCNF relations.
25. Consider a relation R(A, B, C, D, E) with the following
dependencies:
AB → C, CD → E, DE → B
Is AB a candidate key of this relation? If not, is ABD? Explain your
answer. How would you normalize this relation?
121
Review
26. Consider the relation R, which has attributes that hold
schedules of courses and sections at a university; R =
{Course_no, Sec_no, Offering_dept, Credit_hours,
Course_level, Instructor_ssn, Semester, Year, Days_hours,
Room_no, No_of_students}. Suppose that the following
functional dependencies hold on R:
{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no,
No_of_students, Instructor_ssn}
{Room_no, Days_hours, Semester, Year} → {Instructor_ssn,
Course_no, Sec_no}
Try to determine which sets of attributes form keys of R. How
would you normalize this relation?

122
Review
27. Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesperson#,
Commission%, Discount_amt)
Assume that a car may be sold by multiple
salespeople, and hence {Car#, Salesperson#} is the
primary key. Additional dependencies are:
Date_sold → Discount_amt
Salesperson# → Commission%
Based on the given primary key, is this relation in 1NF,
2NF, 3NF, or BCNF? Why or why not? How would you
successively normalize it completely?
123
Review
28. Consider the following relation for published books:

BOOK (Book_title, Author_name, Book_type, List_price,


Author_affil, Publisher)

Author_affil refers to the affiliation of author. Suppose the


following dependencies exist:

Book_title → Publisher, Book_type

Book_type → List_price

Author_name → Author_affil

a. What normal form is the relation in? Explain your answer.

b. Apply normalization until you cannot decompose the relations


further. State the reasons behind each decomposition.
124
Review
29. Consider the following relation:
R (Doctor#, Patient#, Date, Diagnosis,
Treat_code, Charge)
In the above relation, a tuple describes a visit of a
patient to a doctor along with a treatment code and
daily charge. Assume that diagnosis is determined
(uniquely) for each patient by a doctor. Assume that
each treatment code has a fixed charge (regardless of
patient). Is this relation in 2NF? Justify your answer
and decompose if necessary. Then argue whether
further normalization to 3NF is necessary, and if so,
perform it.
125
Review
30. Consider the following relation:
CAR_SALE (Car_id, Option_type, Option_listprice,
Sale_date, Option_discountedprice)
This relation refers to options installed in cars (e.g.,
cruise control) that were sold at a dealership, and the
list and discounted prices of the options.
If CarID → Sale_date
and Option_type → Option_listprice
and CarID, Option_type → Option_discountedprice,
argue using the generalized definition of the 3NF that
this relation is not in 3NF. Then argue from your
knowledge of 2NF, why it is not even in 2NF. 126
Review
31. Consider the relation REFRIG(Model#, Year,
Price, Manuf_plant, Color), which is abbreviated as
REFRIG(M, Y, P, MP, C), and the following set F of
functional dependencies: F = {M → MP, {M, Y} → P,
MP → C}
a. Evaluate each of the following as a candidate key
for REFRIG, giving reasons why it can or cannot be a
key: {M}, {M, Y}, {M, C}.
b. Based on the above key determination, state
whether the relation REFRIG is in 3NF and in BCNF,
and provide proper reasons.
c. Consider the decomposition of REFRIG into D =
{R1(M, Y, P), R2(M, MP, C)}. Is this decomposition
lossless? Show why. 127
Review
32. Consider the following decompositions for the relation schema R =
{A, B, C, D, E, F, G, H, I, J} with the following functional
dependencies:
F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}.
Determine whether each decomposition has (1) the dependency
preservation property, and (2) the lossless join property, with respect to
F.
Also determine which normal form each relation in the decomposition is
in.
a. D1 = {R1, R2, R3, R4, R5}; R1 = {A, B, C}, R2 = {A, D, E}, R3 = {B,
F}, R4 = {F, G, H}, R5 = {D, I, J}
b. D2 = {R1, R2, R3}; R1 = {A, B, C, D, E}, R2 = {B, F, G, H}, R3 =
{D, I, J}
c. D3 = {R1, R2, R3, R4, R5}; R1 = {A, B, C, D}, R2 = {D, E}, R3 = {B,
F}, R4 = {F, G, H}, R5 = {D, I, J}
128
Next
Chapter 6: Physical Storage - Data Management

6.1. Physical Data Storage


6.2. Indexing
6.3. Complex Data Management
Approaches (Semi-structured and
Unstructured Data)
6.4. Massive Data Management Approaches
6.5. Quality Issues: Reliability, Scalability,
Effectiveness, Efficiency

129

You might also like