0% found this document useful (0 votes)
62 views38 pages

Ecomposition Chema Ormalization: CS 564-Fall 2016

The document discusses database design theory including schema normalization and decomposition. It introduces the concepts of functional dependencies, normal forms including BCNF, and lossless-join and dependency preserving decompositions. As an example, it shows how to decompose a relation that violates BCNF into two relations to eliminate the violation and achieve BCNF. The decomposition removes certain types of redundancy while remaining lossless-join, though it is not always dependency preserving.

Uploaded by

Farah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views38 pages

Ecomposition Chema Ormalization: CS 564-Fall 2016

The document discusses database design theory including schema normalization and decomposition. It introduces the concepts of functional dependencies, normal forms including BCNF, and lossless-join and dependency preserving decompositions. As an example, it shows how to decompose a relation that violates BCNF into two relations to eliminate the violation and achieve BCNF. The decomposition removes certain types of redundancy while remaining lossless-join, though it is not always dependency preserving.

Uploaded by

Farah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

DECOMPOSITION &

SCHEMA NORMALIZATION

CS 564- Fall 2016

ACKs: Dan Suciu, Jignesh Patel, AnHai Doan


HOW TO BUILD A DB APPLICATION
• Pick an application
• Figure out what to model (ER model)
– Output: ER diagram
• Transform the ER diagram to a relational schema

• Refine the relational schema (normalization)

• Now ready to implement the schema and load the


data!

CS 564 [Fall 2016] - Paris Koutris 2


DB DESIGN THEORY

• Helps us identify the “bad” schemas and improve them


1. express constraints on the data: functional
dependencies (FDs)
2. use the FDs to decompose the relations

• The process, called normalization, obtains a schema in a


“normal form” that guarantees certain properties
– examples of normal forms: BCNF, 3NF, …

CS 564 [Fall 2016] - Paris Koutris 3


SCHEMA DECOMPOSITION

CS 564 [Fall 2016] - Paris Koutris 4


WHAT IS DECOMPOSITION?
We decompose R(A1, …, An) by creating
• R1(B1, .., Bm)
• R2(C1,…, Cl)
• where {𝐵# , … , 𝐵& } ∪ {𝐶# , … , 𝐶+ } = {𝐴# , … 𝐴. }

• The instance of R1 is the projection of R onto B1, .., Bm


• The instance of R2 is the projection of R onto C1, .., Cl

CS 564 [Fall 2016] - Paris Koutris 5


EXAMPLE: DECOMPOSITION
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

SSN name age SSN phoneNumber


934729837 Paris 24 934729837 608-374-8422
123123645 John 30 934729837 603-534-8399
384475687 Arun 20 123123645 608-321-1163
384475687 206-473-8221

CS 564 [Fall 2016] - Paris Koutris 6


DECOMPOSITION DESIDERATA

What should a good decomposition achieve?

1. minimize redundancy
2. avoid information loss (lossless-join)
3. preserve the FDs (dependency preserving)
4. ensure good query performance

CS 564 [Fall 2016] - Paris Koutris 7


EXAMPLE: INFORMATION LOSS
name age phoneNumber
Paris 24 608-374-8422
John 24 608-321-1163 Decompose into:
Arun 20 206-473-8221 R1(name, age)
R2(age, phoneNumber)

name age age phoneNumber


Paris 24 24 608-374-8422
John 24 24 608-321-1163
Arun 20 20 206-473-8221

Can we put it back together?


CS 564 [Fall 2016] - Paris Koutris 8
LOSSLESS-JOIN DECOMPOSITION
R(A, B, C)
decompose (projection)

R1(A, B) R2(B, C)

recover (natural join)


R’(A, B, C)

A schema decomposition is lossless-join if for any


initial instance R, R = R’
CS 564 [Fall 2016] - Paris Koutris 9
LOSSLESS-JOIN CRITERION
• relation R(A) + set F of FDs
• decomposition of R into R1(A1) and R2(A2)

A decomposition is lossless-join if and only if at least one of


the FDs is in F+ (the closure of F) :
1. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟏
2. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟐

CS 564 [Fall 2016] - Paris Koutris 10


EXAMPLE
• relation R(A, B, C, D)
• FD 𝐴 ⟶ 𝐵, 𝐶

lossless-join
• decomposition into R1(A, B, C) and R2(A, D)

Not lossless-join
• decomposition into R1(A, B, C) and R2(D)

CS 564 [Fall 2016] - Paris Koutris 11


DEPENDENCY PRESERVING
Given R and a set of FDs F, we decompose R into R1
and R2. Suppose:
– R1 has a set of FDs F1
– R2 has a set of FDs F2
– F1 and F2 are computed from F

A decomposition is dependency preserving if by


enforcing F1 over R1 and F2 over R2, we can enforce F
over R
CS 564 [Fall 2016] - Paris Koutris 12
GOOD EXAMPLE
Person(SSN, name, age, canDrink)
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘

decomposes into
• R1(SSN, name, age)
– 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• R2(age, canDrink)
– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
CS 564 [Fall 2016] - Paris Koutris 13
BAD EXAMPLE
R1 R2
R(A, B, C)
A B A C
• 𝐴⟶𝐵 a1 b a1 c
• 𝐵, 𝐶 ⟶ 𝐴 a2 b a2 c

Decomposes into: recover


• R1(A, B) A B C
–𝐴⟶𝐵 a1 b c

• R2(A, C) a2 b c

– no FDs here!! The recovered table


violates 𝐵, 𝐶 ⟶ 𝐴
CS 564 [Fall 2016] - Paris Koutris 14
NORMAL FORMS
A normal form represents a “good” schema design:

• 1NF (flat tables/atomic values)


• 2NF
• 3NF more
restrictive
• BCNF
• 4NF
• …

CS 564 [Fall 2016] - Paris Koutris 15


BCNF DECOMPOSITION

CS 564 [Fall 2016] - Paris Koutris 16


BOYCE-CODD NORMAL FORM (BCNF)

A relation R is in BCNF if whenever 𝑋 ⟶ 𝐵 is


a non-trivial FD, then X is a superkey in R

Equivalent definition: for every attribute set X


• either 𝑋 D = 𝑋
• or 𝑋 D = 𝑎𝑙𝑙 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠

CS 564 [Fall 2016] - Paris Koutris 17


BCNF EXAMPLE 1
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒


• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 is a “bad” FD
• The above relation is not in BCNF!

CS 564 [Fall 2016] - Paris Koutris 18


BCNF EXAMPLE 2
SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒


• key = {𝑆𝑆𝑁}
• The above relation is in BCNF!

CS 564 [Fall 2016] - Paris Koutris 19


BCNF EXAMPLE 3
SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221

• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}


• The above relation is in BCNF!
• Q: is it possible that a binary relation is not in
BCNF?
CS 564 [Fall 2016] - Paris Koutris 20
BCNF DECOMPOSITION
• Find an FD that violates the BCNF condition
𝐴# , 𝐴M , … , 𝐴. ⟶ 𝐵# , 𝐵M , …, 𝐵&
• Decompose R to R1 and R2:

R1 R2
B’s A’s remaining
attributes

• Continue until no BCNF violations are left


CS 564 [Fall 2016] - Paris Koutris 21
EXAMPLE
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

• The FD 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 violates BCNF


• Split into two relations R1, R2 as follows:
R1 R2
name
SSN phoneNumber
age

CS 564 [Fall 2016] - Paris Koutris 22


EXAMPLE CONT’D

R1 R2
name
SSN phoneNumber
age

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒


SSN name age SSN phoneNumber
934729837 Paris 24 934729837 608-374-8422
123123645 John 30 934729837 603-534-8399
384475687 Arun 20 123123645 608-321-1163
384475687 206-473-8221

CS 564 [Fall 2016] - Paris Koutris 23


BCNF DECOMPOSITION PROPERTIES

BCNF decomposition:
– removes certain types of redundancy
– is lossless-join
– is not always dependency preserving

CS 564 [Fall 2016] - Paris Koutris 24


BCNF IS LOSSLESS-JOIN
Example:
R(A, B, C) with 𝐴 ⟶ 𝐵 decomposes into:
R1(A, B) and R2(A, C)

• BCNF decomposition satisfies the lossless-join


criterion!

CS 564 [Fall 2016] - Paris Koutris 25


BCNF IS NOT DEPENDENCY PRESERVING

R(A, B, C)
• 𝐴⟶𝐵
• 𝐵, 𝐶 ⟶ 𝐴

The BCNF decomposition is:


• R1(A, B) with FD 𝐴 ⟶ 𝐵
• R2(A, C) with no FDs

There may not exist any BCNF decomposition


that is FD preserving!
CS 564 [Fall 2016] - Paris Koutris 26
BCNF EXAMPLE (1)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

What is the candidate key?


• (author, booktitle) is the only one!

Is is in BCNF?
• No, because the left hand side of both (not trivial) FDs
is not a superkey!

CS 564 [Fall 2016] - Paris Koutris 27


BCNF EXAMPLE (2)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:


• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!
• Books2 (authos, booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 not in BCNF!

CS 564 [Fall 2016] - Paris Koutris 28


BCNF EXAMPLE (3)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:


• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!

• Splitting Books2 (author, booktitle, genre, price):


– BookInfo (booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 in BCNF!
– BookAuthor (author, booktitle) in BCNF!

CS 564 [Fall 2016] - Paris Koutris 29


THIRD NORMAL FORM (3NF)

CS 564 [Fall 2016] - Paris Koutris 30


3NF DEFINITION

A relation R is in 3NF if whenever 𝑋 ⟶ 𝐴, one


of the following is true:
• 𝐴 ∈ 𝑋 (trivial FD)
• X is a superkey
• A is part of some key of R (prime attribute)

BCNF implies 3NF

CS 564 [Fall 2016] - Paris Koutris 31


3NF CONT’D
• Example: R(A, B, C) with 𝐴, 𝐵 ⟶ 𝐶 and 𝐶 ⟶ 𝐴
– is in 3NF. Why?
– is not in BCNF. Why?

• Compromise used when BCNF not achievable: aim


for BCNF and settle for 3NF
• Lossless-join and dependency preserving
decomposition into a collection of 3NF relations is
always possible!
CS 564 [Fall 2016] - Paris Koutris 32
3NF ALGORITHM

1. Apply the algorithm for BCNF decomposition until all


relations are in 3NF (we can stop earlier than BCNF)
2. Compute a minimal basis F’ of F
3. For each non-preserved FD 𝑋 ⟶ 𝐴 in F’, add a new
relation R(X, A)

CS 564 [Fall 2016] - Paris Koutris 33


3NF EXAMPLE (1)
Start with relation R (A, B, C, D) with FDs:
• 𝐴⟶𝐷
• 𝐴, 𝐵 ⟶ 𝐶
• 𝐴, 𝐷 ⟶ 𝐶
• 𝐵⟶𝐶
• 𝐷 ⟶ 𝐴, 𝐵

Step 1: find a BCNF decomposition


• R1 (B, C)
• R2 (A, B, D)

CS 564 [Fall 2016] - Paris Koutris 34


3NF EXAMPLE (2)
Start with relation R (A, B, C, D) with FDs:
• 𝐴 ⟶𝐷
• 𝐴, 𝐵 ⟶ 𝐶
• 𝐴, 𝐷 ⟶ 𝐶
• 𝐵⟶𝐶
• 𝐷 ⟶ 𝐴, 𝐵

Step 2: compute a minimal basis of the original set of FDs:


• 𝐴⟶𝐷
• 𝐵⟶𝐶
• 𝐷⟶𝐴
• 𝐷⟶𝐵

CS 564 [Fall 2016] - Paris Koutris 35


3NF EXAMPLE (3)
Start with relation R (A, B, C, D) with FDs:
• 𝐴⟶𝐷
• 𝐴, 𝐵 ⟶ 𝐶
• 𝐴, 𝐷 ⟶ 𝐶
• 𝐵⟶𝐶
• 𝐷 ⟶ 𝐴, 𝐵

Step 3: add a new relation for any FD in the basis


that is not satisfied:
• all the dependencies in F’ are satisfied!
• the resulting decomposition R1, R2 is also BCNF!

CS 564 [Fall 2016] - Paris Koutris 36


IS NORMALIZATION ALWAYS GOOD?

• Example: suppose A and B are always used


together, but normalization says they should be in
different tables
– decomposition might produce unacceptable
performance loss
• Example: data warehouses
– huge historical DBs, rarely updated after creation
– joins expensive or impractical

CS 564 [Fall 2016] - Paris Koutris 37


RECAP
• Bad schemas lead to redundancy
• To “correct” bad schemas: decompose relations
– lossless-join
– dependency preserving
• Desired normal forms
– BCNF: only superkey FDs
– 3NF: superkey FDs + dependencies with prime
attributes on the RHS

CS 564 [Fall 2016] - Paris Koutris 38

You might also like