0% found this document useful (0 votes)
41 views100 pages

People Elte Book-100-199

Chapter 2 provides an overview of data models, focusing on the relational model, schemas, keys, and constraints in relational algebra. It emphasizes the importance of functional dependencies and normalization in designing effective relational database schemas. The chapter also introduces SQL and its role in defining and altering database schemas, as well as references for further reading.

Uploaded by

villasflats.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views100 pages

People Elte Book-100-199

Chapter 2 provides an overview of data models, focusing on the relational model, schemas, keys, and constraints in relational algebra. It emphasizes the importance of functional dependencies and normalization in designing effective relational database schemas. The chapter also introduces SQL and its role in defining and altering database schemas, as well as references for further reading.

Uploaded by

villasflats.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

2.6.

SU M M ARY OF CHAPTER 2 63

I! d) A manufacturer of a PC must also make a laptop with at least as great a


processor speed.

! e) If a laptop has a larger main memory than a PC, then the laptop must
also have a higher price than the PC.

E xercise 2 .5 .2 : Express the following constraints in relational algebra. The


constraints are based on the relations of Exercise 2.3.2:

Classes(class, type, country, numGuns, bore, displacement)


Ships(name, class, launched)
Battles(name, date)
Outcomes(ship, battle, result)

You may write your constraints either as containments or by equating an ex­


pression to the empty set. For the data of Exercise 2.4.3, indicate any violations
to your constraints.

a) No class of ships may have guns with larger than 16-inch bore.

b) If a class of ships has more than 9 guns, then their bore must be no larger
than 14 inches.

! c) No class may have more than 2 ships.

! d) No country may have both battleships and battlecruisers.

!! e) No ship with more than 9 guns may be in a battle with a ship having
fewer than 9 guns that was sunk.

! E x ercise 2.5 .3 : Suppose R and S are two relations. Let C be the referen­
tial integrity constraint that says: whenever R has a tuple with some values
v i , V2 , • ■. , vn in particular attributes A \ , , . . . , A n, there must be a tuple of S
th at has the same values v i ,v 2 , . . . , vn in particular attributes B i , S 2, ■■■, Bn.
Show how to express constraint C in relational algebra.

! E x ercise 2 .5 .4 : Another algebraic way to express a constraint is Ei = E 2 ,


where both Ei and E 2 are relational-algebra expressions. Can this form of
constraint express more than the two forms we discussed in this section?

2.6 Sum mary of Chapter 2


♦ Data Models: A data model is a notation for describing the structure of
the data in a database, along with the constraints on that data. The data
model also normally provides a notation for describing operations on that
data: queries and data modifications.
64 CHAPTER 2. THE RELATIONAL MODEL OF DATA

♦ Relational Model: Relations axe tables representing information. Columns


are headed by attributes; each attribute has an associated domain, or
data type. Rows are called tuples, and a tuple has one component for
each attribute of the relation.
♦ Schemas: A relation name, together with the attributes of that relation
and their types, form the relation schema. A collection of relation schemas
forms a database schema. Particular data for a relation or collection of
relations is called an instance of that relation schema or database schema.
♦ Keys: An important type of constraint on relations is the assertion that
an attribute or set of attributes forms a key for the relation. No two
tuples of a relation can agree on all attributes of the key, although they
can agree on some of the key attributes.
♦ Semistructured Data Model: In this model, data is organized in a tree or
graph structure. XML is an important example of a semistructured data
model.
♦ SQL: The language SQL is the principal query language for relational
database systems. The current standard is called SQL-99. Commercial
systems generally vary from this standard but adhere to much of it.
♦ Data Definition: SQL has statements to declare elements of a database
schema. The CREATE TABLE statement allows us to declare the schema
for stored relations (called tables), specifying the attributes, their types,
default values, and keys.
♦ Altering Schemas: We can change parts of the database schema with an
ALTER statement. These changes include adding and removing attributes
from relation schemas and changing the default value associated with an
attribute. We may also use a DROP statement to completely eliminate
relations or other schema elements.
♦ Relational Algebra: This algebra underlies most query languages for the
relational model. Its principal operators are union, intersection, differ­
ence, selection, projection, Cartesian product, natural join, theta-join,
and renaming.
♦ Selection and Projection: The selection operator produces a result con­
sisting of all tuples of the argument relation that satisfy the selection
condition. Projection removes undesired columns from the argument re­
lation to produce the result.
♦ Joins: We join two relations by comparing tuples, one from each relation.
In a natural join, we splice together those pairs of tuples that agree on all
attributes common to the two relations. In a theta-join, pairs of tuples
are concatenated if they meet a selection condition associated with the
theta-join.
2.7. REFERENCES FOR CHAPTER 2 65

♦ Constraints in Relational Algebra: Many common kinds of constraints can


be expressed as the containment of one relational algebra expression in
another, or as the equality of a relational algebra expression to the empty
set.

2.7 References for Chapter 2


The classic paper by Codd on the relational model is [1]. This paper introduces
relational algebra, as well. The use of relational algebra to describe constraints
is from [2], References for SQL are given in the bibliographic notes for Chap­
ter 6 .
The semistructured data model is from [3]. XML is a standard developed
by the World-Wide-Web Consortium. The home page for information about
XML is [4],

1. E. F. Codd, “A relational model for large shared data banks,” Comm.


ACM 13:6, pp. 377-387, 1970.

2. J.-M. Nicolas, “Logic for improving integrity checking in relational data­


bases,” Acta Informatica 18:3, pp. 227-253, 1982.
3. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object ex­
change across heterogeneous information sources,” IEEE Intl. Conf. on
Data Engineering, pp. 251-260, March 1995.
4. World-Wide-Web Consortium, http://www.w3.org/XM L/
Chapter 3

D esign Theory for


R elational Databases

There are many ways we could go about designing a relational database schema
for an application. In Chapter 4 we shall see several high-level notations for
describing the structure of data and the ways in which these high-level designs
can be converted into relations. We can also examine the requirements for a
database and define relations directly, without going through a high-level inter­
mediate stage. Whatever approach we use, it is common for an initial relational
schema to have room for improvement, especially by eliminating redundancy.
Often, the problems with a schema involve trying to combine too much into
one relation.
Fortunately, there is a well developed theory for relational databases: “de­
pendencies,” their implications for what makes a good relational database
schema, and what we can do about a schema if it has flaws. In this chapter,
we first identify the problems that are caused in some relation schemas by the
presence of certain dependencies; these problems are referred to as “anomalies.”
Our discussion starts with “functional dependencies,” a generalization of the
idea of a key for a relation. We then use the notion of functional dependencies
to define normal forms for relation schemas. The impact of this theory, called
“normalization,” is that we decompose relations into two or more relations when
th at will remove anomalies. Next, we introduce “multivalued dependencies,”
which intuitively represent a condition where one or more attributes of a relation
are independent from one or more other attributes. These dependencies also
lead to normal forms and decomposition of relations to eliminate redundancy.

3.1 Functional D ependencies


There is a design theory for relations that lets us examine a design carefully
and make improvements based on a few simple principles. The theory begins by

67
68 CHAPTER 3. DESIGN THEORY FOR RELATIO NAL DATABASES

having us state the constraints that apply to the relation. The most common
constraint is the “functional dependency,” a statement of a type that generalizes
the idea of a key for a relation, which we introduced in Section 2.5.3. Later in
this chapter, we shall see how this theory gives us simple tools to improve our
designs by the process of “decomposition” of relations: the replacement of one
relation by several, whose sets of attributes together include all the attributes
of the original.

3.1.1 Definition of Functional Dependency


A functional dependency (FD) on a relation R is a statement of the form “If two
tuples of R agree on all of the attributes A-i,A2. . . . ,A„ (i.e., the tuples have
the same values in their respective components for each of these attributes),
then they must also agree on all of another list of attributes B±, B 2 , ■■■, B m.
We write this FD formally as Ai A 2 ■■■A n B 1 B 2 ■■■B m and say that
“A i , A 2 , ... , A n functionally determine B i, B 2 , . ■■, B m”
Figure 3.1 suggests what this FD tells us about any two tuples t and u in the
relation R. However, the ^4’s and B's can be anywhere; it is not necessary for
the A’s and B ’s to appear consecutively or for the A’s to precede the B ’s.

If t and T hen they


u agree m ust agree
here, here

Figure 3.1: The effect of a functional dependency on two tuples.

If we can be sure every instance of a relation R will be one in which a given


FD is true, then we say that R satisfies the FD. It is important to remember
th at when we say that R satisfies an FD / , we are asserting a constraint on R,
not just saying something about one particular instance of R.
It is common for the right side of an FD to be a single attribute. In fact,
we shall see that the one functional dependency A 1 A 2 ■■■A n —> B 1 B 2 ■• ■B m is
equivalent to the set of FD’s:
A i A2 ■• ■A n —> Bi
A\ A 2 • • - A n —> B 2

A 1 A 2 ■■■A n —¥ B m
3.1. FUNCTIONAL DEPENDENCIES 69

title year length genre studioN am e starN am e


Star Wars 1977 124 SciFi Fox Carrie Fisher
Star Wars 1977 124 SciFi Fox Mark Hamill
Star Wars 1977 124 SciFi Fox Harrison Ford
Gone With the Wind 1939 231 drama MGM Vivien Leigh
Wayne’s World 1992 95 comedy Paramount Dana Carvey
Wayne’s World 1992 95 comedy Paramount Mike Meyers

Figure 3.2: An instance of the relation Moviesl(title, year, length,


genre, studioName, starName)

E x am p le 3 .1 : Let us consider the relation

Moviesl(title, year, length, genre, studioName, starName)

an instance of which is shown in Fig. 3.2. While related to our running Movies
relation, it has additional attributes, which is why we call it “M oviesl” in­
stead of “Movies.” Notice that this relation tries to “do too much.” It holds
information that in our running database schema was attributed to three dif­
ferent relations: Movies, S tudio, and S ta rs ln . As we shall see, the schema for
Moviesl is not a good design. But to see what is wrong with the design, we
must first determine the functional dependencies that hold for the relation. We
claim th at the following FD holds:

title year —> length genre studioName

Informally, this FD says that if two tuples have the same value in their
title components, and they also have the same value in their year compo­
nents, then these two tuples must also have the same values in their length
components, the same values in their genre components, and the same values
in their studioName components. This assertion makes sense, since we believe
th at it is not possible for there to be two movies released in the same year
with the same title (although there could be movies of the same title released
in different years). This point was discussed in Example 2.1. Thus, we expect
th at given a title and year, there is a unique movie. Therefore, there is a unique
length for the movie, a unique genre, and a unique studio.
On the other hand, we observe th at the statement

title year —> starName

is false; it is not a functional dependency. Given a movie, it is entirely possible


th at there is more than one star for the movie listed in our database. Notice
th at even had we been lazy and only listed one star for Star Wars and one star
for Wayne’s World (just as we only listed one of the many stars for Gone With
the Wind), this FD would not suddenly become true for the relation Moviesl.
70 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

The reason is that the FD says something about all possible instances of the
relation, not about one of its instances. The fact that we could have an instance
with multiple stars for a movie rules out the possibility that title and year
functionally determine starName. □

3.1.2 Keys of Relations


We say a set of one or more attributes { A i ,A 2 , ... , An } is a key for a relation
R if:
1. Those attributes functionally determine all other attributes of the rela­
tion. That is, it is impossible for two distinct tuples of R to agree on all
of A i , A 2 , . . . , A n.
2. No proper subset of { A i ,A 2 , . . . , A n} functionally determines all other
attributes of R; i.e., a key must be minimal.
When a key consists of a single attribute A, we often say that A (rather than
{,4}) is a key.
Example 3.2: Attributes {title, year, starName} form a key for the relation
Moviesl of Fig. 3.2. First, we must show that they functionally determine all
the other attributes. That is, suppose two tuples agree on these three attributes:
title, year, and starName. Because they agree on title and year, they must
agree on the other attributes — length, genre, and studioName — as we
discussed in Example 3.1. Thus, two different tuples cannot agree on all of
title, year, and starName; they would in fact be the same tuple.
Now, we must argue that no proper subset of {title, year, starName}
functionally determines all other attributes. To see why, begin by observing
that title and year do not determine starName, because many movies have
more than one star. Thus, {title, year} is not a key.
{year, starName} is not a key because we could have a star in two movies
in the same year; therefore
year starName —» title

is not an FD. Also, we claim that {title, starName} is not a key, because two
movies with the same title, made in different years, occasionally have a star in
common.1 □
Sometimes a relation has more than one key. If so, it is common to desig­
nate one of the keys as the primary key. In commercial database systems, the
choice of primary key can influence some implementation issues such as how
the relation is stored on disk. However, the theory of FD’s gives no special role
to “primary keys.”
1Since we asserted in an earlier book th a t th e re w ere no know n exam ples o f th is phe­
n o m en o n , several people have show n us we w ere w rong. I t ’s a n in terestin g challenge to
discover s ta rs th a t a p p e a re d in tw o versions of th e sam e m ovie.
3.1. FUNCTIONAL DEPENDENCIES 71

W hat Is “Functional” About Functional


Dependencies?
A 1 A 2 ■■■A n —» B is called a “functional” dependency because in principle
there is a function th at takes a list of values, one for each of attributes
A i , A 2 , . . . , A n and produces a unique value (or no value at all) for B.
For instance, in the Moviesl relation, we can imagine a function that
takes a string like "Star Wars" and an integer like 1977 and produces the
unique value of length, namely 124, that appears in the relation Moviesl.
However, this function is not the usual sort of function th at we meet in
mathematics, because there is no way to compute it from first principles.
T hat is, we cannot perform some operations on strings like "Star Wars"
and integers like 1977 and come up with the correct length. Rather, the
function is only computed by lookup in the relation. We look for a tuple
with the given title and year values and see what value that tuple has
for length.

3.1.3 Superkeys
A set of attributes that contains a key is called a superkey, short for “superset
of a key.” Thus, every key is a superkey. However, some superkeys are not
(minimal) keys. Note that every superkey satisfies the first condition of a key: it
functionally determines all other attributes of the relation. However, a superkey
need not satisfy the second condition: minimality.

E x am p le 3 .3 : In the relation of Example 3.2, there are many superkeys. Not


only is the key

{title, year, starName}

a superkey, but any superset of this set of attributes, such as

{title, year, starName, length, studioName}

is a superkey. □

3.1.4 Exercises for Section 3.1


E x ercise 3 .1 .1 : Consider a relation about people in the United States, includ­
ing their name, Social Security number, street address, city, state, ZIP code,
area code, and phone number (7 digits). W hat FD’s would you expect to hold?
W hat are the keys for the relation? To answer this question, you need to know
something about the way these numbers are assigned. For instance, can an area
72 CHAPTER 3. DESIGN TH EORY FOR RELATIO N AL DATABASES

Other Key Terminology


In some books and articles one finds different terminology regarding keys.
One can find the term “key” used the way we have used the term “su­
perkey,” th at is, a set of attributes th at functionally determine all the
attributes, with no requirement of minimality. These sources typically use
the term “candidate key” for a key that is minimal — that is, a “key” in
the sense we use the term.

code straddle two states? Can a ZIP code straddle two area codes? Can two
people have the same Social Security number? Can they have the same address
or phone number?

E x ercise 3 .1 .2 : Consider a relation representing the present position of mole­


cules in a closed container. The attributes are an ID for the molecule, the x, y,
and z coordinates of the molecule, and its velocity in the x, y, and z dimensions.
W hat FD’s would you expect to hold? W hat are the keys?

E x ercise 3 .1 .3 : Suppose R is a relation with attributes A 1, A 2, . . . , A n . As a


function of n, tell how many superkeys R has, if:

a) The only key is Ai.


b) The only keys are Ai and Ai-
c) The only keys are { y l i,^ } and {Ag,Ai}.
d) The only keys are { ^ 1, ^ 2} and

3.2 Rules A bout Functional D ependencies


In this section, we shall learn how to reason about FD ’s. That is, suppose we
are told of a set of FD ’s that a relation satisfies. Often, we can deduce that the
relation must satisfy certain other FD ’s. This ability to discover additional FD’s
is essential when we discuss the design of good relation schemas in Section 3.3.

3.2.1 Reasoning A bout Functional Dependencies


Let us begin with a motivating example that will show us how we can infer a
functional dependency from other given FD ’s.

E x am p le 3 .4 : If we are told that a relation R(A, B, C) satisfies the FD’s


A —> B and B —> C, then we can deduce that R also satisfies the FD A —> C.
How does that reasoning go? To prove th at A C, we must consider two
tuples of R that agree on A and prove they also agree on C.
3.2. RULES ABO U T FUNCTIONAL DEPENDENCIES 73

Let the tuples agreeing on attribute A be (a, 61, C i ) and (0 , 62, 02). Since R
satisfies A B , and these tuples agree on A, they must also agree on B. That
is, 61 = 62, and the tuples are really (a,b,Ci) and (0 , 6, 02), where 6 is both 61
and 62. Similarly, since R satisfies B C, and the tuples agree on B, they
agree on C. Thus, c\ = C2 ; i.e., the tuples do agree on C. We have proved
th at any two tuples of R that agree on A also agree on C, and that is the FD
A ^C . □

FD’s often can be presented in several different ways, without changing the
set of legal instances of the relation. We say:

• Two sets of FD ’s 5 and T are equivalent if the set of relation instances


satisfying S is exactly the same as the set of relation instances satisfying
T.
• More generally, a set of FD’s S follows from a set of FD’s T if every
relation instance that satisfies all the FD ’s in T also satisfies all the FD ’s
in S.

Note then that two sets of FD’s S and T are equivalent if and only if S follows
from T, and T follows from S.
In this section we shall see several useful rules about FD ’s. In general, these
rules let us replace one set of FD ’s by an equivalent set, or to add to a set of
FD ’s others th at follow from the original set. An example is the transitive rule
th at lets us follow chains of FD ’s, as in Example 3.4. We shall also give an
algorithm for answering the general question of whether one FD follows from
one or more other FD’s.

3.2.2 The Splitting/C om bining Rule


Recall th at in Section 3.1.1 we commented that the FD:

A i A 2 ■• • A n —> B i B 2 ■■■B m

was equivalent to the set of FD ’s:

An —»•B\,
A1A2 ■■■ An -* B2, ... , A1A2 ■■■
A1A2 • ■■ An —> Bm

That is, we may split attributes on the right side so that only one attribute
appears on the right of each FD. Likewise, we can replace a collection of FD ’s
having a common left side by a single FD with the same left side and all the
right sides combined into one set of attributes. In either event, the new set of
FD’s is equivalent to the old. The equivalence noted above can be used in two
ways.

• We can replace an FD A \ A 2 -- -A n —> B \ B 2 ■■■B m by a set of FD’s


A i A 2 ■■■A n —s> Bi for i = 1, 2, . . . ,m. This transformation we call the
splitting rule.
74 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

• We can replace a set of FD’s A 1 A 2 ■■■A n Bi for i = 1, 2, . . . , m by the


single FD A± A 2 ■■■A n —> B 1 B 2 ■■■B m. We call this transformation the
combining rule.
E x am p le 3.5 : In Example 3.1 the set of FD’s:
title year —► length
title year —> genre
title year —¥ studioName
is equivalent to the single FD:
title year —► length genre studioName
that we asserted there. □
The reason the splitting and combining rules axe true should be obvious.
Suppose we have two tuples that agree in A i , A 2 , . . . , A n. As a single FD,
we would assert “then the tuples must agree in all of B i , B 2, . . . , B m.” As
individual FD’s, we assert “then the tuples agree in B \, and they agree in B 2,
and, . .., and they agree in B m.” These two conclusions say exactly the same
thing.
One might imagine that splitting could be applied to the left sides of FD’s
as well as to right sides. However, there is no splitting rule for left sides, as the
following example shows.
E x am p le 3.6 : Consider one of the FD’s such as:
title year —t length
for the relation Moviesl in Example 3.1. If we try to split the left side into
title —> length
year length
then we get two false FD’s. That is, t i t l e does not functionally determine
len g th , since there can be several movies with the same title (e.g., King Kong)
but of different lengths. Similarly, y ear does not functionally determine length,
because there are certainly movies of different lengths made in any one year.

3.2.3 Trivial Functional Dependencies


A constraint of any kind on a relation is said to be trivial if it holds for every
instance of the relation, regardless of what other constraints are assumed. When
the constraints are FD’s, it is easy to tell whether an FD is trivial. They are
the FD’s Ai A 2 ■■■A n — ByB2 ■■■B m such that
{B 1,B 2 ,... , B m} C {A i , A 2, . . . , A n }
That is, a trivial FD has a right side that is a subset of its left side. For example,
3.2. RULES ABO U T FUNCTIONAL DEPENDENCIES 75

title year —¥ title

is a trivial FD, as is

title —¥ title

Every trivial FD holds in every relation, since it says that “two tuples that
agree in all of A i, A 2 , ... , A n agree in a subset of them.” Thus, we may assume
any trivial FD, without having to justify it on the basis of what FD’s are
asserted for the relation.
There is an intermediate situation in which some, but not all, of the at­
tributes on the right side of an FD are also on the left. This FD is not trivial,
but it can be simplifed by removing from the right side of an FD those attributes
th at appear on the left. That is:

• The FD A 1 A 2 ■■■A„ B i B 2 ■■■B m is equivalent to

A 1 A 2 ■■■A n —> C 1 C 2 ■■■C k

where the C ’s are all those B ’s th at are not also .4’s.

We call this rule, illustrated in Fig. 3.3, the trivial-dependency rule.

If t and T hen they


u agree m ust agree
on the A’s on the 5 s

So surely
they agree
on the C s

Figure 3.3: The trivial-dependency rule

3.2.4 Com puting the Closure of Attributes


Before proceeding to other rules, we shall give a general principle from which
all true rules follow. Suppose { A i , A 2 , . . . , A n} is a set of attributes and S
76 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

is a set of FD’s. The closure of { A i ,A 2 , ... , A n} under the FD’s in S is the


set of attributes B such that every relation that satisfies all the FD’s in set
S also satisfies A 1A 2 ---A„ —►B. That is, A i A 2 ■■■A n -» B follows from
the FD’s of S. We denote the closure of a set of attributes A\ A 2 ■■•A n by
{ A i ,A 2 , . . . , A n}+. Note that A i , A 2, . . . , A n are always in { A \ ,A 2 , . . . , A n}+
because the FD Ai A 2 • ■• A n ►Ai is trivial when i is one of 1, 2, . . . , n.

Figure 3.4: Computing the closure of a set of attributes

Figure 3.4 illustrates the closure process. Starting with the given set of
attributes, we repeatedly expand the set by adding the right sides of FD’s as
soon as we have included their left sides. Eventually, we cannot expand the set
any further, and the resulting set is the closure. More precisely:
A lg o rith m 3.7: Closure of a Set of Attributes.
INPUT: A set of attributes { A i ,A 2 , ... , A n} and a set of FD’s S.

O U T P U T : The closure { A i ,A 2 , . . . , A n}+.

1. If necessary, split the FD’s of 5, so each FD in S has a single attribute


on the right.
2. Let X be a set of attributes that eventually will become the closure.
Initialize X to be { ^ 1, ^ 2, . . . , A n}.
3. Repeatedly search for some FD
Bi B 2 ■■■B m C
such that all of £?i, B 2, . . . , B m are in the set of attributes X , but C is not.
Add C to the set X and repeat the search. Since X can only grow, and
the number of attributes of any relation schema must be finite, eventually
nothing more can be added to X , and this step ends.
3.2. RULES ABO U T FUNCTIONAL DEPENDENCIES 77

4. The set X , after no more attributes can be added to it, is the correct
value of {A i , A 2, . .. , A„}+.


E x am p le 3 .8 : Let us consider a relation with attributes A, B, C, D, E, and
F. Suppose that this relation has the FD’s A B —>• C, B C -» AD, D —» E, and
C F —>B. W hat is the closure of {A, B}, that is, { A , B } +?
First, split B C —> AD into B C —> A and B C —> D. Then, start with
X — {A, B}. First, notice that both attributes on the left side of FD A B —» C
are in X , so we may add the attribute C, which is on the right side of that FD.
Thus, after one iteration of Step 3, X becomes { A ,B ,C } .
Next, we see that the left sides of B C ->• A and B C —» D are now contained
in X , so we may add to X the attributes A and D. A is already there, but
D is not, so X next becomes { A ,B ,C ,D } . At this point, we may use the FD
D -> E to add E to X , which is now {A, B, C, D, E}. No more changes to X
are possible. In particular, the FD C F —> B can not be used, because its left
side never becomes contained in X . Thus, { A , B } + — {A , B , C , D , E }. □

By computing the closure of any set of -attributes, we can test whether


any given FD A i A 2 ■■• A n -¥ B follows from a set of FD ’s S. First compute
{j4i, A 2, ... , A n}+ using the set of FD ’s S. If B is in { A i , A 2, ... , A n }+, then
A \ A 2 ■■■An -> B does follow from S, and if B is not in { A i , A 2, ... , A n}+, then
this FD does not follow from S. More generally, A \ A 2 -- ■A n —*• B \ B 2 ■■■B m
follows from set of FD’s S if and only if all of B \, B 2, . . . , B m are in

{ A i , A 2, . .. , A n}+

E x am p le 3 .9 : Consider the relation and FD’s of Example 3.8. Suppose we


wish to test whether A B -* D follows from these FD’s. We compute {A, B }+ ,
which is { A , B , C , D , E } , as we saw in that example. Since D is a member of
the closure, we conclude that A B —>• D does follow.
On the other hand, consider the FD D —¥ A. To test whether this FD follows
from the given FD’s, first compute {D }+. To do so, we start with X = {D}.
We can use the FD D —►E to add E to the set X . However, then we are stuck.
We cannot find any other FD whose left side is contained in X — {D , E }, so
{D }+ = {D, E}. Since A is not a member of {D, E}, we conclude th at D A
does not follow. □

3.2.5 W hy the Closure Algorithm Works


In this section, we shall show why Algorithm 3.7 correctly decides whether or
not an FD A\ A 2 ■■• A n -> B follows from a given set of FD ’s S. There are two
parts to the proof:

1. We must prove that Algorithm 3.7 does not claim too much. That is, we
must show that if A\ A 2 ■• ■A„ —> B is asserted by the closure test (i.e.,
78 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

B is in {A i , A 2, • • ■>A n}+), then A i A 2 ■■■A n -» B holds in any relation


that satisfies all the FD’s in S.
2. We must prove that Algorithm 3.7 does not fail to discover a FD that
truly follows from the set of FD’s S.

W h y th e C losu re A lg o rith m C laim s on ly True F D ’s


We can prove by induction on the number of times that we apply the growing
operation of Step 3 that for every attribute D in X , the FD A \ A 2 ■■■A„ —>D
holds. That is, every relation R satisfying all of the FD’s in S also satisfies
A 1 A 2 ■■■A n —>D.
BASIS: The basis case is when there are zero steps. Then D must be one of
A 1, A 2, . . . , A n. and surely AiA% ■■■A n -¥ D holds in any relation, because it
is a trivial FD.
I N D U C T I O N : For the induction, suppose D was added when we used the FD
Bi B 2 ■■■B m —>■D of S. We know by the inductive hypothesis that R satisfies
A 1A 2 ---A„ —> B \ B 2 - ■-B m. Now, suppose two tuples of R agree on all of
A l t A 2, ... , A n. Then since R satisfies A i A 2 ■■■A n -¥ B \ B 2 ■• •B m, the two
tuples must agree on all of B i , B 2, ... , B m. Since R satisfies B \ B 2 • ■■B m —>■D,
we also know these two tuples agree on D. Thus, R satisfies A i A 2 ■■■A n —l D.

W h y th e C losure A lg o r ith m 'd isc o v e rs A ll True F D ’s


Suppose A \ A 2 • • •A„ B were an FD that Algorithm 3.7 says does not follow
from set S. That is, the closure of {Ai, A2, ... ,A„} using set of FD’s S does
not include B. We must show that FD A i A 2 ■■■A n —> B really doesn’t follow
from S. That is, we must show that there is at least one relation instance that
satisfies all the FD’s in S, and yet does not satisfy A \ A 2 ■■■A n B.
This instance I is actually quite simple to construct; it is shown in Fig. 3.5.
I has only two tuples: t and s. The two tuples agree in all the attributes
of {A i , A 2,. .. , A n} +, and they disagree in all the other attributes. We must
show first that I satisfies all the FD’s of S, and then that it does not satisfy
A \ A 2 ■■■A n —> B.

{A i , A 2, ... , A n}+ Other Attributes


~J: 1 1 1 ••• 1 1 00 0 00
s: 1 1 1 • ■• 1 1 1 1 1 ••• 1 1

Figure 3.5: An instance I satisfying S but not A \ A 2 ■■■A n B

Suppose there were some FD C\C2 ■■■Ck —> D in set S (after splitting
right sides) that instance I does not satisfy. Since I has only two tuples, t
and s, those must be the two tuples that violate C\C2 ■■-Ck -* D. That is, t
and s agree in all the attributes of {Ci, C2, ... , Ck}, yet disagree on D. If we
3.2. RULES ABO U T FUNCTIONAL DEPENDENCIES 79

examine Fig. 3.5 we see th at all of C\ , C2, .. ■,Ck must be among the attributes
of {A i , A 2, . .. , A n}+, because those are the only attributes on which t and s
agree. Likewise, D must be among the other attributes, because only on those
attributes do t and s disagree.
But then we did not compute the closure correctly. CiC2 ---Ck ->■ D should
have been applied when X was { A i , A 2, . .. , A n} to add D to X . We conclude
th at C\C2 ■■-Ck —►D cannot exist; i.e., instance I satisfies S.
Second, we must show that I does not satisfy A \ A 2 ■■■A n —> B. However,
this part is easy. Surely, A \ , A 2, . . . , A n are among the attributes on which t and
s agree. Also, we know that B is not in {A \ , A 2, . .. , A n}+, so B is one of the
attributes on which t and s disagree. Thus, I does not satisfy A \ A 2 •■■An -¥ B.
We conclude that Algorithm 3.7 asserts neither too few nor too many FD’s; it
asserts exactly those FD ’s th at do follow from S.

3.2.6 The Transitive Rule


The transitive rule lets us cascade two FD ’s, and generalizes the observation of
Example 3.4.

• If A\A% • ■• A n B i B 2 ■■■B m and B \ B 2 ■■■B m -»• C \C2 ---Ck hold in


relation R, then Ai A 2 ■■■A„ —>■C\ C2 • ■■Ck also holds in R.

If some of the C ’s are among the A’s, we may eliminate them from the right
side by the trivial-dependencies rulev
To see why the transitive rule holds, apply the test of Section 3.2.4. To
test whether A i A 2 ■■■A n —> C\C2 ■■■Cu holds, we need to compute the closure
{Ai, A 2, ... , A n}+ with respect to the two given FD ’s.
The FD A i A 2 ■■■A n —> B \ B 2 ■■■B m tells us that all of B i , B 2, ... , B m are
in {A i , A 2, . .. , A n}+. Then, we can use the FD B i B 2 ■■■B m —¥ C\C2 ■■■Ck
to add C i,C 2, ... ,Ck to {A i , A 2, ... , A n }+. Since all the C ’s are in

{ A i , A 2, ... , A n}+

we conclude that A \ A 2 ■■• A n -»■ C\ C2 ■■■Ck holds for any relation th at satisfies
both A i A 2 ■• • A n —¥ B \ B 2 • • • B m and B \ B 2 ■■■B m CiC2 • • ■Ck-

E x am p le 3.1 0 : Here is another version of the Movies relation that includes


both the studio of the movie and some information about that studio.
title year length genre studioName studioAddr
Star Wars 1977 124 sciFi Fox Hollywood
Eight Below 2005 120 drama Disney Buena Vista
Wayne’s World 1992 95 comedy Paramount Hollywood

Two of the FD ’s that we might reasonably claim to hold are:


title year -» studioName
studioName —> studioAddr
80 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

Closures and Keys


Notice that {Ai,A<2 , ... ,A„}+ is the set of all attributes of a relation if
and only if A \ , A 2 , . . . ,A„ is a superkey for the relation. For only then
does A i , A 2 , . . . , An functionally determine all the other attributes. We
can test if A i , A 2, . . . , An is a key for a relation by checking first that
{ A i ,A 2 , ... , A n}+ is all attributes, and then checking that, for no set X
formed by removing one attribute from { A i ,A 2 , ... , A n}, is X + the set
of all attributes.

The first is justified because there can be only one movie with a given title
and year, and there is only one studio that owns a given movie. The second is
justified because studios have unique addresses.
The transitive rule allows us to combine the two FD’s above to get a new
FD:

title year —> studioAddr

This FD says that a title and year (i.e., a movie) determines an address — the
address of the studio owning the movie. □

3.2.7 Closing Sets of Functional Dependencies


Sometimes we have a choice of which FD’s we use to represent the full set of
FD’s for a relation. If we are given a set of FD’s S (such as the FD’s that hold
in a given relation), then any set of FD’s equivalent to S is said to be a basis
for S. To avoid some of the explosion of possible bases, we shall limit ourselves
to considering only bases whose FD’s have singleton right sides. If we have any
basis, we can apply the splitting rule to make the right sides be singletons. A
minimal basis for a relation is a basis B that satisfies three conditions:

1. All the FD’s in B have singleton right sides.


2. If any FD is removed from B, the result is no longer a basis.
3. If for any FD in B we remove one or more attributes from the left side of
F, the result is no longer a basis.

Notice that no trivial FD can be in a minimal basis, because it could be removed


by rule (2 ).

E xam p le 3 .1 1 : Consider a relation R{A, B, C) such that each attribute func­


tionally determines the other two attributes. The full set of derived FD’s thus
includes six FD’s with one attribute on the left and one on the right; A -»■ B,
A -¥ C, B A, B -¥ C, C A, and C B. It also includes the three
3.2. RULES A BO U T FUNCTIONAL DEPENDENCIES 81

A Com plete Set of Inference Rules


If we want to know whether one FD follows from some given FD ’s, the
closure computation of Section 3.2.4 will always serve. However, it is
interesting to know th at there is a set of rules, called Armstrong’s axioms,
from which it is possible to derive any FD that follows from a given set.
These axioms are:

1. Reflexivity. If {B 1, B 2, ... , B m} C {A i , A 2, . .. , A n}, then


Ai A 2 ■■■An —> B i B 2 ■■■B m. These are what we have called triv­
ial FD’s.
2. Augmentation. If A \ A 2 • • •A n -4 B%B2 ■• ■B m, then

A iA 2 ■■■A nC iC 2 ■■■Ck —> B iB 2 ■■■B mC iC 2 ■■■Ck

for any set of attributes C\, C2, ... ,Ck- Since some of the C ’s may
also be j4’s or B ’s or both, we should eliminate from the left side
duplicate attributes and do the same for the right side.

3. Transitivity. If

A\ A 2 ***A n B \ B 2 ■■■B m and B \ B 2 • ■■B m C\C2 • • - Ck

then A i A 2 ■■■A n —¥ C±C2 ■■■Ck-

nontrivial FD ’s with two attributes on the left: A B —> C, A C —¥ B, and


B C —¥ A. There are also FD’s with more than one attribute on the right, such
as A BC , and trivial FD’s such as A -> A.
Relation R and its FD ’s have several minimal bases. One is
{A -> B, B A, B C, C ->• B}
Another is {^4 -¥ B, B C, C —» A}. There are several other minimal bases
for R, and we leave their discovery as an exercise. □

3.2.8 Projecting Functional Dependencies


When we study design of relation schemas, we shall also have need to answer
the following question about FD’s. Suppose we have a relation R with set of
FD ’s S, and we project R by computing Ri — itl (R), for some list of attributes
R. W hat FD ’s hold in i?i?
The answer is obtained in principle by computing the projection of functional
dependencies S, which is all FD’s that:
82 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

a) Follow from S, and

b) Involve only attributes of R \ .

Since there may be a large number of such FD’s, and many of them may be
redundant (i.e., they follow from other such FD’s), we are free to simplify that
set of FD’s if we wish. However, in general, the calculation of the FD’s for
R i is exponential in the number of attributes of R\. The simple algorithm is
summarized below.

A lg o rith m 3 .1 2 : Projecting a Set of Functional Dependencies.


INPUT: A relation R and a second relation Ri computed by the projection
Ri = nL(R). Also, a set of FD’s 5 that hold in R.
O U T P U T : The set of FD’s that hold in Ri.

METHOD:

1. Let T be the eventual output set of FD’s. Initially, T is empty.

2. For each set of attributes X that is a subset of the attributes of Ri,


compute X +. This computation is performed with respect to the set of
FD’s S, and may involve attributes that are in the schema of R but not
R\. Add to T all nontrivial FD’s X A such that A is both in X + and
an attribute of R\.
3. Now, T is a basis for the FD’s that hold in Ri, but may not be a minimal
basis. We may construct a minimal basis by modifying T as follows:

(a) If there is an FD F in T that follows from the other FD’s in T,


remove F from T.
(b) Let Y —» B be an FD in T, with at least two attributes in Y , and let
Z be Y with one of its attributes removed. If Z -> B follows from
the FD’s in T (including Y —> B), then replace Y -» B by Z B.
(c) Repeat the above steps in all possible ways until no more changes to
T can be made.

E xam ple 3 .1 3 : Suppose R(A, B, C, D) has FD’s A —►B, B C, and C D.


Suppose also that we wish to project out the attribute B, leaving a relation
R i(A ,C ,D ). In principle, to find the FD’s for R \ , we need to take the closure
of all eight subsets of {A, C, D}, using the full set of FD’s, including those
involving B. However, there are some obvious simplifications we can make.

• Closing the empty set and the set of all attributes cannot yield a nontrivial
FD.
3.2. RULES ABO U T FUNCTIONAL DEPENDENCIES 83

• If we already know that the closure of some set X is all attributes, then
we cannot discover any new FD ’s by closing supersets of X .

Thus, we may start with the closures of the singleton sets, and then move
on to the doubleton sets if necessary. For each closure of a set X , we add the
FD X E for each attribute E th at is in X + and in the schema of R i, but
not in X .
First, {^4}+ = { A ,B ,C ,D } . Thus, A —> C and A —» D hold in R \. Note
th at A —>B is true in R, but makes no sense in R,\ because B is not an attribute
of Ri.
Next, we consider {C'}+ = {C ,D }, from which we get the additional FD
C -» D for R i. Since { D }+ = {£>}, we can add no more FD’s, and are done
with the singletons.
Since {A }+ includes all attributes of R i , there is no point in considering any
superset of {A}. The reason is that whatever FD we could discover, for instance
A C -» D, follows from an FD with only A on the left side: A —>D in this case.
Thus, the only doubleton whose closure we need to take is {C, D }+ — {C ,D }.
This observation allows us to add nothing. We are done with the closures, and
the FD ’s we have discovered are A C, A D, and C D.
If we wish, we can observe that A —> D follows from the other two by
transitivity. Therefore a simpler, equivalent set of FD ’s for R \ is A —> C and
C —> D. This set is, in fact, a minimal basis for the FD ’s of R \ . □

3.2.9 Exercises for Section 3.2


E x ercise 3.2 .1 : Consider a relation with schema R ( A , B , C , D ) and FD ’s
A B —^ C , C —^ D , and D — A.

a) W hat are all the nontrivial FD’s that follow from the given FD ’s? You
should restrict yourself to FD’s with single attributes on the right side.
b) W hat are all the keys of R?
c) W hat are all the superkeys for R th at are not keys?

E x ercise 3 .2 .2 : Repeat Exercise 3.2.1 for the following schemas and sets of
FD ’s:

i) S(A, B, C, D) with FD ’s A -> B, B ->■ C, and B -» D.


ii) T(A, B, C, D) with FD ’s A B -*■ C, B C ->■ D, CD ->■ A, and AD ->■ B.
in) U(A, B, C, D) with FD ’s A -> B, B ->■ C, C -»• D, and D A.

E x ercise 3 .2 .3 : Show th at the following rules hold, by using the closure test
of Section 3.2.4.

a) Augmenting left sides. If A 1 A 2 ■■■A n —¥ B is an FD, and C is another


attribute, then A 1 A 2 ■■■A nC B follows.
84 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

b) Full augmentation. If A iA 2 ■■■An -» B is an FD, and C is another at­


tribute, then A 1 A 2 ■■■A nC —>■ B C follows. Note: from this rule, the
“augmentation” rule mentioned in the box of Section 3.2.7 on “A Com­
plete Set of Inference Rules” can easily be proved.
c) Pseudotransitivity. Suppose FD’s A 1 A 2 ■■■A„ —» B 1 B 2 • ■■B m and

C iC f-C k -tD

hold, and the B ’s are each among the C ’s. Then

A\ Ai • ■■AnE\ E 2 • • ■Ej —¥ D

holds, where the E ’s are all those of the C ’s that are not found among
the B ’s.
d) Addition. If FD’s A 1 A 2 ■■■A n —¥ B 1 B 2 ■■• B m and

C 1C 2 • • •Ck —^ D 1 D 2 • ■■Dj

hold, then FD A 1 A 2 *• •A nC\C 2 ' ■*Ck —^ B 1 B 2 ■**B 7nI)j D 2 • *•Dj also


holds. In the above, we should remove one copy of any attribute that
appears among both the A’s and C ’s or among both the B ’s and D ’s.

! Exercise 3.2.4: Show that each of the following are not valid rules about FD’s
by giving example relations that satisfy the given FD’s (following the “if”) but
not the FD that allegedly follows (after the “then”).

a) If A —>B then B —> A.


b) If A B -¥ C and A C, then B - > C .
c) If A B C, then A —>C or B —►C.

! Exercise 3.2.5: Show that if a relation has no attribute that is functionally


determined by all the other attributes, then the relation has no nontrivial FD’s
at all.

! Exercise 3.2.6: Let X and Y be sets of attributes. Show that if X C Y , then


X + C Y+, where the closures are taken with respect to the same set of FD’s.

! Exercise 3.2.7: Prove that (X+ )+ = X +.

! Exercise 3.2.8: We say a set of attributes X is closed (with respect to a given


set of FD’s) if X + = X . Consider a relation with schema R{A, B, C, D) and an
unknown set of FD’s. If we are told which sets of attributes are closed, we can
discover the FD’s. What are the FD’s if:

a) All sets of the four attributes are closed.


3.3. DESIGN OF RELATIO N AL D ATABASE SCHEMAS 85

b) The only closed sets are 0 and {A, B ,C ,D } .


c) The closed sets are 0, {A,B}, and {A, B ,C , D}.
! E x ercise 3 .2 .9 : Find all the minimal bases for the FD ’s and relation of Ex­
ample 3.11.
! E x ercise 3.2.10: Suppose we have relation R ( A , B ,C , D ,E ) , with some set
of FD’s, and we wish to project those FD ’s onto relation S(A, B, C). Give the
FD ’s that hold in S if the FD’s for R are:
a) A B —^ D E , C —^ E , D —^ (7, and E —^ A.
b) A —¥ D : B D — E, A C —¥ E, and D E —^ B.
c) A B —¥ D , A C —¥ E, B C —¥ D , D —^ A , and E — B.
d) A ->■ B, B C, C D, D -> E, and E A.
In each case, it is sufficient to give a minimal basis for the full set of FD’s of S.
!! E x ercise 3 .2 .1 1 : Show that if an FD F follows from some given FD ’s, then
we can prove F from the given FD’s using Armstrong’s axioms (defined in the
box “A Complete Set of Inference Rules” in Section 3.2.7). Hint: Examine
Algorithm 3.7 and show how each step of that algorithm can be mimicked by
inferring some FD ’s by Armstrong’s axioms.

3.3 D esign of R elational D atabase Schemas


Careless selection of a relational database schema can lead to redundancy and
related anomalies. For instance, consider the relation in Fig. 3.2, which we
reproduce here as Fig. 3.6. Notice that the length and genre for Star Wars
and Wayne’s World are each repeated, once for each star of the movie. The
repetition of this information is redundant. It also introduces the potential for
several kinds of errors, as we shall see.
In this section, we shall tackle the problem of design of good relation schemas
in the following stages:
1. We first explore in more detail the problems that arise when our schema
is poorly designed.
2. Then, we introduce the idea of “decomposition,” breaking a relation
schema (set of attributes) into two smaller schemas.
3. Next, we introduce “Boyce-Codd normal form,” or “BCNF,” a condition
on a relation schema that eliminates these problems.
4. These points are tied together when we explain how to assure the BCNF
condition by decomposing relation schemas.
86 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

title year length genre studioName starN am e


Star Wars 1977 124 SciFi Fox Carrie Fisher
Star Wars 1977 124 SciFi Fox Hark Hamill
Star Wars 1977 124 SciFi Fox Harrison Ford
Gone With the Wind 1939 231 drama MGM Vivien Leigh
Wayne’s World 1992 95 comedy Paramount Dana Carvey
Wayne’s World 1992 95 comedy Paramount Mike Meyers

Figure 3.6: The relation Moviesl exhibiting anomalies

3.3.1 Anomalies
Problems such as redundancy that occur when we try to cram too much into a
single relation axe called anomalies. The principal kinds of anomalies that we
encounter are:

1. Redundancy. Information may be repeated unnecessarily in several tuples.


Examples are the length and genre for movies in Fig. 3.6.
2. Update Anomalies. We may change information in one tuple but leave
the same information unchanged in another. For example, if we found
that Star Wars is really 125 minutes long, we might carelessly change the
length in the first tuple of Fig. 3.6 but not in the second or third tuples.
You might argue that one should never be so careless, but it is possible
to redesign relation Moviesl so that the risk of such mistakes does not
exist.
3. Deletion Anomalies. If a set of values becomes empty, we may lose other
information as a side effect. For example, should we delete Vivien Leigh
from the set of stars of Gone With the Wind, then we have no more stars
for that movie in the database. The last tuple for Gone With the Wind
in the relation Moviesl would disappear, and with it information that it
is 231 minutes long and a drama.

3.3.2 Decomposing Relations


The accepted way to eliminate these anomalies is to decompose relations. De­
composition of R involves splitting the attributes of R to make the schemas of
two new relations. After describing the decomposition process, we shall show
how to pick a decomposition that eliminates anomalies.
Given a relation R (A \ , A 2 , . ■■, A n), we may decompose R into two relations
S ( B i , B 2 , ... , B m) and T(Ci, C2, ... , Ck) such that:

1. { Ai , A2, . . . , A n} = { Bi , B 2 , ■■■, Bm} U {Ci, C 2 , ■• • , Ck}-


3.3. DESIGN OF RELATIO N AL DATABASE SCHEMAS 87

3. T - ircu c 2,...,ck(R)-

Example 3.1 4 : Let us decompose the Moviesl relation of Fig. 3.6. Our choice,
whose merit will be seen in Section 3.3.3, is to use:

1. A relation called Movies2, whose schema is all the attributes except for
starName.

2. A relation called Movies3, whose schema consists of the attributes title,


year, and starName.

The projection of Moviesl onto these two new schemas is shown in Fig, 3.7.

title year length genre studioName


Star Wars 1977 124 sciFi Fox
Gone With the Wind 1939 231 drama MGM
Wayne’s World 1992 95 comedy Paramount

(b) The relation Movies2.

title year starName


Star Wars 1977 Carrie Fisher
Star Weirs 1977 Mark Hamill
Star Wars 1977 Harrison Ford
Gone With the Wind 1939 Vivien Leigh
Wayne’s World 1992 Dana Carvey
Wayne’s World 1992 Mike Meyers

(b) The relation Movies3.

Figure 3.7: Projections of relation Moviesl

Notice how this decomposition eliminates the anomalies we mentioned in


Section 3.3.1. The redundancy has been eliminated; for example, the length
of each film appears only once, in relation Movies2. The risk of an update
anomaly is gone. For instance, since we only have to change the length of Star
Wars in one tuple of Movies2, we cannot wind up with two different lengths
for th at movie.
Finally, the risk of a deletion anomaly is gone. If we delete all the stars
for Gone With the Wind, say, th at deletion makes the movie disappear from
Movies3. But all the other information about the movie can still be found in
Movies2.
88 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

It might appear that Movies3 still has redundancy, since the title and year
of a movie can appear several times. However, these two attributes form a key
for movies, and there is no more succinct way to represent a movie. Moreover,
Movies3 does not offer an opportunity for an update anomaly. For instance, one
might suppose that if we changed to 2008 the year in the Carrie Fisher tuple,
but not the other two tuples for Star Wars, then there would be an update
anomaly. However, there is nothing in our assumed FD’s that prevents there
being a different movie named Star Wars in 2008, and Carrie Fisher may star
in that one as well. Thus, we do not want to prevent changing the year in one
Star Wars tuple, nor is such a change necessarily incorrect.

3.3.3 Boyce-Codd Normal Form


The goal of decomposition is to replace a relation by several that do not exhibit
anomalies. There is, it turns out, a simple condition under which the anomalies
discussed above can be guaranteed not to exist. This condition is called Boyce-
Codd normal form, or BCNF.

• A relation R is in BCNF if and only if: whenever there is a nontrivial FD


A iA 2 ■• •A n -* B iB 2 ■■■B m for R, it is the case that {Ai, A 2, ... , An} is
a superkey for R.

That is, the left side of every nontrivial FD must be a superkey. Recall that
a superkey need not be minimal. Thus, an equivalent statement of the BCNF
condition is that the left side of every nontrivial FD must contain a key.

E xam ple 3.15 : Relation Moviesl, as in Fig. 3.6, is not in BCNF. To see why,
we first need to determine what sets of attributes are keys. We argued in Ex­
ample 3.2 why {title, year, starName} is a key. Thus, any set of attributes
containing these three is a superkey. The same arguments we followed in Ex­
ample 3.2 can be used to explain why no set of attributes that does not include
all three of title, year, and starName could be a superkey. Thus, we assert
that {title, year, starName} is the only key for Moviesl.
However, consider the FD

title year —> length genre studioName

which holds in Moviesl according to our discussion in Example 3.2.


Unfortunately, the left side of the above FD is not a superkey. In particular,
we know that title and year do not functionally determine the sixth attribute,
starName. Thus, the existence of this FD violates the BCNF condition and tells
us Moviesl is not in BCNF. □

E xam ple 3.16: On the other hand, Movies2 of Fig. 3.7 is in BCNF. Since

title year —¥ length genre studioName


3.3. DESIGN OF RELATIO NAL D ATABASE SCHEMAS 89

holds in this relation, and we have argued that neither t i t l e nor y ea r by itself
functionally determines any of the other attributes, the only key for Movies2
is { t i t l e , year}. Moreover, the only nontrivial FD’s must have at least t i t l e
and y ea r on the left side, and therefore their left sides must be superkeys. Thus,
Movles2 is in BCNF. □
E x am p le 3 .1 7 : We claim that any two-attribute relation is in BCNF. We
need to examine the possible nontrivial FD’s with a single attribute on the
right. There are not too many cases to consider, so let us consider them in
turn. In what follows, suppose that the attributes are A and B.
1. There are no nontrivial FD ’s. Then surely the BCNF condition must hold,
because only a nontrivial FD can violate this condition. Incidentally, note
th at {A, B } is the only key in this case.
2. A —¥ B holds, but B -4 - A does not hold. In this case, A is the only key,
and each nontrivial FD contains A on the left (in fact the left can only
be A). Thus there is no violation of the BCNF condition.
3. B ->• A holds, but A ->• B does not hold. This case is symmetric to
case (2 ).
4. Both A —» B and B -> A hold. Then both A and B are keys. Surely
any FD has at least one of these on the left, so there can be no BCNF
violation.
It is worth noticing from case (4) above that there may be more than one
key for a relation. Further, the BCNF condition only requires that some key be
contained in the left side of any nontrivial FD, not th at all keys are contained in
the left side. Also observe th at a relation with two attributes, each functionally
determining the other, is not completely implausible. For example, a company
may assign its employees unique employee ID’s and also record their Social
Security numbers. A relation with attributes empID and ssNo would have each
attribute functionally determining the other. P ut another way, each attribute
is a key, since we don’t expect to find two tuples th at agree on either attribute.

3.3.4 D ecom position into BC N F


By repeatedly choosing suitable decompositions, we can break any relation
schema into a collection of subsets of its attributes with the following important
properties:
1. These subsets are the schemas of relations in BCNF.
2. The data in the original relation is represented faithfully by the data in the
relations that are the result of the decomposition, in a sense to be made
precise in Section 3.4.1. Roughly, we need to be able to reconstruct the
original relation instance exactly from the decomposed relation instances.
90 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

Example 3.17 suggests that perhaps all we have to do is break a relation schema
into two-attribute subsets, and the result is surely in BCNF. However, such
an arbitrary decomposition will not satisfy condition (2), as we shall see in
Section 3.4.1. In fact, we must be more careful and use the violating FD’s to
guide our decomposition.
The decomposition strategy we shall follow is to look for a nontrivial FD
A 1 A 2 ■■■A n -¥ B 1 B 2 ■■■B m that violates BCNF; i.e., { A i,A 2, ... , A n} is not a
superkey. We shall add to the right side as many attributes as are functionally
determined by { A i,A 2 , . . . ,A n}. This step is not mandatory, but it often
reduces the total amount of work done, and we shall include it in our algorithm.
Figure 3.8 illustrates how the attributes are broken into two overlapping relation
schemas. One is all the attributes involved in the violating FD, and the other
is the left side of the FD plus all the attributes not involved in the FD, i.e., all
the attributes except those B ’s that are not /Ts.

Figure 3.8: Relation schema decomposition based on a BCNF violation

E x am p le 3.18: Consider our running example, the Moviesl relation of Fig.


3.6. We saw in Example 3.15 that

title year —> length genre studioName

is a BCNF violation. In this case, the right side already includes all the at­
tributes functionally determined by title and year, so we shall use this BCNF
violation to decompose Moviesl into:

1. The schema {title, year, length, genre, studioName} consisting of all


the attributes on either side of the FD.

2. The schema {title, year, starName} consisting of the left side of the FD
plus all attributes of Moviesl that do not appear in either side of the FD
(only starName, in this case).

Notice that these schemas are the ones selected for relations Movies2 and
Movies3 in Example 3.14. We observed in Example 3.16 that Movies2 is in
BCNF. Movies3 is also in BCNF; it has no nontrivial FD’s. □
3.3. DESIGN OF RELATIO N AL D ATABASE SCHEMAS 91

In Example 3.18, one judicious application of the decomposition rule is


enough to produce a collection of relations that are in BCNF. In general, that
is not the case, as the next example shows.

E x am p le 3 .1 9 : Consider a relation with schema

{title, year, studioName, president, presAddr}

That is, each tuple of this relation tells about a movie, its studio, the president
of the studio, and the address of the president of the studio. Three FD’s that
we would assume in this relation are
title year —» studioName
studioName —» president
president —¥ presAddr

By closing sets of these five attributes, we discover that {title, year} is the
only key for this relation. Thus the last two FD’s above violate BCNF. Suppose
we choose to decompose starting with

studioName -> president

First, we add to the right side of this functional dependency any other attributes
in the closure of studioName. That closure includes presAddr, so our final
choice of FD for the decomposition is:

studioName —> president presAddr

The decomposition based on this FD yields the following two relation schemas.

{title, year, studioName}


{studioName, president, presAddr}

If we use Algorithm 3.12 to project FD ’s, we determine that the FD ’s for


the first relation has a basis:

title year —¥ studioName

while the second has:


studioName —¥ president
president —¥ presAddr

The sole key for the first relation is { t i t l e , year}, and it is therefore in BCNF.
However, the second has {studioName} for its only key but also has the FD:

president —¥ presAddr

which is a BCNF violation. Thus, we must decompose again, this time using
the above FD. The resulting three relation schemas, all in BCNF, are:
92 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

{title, year, studioName}


{studioName, president}
{president, presAddr}

In general, we must keep applying the decomposition rule as many times as
needed, until all our relations are in BCNF. We can be sure of ultimate success,
because every time we apply the decomposition rule to a relation R, the two
resulting schemas each have fewer attributes than that of R. As we saw in
Example 3.17, when we get down to two attributes, the relation is sure to be
in BCNF; often relations with larger sets of attributes are also in BCNF. The
strategy is summarized below.

A lg o rith m 3.20: BCNF Decomposition Algorithm.


INPUT: A relation Ro with a set of functional dependencies So-

O U T P U T : A decomposition of Ro into a collection of relations, all of which are


in BCNF.
M E T H O D : The following steps can be applied recursively to any relation R and
set of FD’s S. Initially, apply them with R = Ro and S = Sq.

1. Check whether R is in BCNF. If so, nothing more needs to be done.


Return {J?} as the answer.
2. If there are BCNF violations, let one be X Y . Use Algorithm 3.7 to
compute X +. Choose Ri = X + as one relation schema and let R? have
attributes X and those attributes of R that are not in X +.
3. Use Algorithm 3.12 to compute the sets of FD’s for R\ and R%-, let these
be Si and S2, respectively.
4. Recursively decompose R i and R 2 using this algorithm. Return the union
of the results of these decompositions.

3.3.5 Exercises for Section 3.3


E xercise 3.3.1: For each of the following relation schemas and sets of FD’s:

a) R (A, B, C, D) with FD’s A B —►C, C -> D, and D A.


b) R (A ,B ,C ,D ) with FD’s B —> C and B D.
c) R{A, B , C, D) with FD’s A B -+ C ,B C ->■ D, CD ->■ A, and AD B.

d) R(A, B , C, D) with FD’s A B, B C, C ->• D, and D A.


3.4. DECOMPOSITION: THE GOOD, BAD, AND UGLY 93

e) R(A , B , C, D, E ) with FD ’s A B —^ C , D E —¥ C , and B —¥ D.


f) R (A , B , C, D, E ) with FD ’s A B —¥ C , C —^ D, D —¥ B , and D —¥ E.

do the following:

i) Indicate all the BCNF violations. Do not forget to consider FD ’s th at are


not in the given set, but follow from them. However, it is not necessary
to give violations th at have more than one attribute on the right side.
ii) Decompose the relations, as necessary, into collections of relations that
are in BCNF.

E x ercise 3 .3 .2 : We mentioned in Section 3.3.4 th at we would exercise our


option to expand the right side of an FD that is a BCNF violation if possible.
Consider a relation R whose schema is the set of attributes {.4, B , C, D] with
FD’s A -¥ B and A -¥ C. Either is a BCNF violation, because the only key
for R is {A, D}. Suppose we begin by decomposing R according to A -¥ B . Do
we ultimately get the same result as if we first expand the BCNF violation to
A -¥ B C ? Why or why not?

! E x ercise 3 .3 .3 : Let R be as in Exercise 3.3.2, but let the FD ’s be A -¥ B and


B —¥ C. Again compare decomposing using A —¥ B first against decomposing
by A -¥ B C first.

! E x ercise 3.3 .4 : Suppose we have a relation schema R (A , B , C ) with FD A —¥


B . Suppose also that we decide to decompose this schema into S (A ,B ) and
T (B , C). Give an example of an instance of relation R whose projection onto
S and T and subsequent rejoining as in Section 3.4.1 does not yield the same
relation instance. That is, tta,b (R) x ^ b ,c (R) / R-

3.4 Decom position: The G ood, Bad, and U gly


So far, we observed that before we decompose a relation schema into BCNF,
it can exhibit anomalies; after we decompose, the resulting relations do not
exhibit anomalies. T h at’s the “good.” But decomposition can also have some
bad, if not downright ugly, consequences. In this section, we shall consider
three distinct properties we would like a decomposition to have.

1. Elimination of Anomalies by decomposition as in Section 3.3.


2. Recoverability of Information. Can we recover the original relation from
the tuples in its decomposition?
3. Preservation of Dependencies. If we check the projected FD’s in the rela­
tions of the decomposition, can we can be sure th at when we reconstruct
the original relation from the decomposition by joining, the result will
satisfy the original FD ’s?
94 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

It turns out that the BCNF decomposition of Algorithm 3.20 gives us (1) and
(2), but does not necessarily give us all three. In Section 3.5 we shall see another
way to pick a decomposition that gives us (2) and (3) but does not necessarily
give us (1). In fact, there is no way to get all three at once.

3.4.1 Recovering Information from a Decomposition


Since we learned that every two-attribute relation is in BCNF, why did we
have to go through the trouble of Algorithm 3.20? Why not just take any
relation R and decompose it into relations, each of whose schemas is a pair of
R ’s attributes? The answer is that the data in the decomposed relations, even
if their tuples were each the projection of a relation instance of R, might not
allow us to join the relations of the decomposition and get the instance of R
back. If we do get R back, then we say the decomposition has a lossless join.
However, if we decompose using Algorithm 3.20, where all decompositions
are motivated by a BCNF-violating FD, then the projections of the original
tuples can be joined again to produce all and only the original tuples. We shall
consider why here. Then, in Section 3.4.2 we shall give an algorithm called the
“chase,” for testing whether the projection of a relation onto any decomposition
allows us to recover the relation by rejoining.
To simplify the situation, consider a relation R{A, B, C) and an FD B —►C
that is a BCNF violation. The decomposition based on the FD B -» C separates
the attributes into relations R\{A, B) and R 2 (B ,C ).
Let t be a tuple of R. We may write t = (a, b, c), where a, b, and c are the
components of t for attributes A, B , and C, respectively. Tuple t projects as
(a, b) in R 1 (A ,B ) — ka,b{R) and as (6, c) in R 2 (B ,C ) = kb,c(R)- When we
compute the natural join Ri ix R 2, these two projected tuples join, because
they agree on the common B component (they both have b there). They give
us t = (a. b, c). the tuple we started with, in the join. That is, regardless of
what tuple t we started with, we can always join its projections to get t back.
However, getting back those tuples we started with is not enough to assure
that the original relation R is truly represented by the decomposition. Consider
what happens if there are two tuples of R, say t = (a,b,c) and v = (d,b,e).
When we project t onto R,\ (A. B) we get u = (a, b), and when we project v onto
R 2 (B ,C ) we get w = (b,e). These tuples also match in the natural join, and
the resulting tuple is x — (a,b,e). Is it possible that a: is a bogus tuple? That
is, could (a, b, e) not be a tuple of R ?
Since we assume the FD B -¥ C for relation R, the answer is “no.” Recall
that this FD says any two tuples of R that agree in their B components must
also agree in their C components. Since t and v agree in their B components,
they also agree on their C components. That means c — e; i.e., the two values
we supposed were different are really the same. Thus, tuple (a, b, e) of R is
really (a, b, c); that is, x = t.
Since t is in R. it must be that x is in R. Put another way, as long as FD
B —»■C holds, the joining of two projected tuples cannot produce a bogus tuple.
3.4. DECOMPOSITION: THE GOOD, BAD, AND UGLY 95

Rather, every tuple produced by the natural join is guaranteed to be a tuple of


R.
This argument works in general. We assumed A, B , and C were each
single attributes, but the same argument would apply if they were any sets
of attributes X , Y and Z. That is, if Y —> Z holds in R, whose attributes are
X U Y U Z, then R = n x u y (R ) cx k y u z ( R ) -
We may conclude:

• If we decompose a relation according to Algorithm 3.20, then the original


relation can be recovered exactly by the natural join.

To see why, we argued above th at at any one step of the recursive decomposition,
a relation is equal to the join of its projections onto the two components. If
those components are decomposed further, they can also be recovered by the
natural join from their decomposed relations. Thus, an easy induction on the
number of binary decomposition steps says that the original relation is always
the natural join of whatever relations it is decomposed into. We can also prove
th at the natural join is associative and commutative, so the order in which we
perform the natural join of the decomposition components does not matter.
The FD Y -» Z , or its symmetric FD Y X , is essential. W ithout one of
these FD ’s, we might not be able to recover the original relation. Here is an
example.

E x am p le 3 .21: Suppose we have the relation R(A, B , C) as above, but neither


of the FD’s B A nor B —>C holds. Then R might consist of the two tuples

A B C
1 2 3
4 2 5

The projections of R onto the relations with schemas {A. B } and {B, C]
are R i = ttab(R) =

A B
1 2
4 2

and R -2 — ttbc(R ) —

B C
2 3
2 5

respectively. Since all four tuples share the same 5-value, 2 , each tuple of one
relation joins with both tuples of the other relation. When we try to reconstruct
R by the natural join of the projected relations, we get R 3 — R i cxi R 2 —
96 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

Is Join the Only Way to Recover?


We have assumed that the only possible way we could reconstruct a rela­
tion from its projections is to use the natural join. However, might there
be some other algorithm to reconstruct the original relation that would
work even in cases where the natural join fails? There is in fact no such
other way. In Example 3.21, the relations R and R 3 are different instances,
yet have exactly the same projections onto {^4, B } and {B, C}, namely the
instances we called Ri and R 2 , respectively. Thus, given Ri and R 2 , no
algorithm whatsoever can tell whether the original instance was R or R 3 .
Moreover, this example is not unusual. Given any decomposition of
a relation with attributes I U F U Z into relations with schemas X U Y
and Y U Z, where neither Y —►X nor Y —> Z holds, we can construct
an example similar to Example 3.21 where the original instance cannot be
determined from its projections.

A B C
1 2 3
1 2 5
4 2 3
4 2 5

That is, we get “too much” ; we get two bogus tuples, (1,2,5) and (4,2,3), that
were not in the original relation R. □

3.4.2 The Chase Test for Lossless Join


In Section 3.4.1 we argued why a particular decomposition, that of R (A ,B ,C )
into {A ,B } and {B ,C }, with a particular FD, B —►C, had a lossless join.
Now, consider a more general situation. We have decomposed relation R into
relations with sets of attributes S \,S 2 ,--- ,S k ■ We have a given set of FD’s
F that hold in R. Is it true that if we project R onto the relations of the
decomposition, then we can recover R by taking the natural join of all these
relations? That is, is it true that 7TSj (R) ix 7rs2(R) tx • • • m -KSk (R) = R? Three
important things to remember are:

• The natural join is associative and commutative. It does not matter in


what order we join the projections; we shall get the same relation as a
result. In particular, the result is the set of tuples t such that for all
i — 1 , 2 t projected onto the set of attributes S t is a tuple in
7rs;(-R).
3.4. DECOMPOSITION: THE GOOD, BAD, AND UGLY 97

• Any tuple t in R is surely in ns 1 (R) 1x1 71 s 2 (R) tx •• • cx n sk (R) ■ The


reason is th at the projection of t onto Si is surely in 7rs; (R) for each i,
and therefore by our first point above, t is in the result of the join.

• As a consequence, 7^ (R) m 7t,s 2 (R) tx ••• ix i r s k (R) — R when the FD’s


in F hold for R if and only if every tuple in the join is also in R. T hat is,
the membership test is all we need to verify th at the decomposition has
a lossless join.

The chase test for a lossless join is just an organized way to see whether a
tuple t in 7T.SJ(R) tx its 2 {R) xj • • • tx ns k(R) can be proved, using the FD ’s in
F , also to be a tuple in R. If t is in the join, then there must be tuples in R,
say h , t 2 , ■■■,tk, such that t is the join of the projections of each ti onto the
set of attributes Si, for * = 1 ,2 ,... , k. We therefore know th at ti agrees with t
on the attributes of Si, but ti has unknown values in its components not in 5,.
We draw a picture of what we know, called a tableau. Assuming R has
attributes A , B , . . . we use a ,b ,... for the components of t. For ti, we use the
same letter as t in the components that are in Si, but we subscript the letter
with i if the component is not in i. In that way, ti will agree with t for the
attributes of S», but have a unique value — one that can appear nowhere else
in the tableau — for other attributes.

E x am p le 3 .22: Suppose we have relation R (A ,B ,C ,D ), which we have de­


composed into relations with sets of attributes Si — {A ,D }, S 2 = {A ,C }, and
S3 — { B ,C ,D }. Then the tableau for this decomposition is shown in Fig. 3.9.

A B C D
a bi Cl d
a 62 c d2
a3 b c d

Figure 3.9: Tableau for the decomposition of R into {A ,D }, {A ,C }, and


{B ,C ,D }

The first row corresponds to set of attributes A and D. Notice that the
components for attributes A and D are the unsubscripted letters a and d.
However, for the other attributes, b and c, we add the subscript 1 to indicate that
they are arbitrary values. This choice makes sense, since the tuple ( a , b i , C i , d )
represents a tuple of R th at contributes to t = (a, b, c, d) by being projected onto
{A, D} and then joined with other tuples. Since the B - and C-components of
this tuple are projected out, we know nothing yet about what values the tuple
had for those attributes.
Similarly, the second row has the unsubscripted letters in attributes A and
C, while the subscript 2 is used for the other attributes. The last row has the
unsubscripted letters in components for {B , C, D } and subscript 3 on a. Since
98 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

each row uses its own number as a subscript, the only symbols that can appear
more than once are the unsubscripted letters. □

Remember that our goal is to use the given set of FD’s F to prove that t is
really in R. In order to do so, we “chase” the tableau by applying the FD’s in
F to equate symbols in the tableau whenever we can. If we discover that one of
the rows is actually the same as t (that is, the row becomes all unsubscripted
symbols), then we have proved that any tuple t in the join of the projections
was actually a tuple of R.
To avoid confusion, when equating two symbols, if one of them is unsub­
scripted, make the other be the same. However, if we equate two symbols, both
with their own subscript, then you can change either to be the other. However,
remember that when equating symbols, you must change all occurrences of one
to be the other, not just some of the occurences.

E x a m p le 3 .2 3 : Let us continue with the decomposition of Example 3.22, and


suppose the given FD’s are A —>■B , B —►C, and CD —¥ A. Start with the
tableau of Fig. 3.9. Since the first two rows agree in their A-components, the FD
A —¥ B tells us they must also agree in their 5-components. That is, b\ = b2.
We can replace either one with the other, since they are both subscripted. Let
us replace b2 by &i. Then the resulting tableau is:

A B C D
a bi Cl d
a bi c d2
a-3 b c d

Now, we see that the first two rows have equal B-values, and so we may use
the FD B —¥ C to deduce that their C-components, ci and c, are the same.
Since c is unsubscripted, we replace Ci by c, leaving:

A B C D
a bi c d
a bi c d2
a3 b c d

Next, we observe that the first and third rows agree in both columns C and
D. Thus, we may apply the FD CD —¥ A to deduce that these rows also have
the same A-value; that is, a — a 3 . We replace a 3 by a, giving us:

A B C D
a bi c d
a b1 c d2
a b c d
3.4. DECOMPOSITION: THE GOOD, BAD, AND UGLY 99

At this point, we see that the last row has become equal to t, th at is,
(a ,b ,c,d ). We have proved that if R satisfies the FD ’s A ->■ B , B ->• C, and
CD A, then whenever we project onto {A. D }, {A, C \, and {B ,C ,D } and
rejoin, what we get must have been in R. In particular, what we get is the same
as the tuple of R th at we projected onto { B ,C ,D }. □

3.4.3 W hy the Chase Works


There are two issues to address:
1. When the chase results in a row that matches the tuple t (i.e., the tableau
is shown to have a row with all unsubscripted variables), why must the
join be lossless?
2. When, after applying FD ’s whenever we can, we still find no row of all
unsubscripted variables, why must the join not be lossless?

Question (1) is easy to answer. The chase process itself is a proof that one
of the projected tuples from R must in fact be the tuple t th at is produced by
the join. We also know that every tuple in R is sure to come back if we project
and join. Thus, the chase has proved th at the result of projection and join is
exactly R.
For the second question, suppose that we eventually derive a tableau without
an unsubscripted row, and that this tableau does not allow us to apply any of
the FD’s to equate any symbols. Then think of the tableau as an instance of the
relation R. It obviously satisfies the given FD’s, because none can be applied
to equate symbols. We know that the ith row has unsubscripted symbols in the
attributes of Si, the *th relation of the decomposition. Thus, when we project
this relation onto the S i’s and take the natural join, we get the tuple with all
unsubscripted variables. This tuple is not in R, so we conclude th at the join is
not lossless.
E x am p le 3.2 4 : Consider the relation R (A ,B ,C ,D ) with the FD B — AD
and the proposed decomposition {A, B }, {B, C}, and {C, D}. Here is the initial
tableau:
A B C D
a b Cl di
0 ,2 b c
0.3 fa c d
When we apply the lone FD, we deduce th at a = and d\ = (fa. Thus, the
final tableau is:
A B C D
a b Cl di
a b c di
03 &3 c d
100 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

No more changes can be made because of the given FD ’s, and there is no
row that is fully unsubscripted. Thus, this decomposition does not have a
lossless join. We can verify that fact by treating the above tableau as a relation
with three tuples. When we project onto {A ,B }, we get {(a, 6)}, (03 , 63)}.
The projection onto {B ,C } is {(6, ci), (6, c), (63, 0)}, and the projection onto
{C, D} is (cr,di), (c, di), (c, d)}. If we join the first two projections, we get
{(a, 6, ci), (a, 6 , c), (03 , 63, 0)}. Joining this relation with the third projection
gives {(0 , 6, ci,d i), (a,b,c,di), (a,b,c,d), (a3 ,b 3 ,c ,d i), (o3 , 63, c, d)}. Notice
that this join has two more tuples than R, and in particular it has the tuple
(a, 6, c, d), as it must. □

3.4.4 Dependency Preservation


We mentioned that it is not possible, in some cases, to decompose a relation into
BCNF relations that have both the lossless-join and dependency-preservation
properties. Below is an example where we need to make a tradeoff between
preserving dependencies and BNCF.

E x am p le 3.25: Suppose we have a relation Bookings with attributes:

1 . t i t l e , the name of a movie.

2 . th e a te r, the name of a theater where the movie is being shown.

3. c ity , the city where the theater is located.

The intent behind a tuple (m ,t,c ) is that the movie with title m is currently
being shown at theater t in city c.
We might reasonably assert the following FD’s:

th e a te r —> c ity
t i t l e c ity -» th e a te r

The first says that a theater is located in one city. The second is not obvious
but is based on the common practice of not booking a movie into two theaters
in the same city. We shall assert this FD if only for the sake of the example.
Let us first find the keys. No single attribute is a key. For example, t i t l e
is not a key because a movie can play in several theaters at once and in several
cities at once.2 Also, th e a te r is not a key, because although th e a te r function­
ally determines c ity , there are multiscreen theaters that show many movies
at once. Thus, th e a te r does not determine t i t l e . Finally, c i t y is not a key
because cities usually have more than one theater and more than one movie
playing.
2In th is ex am ple we assum e th a t th e re are n o t tw o “c u rre n t” m ovies w ith th e sam e title ,
even th o u g h we have previously recognized th a t th e re could be tw o m ovies w ith th e sam e
title m ad e in different years.
3.4. DECOMPOSITION: THE GOOD, BAD, AND UGLY 101

On the other hand, two of the three sets of two attributes are keys. Clearly
{ t i t l e , c ity } is a key because of the given FD that says these attributes
functionally determine th e a te r.
It is also true th at { th e a te r, t i t l e } is a key, because its closure includes
c i t y due to the given FD t h e a t e r —¥ c ity . The remaining pair of attributes,
c i t y and th e a te r , do not functionally determine t i t l e , because of multiscreen
theaters, and are therefore not a key. We conclude th at the only two keys are

{ t i t l e , c ity }
{ th e a te r, t i t l e }

Now we immediately see a BCNF violation. We were given functional de­


pendency t h e a te r —¥ c ity , but its left side, th e a te r , is not a superkey. We
are therefore tempted to decompose, using this BCNF-violating FD, into the
two relation schemas:

{ th e a te r, c ity }
{ th e a te r, t i t l e }

There is a problem with this decomposition, concerning the FD

t i t l e c i t y —^ th e a te r

There could be current relations for the decomposed schemas that satisfy the
FD t h e a t e r —> c i t y (which can be checked in the relation { th e a te r, c ity } )
but that, when joined, yield a relation not satisfying t i t l e c i t y —^ th e a te r.
For instance, the two relations

theater city
Guild Menlo Park
Park Menlo Park

and

theater title
Guild Antz
Park Antz

are permissible according to the FD ’s th at apply to each of the above relations,


but when we join them we get two tuples

theater city title


Guild Menlo Park Antz
Park Menlo Park Antz

th at violate the FD t i t l e c it y —¥ th e a te r . □
102 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

3.4.5 Exercises for Section 3.4


E xercise 3.4.1: Let R(A, B, C, D, E) be decomposed into relations with the
following three sets of attributes: {A, B, C}, {B, C, D}, and {A, C, E ). For each
of the following sets of FD’s, use the chase test to tell whether the decomposition
of R is lossless. For those that are not lossless, give an example of an instance
of R that returns more than R when projected onto the decomposed relations
and rejoined.
a) B —^ E and C E —¥ A.
b) AC -» E and B C D.
c) A >■D , D —^ E , and B —^ D.
d) A —>■D , CD —> E , and E —^ D.
E xercise 3.4.2: For each of the sets of FD’s in Exercise 3.4.1, are dependencies
preserved by the decomposition?

3.5 Third Normal Form


The solution to the problem illustrated by Example 3.25 is to relax our BCNF
requirement slightly, in order to allow the occasional relation schema that can­
not be decomposed into BCNF relations without our losing the ability to check
the FD’s. This relaxed condition is called “third normal form.” In this section
we shall give the requirements for third normal form, and then show how to
do a decomposition in a manner quite different from Algorithm 3.20, in order
to obtain relations in third normal form that have both the lossless-join and
dependency-preservation properties.

3.5.1 Definition of Third Normal Form


A relation R is in third normal form (3NF) if:
• Whenever Ai A2 ■■■A„ —>■B iB 2 ■■■Bm is a nontrivial FD, either
{A i ,A 2, . . . ,A„}
is a superkey, or those of B%, B 2 , . . . , B m that are not among the A’s, are
each a member of some key (not necessarily the same key).
An attribute that is a member of some key is often said to be prime. Thus, the
3NF condition can be stated as “for each nontrivial FD, either the left side is a
superkey, or the right side consists of prime attributes only.”
Note that the difference between this 3NF condition and the BCNF condi­
tion is the clause “is a member of some key (i.e., prime).” This clause “excuses”
an FD like th e a te r —> c ity in Example 3.25, because the right side, c ity , is
prime.
3.5. THIRD NORM AL FORM 103

Other Normal Forms


If there is a “third normal form,” what happened to the first two “nor­
mal forms”? They indeed were defined, but today there is little use for
them. First normal form is simply the condition th at every component
of every tuple is an atomic value. Second normal form is a less restrictive
verison of 3NF. There is also a “fourth normal form” that we shall meet
in Section 3.6.

3.5.2 The Synthesis Algorithm for 3NF Schemas


We can now explain and justify how we decompose a relation R into a set of
relations such that:

a) The relations of the decomposition are all in 3NF.

b) The decomposition has a lossless join.

c) The decomposition has the dependency-preservation property.

A lg o rith m 3 .26: Synthesis of Third-Normal-Form Relations W ith a Lossless


Join and Dependency Preservation.
INPUT: A relation R and a set F of functional dependencies that hold for R.

O U T P U T : A decomposition of R into a collection of relations, each of which is


in 3NF. The decomposition has the lossless-join and dependency-preservation
properties.
M E T H O D : Perform the following steps:

1. Find a minimal basis for F, say G.

2. For each functional dependency X —> A in G, use X A as the schema of


one of the relations in the decomposition.

3. If none of the relation schemas from Step 2 is a superkey for R, add


another relation whose schema is a key for R.


E x am p le 3.2 7 : Consider the relation R (A ,B ,C ,D ,E ) with FD ’s A B —>■C,
C -»■ B , and A ->■ D. To start, notice that the given FD’s are their own
minimal basis. To check, we need to do a bit of work. First, we need to verify
th at we cannot eliminate any of the given dependencies. That is, we show,
using Algorithm 3.7, that no two of the FD ’s imply the third. For example,
we must take the closure of {A, B }, the left side of the first FD, using only the
104 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

second and third FD’s, C —» B and A —> D. This closure includes D but not
C, so we conclude that the first FD A B —>• C is not implied by the second and
third FD’s. We get a similar conclusion if we try to drop the second or third
FD.
We must also verify that we cannot eliminate any attributes from a left
side. In this simple case, the only possibility is that we could eliminate A or
B from the first FD. For example, if we eliminate A, we would be left with
B —►C. We must show that B C is not implied by the three original FD’s,
AB C, C B , and A —> D. With these FD’s, the closure of {5} is just B,
so B —> C does not follow. A similar conclusion is drawn if we try to drop B
from A B -> C. Thus, we have our minimal basis.
We start the 3NF synthesis by taking the attributes of each FD as a relation
schema. That is, we get relations S i(A ,B ,C ), S 2 {B, C). and Sz{A,D ). It is
never necessary to use a relation whose schema is a proper subset of another
relation’s schema, so we can drop
We must also consider whether we need to add a relation whose schema is
a key. In this example, R has two keys: { A ,B ,E } and {A ,C ,E }, as you can
verify. Neither of these keys is a subset of the schemas chosen so far. Thus, we
must add one of them, say S n (A ,B ,E ). The final decomposition of R is thus
S i(A ,B ,C ), Ss (A ,D ), and S 4 (A ,B ,E ). □

3.5.3 W hy the 3NF Synthesis Algorithm Works


We need to show three things: that the lossless-join and dependency-preser­
vation properties hold, and that all the relations of the decomposition are in
3NF.
1. Lossless Join. Start with a relation of the decomposition whose set of
attributes K is a superkey. Consider the sequence of FD’s that are used
in Algorithm 3.7 to expand K to become K +. Since i f is a superkey,
we know K + is all the attributes. The same sequence of FD applications
on the tableau cause the subscripted symbols in the row corresponding
to K to be equated to unsubscripted symbols in the same order as the
attributes were added to the closure. Thus, the chase test concludes that
the decomposition is lossless.
2. Dependency Preservation. Each FD of the minimal basis has all its at­
tributes in some relation of the decomposition. Thus, each dependency
can be checked in the decomposed relations.
3. Third Normal Form. If we have to add a relation whose schema is a key,
then this relation is surely in 3NF. The reason is that all attributes of this
relation are prime, and thus no violation of 3NF could be present in this
relation. For the relations whose schemas are derived from the FD’s of a
minimal basis, the proof that they are in 3NF is beyond the scope of this
book. The argument involves showing that a 3NF violation implies that
the basis is not minimal.
3.6. MULTIVALUED DEPENDENCIES 105

3.5.4 Exercises for Section 3.5


E x ercise 3 .5 .1 : For each of the relation schemas and sets of FD ’s of Exer­
cise 3.3.1:

i) Indicate all the 3NF violations.

ii) Decompose the relations, as necessary, into collections of relations that


are in 3NF.

E x ercise 3 .5 .2 : Consider the relation C ourses(C ,T ,H ,R ,S ,G ), whose at­


tributes may be thought of informally as course, teacher, hour, room, student,
and grade. Let the set of FD ’s for Courses be C —>■T , H R —> C, H T -»• R,
H S -¥ R, and C S ->• G. Intuitively, the first says that a course has a unique
teacher, and the second says that only one course can meet in a given room at
a given hour. The third says th at a teacher can be in only one room at a given
hour, and the fourth says the same about students. The last says that students
get only one grade in a course.

a) W hat are all the keys for Courses?

b) Verify that the given FD’s are their own minimal basis.

c) Use the 3NF synthesis algorithm to find a lossless-join, dependency-pres-


erving decomposition of R into 3NF relations. Are any of the relations
not in BCNF?

E x ercise 3 .5 .3 : Consider a relation Stocks(B , O, I, S, Q, D), whose attributes


may be thought of informally as broker, office (of the broker), investor, stock,
quantity (of the stock owned by the investor), and dividend (of the stock). Let
the set of FD’s for Stocks be S —> D, I B , I S ^ Q, and B —> O. Repeat
Exercise 3.5.2 for the relation Stocks.

E x ercise 3 .5 .4 : Verify, using the chase, that the decomposition of Exam­


ple 3.27 has a lossless join.

!! E x ercise 3 .5 .5 : Suppose we modified Algorithm 3.20 (BNCF decomposition)


so th at instead of decomposing a relation R whenever R was not in BCNF, we
only decomposed R if it was not in 3NF. Provide a counterexample to show that
this modified algorithm would not necessarily produce a 3NF decomposition
with dependency preservation.

3.6 M ultivalued D ependencies


A “multivalued dependency” is an assertion that two attributes or sets of at­
tributes are independent of one another. This condition is, as we shall see,
a generalization of the notion of a functional dependency, in the sense that
106 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

every FD implies the corresponding multivalued dependency. However, there


are some situations involving independence of attribute sets that cannot be
explained as FD’s. In this section we shall explore the cause of multivalued
dependencies and see how they can be used in database schema design.

3.6.1 Attribute Independence and Its Consequent


Redundancy
There are occasional situations where we design a relation schema and find it is
in BCNF, yet the relation has a kind of redundancy that is not related to FD’s.
The most common source of redundancy in BCNF schemas is an attempt to
put two or more set-valued properties of the key into a single relation.

E xam ple 3.28: In this example, we shall suppose that stars may have several
addresses, which we break into street and city components. The set of addresses
is one of the set-valued properties this relation will store. The second set-valued
property of stars that we shall put into this relation is the set of titles and years
of movies in which the star appeared. Then Fig. 3.10 is a typical instance of
this relation.

name street city title year


C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Star Wars 1977
C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980
C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983
C. Fisher 5 Locust Ln. Malibu Return of the Jedi 1983

Figure 3.10: Sets of addresses independent from movies

We focus in Fig. 3.10 on Carrie Fisher’s two hypothetical addresses and her
three best-known movies. There is no reason to associate an address with one
movie and not another. Thus, the only way to express the fact that addresses
and movies are independent properties of stars is to have each address appear
with each movie. But when we repeat address and movie facts in all combi­
nations, there is obvious redundancy. For instance, Fig. 3.10 repeats each of
Carrie Fisher’s addresses three times (once for each of her movies) and each
movie twice (once for each address).
Yet there is no BCNF violation in the relation suggested by Fig. 3.10. There
are, in fact, no nontrivial FD’s at all. For example, attribute c ity is not
functionally determined by the other four attributes. There might be a star
with two homes that had the same street address in different cities. Then there
would be two tuples that agreed in all attributes but c ity and disagreed in
c ity . Thus,
3.6. MULTIVALUED DEPENDENCIES 107

name street title year —> city

is not an FD for our relation. We leave it to the reader to check th at none of


the five attributes is functionally determined by the other four. Since there are
no nontrivial FD ’s, it follows that all five attributes form the only key and that
there are no BCNF violations. □

3.6.2 Definition of M ultivalued Dependencies


A multivalued dependency (abbreviated MVD) is a statement about some rela­
tion R th at when you fix the values for one set of attributes, then the values in
certain other attributes are independent of the values of all the other attributes
in the relation. More precisely, we say the MVD

A 1 A 2 ■■■A n —>4 B 1 B 2 ■■■B m


holds for a relation R if when we restrict ourselves to the tuples of R that have
particular values for each of the attributes among the ,4’s, then the set of values
we find among the B ’s is independent of the set of values we find among the
attributes of R th at are not among the ,4’s or B ’s. Still more precisely, we say
this MVD holds if

For each pair of tuples t and u of relation R that agree on all the
j4 ’s , we can find in R some tuple v that agrees:

1. W ith both t and u on the A’s,


2. W ith t on the B ’s, and
3. W ith u on all attributes of R th at axe not among the A's or
B ’s.

Note that we can use this rule with t and u interchanged, to infer the existence
of a fourth tuple w that agrees with u on the B ’s and with t on the other
attributes. As a consequence, for any fixed values of the A’s, the associated
values of the B ’s and the other attributes appear in all possible combinations
in different tuples. Figure 3.11 suggests how v relates to t and u when an MVD
holds. However, the ^4’s and B ’s to not have to appear consecutively.
In general, we may assume th at the .4’s and B ’s (left side and right side) of
an MVD are disjoint. However, as with FD ’s, it is permissible to add some of
the A's to the right side if we wish.

E x am p le 3.2 9 : In Example 3.28 we encountered an MVD th at in our notation


is expressed:

name —H street city


T hat is, for each star’s name, the set of addresses appears in conjunction with
each of the star’s movies. For an example of how the formal definition of this
MVD applies, consider the first and fourth tuples from Fig. 3.10:
108 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

Figure 3.11: A multivalued dependency guarantees that v exists

name street city title year


C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980

If we let the first tuple be t and the second be u, then the MVD asserts
that we must also find in R the tuple that has name C. F isher, a street and
city that agree with the first tuple, and other attributes ( t i t l e and year) that
agree with the second tuple. There is indeed such a tuple; it is the third tuple
of Fig. 3.10.
Similarly, we could let t be the second tuple above and u be the first. Then
the MVD tells us that there is a tuple of R that agrees with the second in
attributes name, s t r e e t, and c ity and with the first in name, t i t l e , and year.
This tuple also exists; it is the second tuple of Fig. 3.10. □

3.6.3 Reasoning About Multivalued Dependencies


There are a number of rules about MVD’s that are similar to the rules we
learned for FD’s in Section 3.2. For example, MVD’s obey

• Trivial M VD ’s. The MVD

Ai A 2 • • • A n — B\ B 2 • • • B m

holds in any relation if {Bi, B 2 , ... , B m} C {A i, A 2 , ... , A n}.


• The transitive rule, which says that if Ai A 2 ■■■A n —H B iB 2 ■• ■B m and
B 1 B 2 - ■■B m —>-> C 1C2 • • •Ck hold for some relation, then so does

A 1 A 2 ■■■A n —h C 1 C2 • • • Ck

Any C ’s that are also ^4’s must be deleted from the right side.
3.6. MULTIVALUED DEPENDENCIES 109

On the other hand, MVD’s do not obey the splitting part of the splitting/com­
bining rule, as the following example shows.

E x am p le 3 .3 0 : Consider again Fig. 3.10, where we observed the MVD:

name —H s t r e e t c i t y

If the splitting rule applied to MVD’s, we would expect

name —H s t r e e t

also to be true. This MVD says that each star’s street addresses are indepen­
dent of the other attributes, including c ity . However, that statement is false.
Consider, for instance, the first two tuples of Fig. 3.10. The hypothetical MVD
would allow us to infer that the tuples with the streets interchanged:

name street city title year


C. Fisher 5 Locust Ln. Hollywood Star Wars 1977
C. Fisher 123 Maple St. Malibu Star Wars 1977

were in the relation. But these are not true tuples, because, for instance, the
home on 5 Locust Ln. is in Malibu, not Hollywood. □

However, there are several new rules dealing with MVD’s that we can learn.

• FD Promotion. Every FD is an MVD. That is, if

A 1 A 2 • • • A n —»• B 1 B 2 ■■■B m

then A \ A 2 • ■■A n —>-> B \ B 2 • ■■B m.

To see why, suppose R is some relation for which the FD

A 1 A 2 ■• ■A n —¥ B 1 B 2 ■• • B m

holds, and suppose t and u are tuples of R that agree on the A’s. To show
th at the MVD A 1 A 2 ■■■A n —H- B 1 B 2 ■■■B m holds, we have to show that R
also contains a tuple v th at agrees with t and u on the A’s, with t on the B ’s,
and with u on all other attributes. But v can be u. Surely u agrees with t and
u on the .4’s, because we started by assuming th at these two tuples agree on
the j4’s. The FD A 1 A 2 ■■■A n —> B i B ^ - - B m assures us that u agrees with t
on the S ’s. And of course u agrees with itself on the other attributes. Thus,
whenever an FD holds, the corresponding MVD holds.

• Complementation Rule. If A 1 A 2 ■• ■A n —B- B 1 B 2 ■■■B m is an MVD for


relation R, then R also satisfies A 1 A 2 ■■■A„ -++ C 1 C 2 ■■■Ck, where the
C ’s are all attributes of R not among the A.’s and B ’s.
110 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

That is, swapping the B ’s between two tuples that agree in the v4’s has the
same effect as swapping the C s.

E xam ple 3.31: Again consider the relation of Fig. 3.10, for which we asserted
the MVD:

name —»-> s t r e e t c ity

The complementation rule says that

name — t i t l e year

must also hold in this relation, because t i t l e and year are the attributes not
mentioned in the first MVD. The second MVD intuitively means that each star
has a set of movies starred in, which are independent of the star’s addresses.

An MVD whose right side is a subset of the left side is trivial — it holds
in every relation. However, an interesting consequence of the complementation
rule is that there are some other MVD’s that are trivial, but that look distinctly
nontrivial.

• More Trivial M VD ’s. If all the attributes of relation R are

{ Ai , A2 , . . . , A n, Bi , B2 , - - . , B m}

then A 1 A 2 ■■■A n -h- B 1 B 2 ■■■B m holds in R.

To see why these additional trivial MVD’s hold, notice that if we take two
tuples that agree in A i , A 2 , . . . , An and swap their components in attributes
B i,B 2 ,-.- ,B m, we get the same two tuples back, although in the opposite
order.

3.6.4 Fourth Normal Form


The redundancy that we found in Section 3.6.1 to be caused by MVD’s can be
eliminated if we use these dependencies for decomposition. In this section we
shall introduce a new normal form, called “fourth normal form.” In this normal
form, all nontrivial MVD’s are eliminated, as are all FD’s that violate BCNF.
As a result, the decomposed relations have neither the redundancy from FD’s
that we discussed in Section 3.3.1 nor the redundancy from MVD’s that we
discussed in Section 3.6.1.
The “fourth normal form” condition is essentially the BCNF condition, but
applied to MVD’s instead of FD’s. Formally:

• A relation R is in fourth normal form (4NF) if whenever

Ai A2 ■■
■An —H- B\B2 ■
■■Bm
3.6. MULTIVALUED DEPENDENCIES 111

is a nontrivial MVD, { A i,A 2 , ... , A n} is a superkey.

That is, if a relation is in 4NF, then every nontrivial MVD is really an FD with
a superkey on the left. Note that the notions of keys and super keys depend on
FD’s only; adding MVD’s does not change the definition of “key.”

E x am p le 3 .3 2 : The relation of Fig. 3.10 violates the 4NF condition. For


example,

name — street city

is a nontrivial MVD, yet name by itself is not a superkey. In fact, the only key
for this relation is all the attributes. □

Fourth normal form is truly a generalization of BCNF. Recall from Sec­


tion 3.6.3 th at every FD is also an MVD. Thus, every BCNF violation is also
a 4NF violation. P ut another way, every relation that is in 4NF is therefore in
BCNF.
However, there are some relations that are in BCNF but not 4NF. Fig­
ure 3.10 is a good example. The only key for this relation is all five attributes,
and there are no nontrivial FD ’s. Thus it is surely in BCNF. However, as we
observed in Example 3.32, it is not in 4NF.

3.6.5 D ecom position into Fourth Normal Form


The 4NF decomposition algorithm is quite analogous to the BCNF decomposi­
tion algorithm.

A lg o rith m 3.33: Decomposition into Fourth Normal Form.


INPUT: A relation Ro with a set of functional and multivalued dependencies
S0.
O U T P U T : A decomposition of Ro into relations all of which are in 4NF. The
decomposition has the lossless-join property.
M E T H O D : Do the following steps, with R — Ro and S = So'-

1. Find a 4NF violation in R, say A 1 A 2 ---A n B \B 2 ■■■B m. where

{ Ai , A 2 , .. ■ , A n}

is not a superkey. Note this MVD could be a true MVD in S, or it could


be derived from the corresponding FD .41^-2 ■• • A n —>■B iB 2 • • • B m in S,
since every FD is an MVD. If there is none, return; R by itself is a suitable
decomposition.

2. If there is such a 4NF violation, break the schema for the relation R that
has the 4NF violation into two schemas:
112 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

(a) R i, whose schema is A’s and the B's.


(b) R 2 , whose schema is the ^4’s and all attributes of R that are not
among the A’s or B ’s.
3. Find the FD’s and MVD’s that hold in R i and R 2 (Section 3.7 explains
how to do this task in general, but often this “projection” of dependencies
is straightforward). Recursively decompose jRi and R 2 with respect to
their projected dependencies.

E xam ple 3.34: Let us continue Example 3.32. We observed that
name — s t r e e t c ity
was a 4NF violation. The decomposition rule above tells us to replace the
five-attribute schema by one schema that has only the three attributes in the
above MVD and another schema that consists of the left side, name, plus the
attributes that do not appear in the MVD. These attributes are t i t l e and
year, so the following two schemas
{name, s t r e e t , city }
{name, t i t l e , year}
are the result of the decomposition. In each schema there are no nontrivial
multivalued (or functional) dependencies, so they are in 4NF. Note that in the
relation with schema {name, s t r e e t , city } , the MVD:
name s t r e e t c ity
is trivial since it involves all attributes. Likewise, in the relation with schema
{name, t i t l e , year}, the MVD:
name —H t i t l e year
is trivial. Should one or both schemas of the decomposition not be in 4NF, we
would have had to decompose the non-4NF schema(s). □
As for the BCNF decomposition, each decomposition step leaves us with
schemas that have strictly fewer attributes than we started with, so eventually
we get to schemas that need not be decomposed further; that is, they are
in 4NF. Moreover, the argument justifying the decomposition that we gave
in Section 3.4.1 carries over to MVD’s as well. When we decompose a relation
because of an MVD A iA 2 ■■■A„ -H- B iB 2 ■■■B m, this dependency is enough to
justify the claim that we can reconstruct the original relation from the relations
of the decomposition.
We shall, in Section 3.7, give an algorithm by which we can verify that the
MVD used to justify a 4NF decomposition also proves that the decomposition
has a lossless join. Also in that section, we shall show how it is possible, although
time-consuming, to perform the projection of MVD’s onto the decomposed
relations. This projection is required if we are to decide whether or not further
decomposition is necessary.
3.6. MULTIVALUED DEPENDENCIES 113

3.6.6 Relationships Am ong Normal Forms


As we have mentioned, 4NF implies BCNF, which in turn implies 3NF. Thus,
the sets of relation schemas (including dependencies) satisfying the three normal
forms are related as in Fig. 3.12. That is, if a relation with certain dependen­
cies is in 4NF, it is also in BCNF and 3NF. Also, if a relation with certain
dependencies is in BCNF, then it is in 3NF.

R e la tio n s in 3 N F

R e la tio n s in B C N F

R e la tio n s in 4 N F

Figure 3.12: 4NF implies BCNF implies 3NF

Another way to compare the normal forms is by the guarantees they make
about the set of relations that result from a decomposition into that normal
form. These observations are summarized in the table of Fig. 3.13. That is,
BCNF (and therefore 4NF) eliminates the redundancy and other anomalies
th at are caused by FD ’s, while only 4NF eliminates the additional redundancy
th at is caused by the presence of MVD’s that are not FD ’s. Often, 3NF is
enough to eliminate this redundancy, but there are examples where it is not.
BCNF does not guarantee preservation of FD’s, and none of the normal forms
guarantee preservation of MVD’s, although in typical cases the dependencies
are preserved.

Property 3NF BCNF 4NF


Eliminates redundancy No Yes Yes
due to FD ’s
Eliminates redundancy No No Yes
due to MVD’s
Preserves FD ’s Yes No No
Preserves MVD’s No No No

Figure 3.13: Properties of normal forms and their decompositions

3.6.7 Exercises for Section 3.6


E x ercise 3.6 .1 : Suppose we have a relation R (A , B , C) with an MVD A —H-
114 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

B. If we know that the tuples (a , 6i,c i), (a, &2, c-2), and (0 , 63, 03) are in the
current instance of R, what other tuples do we know must also be in R I

E xercise 3.6.2: Suppose we have a relation in which we want to record for


each person their name, Social Security number, and birthdate. Also, for each
child of the person, the name, Social Security number, and birthdate of the
child, and for each automobile the person owns, its serial number and make.
To be more precise, this relation has all tuples

(n, s, b, cn, cs, cb, as, am)

such that

1 . n is the name of the person with Social Security number s.

2 . b is n ’s birthdate.

3. cn is the name of one of n ’s children.


4. cs is cn’s Social Security number.
5. cb is cn’s birthdate.
6 . as is the serial number of one of n ’s automobiles.

7. am is the make of the automobile with serial number as.

For this relation:

a) Tell the functional and multivalued dependencies we would expect to hold.


b) Suggest a decomposition of the relation into 4NF.

E xercise 3.6.3: For each of the following relation schemas and dependencies

a) R{A, B , C, D) with MVD’s A -H- B and A -»■ C.


b) R(A, B , C, D) with MVD’s A -H- B and B -*-» CD.
c) R(A, B , C, D ) with MVD AB -H- C and FD B -> D.
d) R (A ,B ,C ,D ,E ) with MVD’s A -H- B and A B C and FD’s A -> D
and A B -» E.

do the following:

i) Find all the 4NF violations.


ii) Decompose the relations into a collection of relation schemas in 4NF.

E xercise 3.6.4: Give informal arguments why we would not expect any of the
five attributes in Example 3.28 to be functionally determined by the other four.
3.7. A N ALG O RITH M FOR DISCOVERING M V D ’S 115

3.7 A n A lgorithm for Discovering M V D ’s


Reasoning about MVD’s, or combinations of MVD’s and FD’s, is rather more
difficult than reasoning about FD ’s alone. For FD’s, we have Algorithm 3.7 to
decide whether or not an FD follows from some given FD ’s. In this section,
we shall first show that the closure algorithm is really the same as the chase
algorithm we studied in Section 3.4.2. The ideas behind the chase can be
extended to incorporate MVD’s as well as FD’s. Once we have th at tool in
place, we can solve all the problems we need to solve about MVD’s and FD’s,
such as finding whether an MVD follows from given dependencies or projecting
MVD’s and FD ’s onto the relations of a decomposition.

3.7.1 The Closure and the Chase


In Section 3.2.4 we saw how to take a set of attributes X and compute its
closure X + of all attributes that functionally depend on X . In that manner, we
can test whether an FD X -¥ Y follows from a given set of FD’s F, by closing
X with respect to F and seeing whether Y C X +. We could see the closure as
a variant of the chase, in which the starting tableau and the goal condition are
different from what we used in Section 3.4.2.
Suppose we start with a tableau that consists of two rows. These rows agree
in the attributes of X and disagree in all other attributes. If we apply the FD’s
in F to chase this tableau, we shall equate the symbols in exactly those columns
th at are in X + —X . Thus, a chase-based test for whether X —►Y follows from
F can be summarized as:

1. Start with a tableau having two rows that agree only on X .

2. Chase the tableau using the FD’s of F.

3. If the final tableau agrees in all columns of Y , then X Y holds; other­


wise it does not.

E x am p le 3.3 5 : Let us repeat Example 3.8, where we had a relation

R (A ,B ,C ,D ,E ,F )

with FD’s A B —>• C, B C —►AD , D —»• E , and C F —►B . We want to test


whether A B — D holds. Start with the tableau:

A B C D E F
a b Cl di ei fi
a b C2 d,2 e2 h

We can apply A B C to infer c\ = C2 ; say both become c±. The resulting


tableau is:
116 CHAPTER 3. DESIGN TH EORY FOR RELATIONAL DATABASES

A B C D E F
a b Cl d\ hei
a b Cl di hC2

Next, apply B C —> AD to infer that d\ = d2, and apply D —>■ E to infer
ei = e2. At this point, the tableau is:
A B C D E F
a b Cl di ei h
a b Cl di ei h
and we can go no further. Since the two tuples now agree in the D column, we
know th at A B -» D does follow from the given FD’s. □

3.7.2 Extending the Chase to M V D ’s


The method of inferring an FD using the chase can be applied to infer MVD’s
as well. When we try to infer an FD, we are asking whether we can conclude
th at two possibly unequal values must indeed be the same. When we apply an
FD X —> Y , we search for pairs of rows in the tableau that agree on all the
columns of X , and we force the symbols in each column of Y to be equal.
However, MVD’s do not tell us to conclude symbols are equal. Rather,
X —H- Y tells us that if we find two rows of the tableau that agree in X , then
we can form two new tuples by swapping all their components in the attributes
of Y ; the resulting two tuples must also be in the relation, and therefore in
the tableau. Likewise, if we want to infer some MVD X —H- Y from given
FD’s and MVD’s, we start with a tableau consisting of two tuples that agree
in X and disagree in all attributes not in the set X . We apply the given
FD’s to equate symbols, and we apply the given MVD’s to swap the values in
certain attributes between two existing rows of the tableau in order to add new
rows to the tableau. If we ever discover that one of the original tuples, with
its components for Y replaced by those of the other original tuple, is in the
tableau, then we have inferred the MVD.
There is a point of caution to be observed in this more complex chase pro­
cess. Since symbols may get equated and replaced by other symbols, we may
not recognize that we have created one of the desired tuples, because some of
the original symbols may be replaced by others. The simplest way to avoid a
problem is to define the target tuple initially, and never change its symbols.
T hat is, let the target row be one with an unsubscripted letter in each compo­
nent. Let the two initial rows of the tableau for the test of X —>-* Y have the
unsubscripted letters in X . Let the first row also have unsubscripted letters in
Y , and let the second row have the unsubscripted letters in all attributes not
in X or Y . Fill in the other positions of the two rows with new symbols that
each occur only once. When we equate subscripted and unsubscripted symbols,
always replace a subscripted one by the unsubscripted one, as we did in Sec­
tion 3.4.2. Then, when applying the chase, we have only to ask whether the
all-unsubscripted-letters row ever appears in the tableau.
3.7. A N ALG ORITHM FOR DISCOVERING M V D ’S 117

E x am p le 3.3 6 : Suppose we have a relation R (A ,B ,C ,D ) with given depen­


dencies A B and B —>4 C. We wish to prove that A —h> C holds in R. Start
with the two-row tableau that represents A —H- C :

A B C D
a h c di
a b C2 d

Notice th at our target row is (a,b ,c,d ). Both rows of the tableau have the
unsubscripted letter in the column for A. The first row has the unsubscripted
letter in C, and the second row has unsubscripted letters in the remaining
columns.
We first apply the FD A —¥ B to infer th at b = b\. We must therefore
replace the subscripted &i by the unsubscripted b. The tableau becomes:

A B C D
a b c di
a b C2 d

Next, we apply the MVD B —>4 C, since the two rows now agree in the B
column. We swap the C columns to get two more rows which we add to the
tableau, which becomes:

A B C D
a b c d\
a b C2 d
a b C2 di
a b C d

We have now a row with all unsubscripted symbols, which proves that A —h- C
holds in relation R. Notice how the tableau manipulations really give a proof
th at A —»-> C holds. This proof is: “Given two tuples of R that agree in A,
they must also agree in B because A -¥ B . Since they agree in B , we can swap
their C components by B C, and the resulting tuples will be in R. Thus, if
two tuples of R agree in A, the tuples that result when we swap their C ’s are
also in R; i.e., A —>-> C .” □

E x am p le 3 .3 7 : There is a surprising rule for FD’s and MVD’s that says when­
ever there is an MVD X —»-» Y , and any FD whose right side is a (not necessarily
proper) subset of Y , say Z, then X —> Z. We shall use the chase process to
prove a simple example of this rule. Let us be given relation R (A , B , C, D) with
MVD A —>4 B C and FD D -¥ C. We claim that A C.
Since we are trying to prove an FD, we don’t have to worry about a target
tuple of unsubscripted letters. We can start with any two tuples that agree in
A and disagree in every other column, such as:
118 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

A B C D
a bi Cl di
a 62 C2 d,2

Our goal is to prove that cj = C2 -


The only thing we can do to start is to apply the MVD A —H BC, since
the two rows agree on A, but no other columns. When we swap the B and C
columns of these two rows, we get two new rows to add:

A B C D
a bi Cl di
a 62 C2 di
a 62 C2 di
a bi Cl d2

Now, we have pairs of rows that agree in D, so we can apply the FD D —> C .
For instance, the first and third rows have the same D-value d\ , so we can apply
the FD and conclude ci = C2 . That is our goal, so we have proved A —> C . The
new tableau is:
A B C D
a bi Cl di
a 62 Cl di
a 62 Cl di
a bi Cl d2

It happens that no further changes are possible, using the given dependencies.
However, that doesn’t matter, since we already proved what we need. □

3.7.3 Why the Chase Works for M VD’s


The arguments are essentially the same as we have given before. Each step of the
chase, whether it equates symbols or generates new rows, is a true observation
about tuples of the given relation R that is justified by the FD or MVD that
we apply in that step. Thus, a positive conclusion of the chase is always a proof
that the concluded FD or MVD holds in R.
When the chase ends in failure — the goal row (for an MVD) or the desired
equality of symbols (for an FD) is not produced — then the final tableau is a
counterexample. It satisfies the given dependencies, or else we would not be
finished making changes. However, it does not satisfy the dependency we were
trying to prove.
There is one other issue that did not come up when we performed the chase
using only FD’s. Since the chase with MVD’s adds rows to the tableau, how
do we know we ever terminate the chase? Could we keep adding rows forever,
never reaching our goal, but not sure that after a few more steps we would
achieve that goal? Fortunately, that cannot happen. The reason is that we
3.7. A N ALG O RITH M FOR DISCOVERING M V D ’S 119

never create any new symbols. We start out with at most two symbols in each
of k columns, and all rows we create will have one of these two symbols in its
component for that column. Thus, we cannot ever have more than 2k rows in
our tableau, if k is the number of columns. The chase with MVD’s can take
exponential time, but it cannot run forever.

3.7.4 Projecting M V D ’s
Recall th at our reason for wanting to infer MVD’s was to perform a cascade of
decompositions leading to 4NF relations. To do that task, we need to be able
to project the given dependencies onto the schemas of the two relations that
we get in the first step of the decomposition. Only then can we know whether
they are in 4NF or need to be decomposed further.
In the worst case, we have to test every possible FD and MVD for each of
the decomposed relations. The chase test is applied on the full set of attributes
of the original relation. However, the goal for an MVD is to produce a row
of the tableau that has unsubscripted letters in all the attributes of one of
the relations of the decomposition; th at row may have any letters in the other
attributes. The goal for an FD is the same: equality of the symbols in a given
column.

E x am p le 3.3 8 : Suppose we have a relation R (A , B , C, D, E ) th at we decom­


pose, and let one of the relations of the decomposition be 5(A, B , C). Suppose
th at the MVD A —H CD holds in R. Does this MVD imply any dependency
in S? We claim that A —>4 C holds in S, as does A —>4 B (by the comple­
mentation rule). Let us verify that A — C holds in S. We start with the
tableau:

A B C D E
a bi c di ei
a b Cl d e

Use the MVD of R, A —»-> CD to swap the C and D components of these two
rows to get two new rows:

A B C D E
a bi c di ei
a b C2 d e
a bi C2 d Cl
a b C di e

Notice th at the last row has unsubscripted symbols in all the attributes of S,
th at is, A, B , and C. That is enough to conclude that A —B- C holds in S. □

Often, our search for FD ’s and MVD’s in the projected relations does not
have to be completely exhaustive. Here are some simplifications.
120 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

1. It is surely not necessary to check the trivial FD’s and MVD’s.


2. For FD’s, we can restrict ourselves to looking for FD’s with a singleton
right side, because of the combining rule for FD’s.
3. An FD or MVD whose left side does not contain the left side of any given
dependency surely cannot hold, since there is no way for its chase test
to get started. That is, the two rows with which you start the test are
unchanged by the given dependencies.

3.7.5 Exercises for Section 3.7


E xercise 3 .7 .1 : Use the chase test to tell whether each of the following depen­
dencies hold in a relation R(A, B , C, D, E) with the dependencies A —H- BC,
B -> D, and C -»• E.
a) A -> D.
b) A D.
c) A ^ E .
d) A -H- E.
! E xercise 3 .7 .2 : If we project the relation R of Exercise 3.7.1 onto S(A , C, E),
what nontrivial FD’s and MVD’s hold in S?
! E xercise 3 .7 .3 : Show the following rules for MVD’s. In each case, you can
set up the proof as a chase test, but you must think a little more generally than
in the examples, since the set of attributes are arbitrary sets X , Y , Z , and the
other unnamed attributes of the relation in which these dependencies hold.
a) The Union Rule. If X , Y , and Z are sets of attributes, X —>-» Y , and
X Z, then X -h- (Y U Z).
b) The Intersection Rule. If X , Y , and Z are sets of attributes, X -y* Y ,
and X — Z, then X —y-y (Y n Z ).
c) The Difference Rule. If X , Y , and Z are sets of attributes, X —h Y , and
X -»• Z, then X - » ( Y - Z).
d) Removing attributes shared by left and right side. If X —H Y holds, then
X - » {Y - X ) holds.
! E xercise 3 .7 .4 : Give counterexample relations to show why the following rules
for MVD’s do not hold. Hint: apply the chase test and see what happens.
a) If A — B C , then A —yy B.
b) If A —H- B, then A —►B.
c) If A B —»-> C, then A -h- C.
.8. SU M M ARY OF CHAPTER 3 121

.8 Summary of Chapter 3
♦ Functional Dependencies: A functional dependency is a statement that
two tuples of a relation that agree on some particular set of attributes
must also agree on some other particular set of attributes.
♦ Keys of a Relation: A superkey for a relation is a set of attributes that
functionally determines all the attributes of the relation. A key is a su­
perkey, no proper subset of which is also a superkey.
♦ Reasoning About Functional Dependencies: There are many rules th at let
us infer that one FD X —» A holds in any relation instance th at satisfies
some other given set of FD ’s. To verify that X -» A holds, compute the
closure of X , using the given FD ’s to expand X until it includes A.
♦ Minimal Basis for a set of F D ’s: For any set of FD’s, there is at least
one minimal basis, which is a set of FD’s equivalent to the original (each
set implies the other set), with singleton right sides, no FD that can be
eliminated while preserving equivalence, and no attribute in a left side
that can be eliminated while preserving equivalence.
♦ Boyce-Codd Normal Form: A relation is in BCNF if the only nontrivial
FD’s say that some superkey functionally determines one or more of the
other attributes. A major benefit of BCNF is that it eliminates redun­
dancy caused by the existence of FD ’s.
♦ Lossless-Join Decomposition: A useful property of a decomposition is that
the original relation can be recovered exactly by taking the natural join of
the relations in the decomposition. Any decomposition gives us back at
least the tuples with which we start, but a carelessly chosen decomposition
can give tuples in the join that were not in the original relation.
♦ Dependency-Preserving Decomposition: Another desirable property of a
decomposition is th at we can check all the functional dependencies that
hold in the original relation by checking FD ’s in the decomposed relations.
♦ Third Normal Form: Sometimes decomposition into BCNF can lose the
dependency-preservation property. A relaxed form of BCNF, called 3NF,
allows an FD X -»■ A even if X is not a superkey, provided A is a member
of some key. 3NF does not guarantee to eliminate all redundancy due to
FD ’s, but often does so.
♦ The Chase: We can test whether a decomposition has the lossless-join
property by setting up a tableau — a set of rows that represent tuples of
the original relation. We chase a tableau by applying the given functional
dependencies to infer th at certain pairs of symbols must be the same. The
decomposition is lossless with respect to a given set of FD ’s if and only if
the chase leads to a row identical to the tuple whose membership in the
join of the projected relations we assumed.
122 CHAPTER 3. DESIGN THEORY FOR RELATIONAL DATABASES

♦ Synthesis Algorithm, for 3NF: If we take a minimal basis for a given set
of FD’s, turn each of these FD’s into a relation, and add a key for the
relation, if necessary, the result is a decomposition into 3NF that has the
lossless-join and dependency-preservation properties.

♦ Multivalued Dependencies: A multivalued dependency is a statement that


two sets of attributes in a relation have sets of values that appear in all
possible combinations.

♦ Fourth Normal Form: MVD’s can also cause redundancy in a relation.


4NF is like BCNF, but also forbids nontrivial MVD’s whose left side is
not a superkey. It is possible to decompose a relation into 4NF without
losing information.

♦ Reasoning About M VD ’s: We can infer MVD’s and FD’s from a given set
of MVD’s and FD’s by a chase process. We start with a two-row tableau
that represent the dependency we are trying to prove. FD’s are applied by
equating symbols, and MVD’s are applied by adding rows to the tableau
that have the appropriate components interchanged.

3.9 References for Chapter 3


Third normal form was described in [6 ]. This paper introduces the idea of
functional dependencies, as well as the basic relational concept. Boyce-Codd
normal form is in a later paper [7].
Multivalued dependencies and fourth normal form were defined by Fagin in
[9]. However, the idea of multivalued dependencies also appears independently
in [8 ] and [11].
Armstrong was the first to study rules for inferring FD’s [2], The rules for
FD’s th at we have covered here (including what we call “Armstrong’s axioms”)
and rules for inferring MVD’s as well, come from [3].
The technique for testing an FD by computing the closure for a set of at­
tributes is from [4], as is the fact that a minimal basis provides a 3NF de­
composition. The fact that this decomposition provides the lossless-join and
dependency-preservation propoerties is from [5].
The tableau test for the lossless-join property and the chase are from [1],
More information and the history of the idea is found in [10].

1. A. V. Aho, C. Beeri, and J. D. Ullman, “The theory of joins in relational


databases,” ACM Transactions on Database Systems 4:3, pp. 297-314,
1979.

2. W. W. Armstrong, “Dependency structures of database relationships,”


Proceedings of the 1974 IFIP Congress, pp. 580-583.
3.9. REFERENCES FOR CHAPTER 3 123

3. C. Beeri, R. Fagin, and J. H. Howard, “A complete axiomatization for


functional and multivalued dependencies,” ACM SIGMOD Intl. Conf. on
Management of Data, pp. 47-61, 1977.

4. P. A. Bernstein, “Synthesizing third normal form relations from functional


dependencies,” ACM Transactions on Database Systems 1:4, pp. 277-298,
1976.

5. J. Biskup, U. Dayal, and P. A. Bernstein, “Synthesizing independent


database schemas,” ACM SIGMOD Intl. Conf. on Management of Data,
pp. 143-152, 1979.

6 . E. F. Codd, “A relational model for large shared data banks,” Comm.


ACM 13:6, pp. 377-387, 1970.
7. E. F. Codd, “Further normalization of the data base relational model,” in
Database Systems (R. Rustin, ed.), Prentice-Hall, Englewood Cliffs, NJ,
1972.

8 . C. Delobel, “Normalization and hierarchical dependencies in the relational


data model,” ACM Transactions on Database Systems 3:3, pp. 201-222,
1978.
9. R. Fagin, “Multivalued dependencies and a new normal form for relational
databases,” ACM Transactions on Database Systems 2:3, pp. 262-278,
1977.

10. J. D. Ullman, Principles of Database and Knowledge-Base Systems, Vol­


ume I, Computer Science Press, New York, 1988.

11. C. Zaniolo and M. A. Melkanoff, “On the design of relational database


schemata,” ACM Transactions on Database Systems 6:1, pp. 1-47, 1981.
Chapter 4

High-Level Database
M odels

Let us consider the process whereby a new database, such as our movie database,
is created. Figure 4.1 suggests the process. We begin with a design phase, in
which we address and answer questions about what information will be stored,
how information elements will be related to one another, what constraints such
as keys or referential integrity may be assumed, and so on. This phase may last
for a long time, while options are evaluated and opinions axe reconciled. We
show this phase in Fig. 4.1 as the conversion of ideas to a high-level design.

R elational
R elational
Ideas ---------- ► H ig h -L ev el ---------- ^ D atabase
D esi§ n S chem a DBM S

Figure 4.1: The database modeling and implementation process

Since the great majority of commercial database systems use the relational
model, we might suppose that the design phase should use this model too.
However, in practice it is often easier to start with a higher-level model and
then convert the design to the relational model. The primary reason for doing so
is th at the relational model has only one concept — the relation — rather than
several complementary concepts that more closely model real-world situations.
Simplicity of concepts in the relational model is a great strength of the model,
especially when it comes to efficient implementation of database operations.
Yet that strength becomes a weakness when we do a preliminary design, which
is why it often is helpful to begin by using a high-level design model.
There are several options for the notation in which the design is expressed.
The first, and oldest, method is the “entity-relationship diagram,” and here is
where we shall start in Section 4.1. A more recent trend is the use of UML
(“Unified Modeling Language”), a notation that was originally designed for

125
126 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

describing object-oriented software projects, but which has been adapted to de­
scribe database schemas as well. We shall see this model in Section 4.7. Finally,
in Section 4.9, we shall consider ODL (“Object Description Language”), which
was created to describe databases as collections of classes and their objects.
The next phase shown in Fig. 4.1 is the conversion of our high-level design
to a relational design. This phase occurs only when we are confident of the
high-level design. Whichever of the high-level models we use, there is a fairly
mechanical way of converting the high-level design into a relational database
schema, which then runs on a conventional DBMS. Sections 4.5 and 4.6 discuss
conversion of E /R diagrams to relational database schemas. Section 4.8 does
the same for UML, and Section 4.10 serves for ODL.

4.1 The E ntity/R elationship M odel


In the entity-relationship model (or E /R model), the structure of data is rep­
resented graphically, as an “entity-relationship diagram,” using three principal
element types:

1. Entity sets,
2 . Attributes, and

3. Relationships.

We shall cover each in turn.

4.1.1 Entity Sets


An entity is an abstract object of some sort, and a collection of similar entities
forms an entity set. An entity in some ways resembles an “object” in the sense of
object-oriented programming. Likewise, an entity set bears some resemblance
to a class of objects. However, the E /R model is a static concept, involving the
structure of data and not the operations on data. Thus, one would not expect
to find methods associated with an entity set as one would with a class.

E x am p le 4 .1 : Let us consider the design of our running movie-database ex­


ample. Each movie is an entity, and the set of all movies constitutes an entity
set. Likewise, the stars are entities, and the set of stars is an entity set. A
studio is another kind of entity, and the set of studios is a third entity set that
will appear in our examples. □

4.1.2 Attributes
Entity sets have associated attributes, which are properties of the entities in
that set. For instance, the entity set Movies might be given attributes such
as title and length. It should not surprise you if the attributes for the entity
4.1. THE E N TITY /R E L A TIO N SH IP MODEL 127

E /R M odel Variations
In some versions of the E /R model, the type of an attribute can be either:

1. A primitive type, as in the version presented here.

2. A “struct,” as in C, or tuple with a fixed number of primitive com­


ponents.
3. A set of values of one type: either primitive or a “struct” type.

For example, the type of an attribute in such a model could be a set of


pairs, each pair consisting of an integer and a string.

set Movies resemble the attributes of the relation Movies in our example. It
is common for entity sets to be implemented as relations, although not every
relation in our final relational design will come from an entity set.
In our version of the E /R model, we shall assume that attributes are of
primitive types, such as strings, integers, or reals. There are other variations of
this model in which attributes can have some limited structure; see the box on
“E /R Model Variations.”

4.1.3 Relationships
Relationships are connections among two or more entity sets. For instance,
if Movies and Stars are two entity sets, we could have a relationship Stars-in
that connects movies and stars. The intent is that a movie entity m is related
to a star entity s by the relationship Stars-in if s appears in movie m. While
binary relationships, those between two entity sets, are by far the most common
type of relationship, the E /R model allows relationships to involve any number
of entity sets. We shall defer discussion of these multiway relationships until
Section 4.1.7.

4.1.4 Entity-Relationship Diagrams


An E /R diagram is a graph representing entity sets, attributes, and relation­
ships. Elements of each of these kinds are represented by nodes of the graph,
and we use a special shape of node to indicate the kind, as follows:

• Entity sets are represented by rectangles.

• Attributes are represented by ovals.

• Relationships are represented by diamonds.


128 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Edges connect an entity set to its attributes and also connect a relationship to
its entity sets.

E x am p le 4.2 : In Fig. 4.2 is an E /R diagram that represents a simple database


about movies. The entity sets are Movies, Stars, and Studios.

Figure 4.2: An entity-relationship diagram for the movie database

The Movies entity set has four of our usual attributes: title, year, length,
and genre. The other two entity sets Stars and Studios happen to have the
same two attributes: name and address, each with an obvious meaning. We
also see two relationships in the diagram:

1. Stars-in is a relationship connecting each movie to the stars of that movie.


This relationship consequently also connects stars to the movies in which
they appeared.
2. Owns connects each movie to the studio that owns the movie. The arrow
pointing to entity set Studios in Fig. 4.2 indicates that each movie is
owned by at most one studio. We shall discuss uniqueness constraints
such as this one in Section 4.1.6.

4.1.5 Instances of an E /R Diagram


E /R diagrams are a notation for describing schemas of databases. We may
imagine th at a database described by an E /R diagram contains particular data,
an “instance” of the database. Since the database is not implemented in the
E /R model, only designed, the instance never exists in the sense that a relation’s
4.1. THE E N TITY/R E LA TIO N SH IP MODEL 129

instances exist in a DBMS. However, it is often useful to visualize the database


being designed as if it existed.
For each entity set, the database instance will have a particular finite set
of entities. Each of these entities has particular values for each attribute. A
relationship R th at connects n entity sets E i , E 2,... ,E„ may be imagined to
have an “instance” that consists of a finite set of tuples (e i,e 2 , ... ,e n), where
each ei is chosen from the entities th at are in the current instance of entity set
Ei . We regard each of these tuples as “connected” by relationship R.
This set of tuples is called the relationship set for R. It is often helpful to
visualize a relationship set as a table or relation. However, the “tuples” of a
relationship set are not really tuples of a relation, since their components are
entities rather than primitive types such as strings or integers. The columns of
the table are headed by the names of the entity sets involved in the relationship,
and each list of connected entities occupies one row of the table. As we shall
see, however, when we convert relationships to relations, the resulting relation
is not the same as the relationship set.

E x am p le 4 .3 : An instance of the Stars-in relationship could be visualized as


a table with pairs such as:

Movies Stars
Basic Instinct Sharon Stone
Total Recall Arnold Schwarzenegger
Total Recall Sharon Stone

The members of the relationship set are the rows of the table. For instance,
(Basic Instinct, Sharon Stone) is a tuple in the relationship set for the current
instance of relationship Stars-in. □

4.1.6 M ultiplicity o f Binary E /R Relationships


In general, a binary relationship can connect any member of one of its entity
sets to any number of members of the other entity set. However, it is common
for there to be a restriction on the “multiplicity” of a relationship. Suppose R
is a relationship connecting entity sets E and F. Then:

• If each member of E can be connected by R to at most one member of F,


then we say that R is many-one from E to F. Note that in a many-one
relationship from E to F, each entity in F can be connected to many
members of E. Similarly, if instead a member of F can be connected by
R to at most one member of E , then we say R is many-one from F to E
(or equivalently, one-many from E to F).

• If R is both many-one from E to F and many-one from F to E , then we


say th at R is one-one. In a one-one relationship an entity of either entity
set can be connected to at most one entity of the other set.
130 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

• If R is neither many-one from E to F or from F to E , then we say R is


many-many.

As we mentioned in Example 4.2, arrows can be used to indicate the multi­


plicity of a relationship in an E /R diagram. If a relationship is many-one from
entity set E to entity set F, then we place an arrow entering F. The arrow
indicates that each entity in set E is related to at most one entity in set F.
Unless there is also an arrow on the edge to E , an entity in F may be related
to many entities in E.

E x am p le 4 .4 : A one-one relationship between entity sets E and F is repre­


sented by arrows pointing to both E and F. For instance, Fig. 4.3 shows two
entity sets, Studios and Presidents, and the relationship Runs between them
(attributes are omitted). We assume that a president can run only one studio
and a studio has only one president, so this relationship is one-one, as indicated
by the two arrows, one entering each entity set.

Studios P residents

Figure 4.3: A one-one relationship

Remember that the arrow means “at most one”; it does not guarantee ex­
istence of an entity of the set pointed to. Thus, in Fig. 4.3, we would expect
th at a “president” is surely associated with some studio; how could they be a
“president” otherwise? However, a studio might not have a president at some
particular time, so the arrow from Runs to Presidents truly means “at most one”
and not “exactly one.” We shall discuss the distinction further in Section 4.3.3.

4.1.7 Multiway Relationships


The E /R model makes it convenient to define relationships involving more than
two entity sets. In practice, ternary (three-way) or higher-degree relationships
are rare, but they occasionally are necessary to reflect the true state of affairs.
A multiway relationship in an E /R diagram is represented by lines from the
relationship diamond to each of the involved entity sets.

E x am p le 4 .5 : In Fig. 4.4 is a relationship Contracts that involves a studio,


a star, and a movie. This relationship represents that a studio has contracted
with a particular star to act in a particular movie. In general, the value of
an E /R relationship can be thought of as a relationship set of tuples whose
components are the entities participating in the relationship, as we discussed in
Section 4.1.5. Thus, relationship Contracts can be described by triples of the
form (studio, star, movie).
4.1. THE EN TITY/R E LA TIO N SH IP MODEL 131

Figure 4.4: A three-way relationship

In multiway relationships, an arrow pointing to an entity set E means that if


we select one entity from each of the other entity sets in the relationship, those
entities are related to at most one entity in E. (Note that this rule generalizes
the notation used for many-one, binary relationships.) Informally, we may think
of a functional dependency with E on the right and all the other entity sets of
the relationship on the left.
In Fig. 4.4 we have an arrow pointing to entity set Studios, indicating that
for a particular star and movie, there is only one studio with which the star has
contracted for that movie. However, there are no arrows pointing to entity sets
Stars or Movies. A studio may contract with several stars for a movie, and a
star may contract with one studio for more than one movie. □

4.1.8 Roles in Relationships


It is possible that one entity set appears two or more times in a single relation­
ship. If so, we draw as many lines from the relationship to the entity set as the
entity set appears in the relationship. Each line to the entity set represents a
different role th at the entity set plays in the relationship. We therefore label the
edges between the entity set and relationship by names, which we call “roles.”

E x am p le 4 .6 : In Fig. 4.5 is a relationship Sequel-of between the entity set


Movies and itself. Each relationship is between two movies, one of which is
the sequel of the other. To differentiate the two movies in a relationship, one
line is labeled by the role Original and one by the role Sequel, indicating the
132 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Limits on Arrow Notation in Multiway Relationships


There are not enough choices of arrow or no-arrow on the lines attached to
a relationship with three or more participants. Thus, we cannot describe
every possible situation with arrows. For instance, in Fig. 4.4, the studio
is really a function of the movie alone, not the star and movie jointly,
since only one studio produces a movie. However, our notation does not
distinguish this situation from the case of a three-way relationship where
the entity set pointed to by the arrow is truly a function of both other
entity sets. To handle all possible situations, we would have to give a set
of functional dependencies involving the entity sets of the relationship.

Original

Figure 4.5: A relationship with roles

original movie and its sequel, respectively. We assume that a movie may have
many sequels, but for each sequel there is only one original movie. Thus, the
relationship is many-one from Sequel movies to Original movies, as indicated
by the arrow in the E /R diagram of Fig. 4.5. □

E xam ple 4.7 : As a final example that includes both a multiway relationship
and an entity set with multiple roles, in Fig. 4.6 is a more complex version of
the Contracts relationship introduced earlier in Example 4.5. Now, relationship
Contracts involves two studios, a star, and a movie. The intent is that one
studio, having a certain star under contract (in general, not for a particular
movie), may further contract with a second studio to allow that star to act in
a particular movie. Thus, the relationship is described by 4-tuples of the form
(studiol, studio 2 , star, movie), meaning that studio 2 contracts with studiol for
the use of studiol’s star by studio 2 for the movie.
We see in Fig. 4.6 arrows pointing to Studios in both of its roles, as “owner”
of the star and as producer of the movie. However, there are not arrows pointing
to Stars or Movies. The rationale is as follows. Given a star, a movie, and a
studio producing the movie, there can be only one studio that “owns” the
star. (We assume a star is under contract to exactly one studio.) Similarly,
only one studio produces a given movie, so given a star, a movie, and the
star’s studio, we can determine a unique producing studio. Note that in both
4.1. THE E N TITY/R E LA TIO N SH IP MODEL 133

Figure 4.6: A four-way relationship

cases we actually needed only one of the other entities to determine the unique
entity—for example, we need only know the movie to determine the unique
producing studio—but this fact does not change the multiplicity specification
for the multiway relationship.
There are no arrows pointing to Stars or Movies. Given a star, the star’s
studio, and a producing studio, there could be several different contracts allow­
ing the star to act in several movies. Thus, the other three components in a
relationship 4-tuple do not necessarily determine a unique movie. Similarly, a
producing studio might contract with some other studio to use more than one
of their stars in one movie. Thus, a star is not determined by the three other
components of the relationship. □

Figure 4.7: A relationship with an attribute


134 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

4.1.9 Attributes on Relationships


Sometimes it is convenient, or even essential, to associate attributes with a
relationship, rather than with any one of the entity sets that the relationship
connects. For example, consider the relationship of Fig. 4.4, which represents
contracts between a star and studio for a movie.1 We might wish to record the
salary associated with this contract. However, we cannot associate it with the
star; a star might get different salaries for different movies. Similarly, it does
not make sense to associate the salary with a studio (they may pay different
salaries to different stars) or with a movie (different stars in a movie may receive
different salaries).
However, we can associate a unique salary with the (star, movie, studio)
triple in the relationship set for the Contracts relationship. In Fig. 4.7 we see
Fig. 4.4 fleshed out with attributes. The relationship has attribute salary, while
the entity sets have the same attributes that we showed for them in Fig. 4.2.
In general, we may place one or more attributes on any relationship. The
values of these attributes are functionally determined by the entire tuple in the
relationship set for that relation. In some cases, the attributes can be deter­
mined by a subset of the entity sets involved in the relation, but presumably
not by any single entity set (or it would make more sense to place the attribute
on th at entity set). For instance, in Fig. 4.7, the salary is really determined by
the movie and star entities, since the studio entity is itself determined by the
movie entity.
It is never necessary to place attributes on relationships. We can instead
invent a new entity set, whose entities have the attributes ascribed to the rela­
tionship. If we then include this entity set in the relationship, we can omit the
attributes on the relationship itself. However, attributes on a relationship are
a useful convention, which we shall continue to use where appropriate.

E x am p le 4 .8 : Let us revise the E /R diagram of Fig. 4.7, which has the


salary attribute on the Contracts relationship. Instead, we create an entity
set Salaries, with attribute salary. Salaries becomes the fourth entity set of
relationship Contracts. The whole diagram is shown in Fig. 4.8.
Notice that there is an arrow into the Salaries entity set in Fig. 4.8. That
arrow is appropriate, since we know that the salary is determined by all the other
entity sets involved in the relationship. In general, when we do a conversion
from attributes on a relationship to an additional entity set, we place an arrow
into that entity set. □

4.1.10 Converting Multiway Relationships to Binary


There are some data models, such as UML (Section 4.7) and ODL (Section 4.9),
that limit relationships to be binary. Thus, while the E /R model does not
1H ere, we have rev erte d to th e earlier n o tio n o f th ree-w ay c o n tra c ts in E x am p le 4.5, n o t
th e four-w ay relatio n sh ip of E x am p le 4.7.
4.1. THE E N TITY/R E LA TIO N SH IP MODEL 135

Figure 4.8: Moving the attribute to an entity set

require binary relationships, it is useful to observe th at any relationship con­


necting more than two entity sets can be converted to a collection of binary,
many-one relationships. To do so, introduce a new entity set whose entities we
may think of as tuples of the relationship set for the multiway relationship. We
call this entity set a connecting entity set. We then introduce many-one rela­
tionships from the connecting entity set to each of the entity sets that provide
components of tuples in the original, multiway relationship. If an entity set
plays more than one role, then it is the target of one relationship for each role.

E x am p le 4 .9 : The four-way Contracts relationship in Fig. 4.6 can be replaced


by an entity set that we may also call Contracts. As seen in Fig. 4.9, it partici­
pates in four relationships. If the relationship set for the relationship Contracts
has a 4-tuple (studiol, studio2, star, movie) then the entity set Contracts has
an entity e. This entity is linked by relationship Star-of to the entity star in
entity set Stars. It is linked by relationship Movie-of to the entity movie in
Movies. It is linked to entities studiol and studioB of Studios by relationships
Studio-of-star and Producing-studio, respectively.
Note that we have assumed there are no attributes of entity set Contracts,
although the other entity sets in Fig. 4.9 have unseen attributes. However, it is
possible to add attributes, such as the date of signing, to entity set Contracts.

4.1.11 Subclasses in the E /R M odel


Often, an entity set contains certain entities that have special properties not
associated with all members of the set. If so, we find it useful to define certain
136 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Figure 4.9: Replacing a multiway relationship by an entity set and binary


relationships

special-case entity sets, or subclasses, each with its own special attributes and/or
relationships. We connect an entity set to its subclasses using a relationship
called isa (i.e., “an A is a B ” expresses an “isa” relationship from entity set A
to entity set B).
An isa relationship is a special kind of relationship, and to emphasize that
it is unlike other relationships, we use a special notation: a triangle. One side
of the triangle is attached to the subclass, and the opposite point is connected
to the superclass. Every isa relationship is one-one, although we shall not draw
the two arrows that are associated with other one-one relationships.

E x am p le 4.10: Among the special kinds of movies we might store in our


example database are cartoons and murder mysteries. For each of these special
movie types, we could define a subclass of the entity set Movies. For instance, let
us postulate two subclasses: Cartoons and Murder-Mysteries. A cartoon has, in
addition to the attributes and relationships of Movies, an additional relationship
called Voices that gives us a set of stars who speak, but do not appear in the
movie. Movies that are not cartoons do not have such stars. Murder-mysteries
have an additional attribute weapon. The connections among the three entity
sets Movies, Cartoons, and Murder-Mysteries is shown in Fig. 4.10. □

While, in principle, a collection of entity sets connected by isa relationships


could have any structure, we shall limit isa-structures to trees, in which there
4.1. THE EN TITY/R E LA TIO N SH IP MODEL 137

Figure 4.10: Isa relationships in an E /R diagram

is one root entity set (e.g., Movies in Fig. 4.10) that is the most general, with
progressively more specialized entity sets extending below the root in a tree.
Suppose we have a tree of entity sets, connected by isa relationships. A
single entity consists of components from one or more of these entity sets, as
long as those components are in a subtree including the root. That is, if an
entity e has a component c in entity set E , and the parent of E in the tree is
F , then entity e also has a component d in F. Further, c and d must be paired
in the relationship set for the isa relationship from E to F. The entity e has
whatever attributes any of its components has, and it participates in whatever
relationships any of its components participate in.

E x am p le 4.1 1 : The typical movie, being neither a cartoon nor a murder-


mystery, will have a component only in the root entity set Movies in Fig. 4.10.
These entities have only the four attributes of Movies (and the two relationships
138 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

The E /R View of Subclasses


There is a significant resemblance between “isa” in the E /R model and
subclasses in object-oriented languages. In a sense, “isa” relates a subclass
to its superclass. However, there is also a fundamental difference between
the conventional E /R view and the object-oriented approach: entities are
allowed to have representatives in a tree of entity sets, while objects are
assumed to exist in exactly one class or subclass.
The difference becomes apparent when we consider how the movie
Roger Rabbit was handled in Example 4.11. In an object-oriented ap­
proach, we would need for this movie a fourth entity set, “cartoon-murder-
mystery,” which inherited all the attributes and relationships of Movies,
Cartoons, and Murder-Mysteries. However, in the E /R model, the effect
of this fourth subclass is obtained by putting components of the movie
Roger Rabbit in both the Cartoons and Murder-Mysteries entity sets.

of Movies — Stars-in and Owns — that are not shown in Fig. 4.10).
A cartoon that is not a murder-mystery will have two components, one in
Movies and one in Cartoons. Its entity will therefore have not only the four
attributes of Movies, but the relationship Voices. Likewise, a murder-mystery
will have two components for its entity, one in Movies and one in Murder-
Mysteries and thus will have five attributes, including weapon.
Finally, a movie like Roger Rabbit, which is both a cartoon and a murder-
mystery, will have components in all three of the entity sets Movies, Cartoons,
and Murder-Mysteries. The three components are connected into one entity by
the isa relationships. Together, these components give the Roger Rabbit entity
all four attributes of Movies plus the attribute weapon of entity set Murder-
Mysteries and the relationship Voices of entity set Cartoons. □

4.1.12 Exercises for Section 4.1


E xercise 4.1.1: Design a database for a bank, including information about
customers and their accounts. Information about a customer includes their
name, address, phone, and Social Security number. Accounts have numbers,
types (e.g., savings, checking) and balances. Also record the customer(s) who
own an account. Draw the E /R diagram for this database. Be sure to include
arrows where appropriate, to indicate the multiplicity of a relationship.

E xercise 4.1.2: Modify your solution to Exercise 4.1.1 as follows:

a) Change your diagram so an account can have only one customer.


b) Further change your diagram so a customer can have only one account.
4.1. THE E N TITY/R E LA TIO N SH IP MODEL 139

! c) Change your original diagram of Exercise 4.1.1 so th at a customer can


have a set of addresses (which are street-city-state triples) and a set of
phones. Remember that we do not allow attributes to have nonprimitive
types, such as sets, in the E /R model.

! d) Further modify your diagram so that customers can have a set of ad­
dresses, and at each address there is a set of phones.

E x ercise 4 .1 .3 : Give an E /R diagram for a database recording information


about teams, players, and their fans, including:

1. For each team, its name, its players, its team captain (one of its players),
and the colors of its uniform.

2. For each player, his/her name.

3. For each fan, his/her name, favorite teams, favorite players, and favorite
color.

Remember th at a set of colors is not a suitable attribute type for teams. How
can you get around this restriction?

E x ercise 4 .1 .4 : Suppose we wish to add to the schema of Exercise 4.1.3 a


relationship Led-by among two players and a team. The intention is that this
relationship set consists of triples (playerl, player2, team) such th at player 1
played on the team at a time when some other player 2 was the team captain.

a) Draw the modification to the E /R diagram.

b) Replace your ternary relationship with a new entity set and binary rela­
tionships.

! c) Are your new binary relationships the same as any of the previously ex­
isting relationships? Note th at we assume the two players are different,
i.e., the team captain is not self-led.

E xercise 4 .1 .5 : Modify Exercise 4.1.3 to record for each player the history of
teams on which they have played, including the start date and ending date (if
they were traded) for each such team.

! E x ercise 4 .1 .6 : Design a genealogy database with one entity set: People. The
information to record about persons includes their name (an attribute), their
mother, father, and children.

! E x ercise 4 .1 .7 : Modify your “people” database design of Exercise 4.1.6 to


include the following special types of people:

1. Females.
140 CHAPTER 4. HIGH-LEVEL D ATABASE MODELS

2. Males.

3. People who are parents.

You may wish to distinguish certain other kinds of people as well, so relation­
ships connect appropriate subclasses of people.

E x ercise 4 .1 .8 : An alternative way to represent the information of Exer­


cise 4.1.6 is to have a ternary relationship Family with the intent th at in the
relationship set for Family, triple (person, mother, father) is a person, their
mother, and their father; all three are in the People entity set, of course.

a) Draw this diagram, placing arrows on edges where appropriate.

b) Replace the ternary relationship Family by an entity set and binary rela­
tionships. Again place arrows to indicate the multiplicity of relationships.

E x ercise 4 .1 .9 : Design a database suitable for a university registrar. This


database should include information about students, departments, professors,
courses, which students are enrolled in which courses, which professors are
teaching which courses, student grades, TA’s for a course (TA’s are students),
which courses a department offers, and any other information you deem appro­
priate. Note th at this question is more free-form than the questions above, and
you need to make some decisions about multiplicities of relationships, appro­
priate types, and even what information needs to be represented.

! E x ercise 4 .1 .1 0 : Informally, we can say th at two E /R diagrams “have the


same information” if, given a real-world situation, the instances of these two di­
agrams that reflect this situation can be computed from one another. Consider
the E /R diagram of Fig. 4.6. This four-way relationship can be decomposed
into a three-way relationship and a binary relationship by taking advantage
of the fact that for each movie, there is a unique studio that produces that
movie. Give an E /R diagram without a four-way relationship th at has the
same information as Fig. 4.6.

4.2 D esign Principles


We have yet to learn many of the details of the E /R model, but we have enough
to begin study of the crucial issue of what constitutes a good design and what
should be avoided. In this section, we offer some useful design principles.

4.2.1 Faithfulness
First and foremost, the design should be faithful to the specifications of the
application. That is, entity sets and their attributes should reflect reality. You
can’t attach an attribute number-of-cylinders to Stars, although that attribute
4.2. DESIGN PRINCIPLES 141

would make sense for an entity set Automobiles. Whatever relationships are
asserted should make sense given what we know about the part of the real
world being modeled.

E x am p le 4 .12: If we define a relationship Stars-in between Stars and Movies,


it should be a many-many relationship. The reason is th at an observation of the
real world tells us that stars can appear in more than one movie, and movies
can have more than one star. It is incorrect to declare the relationship Stars-in
to be many-one in either direction or to be one-one. □

E x am p le 4 .1 3 : On the other hand, sometimes it is less obvious what the


real world requires us to do in our E /R design. Consider, for instance, entity
sets Courses and Instructors, with a relationship Teaches between them. Is
Teaches many-one from Courses to Instructors? The answer lies in the policy
and intentions of the organization creating the database. It is possible that
the school has a policy th at there can be only one instructor for any course.
Even if several instructors may “team-teach” a course, the school may require
th at exactly one of them be listed in the database as the instructor responsible
for the course. In either of these cases, we would make Teaches a many-one
relationship from Courses to Instructors.
Alternatively, the school may use teams of instructors regularly and wish
its database to allow several instructors to be associated with a course. Or,
the intent of the Teaches relationship may not be to reflect the current teacher
of a course, but rather those who have ever taught the course, or those who
are capable of teaching the course; we cannot tell simply from the name of the
relationship. In either of these cases, it would be proper to make Teaches be
many-many. □

4.2.2 Avoiding Redundancy


We should be careful to say everything once only. The problems we discussed
in Section 3.3 regarding redundancy and anomalies are typical of problems that
can arise in E /R designs. However, in the E /R model, there are several new
mechanisms whereby redundancy and other anomalies can arise.
For instance, we have used a relationship Owns between movies and studios.
We might also choose to have an attribute studioName of entity set Movies.
While there is nothing illegal about doing so, it is dangerous for several reasons.

1. Doing so leads to repetition of a fact, with the result that extra space
is required to represent the data, once we convert the E /R design to a
relational (or other type of) concrete implementation.

2. There is an update-anomaly potential, since we might change the rela­


tionship but not the attribute, or vice-versa.

We shall say more about avoiding anomalies in Sections 4.2.4 and 4.2.5.
142 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

4.2.3 Simplicity Counts


Avoid introducing more elements into your design than is absolutely necessary.

E xam ple 4.14: Suppose that instead of a relationship between Movies and
Studios we postulated the existence of “movie-holdings,” the ownership of a
single movie. We might then create another entity set Holdings. A one-one
relationship Represents could be established between each movie and the unique
holding that represents the movie. A many-one relationship from Holdings to
Studios completes the picture shown in Fig. 4.11.

Figure 4.11: A poor design with an unnecessary entity set

Technically, the structure of Fig. 4.11 truly represents the real world, since
it is possible to go from a movie to its unique owning studio via Holdings.
However, Holdings serves no useful purpose, and we axe better off without it.
It makes programs that use the movie-studio relationship more complicated,
wastes space, and encourages errors. □

4.2.4 Choosing the Right Relationships


Entity sets can be connected in various ways by relationships. However, adding
to our design every possible relationship is not often a good idea. Doing so
can lead to redundancy, update anomalies, and deletion anomalies, where the
connected pairs or sets of entities for one relationship can be deduced from
one or more other relationships. We shall illustrate the problem and what
to do about it with two examples. In the first example, several relationships
could represent the same information; in the second, one relationship could be
deduced from several others.

E xam ple 4.15: Let us review Fig. 4.7, where we connected movies, stars,
and studios with a three-way relationship Contracts. We omitted from that
figure the two binary relationships Stars-in and Owns from Fig. 4.2. Do we
also need these relationships, between Movies and Stars, and between Movies
and Studios, respectively? The answer is: “we don’t know; it depends on our
assumptions regarding the three relationships in question.”
It might be possible to deduce the relationship Stars-in from Contracts. If
a star can appear in a movie only if there is a contract involving that star, that
movie, and the owning studio for the movie, then there truly is no need for
relationship Stars-in. We could figure out all the star-movie pairs by looking
at the star-movie-studio triples in the relationship set for Contracts and taking
only the star and movie components, i.e., projecting Contracts onto Stars-in.
4.2. DESIGN PRINCIPLES 143

However, if a star can work on a movie without there being a contract — or


what is more likely, without there being a contract th at we know about in our
database — then there could be star-movie pairs in Stars-in th at axe not part
of star-movie-studio triples in Contracts. In that case, we need to retain the
Stars-in relationship.
A similar observation applies to relationship Owns. If for every movie, there
is at least one contract involving that movie, its owning studio, and some star for
th at movie, then we can dispense with Owns. However, if there is the possibility
th at a studio owns a movie, yet has no stars under contract for that movie, or
no such contract is known to our database, then we must retain Owns.
In summary, we cannot tell you whether a given relationship will be redun­
dant. You must find out from those who wish the database implemented what
to expect. Only then can you make a rational decision about whether or not to
include relationships such as Stars-in or Owns. □

E x am p le 4 .1 6 : Now, consider Fig. 4.2 again. In this diagram, there is no


relationship between stars and studios. Yet we can use the two relationships
Stars-in and Owns to build a connection by the process of composing those
two relationships. That is, a star is connected to some movies by Stars-in, and
those movies are connected to studios by Owns. Thus, we could say th at a star
is connected to the studios that own movies in which the star has appeared.
Would it make sense to have a relationship Works-for, as suggested in
Fig. 4.12, between Stars and Studios too? Again, we cannot tell without know­
ing more. First, what would the meaning of this relationship be? If it is to
mean “the star appeared in at least one movie of this studio,” then probably
there is no good reason to include it in the diagram. We could deduce this
information from Stars-in and Owns instead.

Figure 4.12: Adding a relationship between Stars and Studios

However, perhaps we have other information about stars working for stu­
dios that is not implied by the connection through a movie. In that case, a
144 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

relationship connecting stars directly to studios might be useful and would not
be redundant. Alternatively, we might use a relationship between stars and
studios to mean something entirely different. For example, it might represent
the fact that the star is under contract to the studio, in a manner unrelated
to any movie. As we suggested in Example 4.7, it is possible for a star to be
under contract to one studio and yet work on a movie owned by another stu­
dio. In this case, the information found in the new Works-for relation would
be independent of the Stars-in and Owns relationships, and would surely be
nonredundant. □

4.2.5 Picking the Right Kind of Element


Sometimes we have options regarding the type of design element used to repre­
sent a real-world concept. Many of these choices are between using attributes
and using entity set/relationship combinations. In general, an attribute is sim­
pler to implement than either an entity set or a relationship. However, making
everything an attribute will usually get us into trouble.

E x am p le 4.17: Let us consider a specific problem. In Fig. 4.2, were we wise


to make studios an entity set? Should we instead have made the name and
address of the studio be attributes of movies and eliminated the Studio entity
set? One problem with doing so is that we repeat the address of the studio for
each movie. We can also have an update anomaly if we change the address for
one movie but not another with the same studio, and we can have a deletion
anomaly if we delete the last movie owned by a given studio.
On the other hand, if we did not record addresses of studios, then there
is no harm in making the studio name an attribute of movies. We have no
anomalies in this case. Saying the name of a studio for each movie is not true
redundancy, since we must represent the owner of each movie somehow, and
saying the name of the studio is a reasonable way to do so. □

We can abstract what we have observed in Example 4.17 to give the con­
ditions under which we prefer to use an attribute instead of an entity set.
Suppose E is an entity set. Here are conditions that E must obey in order for
us to replace E by an attribute or attributes of several other entity sets.

1. All relationships in which E is involved must have arrows entering E.


That is, E must be the “one” in many-one relationships, or its general­
ization for the case of multiway relationships.
2. If E has more than one attribute, then no attribute depends on the other
attributes, the way address depends on name for Studios. That is, the
only key for E is all its attributes.
3. No relationship involves E more than once.

If these conditions are met, then we can replace entity set E as follows:
4.2. DESIGN PRINCIPLES 145

a) If there is a many-one relationship R from some entity set F to E , then re­


move R and make the attributes of E be attributes of F, suitably renamed
if they conflict with attribute names for F. In effect, each F-entity takes,
as attributes, the name of the unique, related identity.2 For instance,
Movies entities could take their studio name as an attribute, should we
dispense with studio addresses.
b) If there is a multiway relationship R with an arrow to E , make the at­
tributes of E be attributes of R and delete the arc from R to E. An
example of this transformation is replacing Fig. 4.8, where there is an
entity set Salaries with a number as its lone attribute, by its original
diagram in Fig. 4.7.
E x am p le 4 .1 8 : Let us consider a point where there is a tradeoff between using
a multiway relationship and using a connecting entity set with several binary
relationships. We saw a four-way relationship Contracts among a star, a movie,
and two studios in Fig. 4.6. In Fig. 4.9, we mechanically converted it to an
entity set Contracts. Does it m atter which we choose?
As the problem was stated, either is appropriate. However, should we change
the problem just slightly, then we are almost forced to choose a connecting entity
set. Let us suppose that contracts involve one star, one movie, but any set of
studios. This situation is more complex than the one in Fig. 4.6, where we
had two studios playing two roles. In this case, we can have any number of
studios involved, perhaps one to do production, one for special effects, one for
distribution, and so on. Thus, we cannot assign roles for studios.
It appears that a relationship set for the relationship Contracts must contain
triples of the form (star, movie, set-of-studios), and the relationship Contracts
itself involves not only the usual Stars and Movies entity sets, but a new entity
set whose entities are sets of studios. While this approach is possible, it seems
unnatural to think of sets of studios as basic entities, and we do not recommend
it.
A better approach is to think of contracts as an entity set. As in Fig. 4.9,
a contract entity connects a star, a movie and a set of studios, but now there
must be no limit on the number of studios. Thus, the relationship between
contracts and studios is many-many, rather than many-one as it would be if
contracts were a true “connecting” entity set. Figure 4.13 sketches the E /R
diagram. Note that a contract is related to a single star and to a single movie,
but to any number of studios. □

4.2.6 Exercises for Section 4.2


E x ercise 4 .2 .1 : In Fig. 4.14 is an E /R diagram for a bank database involv­
ing customers and accounts. Since customers may have several accounts, and
2 In a s itu a tio n w here a n F -e n tity is n o t re la te d to an y i?-en tity , th e new a ttr ib u te s of F
w ould b e given special “n u ll” values to in d ic a te th e absence o f a re la te d .E -entity. A sim ila r
a rra n g e m e n t w ould be u sed for th e new a ttr ib u te s of i t in case (b).
146 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

\S ta r -o f y K M ovie—o f y

Stars Contracts M ovies

S tu d io s-o i

Studios

Figure 4.13: Contracts connecting a star, a movie, and a set of studios

accounts may be held jointly by several customers, we associate with each cus­
tomer an “account set,” and accounts are members of one or more account sets.
Assuming the meaning of the various relationships and attributes are as ex­
pected given their names, criticize the design. What design rules are violated?
Why? What modifications would you suggest?

Figure 4.14: A poor design for a bank database

E xercise 4.2.2: Under what circumstances (regarding the unseen attributes


of Studios and Presidents) would you recommend combining the two entity sets
and relationship in Fig. 4.3 into a single entity set and attributes?
E xercise 4.2 .3 : Suppose we delete the attribute address from Studios in
Fig. 4.7. Show how we could then replace an entity set by an attribute. Where
4.2. DESIGN PRINCIPLES 147

would th at attribute appear?

E x ercise 4 .2 .4 : Give choices of attributes for the following entity sets in


Fig. 4.13 that will allow the entity set to be replaced by an attribute:

a) Stars.

b) Movies.

! c) Studios.

!! E x ercise 4 .2 .5 : In this and following exercises we shall consider two design


options in the E /R model for describing births. At a birth, there is one baby
(twins would be represented by two births), one mother, any number of nurses,
and any number of doctors. Suppose, therefore, that we have entity sets Babies,
Mothers, Nurses, and Doctors. Suppose we also use a relationship Births, which
connects these four entity sets, as suggested in Fig. 4.15. Note that a tuple of
the relationship set for Births has the form (baby, mother, nurse, doctor). If
there is more than one nurse and/or doctor attending a birth, then there will
be several tuples with the same baby and mother, one for each combination of
nurse and doctor.

Figure 4.15: Representing births by a multiway relationship

There are certain assumptions th at we might wish to incorporate into our


design. For each, tell how to add arrows or other elements to the E /R diagram
in order to express the assumption.

a) For every baby, there is a unique mother.

b) For every combination of a baby, nurse, and doctor, there is a unique


mother.

c) For every combination of a baby and a mother there is a unique doctor.

! E x ercise 4.2 .6 : Another approach to the problem of Exercise 4.2.5 is to con­


nect the four entity sets Babies, Mothers, Nurses, and Doctors by an entity set
148 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Figure 4.16: Representing births by an entity set

Births, with four relationships, one between Births and each of the other entity
sets, as suggested in Fig. 4.16. Use arrows (indicating that certain of these
relationships are many-one) to represent the following conditions:
a) Every baby is the result of a unique birth, and every birth is of a unique
baby.
b) In addition to (a), every baby has a unique mother.
c) In addition to (a) and (b), for every birth there is a unique doctor.
In each case, what design flaws do you see?
!! E xercise 4 .2 .7 : Suppose we change our viewpoint to allow a birth to involve
more than one baby born to one mother. How would you represent the fact
th at every baby still has a unique mother using the approaches of Exercises
4.2.5 and 4.2.6?

4.3 Constraints in the E /R M odel


The E /R model has several ways to express the common kinds of constraints
on the data that will populate the database being designed. Like the relational
model, there is a way to express the idea that an attribute or attributes are a key
for an entity set. We have already seen how an arrow connecting a relationship
to an entity set serves as a “functional dependency.” There is also a way to
express a referential-integrity constraint, where an entity in one set is required
to have an entity in another set to which it is related.

4.3.1 Keys in the E /R Model


A key for an entity set E is a set K of one or more attributes such that, given
any two distinct entities ei and e2 in E , e\ and e2 cannot have identical values
for each of the attributes in the key K . If K consists of more than one attribute,
then it is possible for e\ and e2 to agree in some of these attributes, but never
in all attributes. Some important points to remember are:
4.3. C O N STRAINTS IN THE E /R MODEL 149

• Every entity set must have a key, although in some cases — isa-hierarchies
and “weak” entity sets (see Section 4.4), the key actually belongs to an­
other entity set.

• There can be more than one possible key for an entity set. However, it
is customary to pick one key as the “primary key,” and to act as if that
were the only key.

• When an entity set is involved in an isa-hierarchy, we require th at the root


entity set have all the attributes needed for a key, and th at the key for
each entity is found from its component in the root entity set, regardless
of how many entity sets in the hierarchy have components for the entity.

In our running movies example, we have used title and year as the key for
Movies, counting on the observation th at it is unlikely th at two movies with
the same title would be released in one year. We also decided that it was safe
to use name as a key for MovieStar, believing th at no real star would ever want
to use the name of another star.

4.3.2 Representing Keys in the E /R M odel


In our E/R-diagram notation, we underline the attributes belonging to a key for
an entity set. For example, Fig. 4.17 reproduces our E /R diagram for movies,
stars, and studios from Fig. 4.2, but with key attributes underlined. Attribute
name is the key for Stars. Likewise, Studios has a key consisting of only its own
attribute name.

Figure 4.17: E /R diagram; keys are indicated by underlines


150 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

The attributes title and year together form the key for Movies. Note that
when several attributes are underlined, as in Fig. 4.17, then they are each
members of the key. There is no notation for representing the situation where
there are several keys for an entity set; we underline only the primary key. You
should also be aware that in some unusual situations, the attributes forming
the key for an entity set do not all belong to the entity set itself. We shall defer
this matter, called “weak entity sets,” until Section 4.4.

4.3.3 Referential Integrity


Recall our discussion of referential-integrity constraints in Section 2.5.2. These
constraints say that a value appearing in one context must also appear in
another. For example, let us consider the many-one relationship Owns from
Movies to Studios in Fig. 4.2. The many-one requirement simply says that no
movie can be owned by more than one studio. It does not say that a movie
must surely be owned by a studio, or that the owning studio must be present
in the Studios entity set, as stored in our database. An appropriate referential
integrity constraint on relationship Owns is that for each movie, the owning
studio (the entity “referenced” by the relationship for this movie) must exist in
our database.
The arrow notation in E /R diagrams is able to indicate whether a rela­
tionship is expected to support referential integrity in one or more directions.
Suppose R is a relationship from entity set E to entity set F. A rounded arrow­
head pointing to F indicates not only that the relationship is many-one from E
to F, but that the entity of set F related to a given entity of set E is required
to exist. The same idea applies when R is a relationship among more than two
entity sets.

E x am p le 4.19: Figure 4.18 shows some appropriate referential integrity con­


straints among the entity sets Movies, Studios, and Presidents. These entity sets
and relationships were first introduced in Figs. 4.2 and 4.3. We see a rounded
arrow entering Studios from relationship Owns. That arrow expresses the refer­
ential integrity constraint that every movie must be owned by one studio, and
this studio is present in the Studios entity set.

M ovies

Figure 4.18: E /R diagram showing referential integrity constraints

Similarly, we see a rounded arrow entering Studios from Runs. That arrow
expresses the referential integrity constraint that every president runs a studio
that exists in the Studios entity set.
Note that the arrow to Presidents from Runs remains a pointed arrow. That
choice reflects a reasonable assumption about the relationship between studios
4.3. C O N STRAIN TS IN THE E /R MODEL 151

and their presidents. If a studio ceases to exist, its president can no longer be
called a president, so we would expect the president of the studio to be deleted
from the entity set Presidents. Hence there is a rounded arrow to Studios. On
the other hand, if a president were fired or resigned, the studio would continue
to exist. Thus, we place an ordinary, pointed arrow to Presidents, indicating
th at each studio has at most one president, but might have no president at
some time. □

4.3.4 Degree Constraints


In the E /R model, we can attach a bounding number to the edges that connect
a relationship to an entity set, indicating limits on the number of entities that
can be connected to any one entity of the related entity set. For example, we
could choose to place a constraint on the degree of a relationship, such as that
a movie entity cannot be connected by relationship Stars-in to more than 10
star entities.

Figure 4.19: Representing a constraint on the number of stars per movie

Figure 4.19 shows how we can represent this constraint. As another example,
we can think of the arrow as a synonym for the constraint “< 1,” and we can
think of the rounded arrow of Fig. 4.18 as standing for the constraint “= 1.”

4.3.5 Exercises for Section 4.3


E x ercise 4 .3 .1 : For your E /R diagrams of:
a) Exercise 4.1.1.
b) Exercise 4.1.3.
c) Exercise 4.1.6.
(i) Select and specify keys, and (ii) Indicate appropriate referential integrity
constraints.
E x ercise 4 .3 .2 : We may think of relationships in the E /R model as having
keys, just as entity sets do. Let R be a relationship among the entity sets
E i, E 2 , . . . ,E n . Then a key for R is a set K of attributes chosen from the
attributes of E i ,E 2, . . . , E n such that if (e i,e2). .. ,e n) and ( / i , / 2 , - - - ,fn )
are two different tuples in the relationship set for R, then it is not possible that
these tuples agree in all the attributes of K . Now, suppose n = 2; th at is, R
is a binary relationship. Also, for each i, let K i be a set of attributes that is a
key for entity set Ei. In terms of Ei and E 2, give a smallest possible key for R
under the assumption that:
152 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

a) R is many-many.

b) R is many-one from Ei to E2.


c) R is many-one from E 2 to E \.

d) R is one-one.

!! E xercise 4.3.3: Consider again the problem of Exercise 4.3.2, but with n
allowed to be any number, not just 2. Using only the information about which
arcs from R to the E l’s have arrows, show how to find a smallest possible key
K for R in terms of the K i s.

4.4 Weak Entity Sets


It is possible for an entity set’s key to be composed of attributes, some or all
of which belong to another entity set. Such an entity set is called a weak entity
set.

4.4.1 Causes of Weak Entity Sets


There are two principal reasons we need weak entity sets. First, sometimes
entity sets fall into a hierarchy based on classifications unrelated to the “isa
hierarchy” of Section 4.1.11. If entities of set E are subunits of entities in set
F, then it is possible that the names of .E-entities axe not unique until we take
into account the name of the F-entity to which the E entity is subordinate.
Several examples will illustrate the problem.

E x am p le 4.20: A movie studio might have several film crews. The crews
might be designated by a given studio as crew 1, crew 2, and so on. However,
other studios might use the same designations for crews, so the attribute number
is not a key for crews. Rather, to name a crew uniquely, we need to give
both the name of the studio to which it belongs and the number of the crew.
The situation is suggested by Fig. 4.20. The double-rectangle indicates a weak
entity set, and the double-diamond indicates a many-one relationship that helps
provide the key for the weak entity set. The notation will be explained further
in Section 4.4.3. The key for weak entity set Crews is its own number attribute
and the name attribute of the unique studio to which the crew is related by the
many-one Unit-of relationship. □

E x am p le 4.21: A species is designated by its genus and species names. For


example, humans are of the species Homo sapiens-, Homo is the genus name
and sapiens the species name. In general, a genus consists of several species,
each of which has a name beginning with the genus name and continuing with
the species name. Unfortunately, species names, by themselves, are not unique.
4.4. W E A K E N T IT Y SETS 153

Figure 4.20: A weak entity set for crews, and its connections

Two or more genera may have species with the same species name. Thus, to
designate a species uniquely we need both the species name and the name of the
genus to which the species is related by the Belongs-to relationship, as suggested
in Fig. 4.21. Species is a weak entity set whose key comes partially from its
genus. □

Figure 4.21: Another weak entity set, for species

The second common source of weak entity sets is the connecting entity
sets th at we introduced in Section 4.1.10 as a way to eliminate a multiway
relationship.3 These entity sets often have no attributes of their own. Their
key is formed from the attributes that are the key attributes for the entity sets
they connect.

E x am p le 4 .2 2 : In Fig. 4.22 we see a connecting entity set Contracts that


replaces the ternary relationship Contracts of Example 4.5. Contracts has an
attribute salary, but this attribute does not contribute to the key. Rather, the
key for a contract consists of the name of the studio and the star involved, plus
the title and year of the movie involved. □

4.4.2 Requirements for Weak Entity Sets


We cannot obtain key attributes for a weak entity set indiscriminately. Rather,
if E is a weak entity set then its key consists of:

1. Zero or more of its own attributes, and


3R e m e m b e r t h a t th e re is no p a r tic u la r re q u ire m e n t in th e E / R m o d el t h a t m u ltiw ay re­
latio n sh ip s b e e lim in a te d , a lth o u g h th is re q u ire m e n t ex ists in som e o th e r d a ta b a s e design
m o d els.
154 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Figure 4.22: Connecting entity sets are weak

2. Key attributes from entity sets that are reached by certain many-one
relationships from E to other entity sets. These many-one relationships
are called supporting relationships for E, and the entity sets reached from
E are supporting entity sets.

In order for R, a many-one relationship from E to some entity set F, to be a


supporting relationship for E, the following conditions must be obeyed:

a) R must be a binary, many-one relationship4 from E to F.

b) R must have referential integrity from E to F. That is, for every id­
entity, there must be exactly one existing identity related to it by R. Put
another way, a rounded arrow from R t o F must be justified.

c) The attributes that F supplies for the key of E must be key attributes of
F.

d) However, if F is itself weak, then some or all of the key attributes of F


supplied to E will be key attributes of one or more entity sets G to which
F is connected by a supporting relationship. Recursively, if G is weak,
some key attributes of G will be supplied from elsewhere, and so on.

4R em em b er th a t a one-one relationship is a special case of a m any-one relationship. W hen


we say a relatio n sh ip m u st be m any-one, we always include one-one relatio n sh ip s as well.
4.4. W E A K E N T IT Y SE T S 155

e) If there are several different supporting relationships from E to the same


entity set F, then each relationship is used to supply a copy of the key
attributes of F to help form the key of E. Note th at an entity e from
E may be related to different entities in F through different supporting
relationships from E . Thus, the keys of several different entities from F
may appear in the key values identifying a particular entity e from E.
The intuitive reason why these conditions are needed is as follows. Consider
an entity in a weak entity set, say a crew in Example 4.20. Each crew is unique,
abstractly. In principle we can tell one crew from another, even if they have
the same number but belong to different studios. It is only the data about
crews th at makes it hard to distinguish crews, because the number alone is not
sufficient. The only way we can associate additional information with a crew
is if there is some deterministic process leading to additional values that make
the designation of a crew unique. But the only unique values associated with
an abstract crew entity are:
1. Values of attributes of the Crews entity set, and
2. Values obtained by following a relationship from a crew entity to a unique
entity of some other entity set, where that other entity has a unique
associated value of some kind. That is, the relationship followed must be
many-one to the other entity set F , and the associated value must be part
of a key for F.

4.4.3 Weak Entity Set N otation


We shall adopt the following conventions to indicate th at an entity set is weak
and to declare its key attributes.
1. If an entity set is weak, it will be shown as a rectangle with a double
border. Examples of this convention are Crews in Fig. 4.20 and Contracts
in Fig. 4.22.
2. Its supporting many-one relationships will be shown as diamonds with a
double border. Examples of this convention are Unit-of in Fig. 4.20 and
all three relationships in Fig. 4.22.
3. If an entity set supplies any attributes for its own key, then those at­
tributes will be underlined. An example is in Fig. 4.20, where the number
of a crew participates in its own key, although it is not the complete key
for Crews.
We can summarize these conventions with the following rule:
• Whenever we use an entity set E with a double border, it is weak. The key
for E is whatever attributes of E are underlined plus the key attributes of
those entity sets to which E is connected by many-one relationships with
a double border.
156 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

We should remember that the double-diamond is used only for supporting


relationships. It is possible for there to be many-one relationships from a weak
entity set that are not supporting relationships, and therefore do not get a
double diamond.

E x am p le 4.2 3 : In Fig. 4.22, the relationship Studio-of need not be a support­


ing relationship for Contracts. The reason is that each movie has a unique own­
ing studio, determined by the (not shown) many-one relationship from Movies
to Studios. Thus, if we are told the name of a star and a movie, there is at most
one contract with any studio for the work of that star in that movie. In terms
of our notation, it would be appropriate to use an ordinary single diamond,
rather than the double diamond, for Studio-of in Fig. 4.22. □

4.4.4 Exercises for Section 4.4


E xercise 4 .4 .1 : One way to represent students and the grades they get in
courses is to use entity sets corresponding to students, to courses, and to “en­
rollments.” Enrollment entities form a “connecting” entity set between students
and courses and can be used to represent not only the fact that a student is
taking a certain course, but the grade of the student in the course. Draw an
E /R diagram for this situation, indicating weak entity sets and the keys for the
entity sets. Is the grade part of the key for enrollments?

E xercise 4 .4 .2 : Modify your solution to Exercise 4.4.1 so that we can record


grades of the student for each of several assignments within a course. Again,
indicate weak entity sets and keys.

E xercise 4 .4 .3 : For your E /R diagrams of Exercise 4.2.6(a)-(c), indicate weak


entity sets, supporting relationships, and keys.

E xercise 4.4.4: Draw E /R diagrams for the following situations involving


weak entity sets. In each case indicate keys for entity sets.

a) Entity sets Courses and Departments. A course is given by a unique


department, but its only attribute is its number. Different departments
can offer courses with the same number. Each department has a unique
name.

! b) Entity sets Leagues, Teams, and Players. League names are unique. No
league has two teams with the same name. No team has two players with
the same number. However, there can be players with the same number
on different teams, and there can be teams with the same name in different
leagues.
4.5. FROM E /R DIAGRAM S TO RELATIO N AL DESIGNS 157

4.5 From E /R Diagram s to R elational Designs


To a first approximation, converting an E /E design to a relational database
schema is straightforward:

• Turn each entity set into a relation with the same set of attributes, and

• Replace a relationship by a relation whose attributes are the keys for the
connected entity sets.

While these two rules cover much of the ground, there are also several special
situations that we need to deal with, including:

1. Weak entity sets cannot be translated straightforwardly to relations.

2. “Isa” relationships and subclasses require careful treatment.

3. Sometimes, we do well to combine two relations, especially the relation for


an entity set E and the relation that comes from a many-one relationship
from E to some other entity set.

4.5.1 Prom Entity Sets to Relations


Let us first consider entity sets that are not weak. We shall take up the mod­
ifications needed to accommodate weak entity sets in Section 4.5.4. For each
non-weak entity set, we shall create a relation of the same name and with the
same set of attributes. This relation will not have any indication of the rela­
tionships in which the entity set participates; we’ll handle relationships with
separate relations, as discussed in Section 4.5.2.

E x am p le 4 .2 4 : Consider the three entity sets Movies, Stars and Studios from
Fig. 4.17, which we reproduce here as Fig. 4.23. The attributes for the Movies
entity set are title, year, length, and genre. As a result, this relation Movies
looks just like the relation Movies of Fig. 2.1 with which we began Section 2.2.
Next, consider the entity set Stars from Fig. 4.23. There are two attributes,
name and address. Thus, we would expect the corresponding Stars relation to
have schema Stars (name, address) and for

name address
Carrie Fisher 123 Maple St ., Hollywood
Mark Hamill 456 Oak Rd., Brentwood
Harrison Ford 789 Palm Dr. , Beverly Hills

to be a typical instance. □
158 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

Figure 4.23: E /R diagram for the movie database

4.5.2 From E /R Relationships to Relations


Relationships in the E /R model are also represented by relations. The relation
for a given relationship R has the following attributes:

1. For each entity set involved in relationship R, we take its key attribute
or attributes as part of the schema of the relation for R.

2. If the relationship has attributes, then these are also attributes of relation
R.

If one entity set is involved several times in a relationship, in different roles,


then its key attributes each appear as many times as there are roles. We must
rename the attributes to avoid name duplication. More generally, should the
same attribute name appear twice or more among the attributes of R itself and
the keys of the entity sets involved in relationship R, then we need to rename
to avoid duplication.

E xam ple 4.25: Consider the relationship Owns of Fig. 4.23. This relationship
connects entity sets Movies and Studios. Thus, for the schema of relation Owns
we use the key for Movies, which is title and year, and the key of Studios, which
is name. That is, the schema for relation Owns is:

Owns(title, y e a r, studioName)

A sample instance of this relation is:


4.5. FROM E /R DIAGRAM S TO RELATIO N AL DESIGNS 159

title year studioName


S ta r Wars 1977 Fox
Gone With th e Wind 1939 MGM
Wayne’s World 1992 Paramount
We have chosen the attribute studioName for clarity; it corresponds to the
attribute name of Studios. □

title year starName


S ta r Wars 1977 C a rrie F is h e r
S ta r Wars 1977 Mark Hamill
S ta r Wars 1977 H a rriso n Ford
Gone With th e Wind 1939 V ivien Leigh
Wayne’s World 1992 Dana Carvey
Wayne’s World 1992 Mike Meyers

Figure 4.24: A relation for relationship Stars-In

E x am p le 4 .2 6 : Similarly, the relationship Stars-in of Fig. 4.23 can be trans­


formed into a relation with the attributes t i t l e and y ea r (the key for Movies)
and attribute starName. which is the key for entity set Stars. Figure 4.24 shows
a sample relation S ta r s - in . □

Figure 4.25: The relationship Contracts

E x am p le 4 .2 7 : Multiway relationships are also easy to convert to relations.


Consider the four-way relationship Contracts of Fig. 4.6, reproduced here as
Fig. 4.25, involving a star, a movie, and two studios — the first holding the
star’s contract and the second contracting for that star’s services in that movie.
We represent this relationship by a relation C o n tracts whose schema consists
of the attributes from the keys of the following four entity sets:
160 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

1. The key starName for the star.

2. The key consisting of attributes t i t l e and year for the movie.

3. The key studioO f S ta r indicating the name of the first studio; recall we
assume the studio name is a key for the entity set S tudios.

4. The key producingStudio indicating the name of the studio that will
produce the movie using that star.

That is, the relation schema is:

C ontracts(starN am e, t i t l e , y e a r, stu d io O fS ta r, producingStudio)

Notice that we have been inventive in choosing attribute names for our relation
schema, avoiding “name” for any attribute, since it would be unobvious whether
th at referred to a star’s name or studio’s name, and in the latter case, which
studio role. Also, were there attributes attached to entity set Contracts, such
as salary, these attributes would be added to the schema of relation C ontracts.

4.5.3 Combining Relations


Sometimes, the relations that we get from converting entity sets and relation­
ships to relations are not the best possible choice of relations for the given data.
One common situation occurs when there is an entity set E with a many-one
relationship R from E to F. The relations from E and R will each have the
key for E in their relation schema. In addition, the relation for E will have
in its schema the attributes of E that are not in the key, and the relation for
R will have the key attributes of F and any attributes of R itself. Because R
is many-one, all these attributes are functionally determined by the key for E,
and we can combine them into one relation with a schema consisting of:

1. All attributes of E.

2. The key attributes of F.

3. Any attributes belonging to relationship R.

For an entity e of E that is not related to any entity of F, the attributes of


types (2) and (3) will have null values in the tuple for e.

E x am p le 4.28: In our running movie example, Owns is a many-one relation­


ship from Movies to Studios, which we converted to a relation in Example 4.25.
The relation obtained from entity set Movies was discussed in Example 4.24.
We can combine these relations by taking all their attributes and forming one
relation schema. If we do, the relation looks like that in Fig. 4.26. □
4.5. FROM E /R DIAGRAM S TO RELATIO N AL DESIGNS 161

title year length genre studioName


Star Wars 1977 124 sciFi Fox
Gone With the Wind 1939 239 drama MGM
Wayne’s World 1992 95 comedy Paramount

Figure 4.26: Combining relation Movies with relation Owns

Whether or not we choose to combine relations in this manner is a m atter


of judgement. However, there are some advantages to having all the attributes
th at are dependent on the key of entity set E together in one relation, even
if there are a number of many-one relationships from E to other entity sets.
For example, it is often more efficient to answer queries involving attributes
of one relation than to answer queries involving attributes of several relations.
In fact, some design systems based on the E /R model combine these relations
automatically.
On the other hand, one might wonder if it made sense to combine the
relation for E with the relation of a relationship R th at involved E but was not
many-one from E- to some other entity set. Doing so is risky, because it often
leads to redundancy, as the next example shows.

E x am p le 4 .2 9 : To get a sense of what can go wrong, suppose we combined the


relation of Fig. 4.26 with the relation that we get for the many-many relationship
Stars-in; recall this relation was suggested by Fig. 4.24. Then the combined
relation would look like Fig. 3.2, which we reproduce here as Fig. 4.27. As we
discussed in Section 3.3.1, this relation has anomalies that we need to remove
by the process of normalization. □

title year length genre studioN am e starN am e


Star Wars 1977 124 SciFi Fox Carrie Fisher
Star Wars 1977 124 SciFi Fox Mark Hamill
Star Wars 1977 124 SciFi Fox Harrison Ford
Gone With the Wind 1939 231 drama MGM Vivien Leigh
Wayne’s World 1992 95 comedy Paramount Dana Carvey
Wayne’s World 1992 95 comedy Paramount Mike Meyers

Figure 4.27: The relation Movies with star information

4.5.4 Handling Weak Entity Sets


When a weak entity set appears in an E /R diagram, we need to do three things
differently.
162 CHAPTER 4. HIGH-LEVEL DATABASE MODELS

1. The relation for the weak entity set W itself must include not only the
attributes of W but also the key attributes of the supporting entity sets.
The supporting entity sets are easily recognized because they are reached
by supporting (double-diamond) relationships from W.

2. The relation for any relationship in which the weak entity set W appears
must use as a key for W all of its key attributes, including those of other
entity sets that contribute to W ’s key.

3. However, a supporting relationship R, from the weak entity set W to a


supporting entity set, need not be converted to a relation at all. The
justification is that, as discussed in Section 4.5.3, the attributes of many-
one relationship R's relation will either be attributes of the relation for
W , or (in the case of attributes on R) can be added to the schema for
W 's relation.

Of course, when introducing additional attributes to build the key of a weak


entity set, we must be careful not to use the same name twice. If necessary, we
rename some or all of these attributes.

E x am p le 4 .30: Let us consider the weak entity set Crews from Fig. 4.20,
which we reproduce here as Fig. 4.28. From this diagram we get three relations,
whose schemas are:

Studios(name, addr)
Crews(number, studioName, crewChief)
Unit-of(number, studioName, name)

The first relation, Studios, is constructed in a straightforward manner from


the entity set of the same name. The second, Crews, comes from the weak entity
set Crews. The attributes of this relation are the key attributes of Crews and the
one nonkey attribute of Crews, which is crewChief. We have chosen studioName
as the attribute in relation Crews that corresponds to the attribute name in the
entity set Studios.

Figure 4.28: The crews example of a weak entity set

The third relation, U n it-o f, comes from the relationship of the same name.
As always, we represent an E /R relationship in the relational model by a relation
whose schema has the key attributes of the related entity sets. In this case,

You might also like