Functional Dependency
Functional Dependency
Functional Dependencies
7.1 7.2 7.3 7.4 7.5 7.6 7.7 Introduction Proofs and Functional Dependencies Keys and Functional Dependencies Covers Tableaux Exercises Bibliographical Comments
7.1
Introduction
This chapter is centered around functional dependencies, the rst to be introduced and the most important class of integrity constraints. The central issue examined is the possibility of constructing eectively the set of logical consequences of a set of functional dependencies. We need to be aware of every nontrivial functional dependency that follows from the set of dependencies identied in the design process in order to guarantee that minimal data redundancy in the tables of the database and good behavior of these tables with respect to updates.
7.2
Whenever we have a set, F , of functional dependencies, we can ask the question What other functional dependencies necessarily follow from F ? In other words, what other functional dependencies have the property that any table that satises the functional dependencies of F also satisfy these other functional dependencies? 275
276
To make this more precise, let H be a set of attributes. Recall that FD(H ) denotes the set of all functional dependencies that can be written using the attributes of H ; i.e., FD(H ) = {X Y |X, Y H }. Let F FD(H ) be a set of functional dependencies. In section 6.2.2, we introduced the semantic notion (logical consequence) that corresponds to this question. In the current section, we explore a way of determining syntactically which other functional dependencies are satised by every table of the schema S = (H, F ). So, we examine methods for obtaining logical consequences of a set of functional dependencies. These methods are known as inference rules. The rst author to consider this topic was W. W. Armstrong [Arm74]. Although equivalent to the ones we introduce below, his rules dier from ours. Nevertheless, it is common practice to refer to such collections of rules as Armstrong rules. After introducing these rules, we show in section 7.2.3 that they are correct (sound) and that they allow us to nd all functional dependencies that are logical consequences of F (complete). We denote functional dependencies using (the Greek letter phi), with or without subscripts. Denition 7.2.1 An n-ary inference rule is a relation R (FD(H ))n FD(H ). If R is an n-ary rule, then we write 1 , . . . , n
R
to mean ((1 , . . . , n ), ) R. We refer to the pair ((1 , . . . , n ), ) as an instance of the rule R. The functional dependencies 1 , . . . , n are the hypotheses or premises. The functional dependency is the conclusion of this instance of the rule R, and we say that is obtained by applying rule R to 1 , . . . , n . Following established practice in formal logic, we use the phrase hypotheses of a rule of inference rather than hypotheses of an instance of a rule of inference and similarly for the terms premises and conclusion. To be correct, any inference rule R must lead from true hypotheses to a true conclusion. Thus, for a correct rule R, ((1 , . . . , n ), ) R means that from the fact that a table satises the functional dependencies 1 , . . . , n we may conclude that satises the functional dependency . Example 7.2.2 Suppose that a table = (T, H, ) satises the functional dependencies X Y and Y Z . We claim that it also satises the functional dependency X Z . Indeed, let u, v be two tuples of such that u[X ] = v [X ]. Since satises X Y we have u[Y ] = v [Y ];
277
thus, we infer that u[Z ] = v [Z ], which allows us to conclude that satises the functional dependency X Z . This suggests the introduction of the transitivity rule ((X Y, Y Z ), X Z ) for every X, Y, Z . Denition 7.2.3 Let U be a set of attributes. The Armstrong rules of inference are: X Y
Rincl
if Y X,
(Inclusion Rule) X Y , (Augmentation Rule) XZ Y Z Raug X Y, Y Z Rtrans , (Transitivity Rule) X Z for every X, Y, Z U . Although the formal proof of the soundness of these rules is deferred to section 7.2.3, it may help to note the following. The inclusion rule is a formal statement of the fact that for any table = (T, H, ) such that Y X H , satises the trivial functional dependency X Y (see Theorem 6.2.23). The augmentation rule captures the fact that every table that satises a functional dependency X Y also satises the functional dependency XZ Y Z for any set of attributes Z H , as the reader can easily verify. Note that we do not distinguish between functional dependencies like U V W and U W V since V W = W V = V W . Also, we frequently use the fact that Y Y = Y , which is the idempotency of set union written in the common database notation. Using Armstrong rules we can formulate the notion of proof for a functional dependency. Denition 7.2.4 Let F a set of functional dependencies. A sequence (1 , . . . , n ) of functional dependencies is an F -proof if one of the following is true for each i, 1 i n: (i) i F , or (ii) there exist j1 , . . . , jm , each less than i, such that ((j1 , . . . , jm ), i ) is an instance of an Armstrong rule R. In the rst case, we say that i is an initial functional dependency; in the second case, we say that j1 , . . . , jm are used in the application of rule R. The length of the proof (1 , . . . , n ) is n. An F -proof of the functional dependency is a proof whose last entry is .
278
If there exists an F -proof of a functional dependency , we write F and we say that is provable from F . Denition 7.2.5 An F -proof (1 , . . . , n ) is nonredundant if it satises the following conditions: 1. Every step j (where 1 j n 1) is used in the application of a rule. 2. No functional dependency occurs more than once in the proof. Theorem 7.2.6 For every F -proof of functional dependency , there exists a nonredundant proof of . Proof. The argument by strong induction on the length of proofs is straightforward, and we leave it to the reader. Theorem 7.2.6 shows that, whenever needed, we can assume that if X Y is provable from F , the F -proof of X Y is nonredundant. Example 7.2.7 Let F = {A C, CD AE, BE A}. We have the following proof for F AD E : 1. A C initial functional dependency 2. AD CD Raug and (1) 3. CD AE initial functional dependency 4. AD AE Rtrans and (2),(3) 5. AE E Rincl 6. AD E Rtrans and (4),(5). Thus, AD E is provable from F .
7.2.1
The Armstrong rules we introduced are quite spartan; for providing actual proofs, it helps to have additional rules. The ones we introduce below may be thought of as proof macros. They are useful tools for simplifying the presentation of proofs of functional dependencies, but any use of one of these derived rules could be replaced by a suitable series of steps to make an F -proof that does not rely on the derived rule. Denition 7.2.8 An n-ary derived rule of inference is a relation R (FD(H ))n FD(H ) such that if ((1 , . . . , n ), ) R we have {1 , . . . , n } . Example 7.2.9 The additivity rule Radd is dened by X Y, X Y X Y Y
279
for all subsets X, Y, Y of the set of attributes H . Indeed, we have the proof: 1. X Y initial functional dependency, 2. X Y initial functional dependency, 3. X XY applying Raug to (1), 4. XY Y Y applying Raug to (2), 5. X Y Y applying Rtrans to (3) and (4). Note that in step (3) of the proof we augment both sides of the functional dependency X Y by X and then use the fact that XX = X. Example 7.2.10 The projectivity rule Rproj is given by X Y Z X Y for all subsets X, Y, Z 1. X Y Z 2. Y Z Y 3. X Y of H . To verify this derived rule consider the proof: initial functional dependency, applying Rincl , by applying Rtrans to (1) and (2).
The usefulness of derived rules in presenting proofs for functional dependencies can be seen in the following example. Example 7.2.11 Consider the following proof of X W Y Z from the hypotheses X Y Z and Z W : 1. 2. 3. 4. 5. 6. 7. 8. X Y Z Z W Y Z Z X Z X W X XY Z XY Z W Y Z X W Y Z initial functional dependency, initial functional dependency, applying Rincl , applying Rtrans to (1) and (3), applying Rtrans to (4) and (2), applying Raug to (1), applying Raug to (5), applying Rtrans to (6) and (7).
Note that step (4) is obtained by an application of the same steps we used in Example 7.2.10. Therefore, we can replace this derivation with its shorter variant: 1. 2. 3. 4. 5. 6. 7. X Y Z Z W X Z X W X XY Z XY Z W Y Z X W Y Z initial functional dependency, initial functional dependency, applying Rproj to (1), applying Rtrans to (4) and (2), applying Raug to (1), applying Raug to (4), applying Rtrans to (5) and (6).
280
Further, notice that steps (5), (6) and (7) represent the nal part of the proof of the additivity rule. This allows us to generate the still shorter proof: 1. 2. 3. 4. 5. X Y Z Z W X Z X W X W Y Z initial functional dependency, initial functional dependency, applying Rproj to (1), applying Rtrans to (4) and (2), applying Radd to (1) and (4).
Note that the argument presented in this example introduces a new derived rule: X Y Z, Z W X W Y Z We will refer to this rule as the amplication rule, and we will denote it by Rampl . We will use derived rules from now on in the same way as the basic rules Rincl , Raug and Rtrans .
7.2.2
The notion of closure of a set of attributes under a set of functional dependencies F provides us with a syntactic method for deciding whether a functional dependency X Y is provable from F ; that is, if F X Y . Let H be a nite set of attributes, and let F be a set of functional dependencies, F FD(H ). Starting from H, F and X , we compute a set clH,F (X ) such that F X Y if and only if Y clH,F (X ). As Corollary 7.2.21 below shows, the notion of provability of a functional dependency (F X Y ) is equivalent to the semantic notion of logical consequence (F |= X Y ). Hence, the notion of closure provides us with a syntactic device for deciding if the functional dependency, , is a logical consequence of a set of functional dependencies F . This is very useful in the design and analysis of relational databases. Denition 7.2.12 Let H be a nite set of attributes, and let X be a subset of H . If F FD(H ), we denote by DH,F (X ) the collection of sets that contains all sets of attributes Y such that Y H and F X Y . Theorem 7.2.13 Let H be a nite set of attributes, and let F be a set of functional dependencies on H . For every subset X of H , the collection DH,F (X ) contains a unique largest set. Proof. Note that X DH,F (X ), so DH,F (X ) is always nonempty. Suppose that DH,F (X ) = {Y0 , Y1 , . . . , Ym1 } with Y0 = X and m 1. Since
7.2. PROOFS AND FUNCTIONAL DEPENDENCIES F X Yi , by applying the additivity rule we obtain F X Y0 Ym1 ,
281
so W = Y0 . . . Ym1 DH,F (X ). Since every Y DH,F (X ) is included in W , it follows that W is the largest set of DH,F (X ). The previous theorem justies the next denition. Denition 7.2.14 Let H be a nite set of attributes, and let F be a set of functional dependencies on H . If X is a subset of H , the closure of X under the set F of functional dependencies is the largest set of DH,F (X ). We denote this set by clH,F (X ). If the set H is understood from the context, we may write clF (X ) instead of clH,F (X ).1 Corollary 7.2.15 Let H be a nite set of attributes, and let F be a set of functional dependencies on H . For every subset X of H we have F X clF (X ). Proof. This statement follows immediately from Theorem 7.2.13. Theorem 7.2.16 Let H be a nite set of attributes, and let F be a set of functional dependencies on H . For every subset X of H we have F X Y if and only if Y clF (X ). Proof. If Y clF (X ) then, by Corollary 7.2.15, F X clF (X ). An application of the projectivity rule yields F X Y . Conversely, if F X Y , the denition of clH,F (X ) implies Y clH,F (X ). Theorem 7.2.17 Let F be a set of functional dependencies on the set of attributes H . We have 1. X clH,F (X ), 2. X1 X2 implies clH,F (X1 ) clH,F (X2 ), 3. clH,F (clH,F (X )) = clH,F (X ), for every X, X1 , X2 H . Proof. From the proof of Theorem 7.2.13, the rst inclusion follows immediately. Next, observe that if X1 X2 , then we have F X2 X1 . By Corollary 7.2.15, we have F X1 clH,F (X1 ). Therefore, by the transitivity rule, we obtain F X2 clH,F (X1 ). This implies clH,F (X1 ) clH,F (X2 ). Finally, note that by the rst property we have clH,F (clH,F (X )) clH,F (X ). To prove the reverse inclusion, note that F X clH,F (X ) and F clH,F (X ) clH,F (clH,F (X )), by
1 We prefer this notation for the closure of a set of attributes under a set F of functional dependencies to the more popular notations X +F or X + , because it is clearly distinct from F + , the set of logical consequences of F , and avoids confusing the reader.
282
Corollary 7.2.15. An application of the transitivity rule gives F X clH,F (clH,F (X )), and this implies clH,F (clH,F (X )) clH,F (X ).
7.2.3
In this section we show the equivalence of |= and . Thus, we prove that { | F |= } = { | F } for every set of functional dependencies F . In other words, we show that the functional dependencies that are logical consequences of F are precisely those that are provable from F . We do this by proving that the existence of an F -proof of a functional dependency X Y guarantees that X Y is a logical consequence of F (the soundness of Armstrong rules) and that every functional dependency that is a logical consequence of F has an F -proof (the completeness of Armstrong rules). Soundness means that using the Armstrong rules we can generate only logical consequences, and completeness means that we can generate proofs for all such logical consequences. Theorem 7.2.18 (Soundness Theorem) If F X Y , then F |= X Y . Proof. The argument is by induction on the length n of the proof of X Y in F . If n = 1, we have either X Y F or Y X . In either case, it is clear that F |= X Y . Suppose that the statement holds for each proof of length less than n and that (1 , . . . , n ) is an F -proof of X Y . Then, n = X Y must fall into one of the following cases: 1. If X Y belongs to F , then, as in the base case, F |= X Y . 2. If n = X Y is obtained from two predecessors j = X W and i = W Y (where i, j < n) by applying the transitivity rule, then, by the inductive hypothesis, F |= X W and F |= W Y . Let = (T, H, ) SAT (F ), and let u, v be two tuples of such that u[X ] = v [X ]. Since F |= X W , we have u[W ] = v [W ]. In turn, since F |= W Y we obtain u[Y ] = v [Y ], so satises X Y . Thus, F |= X Y . 3. If X Y is obtained from a previous functional dependency X Y by applying the augmentation rule, then there exists a set of attributes Z such that X = X Z and Y = Y Z . By the inductive hypothesis, F |= X Y . Now, if u, v and u[X Z ] = v [X Z ] we have u[X ] = v [X ] and u[Z ] = v [Z ]. The rst equality implies u[Y ] = v [Y ] because F |= X Y , so u[Y ] = u[Y Z ] = v [Y Z ] = v [Y ]. This shows that F |= X Y .
283
4. If X Y is obtained by applying Rincl , then obviously F |= X Y. To prove that F |= X Y implies F X Y we need a preliminary result. Lemma 7.2.19 Let H be a nite set of attributes, and let F be a set of functional dependencies, F FD(H ). For every nonempty set of attributes X , X H , there exists a table H,F,X = (TH,F,X , H, ) such that consists of two tuples that coincide on X , and satises all functional dependencies of F . Proof. Let H = A1 . . . An . Recall that |Dom(Ai )| 2, and let ai , bi be two distinct values in Dom(Ai ) for 1 i n. Dene the tuple u by u[Ai ] = ai for 1 i n and the tuple v by v [Ai ] = ai bi if Ai clF (X ) otherwise.
Without loss of generality assume that clF (X ) = A1 . . . Ak . We prove that the table H,F,X given by TH,F,X clF (X ) H clF (X ) A1 Ak Ak+1 An u a1 ak ak+1 an v a1 ak bk+1 bn satises all functional dependencies of F . Suppose that Y Z is a functional dependency of F that H,F,X violates. Then, we have u[Y ] = v [Y ] and u[Z ] = v [Z ]. By the construction of H,F,X , this implies Y Z clF (X ) clF (X ) (7.1) (7.2)
By Theorem 7.2.17, inclusion (7.1) implies clF (Y ) clF (clF (X )), and thus, by part 3 of the same theorem, clF (Y ) clF (X ). Now, since Y Z F , we have Z clF (Y ) clF (X ), which contradicts (7.2). Thus H,F,X cannot violate any functional dependency of F . We refer to H,F,X as the Armstrong table on X . Theorem 7.2.20 (Completeness Theorem) Let H be a nite set of attributes, and let F a set of functional dependencies, F FD(H ). If F |= X Y , then F X Y .
284
Proof. Suppose that X W is a logical consequence of F , but X W is not provable from F . Then, W clF (X ). Let H,F,X be the Armstrong table on X . By Lemma 7.2.19, H,F,X satises all functional dependencies of F , and therefore, it satises X W . Since u[X ] = v [X ] and u[W ] = v [W ], we have a contradiction. Therefore, X W must be provable from F. Corollary 7.2.21 Let H be a nite set of attributes, and let F a set of functional dependencies, F FD(H ). F |= X Y if and only if F X Y . Proof. This follows immediately from Theorems 7.2.18 and 7.2.20. We present an application of the notions discussed in this section that is useful in decomposing database schemas. Theorem 7.2.22 Let S = (H, F ) be a table schema, and let U, V H , be two sets of attributes such that U V = H . Then, = [U ] 1 [V ] for every table = (T, H, ) of the schema S if and only if at least one of the functional dependencies U V U or U V V belongs to F + . Proof. Suppose that we have = [U ] 1 [V ] for every table = (T, H, ) of the table schema S and that neither U V U nor U V V belongs to F + . Choose to be an Armstrong table H,F,U V . Our assumption implies that U clF (U V ) and V clF (U V ). Therefore, H,F,U V violates both U V U and U V V . This means that H,F,U V has the form: U clF ((U A1 a1 b1 V )) Ap ap bp TH,F,U V clF (U V ) Ap+1 Aq ap+1 aq ap+1 aq V clF ((U V )) Aq+1 An aq+1 an bq+1 bn
Accordingly, we have the projections: A1 a1 b1 and TH,F,U V [V ] Ap+1 Aq Aq+1 ap+1 aq aq+1 ap+1 aq bq+1 The join TH,F,U V [U ] 1 TH,F,U V [V ] is An an bn TH,F,U V [U ] Ap Ap+1 ap ap+1 bp ap+1 Aq aq aq
285
TH,F,U V U clF ((U V )) clF (U V ) V clF ((U V )) A1 Ap Ap+1 Aq Aq+1 An a1 ap ap+1 aq aq+1 an a1 ap ap+1 aq bq+1 bn b1 bp ap+1 aq bq+1 bn a1 ap ap+1 aq bq+1 bn and so = [U ] 1 [V ]. Conversely, assume that one of U V U or U V V belongs to F + , say U V U . Let = (T, H, ) be a table of the schema S; since satises all functional dependencies of F , it also satises U V U . If r [U ] 1 [V ], then there exist r [U ] and r [V ] such that r and r are joinable and r 1 r = r. In turn, this implies the existence of the tuples s , s such that r = s [U ] and r = s [V ]. The joinability of r and r implies s [U V ] = r [U V ] = r [U V ] = s [U V ] and, since satises the functional dependency U V U we also obtain s [U ] = s [U ]. Since r = r 1 r we have r[U ] = r and r[V ] = r . We claim that r = s . Indeed, we have r[U ] = r = s [U ] = s [U ] and r[V ] = r = s [V ]. Since U V = H , r and s coincide on all attributes of H , so r = s . This proves that [U ] 1 [V ] , so [U ] 1 [V ] = . Corollary 7.2.23 If S = (H, F ) and X Y F + , then for every table = (T, H, ) of this schema, we have = [XY ] 1 [XZ ], where Z = H XY .
7.2.4
Closure Computation
It is helpful to be able to calculate clF (X ) to be able to compute F + ; this is essential for determining whether relational schemas satisfy certain conditions known as normal forms (see Section 8.2). Let H be a set of attributes, F be a set of functional dependencies, F FD(H ), and X be a subset of H . The following algorithm computes the closure clF (X ). Algorithm 7.2.24 Algorithm for Computing clF (X ) Input: A nite set H of attributes, a set F of functional dependencies over H , and a subset X of H . Output: The closure clF (X ) of the set X . Method: Construct an increasing sequence CSF (X ) of subsets of H : X0 Xk
286 dened by
= =
X Xk
{Z |Y Z F and Y Xk }
If Xk+1 = Xk , then stop; we have clF (X ) = Xk . Otherwise, continue with the next value of k . We refer to CSF (X ) as the F -closure sequence of X . Let X, X be two subsets of H with CSF (X ) = (X0 , . . . , Xn ) and CSF (X ) = (X0 , . . . , Xm ). We write CSF (X ) CSF (X ) if for every i, 1 i n, there exists ji such that 1 ji m and Xi Xj . i Note that X X implies CSF (X ) CSF (X ). Also, CSF (Xi ) is a sux of the sequence CSF (X ) for every Xi in CSF (X ). Therefore, if CSF (X ) = (X0 , . . . , Xk ), then CSF (Xk ) = (Xk ). Proof of Correctness: Note that the algorithm does indeed terminate, i.e. Xn = Xn+1 for some n N, because the members of the sequence are all subsets of the nite set H . To prove that the algorithm correctly computes clF (X ), suppose that there exists a proof F X Y of length n. We prove, by strong induction on n 1, that Y Xk , where CSF (X ) = (X0 , . . . , Xk ). If n = 1, Y X = X0 Xk , so the basis case is obviously true. Suppose that this holds for proofs of length less than n, and let 1 , . . . , n be a proof of length n, where n = X Y . We consider three cases: 1. If n was produced by the inclusion rule, we have Y X = X0 Xk . 2. Suppose that n was generated from p (where p < n) by applying the augmentation rule. In this case, p = U V , and X = U Z , Y = V Z for some subset Z of H . By the inductive hypothesis, V Uh , where CSF (U ) = (U0 , U1 , . . . , Uh ). Since CSF (U ) CSF (X ), we have Uh Xk , so V Xk ; thus, Y = V Z Xk because Z X Xk . 3. If n was obtained from p , q by transitivity, there exists a subset S of H such that p = X S and q = S Y . By the inductive hypothesis, S Xk , and Y Sm , where CSF (X ) = (X0 , . . . , Xk ) and CSF (S ) = (S0 , . . . , Sm ). Since CSF (S ) CSF (Xk ), and since CSF (Xk ) = (Xk ), we have Sm Xk . In turn, this implies Y Xk . This proves that Y Xk for every Y such that F X Y , so clF (X ) Xk . The reverse inclusion can be immediately obtained by showing by induction on i that F X Xi for every Xi in CSF (X ). This shows that Xi clF (X ) for every Xi . In particular, Xk clF (X ).
287
Example 7.2.25 Let H = ABCDE , and let F be the set of functional dependencies F = {AB C, CD E, AE B }. Suppose that we wish to compute clF (AE ). We build the sequence X0 X1 X2 X3 = = = = AE AEB AEBC AEBC
The algorithm stops when we detect that X2 = X3 . So, the closure of AE is AEBC . A similar computation shows that the closure of AD is AD and the closure of AED is ABCDE .
7.3
In Denition 2.1.12, we introduced a key of a table = (T, H, ) as a set of attributes K H that satises two conditions: 1. If u[K ] = v [K ], then u = v for all tuples u, v (unique identication property). 2. There is no proper subset L of K that has the unique identication property (minimality property). The rst condition requires the table to satisfy the functional dependency K H ; the second requires K to contain no proper subset L such that would satisfy L H . Now, we formulate this notion in the context of table schemas. Denition 7.3.1 Let S = (H, F ) be a table schema with functional dependencies. A key of the schema S is a set K that satises the following conditions: 1. K H F + (unique identication property). 2. There is no proper subset L of K such that L H F + (minimality property). Using Theorem 7.2.16, we obtain the following, which can serve as an alternate characterization for keys. Theorem 7.3.2 A set of attributes K is a key for a table schema with functional dependencies S = (H, F ) if and only if clF (K ) = H , and for every attribute A of K , clF (K {A}) H .
288
Proof. The argument is straightforward and is left to the reader. Example 7.3.3 Let S = (ABCDE, F ) be a table schema with functional dependencies, where F = {AB C, D C, AE BD}. We show how to determine the keys of this schema using F -closure sequences. Note that there is no functional dependency in F that has either A or E in its right member. Assume that X is a key of this schema; then, A X . If it were not, no set Xk in CSF (X ) will contain A. Similarly, E must be in X . Therefore, any key of this schema must contain A and E . The F -closure sequence of AE is:
X0 X1 X2 X3
The rst condition of Theorem 7.3.2 is clearly satised. To verify the second condition, note that clF (A) = A and clF (E ) = E . Therefore, AE is a key. Moreover, since every key must contain AE it follows that AE is the only key of this schema. In general a table schema can have more than one key; in fact, it is possible to nd table schemas that have a number of keys that is exponential in the number of attributes. Example 7.3.4 Consider the table schema S = (A1 An B1 Bn , F ), where F = {A1 B1 , . . . , An Bn , B1 A1 , . . . , Bn An } Note that each set K of n attributes, K = C1 . . . Cn , where Ci {Ai , Bi } for 1 i n, is a key for S. Since there are 2n such sets, the number of keys of this schema grows exponentially with the number of attributes. Denition 7.3.5 Each attribute A of a key of a table schema with functional dependencies S = (H, F ) is referred to as a prime attribute. The notion of prime attribute is important in dening normal forms of table schemas. Example 7.3.6 The prime attributes of the schema considered in Example 7.3.3 are A and E , since AE is the single key of this schema. On the other hand, each attribute of the schema considered in Example 7.3.4 is prime.
7.4. COVERS Example 7.3.7 Consider the schema S = (stno cno empno sem year grade, F ),
289
where the set F consists of the functional dependencies cno sem year empno stno cno sem year grade The table GRADES of the college database belongs to SAT (S). It is easy to see that the single key of this schema is stno cno sem year. So, the prime attributes of S are stno, cno, sem, year.
7.4
Covers
Restricting and standardizing functional dependencies makes them easier to manipulate and compare. Denition 7.4.1 Let F, G be two sets of functional dependencies, F, G FD(H ). F and G are equivalent if F + = G+ . In this case, we call F a cover for G, and G a cover for F .2 If F, G are equivalent sets of functional dependencies we write F G. Theorem 7.4.2 Let F, G be two sets of functional dependencies, F, G FD(H ). The following three statements are equivalent: (i) F G+ ; (ii) F + G+ ; (iii) clF (X ) clG (X ) for every subset X of H . Proof. (i) implies (ii). Assume F G+ . The rst part of Theorem 6.2.21 gives F + (G+ )+ . The second part of that theorem gives (G+ )+ = G+ , whence F + G+ . (ii) implies (iii). Suppose that (ii) holds. Since X clF (X ) F + we have X clF (X ) G+ so clF (X ) clG (X ) by the maximality of clG (X ). (iii) implies (i). If (iii) holds and X Y F , from Y clF (X ) clG (X ) it follows that X Y G+ . Therefore, (i) holds. The next corollary gives us a useful instrument for proving equivalence of functional dependencies. Corollary 7.4.3 Let F, G be two sets of functional dependencies, F, G FD(H ). The following three statements are equivalent:
2 The choice of the term cover is regretable because the usual English semantics of this word implies an asymmetry. Nevertheless, we use it here to adhere to standard terminology.
290
1. F G+ and G F + ; 2. F, G are equivalent sets of functional dependencies; 3. clF (X ) = clG (X ) for every subset X of H . Proof. The Corollary is an immediate consequence of Theorem 7.4.2. Denition 7.4.4 A unit functional dependency is a functional dependency whose right member consists of a single attribute. Unit functional dependencies in FD(H ) are, of course, of the form X A, where X is a subset of H and A is a member of H . Theorem 7.4.5 For every set F of functional dependencies, F FD(H ) there exists an equivalent set G FD(H ) such that all dependencies of G are unit functional dependencies. Proof. Dene G as G = {X A|X Y F and A Y }. The projectivity rule implies that X A F + for every X A G}. On the other hand, if X Y F and Y = A1 . . . Am , then X A1 , . . . , X Am G, and the additivity rule implies that X Y G+ . Therefore, Corollary 7.4.3 implies the equivalence of F and G. Denition 7.4.6 A set F of functional dependencies is nonredundant if there is no proper subset G of F such that G F . Otherwise, F is a redundant set of functional dependencies. Clearly, a set F is nonredundant if for every X Y F , (F {X Y })+ F + . Also, any subset of a nonredundant set of functional dependencies is nonredundant. Given a set F of functional dependencies, it is possible that more than one nonredundant cover for F can be found. For instance, the set of unit functional dependencies: F = {A B, B A, B C, C B, A C, C A} is clearly redundant. However, F1 = {A B, B A, B C, C B }, F2 = {B C, C B, A C, C A}, and F3 = {A B, B A, A C, C A} are each nonredundant and equivalent to F. Algorithm 7.4.7 (Computation of a Nonredundant Cover) Input: A nite set of attributes H and a set F of functional dependencies, F FD(H ). Output: A nonredundant cover F of F .
7.4. COVERS
291
Method: Let 1 , . . . , n Yn be a sequence that consists of all functional dependencies of F without any repetitions. Construct a sequence of sets of functional dependencies F0 , F1 , . . . , Fn where F0 = F and Fi+1 = Fi {i+1 } Fi if Fi {i+1 } Fi otherwise
for 0 i < n. Output the set F = Fn . Proof of Correctness: It is immediate that the set Fn is nonredundant and equivalent to F . The nonredundant set of functional dependencies obtained in Algorithm 7.4.7 depends on the order in which we consider the functional dependencies. This is not surprising in view of the remark that precedes the algorithm. Observe that even if F is a nonredundant set of functional dependencies, the set G of unit functional dependencies constructed in Theorem 7.4.5 may be redundant. For instance, starting from the nonredundant set F = {A BC, C B } the constructed set G = {A B, A C, C B } is redundant because G {A C, C B }. For reasons that are made apparent in Section 8.2, it is desirable to have table schemas containing functional dependencies with the property that the smallest possible set of attributes determines the largest possible number of remaining attributes. Among other benets, this helps reduce storage requirements. Thus, we seek to minimize the size of X in any functional dependency X Y . The next denition formalizes this requirement. Denition 7.4.8 Let F be a set of functional dependencies, and let X Y be a functional dependency in F . X Y is F -reduced if there exists no proper subset X of X such that (F {X Y }) {X Y } F . The set F is reduced if it consists only of F -reduced functional dependencies. Lemma 7.4.9 Let F be a set of functional dependencies, F FD(H ), and let X Y F . If X X , then F + ((F {X Y }) {X Y })+ . Proof. Let F = (F {X Y }) {X Y }. Observe that the denition of F implies that for every set of attributes W we have {V | U V F, U W } {V | U V F , U W }. (7.3)
To show that F + (F )+ it suces to show that clF (U ) clF (U ) for every set U H . Let CSF (U ) = (U0 , U1 , . . . , Un ), and let CSF (U ) = (U0 , U1 , . . . , Um ). To prove that CSF (U ) CSF (U ), consider a set Ui
292
from CSF (U ). We show, by induction on i, that Ui Um . For i = 0 this statement is immediate because U0 = U0 Um . Therefore, assume that Ui Um . We have
Ui+1
= =
Ui
Um
{V | U V F and U Ui } {V | U V F and U Ui }
Um +1 = Um ,
in view of inclusion 7.3. Since CSF (U ) CSF (U ), it follows that clF (U ) clF (U ). Theorem 7.4.10 For every nite set F of functional dependencies there exists an equivalent, reduced, nite set F of functional dependencies. Proof. The argument is constructive. For each functional dependency X Y of F and each attribute A X , determine if Y clF (X A); if this is the case, replace X Y in F by (X A) Y . We claim that F is equivalent to F {X Y } {(X A) Y }. Note that F (F {X Y } {(X A) Y })+ . On the other hand, F {X Y } {(X A) Y } F + because Y clF (X A), so F and F {X Y } {(X A) Y } are equivalent by Corollary 7.4.3. Since F is nite, the procedure can be applied only a nite number of times. At the end, the remaining set of functional dependencies consists of F -reduced functional dependencies. Example 7.4.11 Let H = ABC , and let F = {AB C, A B } FD(H ). It is easy to verify the following equalities: clF (A) = ABC, clF (B ) = B, clF (C ) = C, and clF (AB ) = clF (AC ) = ABC , clF (BC ) = BC . If we drop A from AB C we note that we cannot infer B C from F because clF (B ) = B . On the other hand, if we drop B from AB C , note that we can infer A C from F since clF (A) = ABC . Therefore, {A C, A B } is an equivalent, reduced set of functional dependencies. Lemma 7.4.12 If F is a reduced set of functional dependencies, F FD(H ) and F is a nonredundant set obtained from F by applying the algorithm 7.4.7, then F is a reduced set of functional dependencies. Proof. The argument is straightforward and is left to the reader.
7.4. COVERS
293
Denition 7.4.13 Let F be a set of functional dependencies, F FD(H ). A canonical form of F is a nonredundant and reduced set G of unit functional dependencies that is equivalent to F . Theorem 7.4.14 For every nite set F of functional dependencies there exists a canonical form of F . Proof. Starting from F , construct an equivalent set F1 of functional dependencies of the form X A as in Theorem 7.4.5. Next, from F1 construct an equivalent set F2 that is reduced and consists of unit functional dependencies. Finally, from F2 construct an equivalent nonredundant set F3 by applying Algorithm 7.4.7. Lemma 7.4.12 implies that F3 is reduced. Example 7.4.15 Let H = ABCDE be a set of attributes, and let F be the set of functional dependencies given by F = {A BCD, AB DE, BE AC }. The set F1 is F1 = {A B, A C, A D, AB D, AB E, BE A, BE C }. To build the reduced set F2 we need to examine functional dependencies in F1 that have more than one attribute in their left members: AB D, AB E, BE A, BE C . Note that clF1 (A) = ABCDE . Therefore, we can eliminate B in the left member of AB D. The resulting functional dependency is already in F1 . Since clF1 (B ) = B , note that A cannot be removed from AB D. Starting from AB E we obtain A E . Since clF1 (E ) = E no more functional dependencies can be obtained. Thus, F2 = {A B, A C, A D, A E, AB D, AB E, BE A, BE C }. Applying Algorithm 7.4.7, we obtain the set of unit functional dependencies F3 = {A B, A C, A D, A E, BE A} that is a canonical cover for F . The following theorem plays an essential role in synthesizing database schemas that satisfy certain normal forms. We use it in Section 8.3. Theorem 7.4.16 Let S = (H, F ) be a schema with functional dependencies, and let K be a key for S. If G = {Xi Ai |1 i n} is a canonical form for F , then 1. No set Xi Ai is included in K ; 2. K {Ai | 1 i n} = H ; 3. H = (K, X1 A1 , . . . , Xn An ) is a lossless decomposition of every table of S.
294
Proof. To prove the rst part of the theorem observe that if Xi Ai were a subset of K , then K Ai would also be a key, thereby contradicting the minimality of K . For the second part of the theorem, note that clG (K ) = clF (K ) = H because F, G are equivalent sets of functional dependencies and K is a key for F . Let CSG (K ) = (K0 , . . . , K , . . . , Km ) be the G-closure sequence of K , where Km = H . For each A H , dene the number pA by pA = min{ | 0 m and A K }. Note that pA exists because clG (K ) = H . If pA = 0, then A K . Otherwise, A KpA KpA 1 which means that there exists a functional dependency Xi Ai G such that Xi KpA 1 and Ai = A KpA . So, in any case, we have A K {Ai | 1 i n}. To prove the last part of the theorem, consider a table = (T, H, ) of the schema S. Let t, t1 , . . . , tn be n +1 joinable tuples such that ti [Xi Ai ] for 1 i n and t [K ]. Then, contains the tuples s, s1 , . . . , sn such that si [Xi Ai ] = ti for 0 i n and s[K ] = t. We assume that the attributes A1 , . . . , An are listed such that pAi pAj implies i j . Let L0 = K and Li = KA1 . . . Ai for 1 i n, where Ln = H . We have Xi Li1 for 1 i n. We prove by induction on i, 1 i n, that (t 1 t1 1 1 ti )[Li ] = s[Li ]. For i = 1, the joinability of t and t1 implies that t[X1 ] = t1 [X1 ], so s[X1 ] = s1 [X1 ], which gives s[A1 ] = s1 [A1 ]. Therefore, (t 1 t1 )[L1 ] = s[L1 ]. Suppose that (t 1 t1 1 1 ti )[Li ] = s[Li ]. We claim that (t 1 t1 1 1 ti 1 ti+1 )[Li+1 ] = s[Li+1 ]. Note that Xi+1 Li . The tuple ti+1 is joinable with (t 1 t1 1 1 ti ); this implies (t 1 t1 1 1 ti )[Xi+1 ] = ti+1 [Xi+1 ], so si+1 [Ai+1 ] = s[Ai+1 ]. This gives the desired conclusion. For i = n we obtain t 1 t1 1 tn = s, which proves that H is a lossless decomposition. Example 7.4.17 Consider the schema S = (A1 . . . A6 , F ), where A1 A2 is a key for F. Let G be a canonical form for F : G = {A1 A3 , A2 A4 , A1 A4 A5 , A2 A3 A6 }. For any table of the schema S we have the lossless decomposition: H = (A1 A2 , A1 A3 , A2 A4 , A1 A4 A5 , A2 A3 A6 ).
7.5. TABLEAUX
295
Example 7.4.18 Let S = (H, F ) be the table schema introduced in Example 7.3.7. Let K = stno cno sem year. The set F that consists of the functional dependencies cno sem year empno stno cno sem year grade is already in canonical form. Therefore, every table of S has the lossless decomposition H = (H1 , H2 , H3 ), where H1 H2 H3 = stno cno sem year = cno sem year empno = stno cno sem year grade
Further, since H1 H3 , we can drop H1 from this decomposition. Thus, H = (H2 , H3 ) is also a lossless decomposition of any table of S. In concluding this section, we stress that its results are independent of any specic table of a schema. In other words, they are applicable to all tables of a schema. Over time, tables change but schema properties remain constant throughout.
7.5
Tableaux
The notion of tableau that we introduce in this section enables us to study properties of functional and multivalued dependencies in a more ecient manner. Let U be a set of relational attributes. For every attribute A U , consider a symbol dA called the distinguished symbol of the attribute A and A a set V A = {nA 0 , n1 , . . .} of nondistinguished symbols. We refer to the set A A DA = {d } V as the pseudodomain of the attribute A. We assume that if A = A , then DA DA = . The set DA is equipped with an order relation whose diagram is given A in Figure 7.1: dA < nA 0 < n1 < . The notion of tableau is very similar to the notion of table. The major dierence between tables and tableaux is that the values that occur in tableaux belong to the pseudodomains of the attributes rather than to their domains. Denition 7.5.1 A tableau is a triple = (T, H, ), where T is a symbol called the tableau name, H = A1 . . . An is a set of relational attributes called the heading of and denoted by heading (), and is a relation, DA1 DAn called the extension of .
296
Figure 7.1: Partial Order on the Set DA Note that no symbol, distinguished or nondistinguished, may occur in more that one column of a tableau. The set of symbols that occur in a tableau is denoted by VAR(). Example 7.5.2 The triple = (T, ABCD, ) given by T A B C D dA dB dC nD 0 nA dB dC dD 1 dA nB nC dD 2 3 is a tableau. Denition 7.5.3 A valuation is a mapping v : DU {Dom(A) | A U}
such that s DA implies v (s) Dom(A), for every symbol s DA and every A U . We assume that valuations are extended from symbols to rows componentwise and, then, to the relations of tableaux, row by row, as shown in the next example. Example 7.5.4 Let v a valuation such that v (dA ) = a0 v (dB ) = b1 v (dC ) = c0 v (dD ) = d2 v (nD 0 ) = d1 v (nA 1 ) = a1 v (nB 2 ) = b2 v (nC 3 ) = c1
7.5. TABLEAUX
297
The image of the tableau dened in Example 7.5.2 under the valuation v is the table: T A B C D a0 b1 c0 d1 a1 b1 c0 d2 a0 b2 c1 d2 We denote the table that results from the application of the valuation v to the tableau = (T, H, ) by v (), where v () = (T, H, v ( )). Every tableau = (T, H, ) that has a distinguished symbol in every column generates a function that transforms a table in T (H ) into another table in T (H ) using the following denition. Denition 7.5.5 Let = (T, H, ) be a tableau that has a distinguished symbol in every column. Assume that H = A1 . . . An . A valuation v : DU {Dom(A) | A U}
is based on a tuple (a1 , . . . , an ) tupl(H ) if v (dAi ) = ai for 1 i n. Since a valuation based on (a1 , . . . , an ) depends only on the valued assigned to the specied distinguished symbols, many quite dierent valuations may be based on (a1 , . . . , an ). Denition 7.5.6 Let = (T, H, ) be a tableau, and let = (T, H, ) be a table. The relation rel(H ), given by = {(a1 , . . . , an ) | there exists v that is based on (a1 , . . . , an ) such that v ( ) },
denes the mapping : T (H ) T (H ) given by ( ) = (T , H, ). Here T is simply a symbol used to name the new table. Note that ( ) is always dened, since, for every table , there exist only a nite number of tuples (a1 , . . . , an ) such that v ( ) for some valuation that is based on (a1 , . . . , an ). Note also that () is empty only if = . Example 7.5.7 Consider the table = (T, ABCD, ) given by
298
T A B C D a1 b2 c1 d1 a1 b1 c0 d0 a1 b2 c0 d1 a2 b2 c1 d0 a2 b2 c0 d1 a2 b1 c1 d1 A valuation v can map dA to either a1 or a2 ; similarly, dB can be mapped to b1 or b2 , etc. Therefore, there are at most 16 rows (v (dA ), v (dB ), v (dC ), v (dD )) on which a valuation can be based. If is the tableau dened in Example 7.5.2, the reader can easily verify that the table ( ) = (T , ABCD, ) is: T A a1 a1 a1 a2 a2 a2 a1 a2 B b2 b1 b2 b2 b2 b1 b2 b2 C c1 c0 c0 c1 c0 c1 c1 c1 D d1 d0 d1 d0 d1 d1 d0 d1
Clearly, every row of generates a family of valuations based on that row such that the image of the tableau under any of these valuations is included in . Therefore, .
7.5.1
Tableaux provide an alternate way of studying properties of project-join mappings that allows us to determine easily whether tables of certain schemas have information lossless decompositions. Denition 7.5.8 Let H = (H1 , . . . , Hk ) be a sequence of subsets of H such that H = {Hi | 1 i k }. A tableau that describes the sequence H is a tableau H = (T, H, H ), where the relation H = {t1 , . . . , tk }, and ti is given by ti [Aj ] = d Aj if Aj Hi a nondistinguished symbol otherwise
299
Example 7.5.9 Let H = ABCD, and let H = (AB, BC, ACD) be a decomposition. A tableau H is given by T A B C D A B C d d n0 nD 0 nA dB dC nD 0 1 C D dA nB d d 0 Theorem 7.5.10 Let H = (H1 , . . . , Hk ) be a sequence of sets of attributes. The project-join mapping pjH equals H , where H = (T, H, H ) is the tableau of the sequence H and H = {Hi | 1 i k }. Proof. We must prove that pjH () = H () for every = (T, H, ) T (H ), where H= {Hi | 1 i k } = A1 . . . An .
Let t = (a1 , . . . , an ) pjH (). There exist k tuples t1 , . . . , tk such that t [H ] and t[H ] = t for 1 k . In turn, this implies that there exist u1 , . . . , uk such that t[H ] = u [H ] for 1 k . Suppose that H , the set of rows of H consists of w1 , . . . , wk , where w represents the set H for 1 k . Consider a valuation v such that v (dAi ) = t[Ai ] for 1 i n, and v (nAq ) = up [Aq ] if the nondistinguished symbol nAq occurs in the p-th row under the attribute Aq in the tableau . The image of the row w under the valuation v is the tuple u of . Indeed, consider the component w [Aq ] of the row w of H . If w [Aq ] is the distinguished symbol dAq , then Aq belongs to H , and v (w [Aq ]) = t[Aq ] = t [Aq ] = u [Aq ]. On the other hand, if w [Aq ] is a nondistinguished symbol, then v (w [Aq ]) = u [Aq ], so in any case, v (w ) = u . Therefore, v (0 ) , so (a1 , . . . , an ) H (). Conversely, let t = (a1 , . . . , an ) H (). There exists a valuation v such that v (dA q ) = aq for 1 q n, and v (H ) . Let u be the image of the row w of H under v . Observe that w contains distinguished symbols for all attributes Aq H , so u [Aq ] = aq for every attribute Aq H . Therefore, we have t[H ] = u [H ] for 1 k , which implies that t pjH (). Theorem 7.5.11 Let H = A1 . . . Ak be a nite set of attributes, and let H = (H1 , . . . , Hk ) be a sequence of subsets of H such that {Hi | 1 i k } = H . The following three statements are equivalent: (i) the set H occurs in the sequence H,
300
(ii) pjH () = for every relation rel(H ), and (iii) the tableau H contains a row of distinguished symbols. Proof. (i) implies (ii). If H occurs in H, then for any subset Hi of H that occurs in H we have [Hi ] 1 [H ] = [Hi ] 1 . Therefore, using the idempotence, commutativity, and associativity of join, we obtain pjH () = ([H1 ] 1 ) 1 1 ([Hk ] 1 ) 1 1 = .
The reverse inclusion, pjH (), holds by Theorem 6.3.7. Consequently, pjH () = . (ii) implies (iii). Suppose that pjH () = for every relation rel(H ). Note that the satisfaction of the equality pjH () = does not depend on the actual domains of the attributes in H . Therefore, pjH (H ) = H . Let r0 be a row on A1 , . . . , An dened by r0 [Ai ] = dAi for 1 i k . If H = {r1 , . . . , rk }, note that r0 [Hi Hj ] = ri [Hi Hj ] = rj [Hi Hj ] for every i = j , 1 i, j k because all these projections consist of distinguished symbols. So, the tuples r1 , . . . , rk are joinable and their join is r0 . Thus r0 pjH (H ), so r0 H . (iii) implies (i). This implication is immediate in view of the denition of H .
7.5.2
In this section we show that tableaux provide an alternative to inference rules for nding the logical consequences of a set of functional dependencies. Since tableaux are tables over attributes whose domains have been replaced by pseudodomains, constraints may be applied to tableaux just as they are applied to tables. We denote by TX(H ) the set of tableaux whose heading is H . If S = (H, ) is a table schema we denote by SATX (S) (or by SATX (H, )) the set of all tableaux that have the heading H and satisfy all constraints of . Recall that Theorem 7.4.5 states that for every set of functional dependencies there exists an equivalent set of functional dependencies that have exactly one attribute in their right member. For the remainder of this section we use only sets of functional dependencies in which each right member consists of one attribute. Denition 7.5.12 Let = (T, H, ) be a tableau, and let X A be a functional dependency such that X H and A H . A violation of X A by is a 4-tuple (X, A, u, v ), where u, v are rows of such that u[X ] = v [X ] and u[A] = v [A].
7.5. TABLEAUX
301
T A dA dA nA 1 nA 2 B nB 0 nB 1 nB 0 nB 1 C nC 0 nC 0 nC 1 nC 1 D nD 0 nD 1 nD 2 nD 3
Figure 7.2: The Tableau = (T, ABCD, ) The tableau obtained from by reducing the violation (X, A, u, v ) of X A is the tableau obtained from by replacing every occurrence of the larger of the symbols u[A], v [A] in the A-column of by the smaller one. If is obtained from through the reduction of a violation of a functional dependency from F we write .
F
Note that if is obtained from by reducing a violation of a functional dependency the number of distinct symbols of is strictly smaller than the similar number for . Also, the number of rows of is less or equal than the number of rows of . If 0 , 1 , . . . , q is a sequence of tableaux such that i i+1 for 0
F
write if there exists q 0 such that . F F Example 7.5.13 Let = (T, ABCD, ) be the tableau given in Figure 7.2. and let F = {A B, BC D}. Note that contains no violation of BC D and that the rst two rows of the tableau violate the functional dependency A B . If we reduce the violation A B , the resulting tableau 1 = (T1 , ABCD, 1 ) is shown in Figure 7.3. The substitution of B nB 1 by n0 aects not only the second, but also the forth row. The tableau 1 violates BC D. By reducing the violation involving the rst two rows we obtain the tableau 2 = (T2 , ABCD, 2 ) given in Figure 7.4. A new reduction of the same violation gives the tableau shown in Figure 7.5. Denition 7.5.14 A containment mapping between the tableaux and is a mapping f : DU DU such that every row of is mapped into a row of , and f (s) s for every s DU .
302
T1 A dA dA nA 1 nA 2 B nB 0 nB 0 nB 0 nB 0 C nC 0 nC 0 nC 1 nC 1 D nD 0 nD 1 nD 2 nD 3
T2 A dA nA 1 nA 2 B nB 0 nB 0 nB 0 C nC 0 nC 1 nC 1 D nD 0 nD 2 nD 3
T3 A dA nA 1 nA 2 B nB 0 nB 0 nB 0 C nC 0 nC 1 nC 1 D nD 0 nD 2 nD 2
7.5. TABLEAUX
303
Containment mappings are extended to rows componentwise, and then, to sets of rows, elementwise. If , , are tableaux in TX(H ) and f, g are containment mappings between , and , , respectively, then it is easy to verify that gf is a containment mapping between and (cf. Exercise 25. Note that if is obtained from by reducing a violation of a functional dependency, then there exists a containment mapping from to such that for every row t of we have t = f (t) for some row t of . We discuss an algorithm whose input is a table schema with functional dependencies S = (H, F ) and a tableau and whose output is a tableau F that satises all functional dependencies of F such that a containment mapping exists from to F . The action of the algorithm consists of chasing violations of functional dependencies of F and successively reducing these violations. The algorithm is named the Chase Algorithm for Functional Dependencies. This algorithm is extremely important because, among other things, it can be used to determine whether a functional dependency is a logical consequence of a set F of functional dependencies without using inference rules or closures. Briey, a tableau based on is created and the functional dependencies of F are chasedon the tableau; the form of the resulting tableau determines whether or not F |= . This is presented in detail in Theorem 7.5.21. This algorithm can also be used to ascertain whether the tables of SAT (H, F ) have H as a lossless decomposition, by chasing the functional dependencies of F on the tableau H and by examining the resultant tableau (H )F (see Theorem 7.5.20). There is a rich literature of other uses of the Chase Algorithm. Algorithm 7.5.15 The Chase Algorithm for Functional Dependencies Input: A table schema with functional dependencies S = (H, F ) and a tableau . Output: A tableau F that satises all functional dependencies of F such that a containment mapping exist from to F . Method: Construct a sequence of tableaux 0 , . . . , i , i+1 , . . . dened by: Stage 0: 0 := Stage i + 1: i+1 is obtained from i by reducing a violation of a functional dependency from F if such a violation exists in i ; otherwise, that is, if no violation exists, stop, and let F = i .
304
Proof of Correctness: Note that the Chase Algorithm is nondeterministic since at each step we may have to chose among several violations of functional dependencies of F . Also, observe that the algorithm terminates since at every step the number of distinct symbols of the tableau decreases by one and the set of symbols of the tableau is nite. Clearly, we always nish the sequence 0 , . . . , i , i+1 , . . . with a tableau that satises all functional dependencies of F . Denition 7.5.16 Let be a tableau and assume that we have the sequences of tableaux: = 0 1 . . . p
F F F
obtained by applying the Chase Algorithm, where i = (Ti , H, i ), and let fi be the containment mapping such that fi (i1 ) = i for 1 i p. A chase sequence is a sequence of tuples t0 , t1 , . . . , tp such that ti i and fi (ti1 ) = ti for 1 i p. Theorem 7.5.17 Let S = (H, F ) is a table schema with functional dependencies and let = (T, H, ) be a tableau. Suppose that = (T, H, ) is a tableau in SATX (S), f is a containment mapping such that f ( ) , and 0 , 1 , . . . , p is a sequence of tableaux obtained from = 0 through the application of the Chase Algorithm, where i = (Ti , H, i ) for 0 i p. Then, for every chase sequence t = t0 , t1 , . . . , tp we have f (ti ) = f (t) , and, therefore, f (i ) for 0 i p. Proof. Let t = t0 , t1 , . . . , tp be a chase sequence. The proof is by induction on i. The basis, i = 0 is immediate. Suppose that it holds for i. The tableau i+1 is obtained by reducing a violation (X, A, ui , vi ) of a functional dependency X A in i . Let u i , vi be the rows in such that f (ui ) = ui and f (vi ) = vi . Clearly, we have ui [A] = vi [A] because satises all functional dependencies of F . In other words, f maps the distinct symbols ui [A], vi [A] into min{ui [A], vi [A]} = u i [A] = vi [A]. Let ui+1 , vi+1 the rows (not necessarily distinct) that result from ui , vi by the reduction of the violation (X, A, ui , vi ). The rows of i+1 fall into three categories: 1. If the row ti+1 of i+1 is unaected by the reduction of the violation (X, A, ui , vi ), then that ti+1 = ti , so this row is also in i and f (ti+1 ) = f (ti ) = f (t) . 2. Suppose that ti+1 of i+1 is aected by that reduction but is neither ui+1 nor vi+1 . This could happen only if for the row ti in i we have ti [A] = max{ui [A], vi [A]} and ti+1 [B ] = ti [B ] if B = A min{ui [A], vi [A]} if B = A.
7.5. TABLEAUX
305
Since f (min{ui [A], vi [A]}) = f (max{ui [A], vi [A]}), it follows that f (ti+1 ) = f (ti ) = f (t) . 3. If ti+1 is ui+1 or vi+1 , then f (ui+1 ) = f (ui ) , and f (vi+1 ) = f (vi ) . Thus, we may conclude that f (ti+1 ) = f (ti ) = f (t) , so f (i+1 ) . Theorem 7.5.18 For any of the choices of reductions of violations of functional dependencies in a tableau = (T, H, ) the Chase Algorithm yields tableaux that have the same extension. Proof. Let be a tableau and assume that we have the sequences of tableaux: = 0 1 . . . p
F F F
and
1 . . . q = 0 F F F
obtained from by applying the Chase Algorithm and reducing violations of dierent sequences of functional dependencies, where i = (T i , H, i ) for 0 i p, and j = (Tj , H, j ) for 0 j q . Let f1 , . . . , fp be the containment mappings dened by the rst sequence of reductions, where fi maps every row of i 1 into a row of i for 1 i p, and let f1 , . . . , fq be the similar sequence for 1 , . . . , q . Consider the containment mappings f = fp f1 and f = fq f1 ; the mapping f maps every row of into a row of p , and f maps every row of into a row of q . Note that both p and q belong to SATX (S). Since f ( ) = p and f ( ) = q , by a double application of Theorem 7.5.17, we also have f (p ) q and f (q ) p . Further, if t = t 0 , t1 , . . . , tp and t = t0 , t1 , . . . , tq are chase sequences, then f (t) = f (t0 ) = f (t1 ) = = f (tp ) and f (t) = f (t 0 ) = f (t1 ) = = f (tq ). Consequently, f (tp ) = f (f (t)) = f (t) and f (tq ) = f (f (t)) = f (t) for every t . Let t q be such that t = f (t). Then, f (f (t )) = f (f (f (t))) = f (f (t)) = f (t) = t . Similarly, if t p and t = f (t) we have f (f (t )) = f (f (f (t))) = f (f (t)) = f (t) = t . Therefore, the mapping f f is the identity on the relation q and f f is the identity on p . This shows that the restrictions of f and f to p and q are mutually inverse bijections. Now, we can actually prove that f is the identity mapping when re stricted to VAR(p ). Since f , f are both containment mappings we have
306
t f (t ) f (f (t )) for every t in p . Since f (f (t )) = t , it fol lows that t = f (t ) = f (f (t )), so f is indeed the identity mapping. Therefore, p = q . Since the sequence of reductions of violations of functional dependencies of F does not inuence the extension F of the nal tableau we denote by F the tableau F = (TF , H, F ).
Theorem 7.5.19 Let S = (H, F ) is a table schema with functional dependencies and let = (T, H, ) be a tableau. If v is a valuation, v : DU {Dom(A) | A U} such that v ( ) rel(S), then v (F ) = v ( ). Proof. Let 0 , . . . , n1 be a sequence of tableaux obtained by applying the Chase Algorithm to and F . We have 0 = and n1 = F , where = (T , H, ) for 0 n 1. We prove by induction on k (for 0 k n 1) that v (k ) = v ( ). The base case k = 0 is obvious. Suppose v (k ) = v ( ) and that k+1 is obtained from k by reducing the violation (X, A, u, v ). There exist two rows r, s in k such that r[X ] = s[X ] but r[A] = s[A]. Suppose that r[A] > s[A]. Then, k+1 is obtained from k by replacing the symbol r[A] by s[A] in the A-column of k . Let r , s be the rows of v ( ) given by r = v (r) and s = v (s), respectively. Since v ( ) is a relation of S it follows that it satises the functional dependency X A, so r [A] = s [A]. In other words, we obtain v (r[A]) = v (s[A]). Therefore, v also maps k+1 onto v ( ). Using tableaux we can determine if the decomposition of a table of the schema S is information lossless. Theorem 7.5.20 Let S = (H, F ) be a schema with functional dependencies, and let H = (H1 , . . . , Hk ) be a sequence of sets of attributes such that H = {Hi | 1 i k }. If H = (T, H, H ) is a tableau that describes H, then H is a lossless decomposition of every table of the schema S if and only if the tableau H,F = (TH,F , H, H,F ) obtained by applying the Chase Algorithm to H and F has a row of distinguished variables. Proof. Let H = A1 . . . An and suppose that H,F contains a row of distinguished variables, and let = (T, H, ) be a table of the schema S. Since H represents the sequence H, if follows that H = pjH . Therefore, to prove that H is a lossless decomposition it suces to show that H () for every rel(S). Let t be a tuple of H (). There exists a valuation v such that t[Ai ] = Ai v (d ) for 1 i n such that v (H ) . By Theorem 7.5.19, we also have v (H,F ) . Since H,F contains a row of distinguished variables (dA1 , . . . , dAn ), the image of this row under v belongs to . Since this image is exactly the tuple t it follows that t and this gives the desired inclusion.
7.5. TABLEAUX
307
Conversely, let H = (T, H, H ) be the tableau of H. Since the tableau H,F satises all functional decompositions of F , it follows that if has the lossless decomposition H. This immediately implies that H,F contains a row that consists of distinguished symbols since such a row belongs to pjH (H,F ). Theorem 7.5.21 Let S = (H, F ) be a schema with functional dependencies. Consider a two-tuple tableau U = (TU , H, {r, s}), where r[A] = s[A] = dA for all A U ; if A U , then r[A] = s[A]. We have F |= U V if and only if r [V ] = s [V ], where r , s are the rows of the tableau U F is the tableau obtained by applying the Chase Algorithm to U . Proof. Observe that if A H U , then at least on of the symbols r[A], s[A] U must be nondistinguished. Since F satises all functional dependencies of U U F , if F |= U V , then F satises U V . The rows r , s of F are equal on U because the Chase Algorithm does not aect distinguished symbols. This implies r [V ] = s [V ]. Conversely, suppose that r [V ] = s [V ]. Let = (T, H, ) be a table that satises the functional dependencies of F , and let t, w be two rows of such that t[U ] = w[U ]. Let v be the valuation that maps the rows r, s of U to the rows t and w, respectively. Theorem 7.5.19 implies that v ({r , s }) = v ({r, s}) , and this, in turn, gives t[V ] = w[V ]. Therefore, every table that satises all functional dependencies of F also satises U V , which means that F |= U V . Example 7.5.22 In Example 7.2.25, we examined the closure generated by the set of functional dependencies F = {AB C, CD E, AE B } on the set of attributes H = ABCDE . Since clF (AE ) = AEBC , F |= AE C . Consider now the tableau AE given by A dA dA T AE B C D nB nC nD 1 2 3 nB nC nD 4 5 6 E dE dE
Applying the Chase Algorithm to AE gives sequence of tableaux shown in Figure 7.6: The Chase Algorithm stops with an array containing two rows whose C components are the same. Therefore, F |= AE C . Using the previous developments we can give a dierent argument of Theorem 7.2.22. Suppose that S = (H, F ) is a table schema and that U = A1 Am B1 Bn , V = B1 Bm C1 . . . Cp
308
Tableau
A dA dA
TAE B C D nB nC nD 1 2 3 C D nB n n 4 5 6
E dE dE
AE B
A dA dA
B nB 1 nB 1
T1 C nC 2 nC 5
D nD 3 nD 6
E dE dE
AB C
A dA dA
B nB 1 nB 1
T2 C nC 2 nC 2
D nD 3 nD 6
E dE dE
7.6. EXERCISES
309
U V
A1 d A1 np+1
U V Am dAm np+m
B1 dB1 dB1
H U V Bn dBn dBn
C1 n1 dC1
V U Cp np dCp
Chase Algorithm ? H,F U V B1 Bn dB1 dBn dB1 dBn Figure 7.7: are subsets of H such that U V = H . Also, suppose that H = (U, V ) is a lossless decomposition of every table of the schema S. In this case, starting from the tableau H the Chase Algorithm yields a tableau H,F that contains a row of distinguished symbols. Suppose, for instance, that the rst row of H,F consists of distinguished variables (see Figure 7.7). Then, the two rows of the table H,F coincide on all attributes of V U . If we start from the table U V , exactly the same reductions of violations of functional dependencies yield a tableau whose rows coincide on V U and, therefore, coincide on V . By Theorem 7.5.21, this implies F |= U V V . Similarly, if U V U V F + , the Chase Algorithm applied to the U V U V tableau U V generates the tableau F . For the rows r, s of F we have r[U V ] = s[U V ]. Therefore, we obtain r[U ] = s[U ]. The same algorithm, using the same reductions of violations applied to the table H gives the tableau H,F that consists of two rows r , s such that r [U ] = s [U ]. This implies that H,F contains a row of distinguished symbols, that is, (U, V ) is a lossless decomposition of any table of S.
U V
A1 d A1 np+1
U V Am dAm np+m
C1 dC1 dC1
V U Cp dCp dCp
7.6
Exercises
1. Prove Theorem 7.2.6. 2. Let S = (H, F ) be a database schema. Prove that for X, X H ,
310
X X implies CSF (X ) CSF (X ). Hint. Assume CSF (X ) = (X0 , X1 , . . .) and CSF (X ) = (X0 , X1 , . . .). Then use induction on i to show that for every Xi there exists Xj i such that Xi Xji . 3. Find all keys for the table schema S = (ABCD, F ), where F is a set of functional dependencies given by: (a) F = {AB C, C D, D BC } (b) F = {AB C, B D} (c) F = {ABC D, ABD C, ACD B } (d) F = {A B, B C, C D, D A} (e) F = {A B, BC AD} 4. (a) Prove that is a trivial functional dependency on H if and only if . (b) Show that any -proof of a trivial functional dependency U V must use Rincl at least once. Conclude that the inclusion rule is independent of Raug and Rtrans . In other words, we cannot replace the use of Rincl by uses of Raug and Rtrans . 5. Let H be a nite set of attributes. (a) If U V FD(H ) is obtained by applying Raug to X Y FD(F ), then show that V U Y X and U V X Y . (b) If U W FD(H ) is obtained by applying Rtrans to U V and V W , then show that W U (V U ) (W V ). (c) Let F = {X Y } be a set that consists of a single functional dependency. Prove that F U V if and only if U V is trivial or X U and V U Y X . Solution. The verication of the rst two parts is a simple settheoretical exercise and is left to the reader. We begin by showing the suciency of the condition of the third part. If X U and V U Y X , we have the trivial functional dependencies U X and Y X V U ; together with X Y , they give the following F -proof of U V : 1. U X trivial functional dependency 2. X Y initial functional dependency 3. U Y Rtrans and (1),(2) 4. Y Y X trivial functional dependency 5. U Y X Rtrans and (3),(4) 6. Y X V U trivial functional dependency 7. U V U Rtrans and (5),(6) 8. U V Raug and (7), augmenting by U, so F U V . The necessity of the condition can be shown by induction on the length n 1 of the F -proof of the functional dependency U V
7.6. EXERCISES
311
by using the rst two parts. 6. Consider the schema S = (ABCD, {AB D, CD A}). (a) Prove that every attribute of S is prime. (b) Prove that D is not a prime attribute in S[ABD]. Conclude that a prime attribute of a schema is not necessarily prime in a projection of the schema. 7. Prove that the set of rules that consists of Rincl , Radd , and Rtrans is complete. Hint. Show that an application of Raug can be replaced by a proof that makes use of the rules mentioned above. 8. Consider the reexivity rule X X and the amplied transitivity rule X Y, Y Z X Y Z
Ramptrans Rre
9.
10.
11.
12.
13.
which hold for every X, Y, Z H . Prove that the set Rre , Ramptrans , and Rproj is sound and complete. , . . . , n Let H be a set of attributes, and assume that 1 R is an rule of inference that is not sound, where 1 , . . . , n , FD(H ). Prove that there exists a table = (T, H, ) that is a counterexample to this rule such that || = 2. Let S = (H, F ) be a table schema. Prove that if F |= X Y , then H (Y X ) is a candidate key for S. Solution. Let = (T, H, ) be a table from SAT (S), and let t, s be two tuples such that t[H (Y X )] = s[H (Y X )]. Note that X H (Y X ), so t[X ] = s[X ]. Therefore, t[Y ] = s[Y ], and this, together with t[H (Y X )] = s[H (Y X )], implies t = s. Prove that if 0 , . . . , n1 = is an {X Y }-proof of a nontrivial functional dependency U V that uses only Rincl and Raug , then XY U V . Conclude that Rtrans is independent of Rincl and Raug . Prove that if 0 , . . . , n1 = is an {X Y }-proof of a nontrivial functional dependency U V that uses only Rincl and Rtrans , then U V XY . Conclude that Raug is independent of Rincl and Rtrans . Prove that if F is a set of functional dependencies, F FD(H ), clF (X ) clF (X ) if and only if X X F + for every X, X H.
312
14. Consider the schema S = (A1 Am , {A1 A2 , A2 A3 , . . . , Am1 Am }). (a) Prove that A1 is the unique key of S. (b) Show that the length of the sequence CSF (A1 ) is m. 15. Let S = (H, F ) be a table schema. (a) Prove that for every subset X of H the length of the sequence CSF (X ) does not exceed |H |. Conclude that Algorithm 7.2.24 takes O(|H |2 |F |) time. (b) Modify Algorithm 7.2.24 to avoid repeated use of functional dependencies whose right-hand member has been already added to a set of CSF (X ). Show that it is possible to compute clF (X ) in O(|H ||F |) time. 16. Prove that the schema with functional dependencies S = (H, F ) has a unique key if and only if H {Yi Xi | 1 i n} is a candidate key, where F = {Xi Yi | 1 i n}. Solution. Suppose that S has a unique key K . Then, by Exercise 10 we have K H (Yi Xi ) for 1 i n. This implies K {H (Yi Xi ) | 1 i n} = H {Yi Xi | 1 i n},
so H {Yi Xi | 1 i n} is a candidate key for S . Conversely, suppose that H {Yi Xi | 1 i n} is a candidate key for S , and let L be an arbitrary candidate key for S . Suppose that there exists A H {Yi Xi | 1 i n} such that A L. Note that there is no functional dependency Xi Yi in F such that A Yi Xi . Therefore, if we compute clF (L) it is impossible to add A to any set Xk in the sequence CSF (L) if A is not already in. Consequently, clF (L) H {A} so, L cannot be a candidate key. This means that H {Yi Xi | 1 i n} is included in any candidate key and, therefore, in any key of S; this implies that H {Yi Xi | 1 i n} is the unique key for S. 17. Let F FD(H ), and let X Y FD(H ). Prove that F {X Y } F if and only if Y clF {X Y } (X ). 18. Let H = ABCDE be a set of attributes. For each of the following, either construct F -proofs of the functional dependency , or explain
7.6. EXERCISES why this is not possible. (a) (b) (c) (d) (e) Set of Functional Dependencies F A B, B C A B, B C AB C, D C, AE BD ABC D, A BD, CD E B C, D E, A BD, CE B A C C A AE BC BC AE A B
313
19. Let S = (H, F ) be a table schema. Prove that: (a) for every subset Z of H we have clF (Z ) Z {Y | X Y F }; (b) if F consists of functional dependencies of the form X A, where X H and A H , then, for every subset Z of H , A clF (Z ) Z implies the existence of a functional dependency X A in F such that X clF (Z ). Solution. Let Z = Z0 , Z1 , . . . be the sequence CSF (Z ). It suces to prove (by induction on k 0) that Zk {Y | X Y F } for every k N to obtain the rst statement. For the second part, observe that if A clF (Z ) Z , then A Zj for some j 1. This implies the existence of a functional dependency X A F such that X Zj 1 , so X clF (Z ). 20. Let = (T, ABC, ) be the table T A B C a 0 b 0 c1 a 1 b 0 c0 a 0 b 1 c1 and let i = (Ti , ABC, i ) (for 1 i 4) be the tableaux: A d d
A A
T1 B nB 0 B d T3 B d
C d
C
A d
A
T2 B nB 0 B d T4 B d
C dC nC 0 C nC 0 dC
nC 0 C d d
C C
nA 0 A d
A
A d
A
nA 0
nB 0 B
nA 0
nB 0
Compute the tables i ( ) for 1 i 4. 21. Let S = (H, F ) be a table schema, H = (H1 , . . . , Hn ) be a decomposition of H , and H = (T, H, H ) be a tableau that describes H.
314
Prove that pjH (H ) contains a tuple of distingushed variables. 22. Prove that L is a candidate key for the table schema S = (H, F ) if L and only if the tableau F obtained from L by applying the Chase Algorithm has only one tuple. 23. Let H = (ABC, BCD, AE ) be a decomposition of the table schema S = (ABCDE, F ). Using tableaux, identify at least two distinct sets of functional dependencies such that H is a lossless decomposition of every table of SAT (S). 24. Let be the tableau T A B C D dA nB 0 dC nD 0 dA nB nC dD 1 0 nA dB dC nD 1 0 Apply the Chase Algorithm to and to the set of functional dependencies F = {A B, CD B, B D}. Verify that the order in which violations of the functional dependencies of F are reduce does not inuence the nal result. 25. Prove that if f, g are containment mappings between the tableaux , and , , then gf is a containment mapping between the tableaux and , where gf (s) = g (f (s)) for every symbol s. 26. Using the Chase Algorithm, determine whether the functional dependency X Y is a logical consequence of the set F of functional dependencies, where X, Y and F are as in Exercise 18. 27. Let S = (ABCDE, F ) be a table schema, and let H1 , . . . , Hk be a collection of subsets of ABCDE such that {Hi |1 i k } = ABCDE . Verify whether the following decompositions are lossless for every table of S using the Chase Algorithm. Set of Functional Dependencies F A B, B C AB C, D C, AE BD ABC D, A BD, CD E B C, D E, A BD, CE B AB D, DE B, A C, BC E H1 , . . . , Hk AC, BCDE AB, BC, CDE ABC, CDE AB, BC, CD, DE ABE, ABCD
7.7
Bibliographical Comments
Chapter 8
Normalization
8.1 8.2 8.3 8.4 8.5 Introduction Normal Forms Normalization Exercises Bibliographical Comments
8.1
Introduction
The design of our continuing example, the college database, has a number of problems. For instance, there are updates that cannot be applied to the table GRADES. At present, this table is the only place we can record the teaching assignments of the faculty. However, if we decide to store these assignments before the semester begins (that is, before students register for courses), this table will be unable to accomodate these data, because any tuple inserted into this table must have a non null stno component. This is known as an insertion anomaly problem. Similarly, if the last student in any course decides to withdraw, we must delete the corresponding tuple from the GRADES table, and we lose any trace of the assignment of the instructor for the course. This is known as a deletion anomaly. Another kind of problem arises when we update GRADES. If we change the instructor who teaches a course to some other instructor, we need to update all records that refer to that oering of the course. This may involve many tuples, and if a crash occurs while this update is performed, the data will be rendered inconsistent: some records will show the new instructor while others will show the previous one. This is known as an update anomaly. 315
316
CHAPTER 8. NORMALIZATION
Note also that GRADES contains redundant data. We repeat the instructors identication (empno) in each grade record. This, and other problems, could be easily eliminated if we separate teaching assignments from grades. In this chapter we provide the foundations necessary to resolve these kinds of problems. Then, we articulate problems encountered with table design choices and oer solutions to eliminate or alleviate them.
8.2
Normal Forms
A normal form is a restriction placed on a table schema. Several normal forms are identied in the literature. The goal of using these is to ensure minimal redundancy and maximal consistency of data in the tables that satisfy the restrictions of the database schemas. Starting with an unruly table schema S that violates some normal form (other than the rst normal form dened below) we replace S with a number of smaller table schemas, S1 , . . . , Sp , that are informationally equivalent to S, such that the new schemas conform to the requirements of the normal form. This process, known as normalization by decomposition, is described in detail in Section 8.3. The normalization by decomposition of S generates a lossless decomposition of any table of S into a collection of tables, 1 , . . . , p , where i SAT (Si ) for 1 i p. Imposing normal forms is not without cost: using decomposed tables that satisfy normal forms increases the cost of computing those queries that require us to reconstruct the original, larger table. The common normal forms comprise a hierarchy (see Figure 8.1). We can trade o the advantages of using normal forms with the goal of increasing performance by choosing how far to proceed with the normalization process.
8.2.1
The most general normal form for table schemas of relational databases is the rst normal form. To introduce this normal form we recast Example 6.2.9. We could dene a more compact way of storing the contents of the table REC BOOKS. Instead of the schema S = (cno empno author title , {cno empno}) we could consider the schema S = (cno empno author-title , {cno empno }),
317
In other words, the domain of the new attribute empno consists of sets of values of empno and the domain of author-title consists of sets of pairs of values. Suppose that the table REC BOOKS of S contains the tuples:
cno cs310 cs310 cs310 cs310 empno 019 023 019 023 REC BOOKS author title Cooper Oh! Pascal! Dale Pascal Plus Data Structures Dale Pascal Plus Data Structures Cooper Oh! Pascal! REC BOOKS author-title { (Cooper, Oh! Pascal!), (Dale, Pascal Plus Data Structures) }
This is an interesting idea from the point of view of decreasing redundancy and improving storage usage. Unfortunately, at the current stage of development of computing, the cost of manipulating tables whose entries are sets of values rather than simple, atomic values is still prohibitive. The inherent cost of testing equality between two sets of integers, for example, exceeds by far the cost of testing the equality of two integers. Such considerations lead us to insist that the entries of the tables be atomic. We use the term atomic as a primary, undened notion. Informally, we regard a value as atomic if nothing in the database operation requires that we take it apart and deal with the smaller individual pieces that constitute this value. Denition 8.2.1 A table schema S = (H, ) is in rst normal form (1NF) if for every A H , Dom(A) consists of atomic values. A database schema S = (S1 , S1 , . . . , Sn , ) is in 1NF if every schema Si is in 1NF for 1 i n. All table schemas that we have dealt with (with the exception of schema S considered in this section) are in rst normal form. Unless we state otherwise, we assume that all table schemas are in 1NF.
8.2.2
We consider now a number of normal forms that are more restrictive than 1NF. Their denition involves functional dependencies. Recall that a set of attributes X H is a candidate key of a schema S = (H, ) if X contains a key of S.
318
CHAPTER 8. NORMALIZATION
Figure 8.1: Hierarchy of the Normal Forms Denition 8.2.2 A table schema S = (H, ) is in Boyce-Codd normal form (BCNF) if for every nontrivial functional dependency X A in + , where X H and A H , X is a candidate key. S is in third normal form (3NF) if for every nontrivial functional dependency X A in + , where X H and A H , either X is a candidate key, or A is a prime attribute. S is in second normal form (2NF) if for every nontrivial functional dependency X A in + , where X H and A H , either X is a candidate key, or A is a prime attribute, or X is not a proper subset of any key of S. A database schema S = (S1 , S1 , . . . , Sn , ) is in 2NF, 3NF, or BCNF if every schema Si is in 2NF, 3NF, or BCNF, respectively, for 1 i n. The most restrictive of the normal forms of Denition 8.2.2 is the BoyceCodd normal form. As we move from BCNF to 3NF, 2NF, and 1NF, the demands imposed by the normal forms are easier to satisfy. Example 8.2.3 Let S0 = (ABC, F0 ) be a table schema, where F0 = {A B }. The single key of this schema is AC , so S0 is not in second normal form because A is not a candidate key, B is not a prime attribute and A is a proper subset of the key AC . Example 8.2.4 Consider the schema S1 = (ABC, {A B, B C }). Note S1 has A as its unique key. The functional dependency B C violates third normal form because B is not a candidate key and C is not
319
a prime attribute. However, it is easy to see that S1 is in second normal form. Example 8.2.5 Consider the table schema S2 = (ABC, F2 ), where F2 = {AB C, C B }). Note that this schema has two keys, namely AB and AC . Since all its attributes are prime, S2 is in third normal form. However, it is not in BCNF since C is not a candidate key. Example 8.2.6 The table schema S3 = (ABC, F3 ), given by F3 = {A B, A C }) is in BCNF. Indeed, for this schema A is the single key, and + for every nontrivial functional dependency X Y of F3 , X is the key A. We examine the impact of imposing a normal forms on a table schema by decomposing the table schema into smaller ones that both convey the same information as the original schema and obey the normal form. Example 8.2.7 Let 0 = (T0 , ABC, 0 ) be the table given by
A a1 a1 a1 a2 a2 a2 a3 T0 B b1 b1 b1 b2 b2 b2 b3 C c1 c2 c3 c4 c5 c6 c7
It is easy to verify that 0 SAT (S0 ), where S0 is the table schema introduced in Example 8.2.3. This table displays some of the problems mentioned in Section 4.1: redundancy and various anomalies. Using the functional dependency A B and Theorem 6.3.13, we could replace 0 by its projections 0 [AB ] and 0 [AC ] that form a lossless decomposition of 0 :
T0 [AB ] A B a1 b1 a2 b2 a3 b3 T0 [AC ] A C a1 c1 a1 c2 a1 c3 a2 c4 a2 c5 a2 c6 a3 c7
Observe that while B -values that correspond to the same A-value are repeated in T0 , the projection T0 [AB ] contains only three pairs, the essential part of the association between the A-values and the B -values contained by T0 . Thus, the redundancy of T0 is eliminated by this decomposition; instead of 14 values that occur in the A and B columns of T0 , we need
320
CHAPTER 8. NORMALIZATION
store only six values. Update anomalies occur in T0 . If we modify the last-but-one tuple of T0 to be (a2 , b1 , c6 ) the functional dependency A B is violated. Since T0 has the unique key AC , we can make no insertion into T0 unless both values of the A-component and of the C -component are dened, which shows that T0 has insertion anomalies. Finally, suppose that we intend to delete all tuples whose C -component is c7 ; in 0 this will delete the tuple (a3 , b3 , c7 ). This removes any trace of the fact that the value B -value associated to a3 is b3 (deletion anomaly). However, replacing T0 by T0 [AB ] and T0 [AC ] gives us a place to store the pair (a3 , b3 ), even if this pair is not associated with c7 . Thus, we see that in addition to a small decrease in required storage which would be far more signicant if the tables were larger we gain control of the data in the tables, and we can enter information that we otherwise could not. Example 8.2.8 Consider the table 1 = (T1 , ABC, 1 ) SAT (S1 ) dened by
A a1 a2 a3 a4 a5 a6 T1 B b1 b1 b1 b2 b2 b2 C c1 c1 c1 c2 c2 c2
where S1 is the table schema introduced in Example 8.2.4. The second normal form table may still contain redundant data. Indeed, using Theorem 6.3.13 and the functional dependency B C we can decompose 1 into
T1 [BC ] B C b1 c1 b2 c2 T1 [AB ] A B a1 b1 a2 b1 a3 b1 a4 b2 a5 b2 a6 b2
The twelve data entries that occur in the columns B, C of T1 are replaced with four data items. Also, update anomalies may occur. We cannot insert a tuple that associates a B -value with a C -value in T1 unless the corresponding A-component is not null. Example 8.2.9 The table 2 = (T2 , ABC, 2 ) dened by
321
A a1 a1 a2 a2
C c1 c2 c1 c2
belongs to SAT (S2 ), where S2 is the schema dened in Example 8.2.5. If we decompose the table using the functional dependency C B we obtain
T2 [BC ] B C b1 c1 b2 c2 T2 [AC ] A C a1 c1 a1 c2 a2 c1 a2 c2
The size of the projection T2 [BC ] shows that the relationship between B and C can be expressed in two tuples; it is redundant to spread it over four tuples in T2 . Example 8.2.10 Let 3 = (T3 , ABC, 3 ) SAT (S3 ) be the table
A a1 a2 a3 T3 B b1 b2 b3 C c1 c2 c3
The schema S3 is the one introduced in Example 8.2.6. Using the functional dependency A B , we can decompose 3 as
T3 [AB ] A B a1 b1 a2 b2 a3 b3 T3 [AC ] A C a1 c1 a2 c2 a3 c3
The table 3 does not contain redundant data, and potential update anomalies that may occur in S3 have been eliminated by replacing the table 3 with 3 [AB ] and 3 [AC ]. It often helps to have alternate methods to recognize when a schema is in a specic normal form. Thus, we present several characterizations of normal forms. Theorem 8.2.11 Every two-attribute schema S = (AB, F ) is in BCNF. Proof. The nontrivial functional dependencies on AB are A B and B A. Several cases may occur: 1. If F contains both A B and B A, then both A and B are keys, and the BCNF requirement is satised. 2. If F contains only A B , then A is a key so, again, the BCNF requirement is satised. The same holds when F contains only B A. 3. The case where F = also satises the requirements of BCNF.
322
CHAPTER 8. NORMALIZATION
Therefore, we conclude that S is in BCNF. The next theorem contains a characterization of the second normal form. Theorem 8.2.12 A table schema S = (H, F ) is in second normal form if and only if, for every nonprime attribute A and every key K of S, the functional dependency K A F + is F -reduced. Proof. Let S = (H, F ) be a table schema with functional dependencies in the second normal form, and let K A be a functional dependency, where K is a key and A is a nonprime attribute. Suppose that K A is not F -reduced. Then, there exists a proper subset K of K such that K A F + , and this contradicts the denition of the second normal form. Conversely, let S = (H, F ) be a schema that satises the condition of the theorem, and let X A be a functional dependency from F + . Suppose that X is not a candidate key and A is not a prime attribute. If X were a proper subset of a key K , then the functional dependency K A would not be F -reduced. Therefore, S is in second normal form. To give an additional characterization of schemas in third normal form we need the following denition: Denition 8.2.13 Let S = (H, F ) be a schema with functional dependencies. An attribute A is transitively dependent on a set X of attributes, where X H and A H , if F + contains the nontrivial functional dependencies X Y and Y A but does not contain Y X . Note that in this case X A F + by Rtrans . A depends directly on X if X A F + and A is not transitively dependent on X . Example 8.2.14 Let us expand, for the sake of this example, the schema Sstudents = (stno name addr city state zip, Fstudents ) introduced in Example 6.2.14 by adding the attribute mayor, which gives the mayor of the city or town where the student lives. Since every city or town has exactly one mayor (for the purpose of this example we arbitrarily exclude any other form of city or town government), we should add city state mayor to the set Fstudents . We obtain the schema
S students = (stno name addr city state zip mayor, Fstudents ), where Fstudents consists of the following dependencies: stno name addr city state zip zip city state city state mayor Observe that the attribute mayor is transitively dependent on zip because zip city state F , city state zip F + , and city state mayor F .
323
Using Denition 8.2.13 we have the following characterization of schemas in third normal form. Theorem 8.2.15 The schema with functional dependencies S = (H, F ) is in third normal form if and only if for every key K of S and every nonprime attribute A of H , A depends directly on K . Proof. Suppose that S is in third normal form and that there is a nonprime attribute that depends transitively on a key K . This means that there exists Y such that the nontrivial dependency Y A belongs to F + and Y K F + . This implies that Y is not a candidate key, and therefore, Y A violates the third normal form. So, A must depend directly on K . Conversely, let S = (H, F ) be a schema that satises the condition of the theorem, and let X A be a functional dependency in F + . If X is not a candidate key and A is a nonprime attribute, then for any key K , we have K X F + . Observe that this means that A depends transitively on K because X K F + (because, otherwise, X would be a candidate key). Thus, no such X and A exist, so S is in third normal form. Actually, a condition that appears weaker because it refers only to a single key rather than to all keys also characterizes schemas in 3NF and so is, in fact, equivalent to the condition above. Theorem 8.2.16 The schema with functional dependencies S = (H, F ) is in third normal form if and only if there exists a key K of S such that for every nonprime attribute A of H , A depends directly on K . Proof. Clearly, Theorem 8.2.15 implies that this condition is necessary for a schema to the in 3NF. To prove that it is sucient, let S be a schema such that there is a key K of S such that every nonprime attribute depends directly on K . Suppose that K is another key of S and that an attribute A depends transitively on K . In this case, there exists a set of attributes Y such that Y A is a nontrivial dependency in F + , K Y F + and Y K F + . Observe that we also have Y K F + since, otherwise, the fact that K is a key would imply Y K F + . Since K Y F + (because K is a key) this means that A depends transitively on K , which contradicts our initial assumption. If we drop the condition that requires the attributes to be nonprime we obtain a characterization of schemas in Boyce-Codd normal form. Theorem 8.2.17 The schema with functional dependencies S = (H, F ) is in the Boyce-Codd normal form if and only if for every key K of S and every attribute A, A depends directly on K . Proof. The argument is straightforward, and it is left to the reader.
324
CHAPTER 8. NORMALIZATION
Observe that the normal forms considered in this section tend to replace arbitrary functional dependencies with functional dependencies involving candidate keys. An advantage of this modication of table schemas is that current database management systems have facilities that allow the user to state and enforce the fact that a set of attributes is a candidate key for a table. These facilities are discussed in Section 6.4.
8.3
Normalization
Normalization is the process of modifying the design of a database such that the resulting design satises one of the normal forms we have discussed. There are two basic way to approach normalization: normalization by synthesis, and normalization by decomposition. Normalization by synthesis starts from a table schema S = (H, F ) and generates an equivalent set of functional dependencies G that satises certain conditions. Then, starting from G, table schemas are synthesized for each functional dependency in a manner that guarantees that each of the resulting schemas will satisfy the desired normal form. Normalization by decomposition is the more common method. Schemas are successively fragmented into smaller pieces by eliminating those functional dependencies that violate the targeted normal form, using Corollary 7.2.23. Ideally, the smaller schemas satisfy the following goals: 1. the decomposition is lossless, 2. the resulting schemas satisfy the desired normal form, and 3. the constraints satised by the smaller schemas obtained by decomposition are equivalent to the constraints satised by the initial schema. Denition 8.3.1 Let S = (H, F ) be a schema with functional dependencies, and let L be a subset of H . The projection of F on L is the set F [L] given by F [L] = {X Y | XY L, X Y F + }. The projection of the schema S is the schema S[L] = (L, F [L]). We stress that F [L] consists not merely of the functional dependencies of F that use attributes of L; instead, it includes every functional dependency in F + that has this property. Note that it is necessary to compute F + before taking the projection, as it can be seen from the following example. Example 8.3.2 Let S = (ABC, F ) be the relational schema introduced in Example 8.2.4, where F = {A B, B C }. Its projection on AC ,
8.3. NORMALIZATION
325
F [AC ], contains the functional dependency A C because the single nontrivial functional dependency from {A B, B C }+ whose set of attributes is included in AC is A C . Let H = (H1 , . . . , Hn ) be a sequence of subsets of H such that H = H1 Hn . The set F [H1 ] F [Hn ] is denoted by F [H]. It is easy to see that F [H] (F [H])+ F + . In some instances, these inclusions can be strict. Example 8.3.3 Let H = ABCD, H = (ACD, BC ), and F = {AB D, B C, C B }. Since clF (AC ) = ABCD it follows that AC D F + , so AC D F [ACD]. It is clear that AB D does not belong to F [H] = F [ACD] F [BC ]; however, we have AB D (F [H])+ because we have B C F [BC ], which implies AB AC (F [H])+ by Raug . Since AC D belongs to F [ACD], using Rtrans , we obtain AB D (F [H])+ . This proves the strict inclusion F [H] (F [H])+ . Denition 8.3.4 Let H = (H1 , . . . , Hn ) be a sequence of subsets of H such that 1in Hi = H . The mapping projH preserves the set of functional dependencies F if (F [H])+ = F + . To determine if H preserves a set of functional dependencies F , we need to verify that for every functional dependency X Y F we have X Y (F [H])+ , or equivalently, that Y clF [H] (X ). In principle, this requires the computation of F + to determine the projections F [Hi ]. However, it is possible to avoid this expensive computation through the use of an algorithm introduced by Beeri and Honeyman [BH81]. Algorithm 8.3.5 Algorithm for Computing clF [H] (X ) Input: A nite set of attributes H , a set F of functional dependencies over H , and a sequence of subsets H = (H1 , . . . , Hn ) of H such that Hi = H.
1in
Output: The closure clF [H] (X ) of the set X . Method: Construct an increasing sequence of subsets of H : X0 Xk dened by
If Xk+1 = Xk , then stop; we have clF [H] (X ) = Xk . Otherwise, continue with the next value of k . Proof of Correctness: The algorithm stops because X0 X1 X2 and each set Xk is a subset of the nite set H . We claim that if the algorithm stops with Xk , then Xk clF [H] (X ). To justify this claim we prove by induction on j that Xj clF [H] (X ). For j = 0, X0 = X , and we have X clF [H] (X ). Suppose that Xj clF [H] (X ) and that A Xj +1 Xj . Since Xj clF [H] (X ) we have X Xj F + . On the other hand, there is a set Hi such that A (clF (Xj Hi ) Hi ). Therefore, the functional dependency Xj Hi A belongs to F [Hi ], so it belongs to F [H]. Since X Xj Hi and Xj Hi A belong to F [H] it follows that A clF [H] (X ). This justies our claim. To prove the converse inclusion, clF [H] (X ) Xk , where Xk is the set on which the algorithm stops, we need to show that if A is an attribute in clF [H] (X ), then A is included in some set Xj . Since every Xj Xk this would imply clF [H] (X ) Xk . Let X0 , . . . , X , . . . be the sequence CSF [H] (X ) constructed by Algo rithm 7.2.24. If A clF [H] (X ), there exists a set X such that A X and A Xp , for p < . We prove that there exists Xj such that A Xj by induction on . If = 0, then A X0 = X = X0 . Suppose that A X . This means that there exists a set Hi and a functional dependency U V F [Hi ] such that U X 1 , A V X . For every attribute B of U , by the inductive hypothesis, there exists a set XqB such that B XqB . Therefore, U Xm , where m = max{qB | B U }. Consequently, during the outer loop of the algorithm that begins with Xm the attribute A is added to Xm , and we have A Xm+1 . Example 8.3.6 Let H = ABCD, H = (ACD, BC ), and F = {AB D, B C, C B }, as in Example 8.3.3. H preserves F if D C B clF [H] (AB ) clF [H] (B ) clF [H] (C )
8.3. NORMALIZATION execute the steps shown below: X0 = AB X1 = ABC X2 = ABCD clF (AB ACD) ACD = A clF (AB BC ) BC = BC clF (ABC ACD) ACD = ACD clF (ABC BC ) BC = BC
327
Therefore, D clF [H] (AB ). Similar computations give clF [H] (B ) = BC , and clF [H] (C ) = BC . This allows us to conclude that F is preserved under the decomposition H. We give a couple of examples of normalization by decomposition before we give a decomposition algorithm. The rst is completely straightforward; the second, though, shows one of the diculties that can arise from such a decomposition: the resultant tables can no longer support the original constraints. Example 8.3.7 Let S0 = (ABC, F0 ) be the relational schema introduced in Example 8.2.3; S0 is not in 2NF. However, if H0 = (AB, AC ), the schemas S0 [AB ] = (AB, {A B }) and S0 [AC ] = (AC, ) are both in BCNF. Further, by Theorem 6.3.13, the decomposition H0 is lossless for any table of the schema S0 and, also, preserves the set of functional dependencies F0 . Example 8.3.8 Consider the schema S1 = (ABC, {AB C, C B }) dened in Example 8.2.5. We show that any attempt to decompose S1 in order to achieve BCNF fails to preserve the functional dependencies. Since there are two functional dependencies, there are two possible information lossless decompositions: H1 = (BC, AC ), which results from the functional dependency C B , and H2 = (ABC, ), which follows from the functional dependency AB C . Since the second decomposition is trivial we discuss only H1 . The schemas S[BC ] and S[AC ] obtained using the decomposition H1 have two attributes, so they are in BCNF. However, only the rst two goals of normalization are achieved; the third, preservation of constraints, fails. Indeed, we have F1 [BC ] = {C B } and F1 [AC ] = ; however, AB C does not belong to (F1 [BC ] F1 [AC ])+ . Thus, if we replace a table SAT (S1 ) by its projections on BC and AC , we lose the capability of directly enforcing the functional dependency AB C simply because the attributes that belong to it are separated into two distinct schemas. We can still choose to enforce this functional dependency by checking for possible violations when we insert tuples in the tables; this, however, is beyond the capabilities of SQL and is typically accomplished using SQL embedded in a programming language.
328
CHAPTER 8. NORMALIZATION
Algorithm 8.3.9 Normalization by Decomposition Input: A nite set of attributes H , a relational schema S = (H, F ), and a normal form (BCNF, 3NF, or 2NF). We refer to this normal form as the prescribed normal form. Output: A lossless decomposition H = (H0 , . . . , Hn ) such that S [Hi ] is in the prescribed normal form for 0 i n. Method: If the input schema is in the prescribed normal form, then halt; we have H = (H ). Otherwise, nondeterministically select a functional dependency X Y in F that violates the prescribed normal form. Decompose S into S[XY ] and S[XZ ], where Z = H XY . Apply the method again to S[XY ] and S[XZ ]. Proof of Correctness: The algorithm must stop because each the schemas S[XY ] and S[XZ ] contains fewer attributes than S. Since every schema that has two attributes is in BCNF, if the decomposition does not achieve the prescribed normal form before we obtain schemas having two attributes or before the projection of the set of functional dependencies contains only trivial functional dependencies, it will have achieved it when one of these two alternatives eventually occurs. While Algorithm 8.3.9 will generate a lossless decomposition, we cannot guarantee that the decomposition will preserve the functional dependencies. Example 8.3.8 shows that, in certain cases, no such decomposition exists. In other cases, depending on the choice of the functional dependencies used in decomposition, we may obtain some decompositions that preserve functional dependencies and others that do not. Example 8.3.10 Let S = (H, F ) be the table schema introduced in Example 7.3.7. We recall that H = stno cno empno sem year grade, and that F consists of the functional dependencies: cno sem year empno stno cno sem year grade The functional dependency cno sem year empno violates the 2NF requirements because cno sem year is not a candidate key, empno is not a prime attribute and cno sem year is included in a key. Therefore, we can consider the decomposition H = (cno sem year empno, cno sem year stno grade). The resulting decomposition is in BCNF. Moreover, it preserves the functional dependencies of F because cno sem year empno belongs to F [cno sem year empno] and stno cno sem year grade belongs to the set F [stno cno sem year grade].
8.3. NORMALIZATION
329
Example 8.3.11 Consider the entity/relationship design of the database of the town library introduced in Example 1.2.8. The representation of the entity set BOOKS in the relational model is a table that belongs to the schema S dened by S = (isbn invno title authors publ place year, F ), where F consists of the following functional dependencies: title authors publ place year isbn isbn title authors publ place year invno isbn The invno represents the inventory number. If the library may have several copies of a book which we assume in this example then invno is the unique key of this schema. S is in second normal form; indeed, if we list the nontrivial functional dependencies X A of F + , we have: invno isbn invno title invno authors invno publ invno year invno place isbn title isbn authors isbn publ isbn place isbn year title authors publ place year isbn The left member of each such functional dependency is either the key invno or is not included in any key of the schema, so the schema is in second normal form. However, the schema is not in third normal form; for example, in the functional dependency title authors publ place year isbn, title authors year place publ is not a candidate key, and isbn is not a prime attribute. This schema can be easily normalized by decomposition using this functional dependency. Namely, replacing S with S[titleauthorspublplaceyearisbn] and S[titleauthorspublplaceyearinvno] yields two schemas that are both in BCNF. The basis for normalization by synthesis is Theorem 7.4.16, which proves that a schema with functional dependencies S = (H, F ) has a lossless decomposition based on a key K and a canonical form for F . The next theorem extends this result and claries its signicance. Theorem 8.3.12 Let S = (H, F ) be a table schema with functional dependencies, and let G be a canonical form of F . If K is a key of S, and G = {X1 A1 , . . . , Xn An }, then H = (X1 A1 , . . . , Xn An , K ) is a lossless decomposition of S that preserves the functional dependencies of F . Further, the schemas S[X1 A1 ], . . . , S[Xn An ], S[K ] are all in third normal form.
330
CHAPTER 8. NORMALIZATION
Proof. Theorem 7.4.16 shows that H is lossless decomposition of S. Since G is a cover for F , the functional dependencies are obviously preserved. To verify that the schemas S[Xi Ai ] are in third normal form (for 1 i n), consider a functional dependency Y B F [Xi Ai ]. Note that Xi is a key for S[Xi Ai ] because Xi Ai F [Xi Ai ], and for no subset Z of Xi do we have Z Ai F [Xi Ai ]; otherwise, the fact that G is a canonical form of F would be contradicted. By Theorem 8.2.16 it suces to show that every nonprime attribute of this schema depends directly on Xi . The only possible nonprime attribute of S [Xi Ai ] is Ai . If Ai does not depend directly on Xi there is a subset Y of Xi Ai such that Xi Y F [Xi Ai ], Y Xi F [Xi Ai ] and Y Ai is thus a nontrivial functional dependency in F [Xi Ai ]. The nontriviality of Y Ai and the fact that Y X F [Xi Ai ] imply Y X . This, in turn, contradicts that G is a canonical form of F . Note that S [K ] is also in 3NF. Example 8.3.13 Let S = (A1 . . . A6 , F ) be the schema introduced in Example 7.4.17. Recall that A1 A2 is the only key for F. A canonical form of F is the set G of functional dependencies given by G = {A1 A3 , A2 A4 , A1 A4 A5 , A2 A3 A6 }. From this we obtain the schemas S1 S2 S3 S4 S5 = = = = = (A1 A2 , ) (A1 A3 , {A1 A3 }) (A2 A4 , {A2 A4 }) (A1 A4 A5 , {A1 A4 A5 }) (A2 A3 A6 , {A2 A3 A6 })
that constitute a 3NF decomposition of S. It is not dicult to see that all schemas Si are in BCNF.
8.4
Exercises
1. Prove that if S = (H, F ) is a table schema that has a unique key, then S is in BCNF if and only if it is in 3NF. Solution. If S is in BCNF, then clearly, it is in 3NF. Consider a schema S that is in 3NF, and let X A be a nontrivial functional dependency in F + . In this case X is a candidate key or A is a prime attribute. Suppose that X is not a candidate key, and let K be the
8.4. EXERCISES
331
2. 3.
4.
5.
unique key of S. Since A is prime, A K and, therefore, (K A) X is a candidate key. This implies K (K A) X , which means that A X , thereby contradicting the non-triviality of X A. Prove that if S = (H, F ) is a table schema such that every key of S consists of one attribute, then S is in BCNF if and only if it is in 3NF. Let S = (H, F ) be a table schema. An attribute A F is abnormal (cf. [JF82]) if there exists a subset X of H that is not a candidate key of S such that A X and X A F + . Clearly, S is in BCNF if it has no abnormal attributes; also, S is in 3NF if every abnormal attribute is prime. Prove that an attribute A is abnormal if and only if there exists a functional dependency X Y F + such that X is not a candidate key and A Y X . Let S = (H, F ) be a table schema. Prove that S is in BCNF if and only if every subset of H is a candidate key or is closed (that is, clF (X ) = X ). Let F = {Xi Ai | 1 i n} be a set of nontrivial functional dependencies such that Xi H and Ai H for 1 i n. (a) If S = (H, F ) is not in BCNF, then there exists a functional dependency X A in F (rather than in F + ) such that X is not a candidate key. (b) If S = (H, F ) is not in 3NF, then there exists a functional dependency X A in F (rather than in F + ) such that X is not a candidate key and A is not a prime attribute. Solution. To prove the rst statement let S = (H, F ) be a table schema that is not in BCNF, and assume that for every functional dependency X A F , X is a candidate key. This means that for any such X we have clF (X ) = H . If Y A F + is a nontrivial functional dependency that violates BCNF, then clF (Y ) = H . Note that no set Xi that is a left member of a functional dependency of F may be included in Y because this would imply clF (Y ) = H . Therefore, clF (Y ) = Y and this contradicts the nontriviality of X A (because it implies A clF (Y ). We conclude that F must contain a functional dependency that violates BCNF. To prove the second part, let S = (H, F ) be a table schema that is not in 3NF, and assume that for every functional dependency X A F , X is a candidate key or A is a prime attribute. Suppose that Y B is a nontrivial functional dependency in F + such that Y is not a candidate key and B is not a prime attribute. We have B clF (Y ) Y , and clF (Y ) H . By Exercise 19, clF (Y ) Y {A | X A F }. This implies the existence of a functional
332
CHAPTER 8. NORMALIZATION
dependency X B F such that X is a candidate key and X clF (Y ). Thus, clF (X ) clF (clF (Y )) = clF (Y ). Since X is a candidate key we have clF (X ) = H , so clF (Y ) = H , which means that Y is a candidate key. This contradiction shows that if S is not in 3NF, F itself must violate the 3NF. 6. Consider the table schema S = (ABCD, F ), where F is the set of functional dependencies F = {A B, BC D}. (a) Find all keys of this schema. (b) Suppose that a table = (T, ABCD, ) of this schema is decomposed into its projections on ABC and BCD. Will this decomposition be lossless or not? Jusify your answer. (c) Determine whether S is in BCNF, 3NF, or 2NF. 7. Let S = (ABCD, F ) be a table schema, where F = {AB C, BD A}, and let H = (ABC, CD). (a) Find all keys and prime attributes of S. (b) Determine whether S is in BCNF, 3NF, or 2NF. (c) Is H a lossless decomposition of S? Identify the most restrictive normal forms to which S[ABC ] and S[CD] belong. (d) Does H preserve the set of functional dependencies F ?
8.5
Bibliographical Comments
The interest in normal forms started with Codds seminal work (see [Cod72a]) where the rst three normal forms and the decomposition approach to normalization were introduced. The Boyce-Codd normal form was introduced in [Cod74]. Normalization by synthesis was developed by P. Bernstein [Ber76]. Important work in the area of normalization are [Ris77], [ZM81] and [Zan82]).