Learning Bayesian Networks 1st Edition by Richard Neapolitan ISBN 0130125342 978-0130125347instant Download
Learning Bayesian Networks 1st Edition by Richard Neapolitan ISBN 0130125342 978-0130125347instant Download
https://ebookball.com/product/learning-bayesian-networks-1st-
edition-by-richard-neapolitan-
isbn-0130125342-978-0130125347-19568/
https://ebookball.com/product/approxamation-methods-for-
efficient-learning-of-bayesian-networks-1st-edition-by-carsten-
riggelsen-isbn-1586038214-9781586038212-19724/
https://ebookball.com/product/foundations-of-algorithms-4th-
edition-by-richard-neapolitan-0763782505-978-0763782504-16434/
https://ebookball.com/product/bayesian-network-structure-
ensemble-learning-1st-edition-by-feng-liu-fengzhan-tian-qiliang-
zhu-9783540738701-10416/
https://ebookball.com/product/learning-to-teach-10th-edition-by-
richard-arends-0078110300-978-0078110306-17444/
A Novel Greedy Bayesian Network Structure Learning Algorithm for
Limited Data 1st Edition by Feng Liu, Fengzhan Tian, Qiliang Zhu
9783540738701
https://ebookball.com/product/a-novel-greedy-bayesian-network-
structure-learning-algorithm-for-limited-data-1st-edition-by-
feng-liu-fengzhan-tian-qiliang-zhu-9783540738701-10414/
https://ebookball.com/product/pricing-communication-networks-
economics-technology-and-modelling-1st-edition-by-costas-
courcoubetis-richard-weber-isbn-9780470864241-0470864249-19372/
https://ebookball.com/product/bayesian-biostatistics-and-
diagnostic-medicine-1st-edition-by-lyle-broemeling-
isbn-1584887680-9781584887683-1900/
https://ebookball.com/product/creating-learning-materials-for-
open-and-distance-learning-a-handbook-for-authors-and-
instructional-designers-1st-edition-by-freeman-richard-
isbn-1894975235-11724/
https://ebookball.com/product/ebook-pdf-learning-to-teach-10th-
edition-by-richard-arends-0078110300aeurz-978-0078110306-full-
chapters-23248/
Learning Bayesian Networks
Richard E. Neapolitan
Northeastern Illinois University
Chicago, Illinois
Preface ix
I Basics 1
1 Introduction to Bayesian Networks 3
1.1 Basics of Probability Theory . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Probability Functions and Spaces . . . . . . . . . . . . . . 6
1.1.2 Conditional Probability and Independence . . . . . . . . . 9
1.1.3 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.4 Random Variables and Joint Probability Distributions . . 13
1.2 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.1 Random Variables and Probabilities in Bayesian Applica-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.2 A Definition of Random Variables and Joint Probability
Distributions for Bayesian Inference . . . . . . . . . . . . 24
1.2.3 A Classical Example of Bayesian Inference . . . . . . . . . 27
1.3 Large Instances / Bayesian Networks . . . . . . . . . . . . . . . . 29
1.3.1 The Difficulties Inherent in Large Instances . . . . . . . . 29
1.3.2 The Markov Condition . . . . . . . . . . . . . . . . . . . . 31
1.3.3 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . 40
1.3.4 A Large Bayesian Network . . . . . . . . . . . . . . . . . 43
1.4 Creating Bayesian Networks Using Causal Edges . . . . . . . . . 43
1.4.1 Ascertaining Causal Influences Using Manipulation . . . . 44
1.4.2 Causation and the Markov Condition . . . . . . . . . . . 51
iii
iv CONTENTS
II Inference 121
3 Inference: Discrete Variables 123
3.1 Examples of Inference . . . . . . . . . . . . . . . . . . . . . . . . 124
3.2 Pearl’s Message-Passing Algorithm . . . . . . . . . . . . . . . . . 126
3.2.1 Inference in Trees . . . . . . . . . . . . . . . . . . . . . . . 127
3.2.2 Inference in Singly-Connected Networks . . . . . . . . . . 142
3.2.3 Inference in Multiply-Connected Networks . . . . . . . . . 153
3.2.4 Complexity of the Algorithm . . . . . . . . . . . . . . . . 155
3.3 The Noisy OR-Gate Model . . . . . . . . . . . . . . . . . . . . . 156
3.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.3.2 Doing Inference With the Model . . . . . . . . . . . . . . 160
3.3.3 Further Models . . . . . . . . . . . . . . . . . . . . . . . . 161
3.4 Other Algorithms that Employ the DAG . . . . . . . . . . . . . . 161
3.5 The SPI Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.5.1 The Optimal Factoring Problem . . . . . . . . . . . . . . 163
3.5.2 Application to Probabilistic Inference . . . . . . . . . . . 168
3.6 Complexity of Inference . . . . . . . . . . . . . . . . . . . . . . . 170
3.7 Relationship to Human Reasoning . . . . . . . . . . . . . . . . . 171
3.7.1 The Causal Network Model . . . . . . . . . . . . . . . . . 171
3.7.2 Studies Testing the Causal Network Model . . . . . . . . 173
IV Applications 647
12 Applications 649
12.1 Applications Based on Bayesian Networks . . . . . . . . . . . . . 649
12.2 Beyond Bayesian networks . . . . . . . . . . . . . . . . . . . . . . 655
Bibliography 657
Index 686
ix
x PREFACE
Basics
1
Chapter 1
Introduction to Bayesian
Networks
Consider the situation where one feature of an entity has a direct influence on
another feature of that entity. For example, the presence or absence of a disease
in a human being has a direct influence on whether a test for that disease turns
out positive or negative. For decades, Bayes’ theorem has been used to perform
probabilistic inference in this situation. In the current example, we would use
that theorem to compute the conditional probability of an individual having a
disease when a test for the disease came back positive. Consider next the situ-
ation where several features are related through inference chains. For example,
whether or not an individual has a history of smoking has a direct influence
both on whether or not that individual has bronchitis and on whether or not
that individual has lung cancer. In turn, the presence or absence of each of these
diseases has a direct influence on whether or not the individual experiences fa-
tigue. Also, the presence or absence of lung cancer has a direct influence on
whether or not a chest X-ray is positive. In this situation, we would want to do
probabilistic inference involving features that are not related via a direct influ-
ence. We would want to determine, for example, the conditional probabilities
both of bronchitis and of lung cancer when it is known an individual smokes, is
fatigued, and has a positive chest X-ray. Yet bronchitis has no direct influence
(indeed no influence at all) on whether a chest X-ray is positive. Therefore,
these conditional probabilities cannot be computed using a simple application
of Bayes’ theorem. There is a straightforward algorithm for computing them,
but the probability values it requires are not ordinarily accessible; furthermore,
the algorithm has exponential space and time complexity.
Bayesian networks were developed to address these difficulties. By exploiting
conditional independencies entailed by influence chains, we are able to represent
a large instance in a Bayesian network using little space, and we are often able
to perform probabilistic inference among the features in an acceptable amount
of time. In addition, the graphical nature of Bayesian networks gives us a much
3
4 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
P(h1) = .2
F C
The concept of probability has a rich and diversified history that includes many
different philosophical approaches. Notable among these approaches include the
notions of probability as a ratio, as a relative frequency, and as a degree of belief.
Next we review the probability calculus and, via examples, illustrate these three
approaches and how they are related.
6 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
1. 0 ≤ P ({ei }) ≤ 1 for 1 ≤ i ≤ n.
2. P ({e1 }) + P ({e2 }) + . . . + P ({en }) = 1.
3. For each event E = {ei1 , ei2 , . . . eik } that is not an elementary event,
Example 1.1 Let the experiment be drawing the top card from a deck of 52
cards. Then Ω contains the faces of the 52 cards, and using the principle of
indifference, we assign P ({e}) = 1/52 for each e ∈ Ω. Therefore, if we let kh
and ks stand for the king of hearts and king of spades respectively, P ({kh}) =
1/52, P ({ks}) = 1/52, and P ({kh, ks}) = P ({kh}) + P ({ks}) = 1/26.
Example 1.2 Suppose we toss a thumbtack and consider as outcomes the two
ways it could land. It could land on its head, which we will call ‘heads’, or
it could land with the edge of the head and the end of the point touching the
ground, which we will call ‘tails’. Due to the lack of symmetry in a thumbtack,
we would not assign a probability of 1/2 to each of these events. So how can
we compute the probability? This experiment can be repeated many times. In
1919 Richard von Mises developed the relative frequency approach to probability
which says that, if an experiment can be repeated many times, the probability of
any one of the outcomes is the limit, as the number of trials approach infinity,
of the ratio of the number of occurrences of that outcome to the total number of
trials. For example, if m is the number of trials,
#heads
P ({heads}) = lim .
m→∞ m
So, if we tossed the thumbtack 10, 000 times and it landed heads 3373 times, we
would estimate the probability of heads to be about .3373.
Probabilities obtained using the approach in the previous example are called
relative frequencies. According to this approach, the probability obtained is
not a property of any one of the trials, but rather it is a property of the entire
sequence of trials. How are these probabilities related to ratios? Intuitively,
we would expect if, for example, we repeatedly shuffled a deck of cards and
drew the top card, the ace of spades would come up about one out of every 52
times. In 1946 J. E. Kerrich conducted many such experiments using games of
chance in which the principle of indifference seemed to apply (e.g. drawing a
card from a deck). His results indicated that the relative frequency does appear
to approach a limit and that limit is the ratio.
8 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
patients with these exact same symptoms, to the actual relative frequency with
which they have lung cancer.
It is straightforward to prove the following theorem concerning probability
spaces.
Theorem 1.1 Let (Ω, P ) be a probability space. Then
1. P (Ω) = 1.
2. 0 ≤ P (E) ≤ 1 for every E ⊆ Ω.
3. For E and F ⊆ Ω such that E ∩ F = ∅,
P (E ∪ F) = P (E) + P (F).
Example 1.5 Suppose we draw the top card from a deck of cards. Denote by
Queen the set containing the 4 queens and by King the set containing the 4 kings.
Then
because Queen ∩ King = ∅. Next denote by Spade the set containing the 13
spades. The sets Queen and Spade are not disjoint; so their probabilities are not
additive. However, it is not hard to prove that, in general,
P (E ∪ F) = P (E) + P (F) − P (E ∩ F).
So
P (Queen ∪ Spade) = P (Queen) + P (Spade) − P (Queen ∩ Spade)
1 1 1 4
= + − = .
13 4 52 13
The initial intuition for conditional probability comes from considering prob-
abilities that are ratios. In the case of ratios, P (E|F), as defined above, is the
fraction of items in F that are also in E. We show this as follows. Let n be the
number of items in the sample space, nF be the number of items in F, and nEF
be the number of items in E ∩ F. Then
P (E ∩ F) nEF /n nEF
= = ,
P (F) nF /n nF
which is the fraction of items in F that are also in E. As far as meaning, P (E|F)
means the probability of E occurring given that we know F has occurred.
Example 1.6 Again consider drawing the top card from a deck of cards, let
Queen be the set of the 4 queens, RoyalCard be the set of the 12 royal cards, and
Spade be the set of the 13 spades. Then
1
P (Queen) =
13
P (Queen ∩ RoyalCard) 1/13 1
P (Queen|RoyalCard) = = =
P (RoyalCard) 3/13 3
P (Queen ∩ Spade) 1/52 1
P (Queen|Spade) = = = .
P (Spade) 1/4 13
Notice in the previous example that P (Queen|Spade) = P (Queen). This
means that finding out the card is a spade does not make it more or less probable
that it is a queen. That is, the knowledge of whether it is a spade is irrelevant
to whether it is a queen. We say that the two events are independent in this
case, which is formalized in the following definition.
Definition 1.3 Two events E and F are independent if one of the following
hold:
1. P (E|F) = P (E) and P (E) 6= 0, P (F) 6= 0.
2. P (E) = 0 or P (F) = 0.
Notice that the definition states that the two events are independent even
though it is based on the conditional probability of E given F. The reason is
that independence is symmetric. That is, if P (E) 6= 0 and P (F) 6= 0, then
P (E|F) = P (E) if and only if P (F|E) = P (F). It is straightforward to prove that
E and F are independent if and only if P (E ∩ F) = P (E)P (F).
The following example illustrates an extension of the notion of independence.
Example 1.7 Let E = {kh, ks, qh}, F = {kh, kc, qh}, G = {kh, ks, kc, kd},
where kh means the king of hearts, ks means the king of spades, etc. Then
3
P (E) =
52
2
P (E|F) =
3
1.1. BASICS OF PROBABILITY THEORY 11
2 1
P (E|G) = =
4 2
1
P (E|F ∩ G) = .
2
So E and F are not independent, but they are independent once we condition on
G.
2. P (E|G) = 0 or P (F|G) = 0.
Example 1.8 Let Ω be the set of all objects in Figure 1.2. Suppose we assign
a probability of 1/13 to each object, and let Black be the set of all black objects,
White be the set of all white objects, Square be the set of all square objects, and
One be the set of all objects containing a ‘1’. We then have
5
P (One) =
13
3
P (One|Square) =
8
3 1
P (One|Black) = =
9 3
2 1
P (One|Square ∩ Black) = =
6 3
2 1
P (One|White) = =
4 2
1
P (One|Square ∩ White) = .
2
So One and Square are not independent, but they are conditionally independent
given Black and given White.
1 1 2 2 2 2 1 2 2
1 2 1 2
Figure 1.2: Containing a ‘1’ and being a square are not independent, but they
are conditionally independent given the object is black and given it is white.
Theorem 1.2 (Bayes) Given two events E and F such that P (E) 6= 0 and
P (F) 6= 0, we have
P (F|E)P (E)
P (E|F) = . (1.3)
P (F)
Furthermore, given n mutually exclusive and exhaustive events E1 , E2 , . . . En
such that P (Ei ) 6= 0 for all i, we have for 1 ≤ i ≤ n,
P (F|Ei )P (Ei )
P (Ei |F) = . (1.4)
P (F|E1 )P (E1 ) + P (F|E2 )P (E2 ) + · · · P (F|En )P (En )
1.1. BASICS OF PROBABILITY THEORY 13
Proof. To obtain Equality 1.3, we first use the definition of conditional proba-
bility as follows:
P (E ∩ F) P (F ∩ E)
P (E|F) = and P (F|E) = .
P (F) P (E)
Next we multiply each of these equalities by the denominator on its right side to
show that
P (E|F)P (F) = P (F|E)P (E)
because they both equal P (E ∩ F). Finally, we divide this last equality by P (F)
to obtain our result.
To obtain Equality 1.4, we place the expression for F, obtained using the rule
of total probability (Equality 1.2), in the denominator of Equality 1.3.
Both of the formulas in the preceding theorem are called Bayes’ theorem
because they were originally developed by Thomas Bayes (published in 1763).
The first enables us to compute P (E|F) if we know P (F|E), P (E), and P (F), while
the second enables us to compute P (Ei |F) if we know P (F|Ej ) and P (Ej ) for
1 ≤ j ≤ n. Computing a conditional probability using either of these formulas
is called Bayesian inference. An example of Bayesian inference follows:
Example 1.9 Let Ω be the set of all objects in Figure 1.2, and assign each
object a probability of 1/13. Let One be the set of all objects containing a 1, Two
be the set of all objects containing a 2, and Black be the set of all black objects.
Then according to Bayes’ Theorem,
P (Black|One)P (One)
P (One|Black) =
P (Black|One)P (One) + P (Black|Two)P (Two)
( 35 )( 13
5
) 1
= 3 5 6 8 = ,
( 5 )( 13 ) + ( 8 )( 13 ) 3
That is, a random variable assigns a unique value to each element (outcome)
in the sample space. The set of values random variable X can assume is called
the space of X. A random variable is said to be discrete if its space is finite
or countable. In general, we develop our theory assuming the random variables
are discrete. Examples follow.
Example 1.10 Let Ω contain all outcomes of a throw of a pair of six-sided
dice, and let P assign 1/36 to each outcome. Then Ω is the following set of
ordered pairs:
Ω = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), . . . (6, 5), (6, 6)}.
Let the random variable X assign the sum of each ordered pair to that pair, and
let the random variable Y assign ‘odd’ to each pair of odd numbers and ‘even’
to a pair if at least one number in that pair is an even number. The following
table shows some of the values of X and Y :
e X(e) Y (e)
(1, 1) 2 odd
(1, 2) 3 even
··· ··· ···
(2, 1) 3 even
··· ··· ···
(6, 6) 12 even
The space of X is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, and that of Y is {odd, even}.
For a random variable X, we use X = x to denote the set of all elements
e ∈ Ω that X maps to the value of x. That is,
X =x represents the event {e such that X(e) = x}.
Note the difference between X and x. Small x denotes any element in the space
of X, while X is a function.
Example 1.11 Let Ω , P , and X be as in Example 1.10. Then
X=3 represents the event {(1, 2), (2, 1)} and
1
P (X = 3) = .
18
It is not hard to see that a random variable induces a probability function
on its space. That is, if we define PX ({x}) ≡ P (X = x), then PX is such a
probability function.
Example 1.12 Let Ω contain all outcomes of a throw of a single die, let P
assign 1/6 to each outcome, and let Z assign ‘even’ to each even number and
‘odd’ to each odd number. Then
1
PZ ({even}) = P (Z = even) = P ({2, 4, 6}) =
2
1.1. BASICS OF PROBABILITY THEORY 15
1
PZ ({odd}) = P (Z = odd) = P ({1, 3, 5}) = .
2
We rarely refer to PX ({x}). Rather we only reference the original probability
function P , and we call P (X = x) the probability distribution of the random
variable X. For brevity, we often just say ‘distribution’ instead of ‘probability
distribution’. Furthermore, we often use x alone to represent the event X = x,
and so we write P (x) instead of P (X = x) . We refer to P (x) as ‘the probability
of x’.
Let Ω, P , and X be as in Example 1.10. Then if x = 3,
1
P (x) = P (X = x) = .
18
Given two random variables X and Y , defined on the same sample space Ω,
we use X = x, Y = y to denote the set of all elements e ∈ Ω that are mapped
both by X to x and by Y to y. That is,
X = x, Y = y represents the event
{e such that X(e) = x} ∩ {e such that Y (e) = y}.
Example 1.13 Let Ω, P , X, and Y be as in Example 1.10. Then
X = 4, Y = odd represents the event {(1, 3), (3, 1)}, and
P (X = 4, Y = odd) = 1/18.
Clearly, two random variables induce a probability function on the Cartesian
product of their spaces. As is the case for a single random variable, we rarely
refer to this probability function. Rather we reference the original probability
function. That is, we refer to P (X = x, Y = y), and we call this the joint
probability distribution of X and Y . If A = {X, Y }, we also call this the
joint probability distribution of A. Furthermore, we often just say ‘joint
distribution’ or ‘probability distribution’.
For brevity, we often use x, y to represent the event X = x, Y = y, and
so we write P (x, y) instead of P (X = x, Y = y). This concept extends in a
straightforward way to three or more random variables. For example, P (X =
x, Y = y, Z = z) is the joint probability distribution function of the variables
X, Y , and Z, and we often write P (x, y, z).
Example 1.14 Let Ω, P , X, and Y be as in Example 1.10. Then if x = 4 and
y = odd,
P (x, y) = P (X = x, Y = y) = 1/18.
If, for example, we let A = {X, Y } and a = {x, y}, we use
A=a to represent X = x, Y = y,
and we often write P (a) instead of P (A = a). The same notation extends to
the representation of three or more random variables. For consistency, we set
P (∅ = ∅) = 1, where ∅ is the empty set of random variables. Note that if ∅
is the empty set of events, P (∅) = 0.
16 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
P (A = a) = P (X = x, Y = y) = 1/18.
This notation entails that if we have, for example, two sets of random vari-
ables A = {X, Y } and B = {Z, W }, then
A = a, B = b represents X = x, Y = y, Z = z, W = w.
P
where y means the sum as y goes through all values of Y . The probability
distribution P (X = x) is called the marginal probability distribution of X
because it is obtained using a process similar to adding across a row or column in
a table of numbers. This concept also extends in a straightforward way to three
or more random variables. For example, if we have a joint distribution P (X =
x, Y = y, Z = z) of X, Y , and Z, the marginal distribution P (X = x, Y = y) of
X and Y is obtained by summing over all values of Z. If A = {X, Y }, we also
call this the marginal probability distribution of A.
The following example reviews the concepts covered so far concerning ran-
dom variables:
Example 1.17 Let Ω be a set of 12 individuals, and let P assign 1/12 to each
individual. Suppose the sexes, heights, and wages of the individuals are as fol-
lows:
1.1. BASICS OF PROBABILITY THEORY 17
s h P (s, h)
female 64 1/3
female 68 1/6
female 70 0
male 64 1/6
male 68 1/6
male 70 1/6
The following table also shows the joint distribution of S and H and illustrates
that the individual distributions can be obtained by summing the joint distribu-
tion over all values of the other variable:
h 64 68 70 Distribution of S
s
female 1/3 1/6 0 1/2
male 1/6 1/6 1/6 1/2
The table that follows shows the first few values in the joint distribution of S,
H, and W . There are 18 values in all, of which many are 0.
18 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
s h w P (s, h, w)
female 64 30, 000 1/6
female 64 40, 000 1/6
female 64 50, 000 0
female 68 30, 000 1/12
··· ··· ··· ···
Definition 1.6 Suppose we have a probability space (Ω, P ), and two sets A and
B containing random variables defined on Ω. Then the sets A and B are said to
be independent if, for all values of the variables in the sets a and b, the events
A = a and B = b are independent. That is, either P (a) = 0 or P (b) = 0 or
P (a|b) = P (a).
IP (A, B),
Example 1.18 Let Ω be the set of all cards in an ordinary deck, and let P
assign 1/52 to each card. Define random variables as follows:
Then we maintain the sets {R, T } and {S} are independent. That is,
IP ({R, T }, {S}).
(Note that it we do not show brackets to denote sets in our probabilistic expres-
sion because in such an expression a set represents the members of the set. See
the discussion following Example 1.14.) The following table shows this is the
case:
1.1. BASICS OF PROBABILITY THEORY 19
Definition 1.7 Suppose we have a probability space (Ω, P ), and three sets A,
B, and C containing random variable defined on Ω. Then the sets A and B are
said to be conditionally independent given the set C if, for all values of
the variables in the sets a, b, and c, whenever P (c) 6= 0, the events A = a and
B = b are conditionally independent given the event C = c. That is, either
P (a|c) = 0 or P (b|c) = 0 or
P (a|b, c) = P (a|c).
IP (A, B|C).
Example 1.19 Let Ω be the set of all objects in Figure 1.2, and let P assign
1/13 to each object. Define random variables S (for shape), V (for value), and
C (for color) as follows:
Then we maintain that {V } and {S} are conditionally independent given {C}.
That is,
IP ({V }, {S}|{C}).
To show this, we need show for all values of v, s, and c that
P (v|s, c) = P (v|c).
The results in Example 1.8 show P (v1|s1, c1) = P (v1|c1) and P (v1|s1, c2) =
P (v1|c2). The table that follows shows the equality holds for the other values of
the variables too:
20 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
c s v P (v|s, c) P (v|c)
c1 s1 v1 2/6 = 1/3 3/9 = 1/3
c1 s1 v2 4/6 = 2/3 6/9 = 2/3
c1 s2 v1 1/3 3/9 = 1/3
c1 s2 v2 2/3 6/9 = 2/3
c2 s1 v1 1/2 2/4 = 1/2
c2 s1 v2 1/2 2/4 = 1/2
c2 s2 v1 1/2 2/4 = 1/2
c2 s2 v2 1/2 2/4 = 1/2
For the sake of brevity, we sometimes only say ‘independent’ rather than
‘conditionally independent’. Furthermore, when a set contains only one item,
we often drop the set notation and terminology. For example, in the preceding
example, we might say V and S are independent given C and write IP (V, S|C).
Finally, we have the chain rule for random variables, which says that given
n random variables X1 , X2 , . . . Xn , defined on the same sample space Ω,
when the average (over all individuals), of the individual daily average skin
contact, exceeds 6 grams of material, the clarity test is passed because the
clairvoyant can answer precisely whether the contact exceeds that. In the case
of a medical application, if we give SmokingHistory only the values yes and
no, the clarity test is not passed because we do not know whether yes means
smoking cigarettes, cigars, or something else, and we have not specified how
long smoking must have occurred for the value to be yes. On the other hand, if
we say yes means the patient has smoked one or more packs of cigarettes every
day during the past 10 years, the clarity test is passed.
After distinguishing the possible values of the random variables (i.e. their
spaces), we judge the probabilities of the random variables having their values.
However, in general we do not always determine prior probabilities; nor do we de-
termine values in a joint probability distribution of the random variables. Rather
we ascertain probabilities, concerning relationships among random variables,
that are accessible to us. For example, we might determine the prior probability
P (LungCancer = present), and the conditional probabilities P (ChestXray =
positive|LungCancer = present), P (ChestXray = positive|LungCancer =
absent), P (LungCancer = present| SmokingHistory = yes), and finally
P (LungCancer = present|SmokingHistory = no). We would obtain these
probabilities either from a physician or from data or from both. Thinking in
terms of relative frequencies, P (LungCancer = present|SmokingHistory =
yes) can be estimated by observing individuals with a smoking history, and de-
termining what fraction of these have lung cancer. A physician is used to judging
such a probability by observing patients with a smoking history. On the other
hand, one does not readily judge values in a joint probability distribution such as
P (LungCancer = present, ChestXray = positive, SmokingHistory = yes). If
this is not apparent, just think of the situation in which there are 100 or more
random variables (which there are in some applications) in the joint probability
distribution. We can obtain data and think in terms of probabilistic relation-
ships among a few random variables at a time; we do not identify the joint
probabilities of several events.
As to the nature of these probabilities, consider first the introduction of the
toxic chemical. The probabilities of the values of CarcinogenicP otential will
be based on data involving this chemical and similar ones. However, this is
certainly not a repeatable experiment like a coin toss, and therefore the prob-
abilities are not relative frequencies. They are subjective probabilities based
on a careful analysis of the situation. As to the medical application involv-
ing a set of entities, we often obtain the probabilities from estimates of rel-
ative frequencies involving entities in the set. For example, we might obtain
P (ChestXray = positive|LungCancer = present) by observing 1000 patients
with lung cancer and determining what fraction have positive chest X-rays.
However, as will be illustrated in Section 1.2.3, when we do Bayesian inference
using these probabilities, we are computing the probability of a specific individ-
ual being in some state, which means it is a subjective probability. Recall from
Section 1.1.1 that a relative frequency is not a property of any one of the trials
(patients), but rather it is a property of the entire sequence of trials. You may
1.2. BAYESIAN INFERENCE 23
feel that we are splitting hairs. Namely, you may argue the following: “This
subjective probability regarding a specific patient is obtained from a relative
frequency and therefore has the same value as it. We are simply calling it a
subjective probability rather than a relative frequency.” But even this is not
the case. Even if the probabilities used to do Bayesian inference are obtained
from frequency data, they are only estimates of the actual relative frequencies.
So they are subjective probabilities obtained from estimates of relative frequen-
cies; they are not relative frequencies. When we manipulate them using Bayes’
theorem, the resultant probability is therefore also only a subjective probability.
Once we judge the probabilities for a given application, we can often ob-
tain values in a joint probability distribution of the random variables. Theo-
rem 1.5 in Section 1.3.3 obtains a way to do this when there are many vari-
ables. Presently, we illustrate the case of two variables. Suppose we only
identify the random variables LungCancer and ChestXray, and we judge the
prior probability P (LungCancer = present), and the conditional probabili-
ties P (ChestXray = positive|LungCancer = present) and P (ChestXray =
positive|LungCancer = absent). Probabilities of values in a joint probability
distribution can be obtained from these probabilities using the rule for condi-
tional probability as follows:
Ω=
We can consider each random variable a function on this space that maps
each tuple into the value of the random variable in the tuple. For example,
LungCancer would map (present, positive) and (present, negative) each into
present. We then assign each elementary event the probability of its correspond-
ing event in the joint distribution. For example, we assign
It is not hard to show that this does yield a probability function on Ω and
that the initially assessed prior probabilities and conditional probabilities are
the probabilities they notationally represent in this probability space (This is a
special case of Theorem 1.5.).
Since random variables are actually identified first and only implicitly be-
come functions on an implicit sample space, it seems we could develop the con-
cept of a joint probability distribution without the explicit notion of a sample
space. Indeed, we do this next. Following this development, we give a theorem
showing that any such joint probability distribution is a joint probability dis-
tribution of the random variables with the variables considered as functions on
an implicit sample space. Definition 1.1 (of a probability function) and Defi-
nition 1.5 (of a random variable) can therefore be considered the fundamental
definitions for probability theory because they pertains both to applications
where sample spaces are directly identified and ones where random variables
are directly identified.
0 ≤ P (X1 = x1 , X2 = x2 , . . . Xn = xn ) ≤ 1.
2. We have
X
P (X1 = x1 , X2 = x2 , . . . Xn = xn ) = 1.
x1 ,x2,... xn
1.2. BAYESIAN INFERENCE 25
P
The notation x1 ,x2,... xn means the sum as the variables x1 , . . . xn go
through all possible values in their corresponding spaces.
Example 1.20 Let V = {X, Y }, let X and Y have spaces {x1, x2}1 and {y1, y2}
respectively, and let the following values be specified:
P (X = x1) = .2 P (Y = y1) = .3
P (X = x2) = .8 P (Y = y2) = .7.
Theorem 1.3 Let a set of random variables V be given and let a joint proba-
bility distribution of the variables in V be specified according to Definition 1.8.
Let Ω be the Cartesian product of the sets of all possible values of the random
variables. Assign probabilities to elementary events in Ω as follows:
that tuple, then the joint probability distribution of the X̂i ’s is the same as the
originally specified joint probability distribution.
Proof. The proof is left as an exercise.
Example 1.21 Suppose we directly specify a joint probability distribution of X
and Y , each with space {x1, x2} and {y1, y2} respectively, as done in Example
1.20. That is, we specify the following probabilities:
P (X = x1, Y = y1)
P (X = x1, Y = y2)
P (X = x2, Y = y1)
P (X = x2, Y = y2).
Next we let Ω = {(x1, y1), (x1, y2), (x2, y1), (x2, y2)}, and we assign
Theorem 1.3 says the joint probability distribution of these random variables is
the same as the originally specified joint probability distribution. Let’s illustrate
this:
P̂ (X̂ = x1, Ŷ = y1) = P̂ ({(x1, y1), (x1, y2)} ∩ {(x1, y1), (x2, y1)})
= P̂ ({(x1, y1)})
= P (X = x1, Y = y1).
which is the originally specified value. This result is a special case of Theorem
1.5.
Note that the specified probability values are not by necessity equal to the
probabilities they notationally represent in the marginal probability distribu-
tion. However, since we used the rule for independence to derive the joint
probability distribution from them, they are in fact equal to those values. For
example, if we had defined P (X = x1, Y = y1) = P (X = x2)P (Y = y1), this
would not be the case. Of course we would not do this. In practice, all specified
values are always the probabilities they notationally represent in the resultant
probability space (Ω, P̂ ). Since this is the case, we will no longer show carats
over P or X when referring to the probability function in this space or a random
variable on the space.
Example 1.22 Let V = {X, Y }, let X and Y have spaces {x1, x2} and {y1, y2}
respectively, and let the following values be specified:
P (Y = y1|X = x2) = .4
P (Y = y2|X = x2) = .6.
Example 1.23 Suppose Joe has a routine diagnostic chest X-ray required of
all new employees at Colonial Bank, and the X-ray comes back positive for lung
cancer. Joe then becomes certain he has lung cancer and panics. But should
he? Without knowing the accuracy of the test, Joe really has no way of knowing
how probable it is that he has lung cancer. When he discovers the test is not
absolutely conclusive, he decides to investigate its accuracy and he learns that it
has a false negative rate of .4 and a false positive rate of .02. We represent this
accuracy as follows. First we define these random variables:
P (present|positive)
P (positive|present)P (present)
=
P (positive|present)P (present) + P (positive|absent)P (absent)
(.6)(.001)
=
(.6)(.001) + (.02)(.999)
= .029.
So Joe now feels that he probability of his having lung cancer is only about .03,
and he relaxes a bit while waiting for the results of further testing.
1.3. LARGE INSTANCES / BAYESIAN NETWORKS 29
Example 1.24 Now suppose Sam is having the same diagnostic chest X-ray
as Joe. However, he is having the X-ray because he has worked in the mines
for 20 years, and his employers became concerned when they learned that about
10% of all such workers develop lung cancer after many years in the mines.
Sam also tests positive. What is the probability he has lung cancer? Based on
the information known about Sam before he took the test, we assign a prior
probability of .1 to Sam having lung cancer. Again using Bayes’ theorem, we
conclude that P (LungCancer = present|T est = positive) = .769 for Sam. Poor
Sam concludes it is quite likely that he has lung cancer.
Note that we presented this same table at the beginning of this chapter, but we
called the random variables ‘features’. We had not yet defined random variable
at that point; so we used the informal term feature. If we knew the joint
probability distribution of these five variables, we could compute the conditional
probability of an individual having bronchitis given the individual smokes, is
fatigued, and has a positive chest X-ray as follows:
P
P (b1, h1, f1, c1, l)
P (b1, h1, f 1, c1) l
P (b1|h1, f 1, c1) = = P , (1.5)
P (h1, f1, c1) P (b, h1, f 1, c1, l)
b,l
P
where b,l means the sum as b and l go through all their possible values. There
are a number of problems here. First, as noted previously, the values in the joint
probability distribution are ordinarily not readily accessible. Second, there are
an exponential number of terms in the sums in Equality 1.5. That is, there
are 22 terms in the sum in the denominator, and, if there were 100 variables
in the application, there would be 297 terms in that sum. So, in the case
of a large instance, even if we had some means for eliciting the values in the
1.3. LARGE INSTANCES / BAYESIAN NETWORKS 31
When (G, P ) satisfies the Markov condition, we say G and P satisfy the
Markov condition with each other.
If X is a root, then its parent set PAX is empty. So in this case the Markov
condition means {X} is independent of NDX . That is, IP ({X}, NDX ). It is
not hard to show that IP ({X}, NDX |PAX ) implies IP ({X}, B|PAX ) for any
B ⊆ NDX . It is left as an exercise to do this. Notice that PAX ⊆ NDX . So
we could define the Markov condition by saying that X must be conditionally
32 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
V C S
V S
(a) (b)
V S
V C S
(c) (d)
Figure 1.3: The probability distribution in Example 1.25 satisfies the Markov
condition only for the DAGs in (a), (b), and (c).
Example 1.25 Let Ω be the set of objects in Figure 1.2, and let P assign a
probability of 1/13 to each object. Let random variables V , S, and C be as
defined as in Example 1.19. That is, they are defined as follows:
B L
F C
Example 1.26 Consider the DAG G in Figure 1.4. If (G, P ) satisfied the
Markov condition for some probability distribution P , we would have the follow-
ing conditional independencies:
Recall from Section 1.3.1 that the number of terms in a joint probability
distribution is exponential in terms of the number of variables. So, in the
case of a large instance, we could not fully describe the joint distribution by
determining each of its values directly. Herein lies one of the powers of the
Markov condition. Theorem 1.4, which follows shortly, shows if (G, P ) satisfies
the Markov condition, then P equals the product of its conditional probability
distributions of all nodes given values of their parents in G, whenever these
conditional distributions exist. After proving this theorem, we discuss how this
means we often need ascertain far fewer values than if we had to determine all
values in the joint distribution directly. Before proving it, we illustrate what it
means for a joint distribution to equal the product of its conditional distributions
of all nodes given values of their parents in a DAG G. This would be the case
34 CHAPTER 1. INTRODUCTION TO BAYESIAN NETWORKS
for a joint probability distribution P of the variables in the DAG in Figure 1.4
if, for all values of f , c, b, l, and h,
whenever the conditional probabilities on the right exist. Notice that if one of
them does not exist for some combination of the values of the variables, then
P (b, l) = 0 or P (l) = 0 or P (h) = 0, which implies P (f, c, b, l, h) = 0 for that
combination of values. However, there are cases in which P (f, c, b, l, h) = 0 and
the conditional probabilities still exist. For example, this would be the case if
all the conditional probabilities on the right existed and P (f|b, l) = 0 for some
combination of values of f , b, and l. So Equality 1.6 must hold for all nonzero
values of the joint probability distribution plus some zero values.
We now give the theorem.
Theorem 1.4 If (G, P ) satisfies the Markov condition, then P is equal to the
product of its conditional distributions of all nodes given values of their parents,
whenever these conditional distributions exist.
Proof. We prove the case where P is discrete. Order the nodes so that if Y is
a descendent of Z, then Y follows Z in the ordering. Such an ordering is called
an ancestral ordering. Examples of such an ordering for the DAG in Figure
1.4 are [H, L, B, C, F ] and [H, B, L, F, C]. Let X1 , X2 , . . . Xn be the resultant
ordering. For a given set of values of x1 , x2 , . . . xn , let pai be the subset of
these values containing the values of Xi ’s parents. We need show that whenever
P (pai ) 6= 0 for 1 ≤ i ≤ n,
We show this using induction on the number of variables in the network. As-
sume, for some combination of values of the xi ’s, that P (pai ) 6= 0 for 1 ≤ i ≤ n.
induction base: Since PA1 is empty,
induction step: We need show for this combination of values of the xi ’s that
P (xi+1 , xi , . . . x1 ) = 0.
Furthermore, due to Equality 1.8 and the induction hypothesis, there is some k,
where 1 ≤ k ≤ i, such that P (xk |pak ) = 0. So Equality 1.7 holds.
Case 2: For this combination of values
P (xi , xi−1 , . . . x1 ) 6= 0.
In this case,
The first equality is due to the rule for conditional probability, the second is due
to the Markov condition and the fact that X1 , . . . Xi are all nondescendents of
Xi+1 , and the last is due to the induction hypothesis.
Example 1.27 Recall that the joint probability distribution in Example 1.25
satisfies the Markov condition with the DAG in Figure 1.3 (a). Therefore, owing
to Theorem 1.4,
P (v, s, c) = P (v|c)P (s|c)p(c), (1.9)
and we need only determine the conditional distributions on the right in Equality
1.9 to uniquely determine the values in the joint distribution. We illustrate that
this is the case for v1, s1, and c1:
2
P (v1, s1, c1) = P (One ∩ Square ∩ Black) =
13
The joint probability distribution in Example 1.25 also satisfies the Markov
condition with the DAGs in Figures 1.3 (b) and (c). Therefore, the probability
distribution in that example equals the product of the conditional distributions
for each of them. You should verify this directly.
If the DAG in Figure 1.3 (d) and some probability distribution P satisfied
the Markov condition, Theorem 1.4 would imply
P(c1) = 9/13
P(c2) = 4/13
V S
Theorem 1.5 Let a DAG G be given in which each node is a random variable,
and let a discrete conditional probability distribution of each node given values of
its parents in G be specified. Then the product of these conditional distributions
yields a joint probability distribution P of the variables, and (G, P ) satisfies the
Markov condition.
Proof. Order the nodes according to an ancestral ordering. Let X1 , X2 , . . . Xn
be the resultant ordering. Next define
where PAi is the set of parents of Xi of in G and P (xi |pai ) is the specified
conditional probability distribution. First we show this does indeed yield a joint
probability distribution. Clearly, 0 ≤ P (x1 , x2 , . . .xn ) ≤ 1 for all values of the
variables. Therefore, to show we have a joint distribution, Definition 1.8 and
Theorem 1.3 imply we need only show that the sum of P (x1 , x2 , . . . xn ), as the
variables range through all their possible values, is equal to one. To that end,
XX XX
... P (x1 , x2 , . . .xn )
x1 x2 xn−1 xn
XX XX
= ··· P (xn |pan )P (xn−1 |pan−1 ) · · · P (x2 |pa2 )P (x1 |pa1 )
x1 x2 xn−1 xn
" #
X X X X
= · · · P (xn |pan ) P (xn−1 |pan−1 ) · · · P (x2 |pa2 ) P (x1 |pa1 )
x1 x2 xn−1 xn
X X X
= · · · [1] P (xn−1 |pan−1 ) · · · P (x2 |pa2 ) P (x1 |pa1 )
x1 x2 xn−1
" #
X X
= [· · · 1 · · · ] P (x2 |pa2 ) P (x1 |pa1 )
x1 x2
X
= [1] P (x1 |pa1 ) = 1.
x1
Let
Dk = {Xk+1 , Xk+2 , . . . Xn }.
X
In what follows, means the sum as the variables in dk go through all
dk
their possible values. Furthermore, notation such as x̂k means the variable has
a particular value; notation such as n̂dk means all variables in the set have
particular values; and notation such as pan means some variables in the set
may not have particular values. We have that
P (x̂k , n̂dk )
P (x̂k |n̂dk ) =
P (n̂dk )
X
P (x̂1 , x̂2 , . . .x̂k , xk+1 , . . .xn )
kd
= X
P (x̂1 , x̂2 , . . .x̂k−1 , xk , . . .xn )
dk ∪{xk }
X
P (xn |pan ) · · · P (xk+1 |pak+1 )P (x̂k |p̂ak ) · · · P (x̂1 |p̂a1 )
dk
= X
P (xn |pan ) · · · P (xk |pak )P (x̂k−1 |p̂ak−1 ) · · · P (x̂1 |p̂a1 )
dk ∪{xk }
X
P (x̂k |p̂ak ) · · · P (x̂1 |p̂a1 ) P (xn |pan ) · · · P (xk+1 |pak+1 )
dk
= X
P (x̂k−1 |p̂ak−1 ) · · · P (x̂1 |p̂a1 ) P (xn |pan ) · · · P (xk |pak )
dk ∪{xk }
X Y Z
Figure 1.6: A DAG containing random variables, along with specified condi-
tional distributions.
Example 1.28 Suppose we specify the DAG G shown in Figure 1.6, along with
the conditional distributions shown in that figure. According to Theorem 1.5,
Note that the proof of Theorem 1.5 does not require that values in the
specified conditional distributions be nonzero. The next example shows what
can happen when we specify some zero values.
Example 1.29 Consider first the DAG and specified conditional distributions
in Figure 1.6. Because we have specified a zero conditional probability, namely
P (y1|x2), there are events in the joint distribution with zero probability. For
example,
P(w1) = .1
P(w2) = .9
P(x1|w1) = 0 P(y1|w1) = .8
P(x2|w1) = 1 P(y2|w1) = .2
X Y
P(x1|w2) = .6 P(y1|w2) = 0
P(x2|w2) = .4 P(y2|w2) = 1
Z
P(z1|x1,y1) = .3 P(z1|x1,y2) = .4
P(z2|x1,y1) = .7 P(z2|x1,y2) = .6
P(z1|x2,y1) = .1 P(z1|x2,y2) = .5
P(z2|x2,y1) = .9 P(z2|x2,y2) = .5
on it. This poses no problem; it simply means we have specified some meaning-
less values, namely P (zi|x1, y1). The Markov condition is still satisfied because
P (z|w, x, y) = P (z|x, y) whenever P (x, y) 6= 0 (See the definition of conditional
independence for sets of random variables in Section 1.1.4.).
Example 1.30 Figure 1.8 shows a Bayesian network containing the probability
distribution discussed in Example 1.23.
Example 1.31 Recall the objects in 1.2 and the resultant joint probability dis-
tribution P discussed in Example 1.25. Example 1.27 developed a Bayesian
network (namely the one in Figure 1.5) containing that distribution. Figure 1.9
shows another Bayesian network whose conditional distributions are obtained
Other documents randomly have
different content
GEORG FRIDERIC
HANDEL
(Born at Halle, February 23, 1685; died at London, April 14, 1759)
“Mr. Georg Frideric Handel,” Mr. Runciman once wrote, “is by far the
most superb personage one meets, in the history of music. He
alone, of all the musicians, lived his life straight through in the grand
[29]
manner.” When Handel wrote “pomposo” on a page, he wrote not
idly. What magnificent simplicity in outlines!... For melodic lines of
such chaste and noble beauty, such Olympian authority, no one has
approached Handel. “Within that circle none durst walk but he.” His
nearest rival is the Chevalier Gluck.
No. 1, in G major
No. 2, in F major
No. 3, in E minor
No. 4, in A minor
No. 5, in D major
No. 6, in G minor
No. 7, in B flat major
No. 8, in C minor
No. 9, in F major
No. 10, in D minor
No. 11, in A major
No. 12, in B minor
The year 1739, in which these concertos were composed, was the
year of the first performance of Handel’s Saul (January 16) and
Israel in Egypt (April 4)—both oratorios were composed in 1738—
also of the music to Dryden’s Ode for St. Cecilia’s Day (November
22).
154
FRANZ JOSEF
HAYDN
(Born at Rohrau, Lower Austria, March 31, 1732; died at Vienna,
May 31, 1809)
LONDON SYMPHONIES
SYMPHONY NO. 104, IN D MAJOR (B. & H. NO. 2)
I. Adagio; allegro
II. Andante
III. Menuetto; trio
IV. Allegro spiritoso
The first of the Salomon-Haydn concerts was given March 11, 1791,
at the Hanover Square Rooms. Haydn, as was the custom, “presided
at the harpsichord”; Salomon stood as leader of the orchestra. The
symphony was in D major, No. 2, of the London list of twelve. The
adagio was repeated, an unusual occurrence, but the critics
preferred the first movement.
The orchestra was thus composed: twelve to sixteen violins, four
violas, three violoncellos, four double basses, flute, oboe, bassoon,
horns, trumpets, drums—in all about forty players.
Haydn and Salomon left Vienna on December 15, 1790, and arrived
at Calais by way of Munich and Bonn. They crossed the English
Channel on New Year’s Day, 1791. From Dover they traveled to
London by stage. The journey from Vienna took them seventeen
days. Haydn was received with great honor.
Haydn left London towards the end of June, 1792. Salomon invited
him again to write six new symphonies. Haydn arrived in London,
February 4, 1794, and did not leave England until August 15, 1795.
The orchestra at the opera concerts in the grand new concert hall of
the King’s Theatre was made up of sixty players. Haydn’s
engagement was again a profitable one. He made by concerts,
lessons, symphonies, etc., £1,200. He was honored in many 157
ways by the King, the Queen, and the nobility. He was
twenty-six times at Carlton House, where the Prince of Wales had a
concert room; and, after he had waited long for his pay, he sent a
bill from Vienna for 100 guineas, which Parliament promptly settled.
LONDON SYMPHONIES
SYMPHONY NO. 94, IN G MAJOR, “SURPRISE” (B. & H. NO.
6)
Composed in 1791, this symphony was performed for the first time
on March 23, 1792, at the sixth Salomon concert in London. It
pleased immediately and greatly. The Oracle characterized the
second movement as one of Haydn’s happiest inventions, and
likened the “surprise”—which is occasioned by the sudden orchestral
crash in the andante—to a shepherdess, lulled by the sound of a
distant waterfall, awakened suddenly from sleep and frightened by
the unexpected discharge of a musket.
I. Adagio; allegro
II. Largo
III. Menuetto; trio
IV. Finale; allegro con spirito
III. The Menuetto, allegretto, G major, 3-4, with trio, is in the regular
minuet form in its simplest manner.
IV. The finale, allegro con spirito, G major, 2-4, is a rondo on the
theme of a peasant country dance, and it is fully developed. Haydn
in his earlier symphonies adopted for the finale the form of his first
movement. Later he preferred the rondo form, with its couplets and
refrains, or repetitions of a short and frank chief theme. “In some
[33]
finales of his last symphonies,” says Brenet, “he gave freer reins
to his fancy, and modified with greater independence the form of his
first allegros; but his fancy, always prudent and moderate, is more
like the clear, precise arguments of a great orator than the headlong
inspiration of a poet. Moderation is one of the characteristics of
Haydn’s genius; moderation in the dimensions, in the sonority, in the
melodic shape; the liveliness of his melodic thought never 160
seems extravagant, its melancholy never induces sadness.”
161
PAUL
HINDEMITH
(Born at Hanau, on November 16, 1895)
“KONZERTMUSIK” FOR STRING AND BRASS
INSTRUMENTS
164
ARTHUR
HONNEGER
(Born at Havre, France, on March 10, 1892)
“PACIFIC 231,” ORCHESTRAL MOVEMENT
Pacific 231 is scored for piccolo, two flutes, two oboes, English horn,
two clarinets, bass clarinet, two bassoons, double bassoon, four
horns, three trumpets, three trombones, bass tuba, snare drum,
bass drum, cymbals, tam-tam, strings.
166
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookball.com