Probability Theory
Probability Theory
Midwestern University
Wichita Falls, Texas
\
NORTH-HOLLAND SERIES IN
APPLIED MATHEMATICS
AND MECHANICS
EDITORS:
H. A. LAUWERIER
Institute of Applied Mathematics
University of Amsterdam
W. T. KOITER
Laboratory of Applied Mechanics
Technical University, Delft
VOLUME 10
1970
WAHRSCHEINLICHKEITSRECHNUN G
valoszInusegszamitAs
and
English translation by
DR. LASZLO VEKERDI
PRINTED IN HUNGARY
(3 A
573
»R455
n?o
144222
PREFACE
One of the latest works of Alfred Renyi is presented to the reader in this
volume. Before his sudden death on the 1st February, 1970, he corrected
the first proof of the book, but he had no longer time for the final proof¬
reading* and writing the preface he had planned.
This preface is, therefore, a brief memorial to a great mathematician,
mentioning a few features of Alfred Renyi’s professional career.
Professor Renyi lectured on probability theory at various universities
throughout an uninterrupted series of years, from 1948 till his untimely
death. His academic career started at the University of Debrecen and was
continued at the University of Budapest where he was professor of the
Chair of Theory of Probability. In the meantime he was invited lecturer for
shorter or longer terms in several scientific centres of the world. Thus he
was visiting professor at Stanford University, Michigan State University,
the University of Erlangen, and the University of North Carolina.
Besides his teaching activities, Professor Renyi was director of the Mathe¬
matical Institute of the Hungarian Academy of Sciences for one and half
decade. Under his direction the Institute developed into an important re¬
search centre of the science of mathematics.
He participated in the editorial work of a number of journals. He was
the editor of Studia Scientiarum Mathematicarum Hungarica and a
member of the Editorial Board of: Acta Mathematica, Annales Sci. Math.,
Publicationes Math., Matematikai Lapok, Zeitschrift fur Wahrscheinlich-
keitstheorie, Journal of Applied Probability, Journal of Combinatorial
Analysis, Information and Control.
The careful reader will certainly note how the long teaching experience
and keen interest in research are amalgamated in the present book. The
material of Professor Renyi’s courses on probability theory was first pub¬
lished in the form of lecture notes. It appeared as a book in Hungarian in
1954, and completely revised in German translation in 1962. The latter
book was the basis of a new Hungarian edition in 1965 and the French
* This was done by Mr. P. Bartfai, Mrs. A. Foldes and Mrs. L. Rejto.
PREFACE
§ 1. Fundamental relations 9
§ 2. Some further operations and relations 13
§ 3. Axiomatical development of the algebra of events 16
§ 4. On the structure of finite algebras of events 18
§ 5. Representation of algebras of events by algebras of sets 21
§ 6. Exercises 25
tables 617
REMARKS AND BIBLIOGRAPHICAL NOTES 638
REFERENCES 645
AUTHOR AND SUBJECT INDEX 661
CHAPTER I
ALGEBRAS OF EVENTS
§ 1. Fundamental relations
A = A. (1)
B one in the right side of it. In this case the statement “A and B occurred
both” means the fact that the hit lies in the right upper quadrant of the target
(Fig. 1).
A B AB
Fig. 1
AB = BA. (2)
Also obviously,
AA = A, (3)
Instead of A(BC) therefore we can write simply ABC. Clearly, the event
AB can occur only if A and B do not exclude each other. If A and B are
mutually exclusive, AB is an impossible event. It is useful to consider the
impossible event as an event too. It will be denoted by O. The fact, that A
and B are mutually exclusive, is thus expressed by AB = O. Since an event
and the complementary event obviously exclude each other, we have
AA = O. (5)
If A and B are two events of an algebra of events, one may ask whether
at least one of the events A and B did occur. Let A denote the event that the
hit lies in the upper half of the target and B the event that it lies in the right
I, § 1] FUNDAMENTAL RELATIONS 11
half; the statement, that at least one of the events A and B occurred, means
then that the hit does not lie in the left lower quadrant of the target (Fig. 2).
The event occurring exactly when at least one of the events A and B occurs,
is said to be the sum of A and B and is denoted by A + B. It is easy to see
that
A + B = B + A (6)
A + (B + C) = (A + B) + C (7)
A + B — A + AB, (8)
where the two terms on the right hand side are now mutually exclusive.
By applying relation (8), every sum of events can be transformed in such
a way that the terms of the sum become pairwise mutually exclusive.
Clearly the formula
A + A = A (9)
A + A = I. (10)
We agree further that
1=0, 0 = 1, (11)
12 ALGEBRAS OF EVENTS [I, § 1
i.e. that the event complementary to the sure event is the impossible event
O and conversely.
Evidently, the following relations are also valid:
AO = 0, (12)
A + 0 = A, (13)
AI = A, (14)
A + 1= I. (15)
A + AB = A; (17)
A + AB = AI + AB = A(I + B) = AI = A.
Clearly, rule (17) can be verified directly as well; the direct verification is,
however, clumsy for some complicated relations, while by applying the for¬
mal rules of operation one can readily get a formal proof. This is the reason
why the algebra of events is useful; therefore it is advisable to obtain a
certain practice in such formal proofs.
The distributive laws can be extended (just like in ordinary algebra) to
more than two terms. In the algebra of events there exists, however, still
another distributive law:
A + BC = (A + B) (A + C). (18)
(A + B) (A + C) = A + AB + AC + BC = A + BC,
I, § 2] SOME FURTHER OPERATIONS AND RELATIONS 13
AB = A + B, (19)
A + B = AB. (20)
The event AB occurs exactly, if AB does not occur, hence if the events
A and B do not both occur; A + B occurs exactly, if A or B (or both) do
not occur. These two propositions evidently state the same thing; thus (19)
is valid. Formula (20) can be proved in the same way.
As to the rules of operation valid for the addition and multiplication
of events, we see that both have the same properties (commutativity, asso¬
ciativity, idempotency of every element) and that the relations between the
two kinds of rules of operation are symmetrical. Formulas (16) and (18)
are obtained from each other - by interchanging everywhere the signs of
multiplication and addition. Such formulas are called dual to one another.
Thus for instance the relations
A + AB = A and A(A + B) = A
are dual to one another. Clearly, there exist relations which are, because of
their symmetry, selfdual; e.g. the relation
(A + B) (A + C) (B + C) — AB + AC + BC.
k=l
instead of AXA2 . . . A„
n
and Yj instead of A1 + A2 + • • • + An.
k= 1
B — A = BA. (1)
they are the two distributive laws of subtraction. Using the subtraction,
the complementary event may be written in the form
A — I — A. (3)
B A-B B-A
Fig. 3
The subtraction does not satisfy all the rules of operation known from
ordinary algebra. Thus for instance (A — B) + B is in general not equal
to A; further A + (B — C) is not always identical to (A + B) — C. Hence,
if in relations between events there figures the sign of subtraction too, the
brackets are not to be omitted without any consideration. There are, how¬
ever, cases when this omission is allowed, e.g.
A - (B + C) = (A - B) - C. (4)
The event A-B occurs exactly if A does and B does not occur; in the
same way, B — A occurs if B does but A does not occur. The meaning of the
expression (A - B) + (B - A) is therefore not O, but the event which
consists of the occurrence of one and only one of the events A and B. It is
reasonable to introduce for this event a new symbol. We put
(A — B) + (B — A) — AAB. (5)
1. O £ A.
2. A £ /.
3. A £ A.
4. A £ B and B <=, C imply T £ C.
5. A £ B and B ^ A imply A = B.
6. A £ A + B.
7. AB £ A.
8. A ^ B implies A = AB.
9. A £ C and i? £ C imply A + B £ C.
10. and C £ B imply C £ AB.
11. A £ B implies B — A + BA.
12. A £ B implies B £ T.
13. A ^ B implies TC £ 2?C.
14. A £ B implies A + C £ B + C.
15. AB = O and C <=, A imply BC = 0.
B = BI — B(A + A) = BA + BA = A + BA.
From this it follows that for the validity of the relation A £ B the validity
of one of the relations A = AB and B = A + BA is necessary and sufficient.
The latter relation can be stated in the following form: For the validity of
A cz B a necessary and sufficient condition is the existence of a C such that
AC = O and B = A + C; indeed, from this it follows directly C = BA.
16 ALGEBRAS OF EVENTS [I, § 3
are valid. For instance [A, A } is a complete system of events, provided that
A A O and A A /.
AA = A (1.1)
AB = BA (1.2)
A(BC) = (AB)C (1.3)
A + A = A (2.1)
A + B = B + A (2.2)
A + (B + C) = (A + B) + C (2.3)
A(B + C) = AB + AC (3.1)
A + BC = (A + B) (A + C) (3.2)
AA = O (4-1)
A + A = I (4.2)
AI = A (5.1)
A + O = A (5.2)
AO = O (5-3)
A + I= I (5.4)
It is to be noted that these axioms are not all mutually independent; thus
for instance (3.2) can be deduced from the others. It is, however, not our
aim to examine here which axioms could be omitted from the system.
The totality of the outcomes of an experiment forms a Boolean algebra,
if we understand by the product AB of two events A, B the joint occurrence
of both events and by the sum A + B of two events the occurrence of at
least one of the two events; further, if we denote by A the event complemen¬
tary to A and by O and / the impossible and the sure events, respectively.
Indeed, the above 14 axioms are fulfilled in this case. More generally, every
subset of the set of the outcomes of an experiment is a Boolean algebra if it
contains the sure event, further for every event A its complementary event
A and for every A and B the events AB and A + B.
Clearly, one can find other Boolean algebras as well. Thus for instance,
the totality of the subsets of a set H is also a Boolean algebra. We define the
sum of two sets as the union of the two sets and their product as the inter¬
section oj the two sets. Let I mean the set H itself and O the empty set,
further A the set complementary to A with respect to H and thus B — A
the set complementary to A with respect to B. A direct verification of each
axiom shows that this system is indeed a Boolean algebra.
There exists a close connection between Boolean algebras of events and
algebras of sets. In our example of the target this connection is clearly vis¬
ible. This analogy between a Boolean algebra of sets and an algebra of
events has an important role in the calculus of probability.
In order to obtain a Boolean algebra, it is not necessary to consider all
subsets of a set. A collection T of the subsets of a set H is said to be an algebra
of sets, if the addition can be always carried out in it, if H itself belongs to
T and for a set A its complementary set A = H — A belongs to T as well;
i.e. if the following conditions are satisfied:1
1. H £T.
2. A d T, B £T implies A + B £ T.
3. A £ T implies A £T.
1 The notation a £ M means here and in the following that a belongs to the set
M; a $ M means that a does not belong to the set M.
18 ALGEBRAS OF EVENTS [I, § 4
A = B + C, B A A, C A A.
A = AB +AB
since the number of events is finite. It is evident, from this proof, that all
the Afs are distinct. If not already known, this could easily be shown be¬
cause of the rule A + A = A and the commutativity of the addition. (The
deduction used above to prove the representability of B as a sum of ele¬
mentary events is nothing else as the so-called “descente infinie” known
from number theory.) It remains still to prove the uniqueness of represen¬
tation (1). If there would exist two essentially different representations
I — Ax + A2 + ... + An.
Thus always one and only one of the elementary events Ax, A2,.. ., A„ occurs.
The elementary events form a complete system of events.
Consider now as an example the algebra of events which consists of the
possible outcomes of a game with two dice. Clearly, the number of elemen¬
tary events is 36; let us denote them by Ay (i,j = 1, 2, . . ., 6) where Ay
means that the result for the first die is /, that for the second, j. Accord¬
ing to Theorem 2 the number of events of this algebra of events is
236 = 68 719 476 736. It would thus be impossible to discuss all cases.
We choose therefore another example, namely the tossing a coin twice.
The possible elementary events are: 1. first head, second head as well (let
Axx denote this case); 2. first head, next tail, denoted by A12; 3. first tail,
next head, denoted by A21; 4. first tail, second also tail, notation: A22. The
number of all possible events is 24 = 16. These are: I, O, the four elementary
events, further Axx + AX2, Axx + A21, Au + A_22, A12 + A21, AX2 + A22,
A i + A. , and besides these the four events Axx, AX2, A2X, A22 complementary
2 22
section A'B' of A' and B'; further that the collection of elementary events
from which the event A + B is composed is equal to the union A' + B'
of the sets A' and B'. In this assignment of events to sets, the elementary
events themselves correspond to the sets having only one element. Obviously
the empty set corresponds to the impossible event. To the sure event cor¬
responds the set of all possible elementary events (with respect to the same
experiment); this set will be denoted by H and will be called the sample
space. Further, it is easy to show that to the complementary event A corre¬
sponds the complementary set of A' with respect to H.
In the following paragraph we shall show that to every algebra of events
corresponds an algebra of the subsets of a set H such that there corresponds
to the event A + B the union of the sets belonging to A and B and to the
product AB the intersection of the sets belonging to A and B, finally to the
complementary event A the complementary set with respect to H of the
set belonging to A. In other words, one can find to every algebra of events
an algebra of sets which is isomorphic to it.
The proof of this theorem, due to Stone [1], is not at all simple; it will
be accomplished in the next paragraph; on first reading it can be omitted,
since Stone’s theorem will not be used in what follows. We give the proof
only in order to show that the basic assumption of Kolmogorov’s theory,
i.e. that events can always be represented by sets, does not restrict the gen¬
erality in any way.
In the case of a finite algebra of events this fact was already established
by means of Theorem 1. Here we even have a uniquely determined event
corresponding to every subset of the sample space.
The theory of Boolean algebras is a particular case of the theory of more
general structures called lattices (cf. e.g. G. Birkhoff [1]).
a) I d a.
b) A £a implies A $ a and conversely.
c) If A + B d a, then A or B belongs to a.
1 In lattice theory such systems are called ultrafilters: ultrafilters are commonly
characterized as sets complementary to prime ideals. A nonempty subset of a Boolean
algebra cA\% called a prime ideal, if the following conditions are fulfilled. 1. A
and B £ /? imply A + B £ /?. 2. A £ /? and B imply AB £ ft. 3. If AB £ ft, then
A £/? or B £/S (or both). Cf. e.g. G. Aumann [1].
I, § 5] REPRESENTATION BY ALGEBRAS OF SETS 23
Let us return to the proof of Lemma 2. Let 32 denote the set of those sys¬
tems of events /J in the algebra of events which fulfil conditions 1 and 2
of the crowds of events. If /? < y means that /? is a proper subset of y, 3 is
a partially ordered set. If A A O, the set /? = (A) consisting only of the
element A evidently fulfils conditions 1 and 2. According to Lemma 3 there
exists a maximal chain containing /? = (T) as a subset. Let a denote the
union of the subsets y belonging to this chain. Clearly, a is a crowd of events,
since it is the union of sets /? fulfilling the rules 1 and 2 defining the crowds
of events. Therefore no element of the chain contains the event O and thus
a does not contain O either. Further if Bx and Bo belong to a, they belong to
a subset /?x respectively a subset /?2 of a- Since either ^ < /L or the contrary
must hold, and /?2 belong both to ^ or to /L and the same holds for B1B2.
Thus BXB2 belongs to a as well. Further we see that a cannot be extended.
This is a consequence of the requirement that the chain be a maximal chain.
Lemma 2 is thus proved.
Now we can construct to every algebra of events a field of sets isomorphic
to it. Let at be the set of all crowds of events a of the algebra of events .
We assign to every event A of the subset 32A of 3f consisting of all crowds
of events a containing the event A. The setL?1^ will be called the representa-
32 A — 32 A , (2)
32 A + 32B = 32a+b. (3)
Thus we have proved that J^is an algebra of sets. In order to show that
3 is isomorphic to the algebra of events it still remains to prove that the
correspondence A -> 32A is one-to-one. If A A B, we have AAB ^ O.
Hence at least one of the relations AB ^ O and AB ^ O is valid as well.
Suppose that AB ^ O. Because of (1) 32~^B = 32^32B. Hence every crowd
of events belonging to <-%AB belongs to 32A and also to 322B, hence it belongs
to 32B and does not belong to 32A. Thus we proved the existence of crowds of
events which belong to 32B but do not belong to 32A. Hence 32B and 32A
x. § 6] EXERCISES 25
§ 6. Exercises
1. Prove
a) AB + CD = (A~ + B) (C -f D),
b) (A + B) (A + B) + (A + B) (A + B) = I,
c) (A + B) (A + B) (A + B) (A + B) = O,
d) (A + B) (A + C) (B + C) = AB + AC + BC,
e) A — BC = (A — B) + (A — C),
f) A — (B + C) = (A — B) — C,
g) (A — B) + C = [(A + C) — B] + BC,
h) (A—B) — (C—D)= [A—{B+ C)] + {AD — B),
i) A — {A — [B — (B—C)]} = ABC,
j) ABC + ABD + ACD + BCD =
= (A + B) (A + C) (A + D)(B + Q(B + D){C + D),
k) A + B+ C= (A—B)+ (B—C)+ (C—A)+ ABC,
l) A A (B A C) = (A A B) A C,
m) (A + B)A (A~+ B)= A AB,
n) AB A BA — A A B.
o) Prove the relations enumerated in § 2 (6) for the symmetric difference.
p) The relation (A + B) — B = A does not hold in general. Under what con¬
ditions is it valid?
q) Prove that A A B = CAD implies A A C = BAD.
2. The elements of a Boolean algebra form a ring with respect to the operations
of symmetric difference and multiplication. The zero element is O, the unit element 1.
3. In a finite algebra of events containing n elementary events one can give several
complete systems of events. Complete systems of events differing only in the order
of the terms are to be considered as identical. Let T„ denote the number of the different
complete systems of events.
a) Prove that 7\ = 1, T2 = 2, T3 = 5, f = 15, Tj = 52, T0 = 203.
b) Prove the recursion formula
c) Prove
k= 1 K•
further that
co
_ gSllX'
1 + n=i
1
7. We can construct from the events A, B, and C by repeated addition and multi¬
plication eighteen, in general different, events namely A, B, C, AB, AC, BC, A + B,
B + C, C + A, A + BC, B + AC, C + AB, AB + AC, AB + BC, AC + BC, ABC,
A + B + C, AB+ AC + BC. (The phrase “in general different” means here that
no two of these events are identical for all possible choices of the events A, B, C.)
Prove that from 4 events one can construct 166, from 5 events 7579 and from 6 events
7 828 352 events in this way. (No general formula is known for the number of events
which can be formed from n events.)
9. Verify that for the example of Exercise 8, our Theorem 1 is the same as the well-
known theorem on the unique representability of (square-free) integers as a product
of prime numbers.
10. The numbers 0, 1, ... , 2"-l form a Boolean algebra if the rules of operation
are defined as follows: Represent these numbers in the binary system. We understand
by the “product” of two numbers the number obtained by multiplying the corre-
sponding digits of both numbers place for place; by the sum the number obtained
by adding the digits place for place and by replacing everywhere the digit 2 obtained
in the course of addition by 1.
11. Let A, B, C denote electric relays or networks of relays. Any two of these
may be connected in series or in parallel. Two such networks which are either closed
both (allowing current to pass) or both open (not allowing current to pass) are con¬
sidered as equivalent. Let A + B denote that A and B are coupled in parallel, AB
that they are coupled in series. Let A denote a network always closed if A is open
and conversely. Let O denote a network allowing no current to pass and / a network
always closed. Prove that all axioms of Boolean algebras are fulfilled.1
Hint. Relation04 + B)C = AC + BC has for instance the meaning that it comes
to the same thing to connect first A and Bin parallel and couple the network so obtained
with C in series or to couple first A and C in series, then B and C in series and then
the two systems so obtained in parallel. Both systems are equivalent to each other in
the sense that they either both allow to pass the current or both do not. A similar
consideration holds for the other distributive law. Both distributive laws are illustrated
in Fig. 5.
(A+B)C = AC + BC
12. A domain of the plane is said to be convex if it contains for any two of its
points the segment connecting these points as well. We understand by the “sum”
of two convex domains the least convex domain containing both, by their “product”
their intersection which, evidently, is convex as well. Let further / denote the entire
plane and O the empty set. The addition and multiplication fulfil axioms (l.l)-(2.3);
the distributive laws are, however, not valid and the complement A is not defined.
13. Let us understand by a linear form a point, a line or a plane of the 3-dimensional
affine space, further let the empty set and the entire 3-dimensional space be called
linearforms too. We define as the sum of a finite number of linear forms the least linear
form containing their set theoretical union; let their product be their (set theoretical)
intersection, which is evidently a linear form too. Prove the same propositions as in
Exercise 12.
1 This example shows how Boolean algebra can be applied in the theory of net¬
works and why it is of great importance in communication theory and in the construc¬
tion of computers (cf. e.g. M. A. Gavrilov [1]).
28 ALGEBRAS OF EVENTS [I, § 6
14. Let Ax, A2,. . . , A„ be arbitrary events. Form all products of these events
containing k distinct factors, and let Sk be the sum of all products. Let Pk be the
product of all events representable as a sum of k distinct terms of Ak, A2, ■ . ■, A„.
Prove the relation
Sk — P„-k+i (k — 1,2,, n)
in a formal way by applying the rules of operation of Boolean algebras and verify
it directly too (cf. the generalization of Exercise Id).
Hint. Sk has the meaning that among the events Ax, A2,..., A„ there are at least
k which occur and the meaning of P„-k+1 is that among these same events there are
no n — jfc + l which do not occur; these two statements are equivalent.
16. Show that condition b) of the preceding exercise cannot be replaced by b')
“whenever A and B belong to T, A A B belongs to T as well”.
Hint. Let H be a finite set of the elements a, b, c, d and let T consist of the following
subsets of H: {a, b}; {c, d}; {a, c}; {b, d}; O; H.
20. We call a nonempty 01 of subsets of a set H that contains with two sets A and
B also A + B and A — B, a ring of sets. A ring of sets ^ is thus an algebra of sets
if and only if H belongs to 01. Prove that a nonempty system of sets containing with
A and B also AB and A — B is not necessarily a ring of sets. Show that the condition
“with two sets A and B, A + B and AB belong as well to S” is not sufficient for 5
to be a ring of sets either.
CHAPTER II
PROBABILITY
with the same accuracy as most of the “deterministic” laws of nature. The
radioactive disintegration is thus a mass phenomenon described, as to its
regularity, by the theory of probability.
As seen in the above example, phenomena described by a stochastic
scheme are also subject to natural laws. But in these cases the complex of
the considered circumstances does not determine the exact course of the
events; it determines a probability law, giving a bird’s view of the outcome.
Probability theory aims at the study of random mass-phenomena, this
explains its great practical importance. Indeed we encounter random mass-
phenomena in nearly all fields of science, industry, and everyday life.
Almost every “deterministic” scheme of the sciences turns out to be sto¬
chastic at a closer examination. The laws of Boyle, Mariotte, and Gay-Lus¬
sac for instance are usually considered to be deterministic laws. But the
pressure of the gas is caused by the impacts of the molecules of the gas on
the walls of the container. The mean pressure of the gas is determined by
the number and the velocity of the molecules hitting the wall of the container
per time unit. In fact, the pressure of the gas shows small fluctuations,
which may, however, be neglected in case of greater gas-masses. As another
example consider the chemical reaction of two substances A and B in a
watery solution. As it is well known, the velocity of the reaction is in every
instant proportional to the product of the concentrations of A and B. This
law is commonly considered as a causal one, but in reality the situation is
as follows. The atoms (respectively the ions) of the two substances move
freely in the solution. The average number of the “encounters” of an ion
of substance A with an ion of substance B is proportional to the product of
their concentrations; hence this law turns out to be essentially a stochastic
one too.
The development of modern science makes it often necessary to examine
small fluctuations in phenomena dealt with earlier only in their outlines
and considered at that level as causal. In the following, we shall find several
occasions to illustrate these principal questions with concrete examples.
ning of our century K. Pearson obtained from 24 000 tossings the value
0.5005 for the relative frequency.
There are thus random events showing a certain stability of the relative
frequency, i.e. the latter fluctuates about a well-determined value and the
more trials are performed, the smaller are, generally, the fluctuations. The
number, about which the relative frequency of an event fluctuates, is called
the probability of the event in question. Thus, for instance, the probability
§ 3. Probability algebras
a number lying between zero and one. Evidently, the probability of an event
must therefore lie between zero and one as well. It is further clear that the
relative frequency of the sure event is equal to one and that of the impossible
event is equal to zero. Hence also the probability of the “sure” event must
be equal to one and that of the “impossible” event equal to zero. If A and B
are two possible outcomes of the same experiment which mutually exclude
each other and if in n performances of the same experiment the event A
occurred kA times and the event B kB times, then clearly the event A + B
occurred kA + kB times. Hence, denoting by fA,fB and fA+B the relative
frequencies of A, B, and A + B, respectively, we have:
ln other words, the relative frequency of the sum of two mutually exclusive
events is always equal to the sum of the relative frequencies of these events.
Hence also the probability of the sum of two mutually exclusive events must
be equal to the sum of the probabilities of the events. We therefore take the
following axioms:
P(A) + P(A) = l
holds.
Ja +/a - 1-
Axiom y) states that the probability of the sum of two mutually exclusive
events is equal to the sum of the probabilities of the two events. This leads
immediately to
Indeed, by assumption
Ax + A2 + . . . + An = I and AtAj = O
36 PROBABILITY [IX, § 3
Theorem 6. If A c b, then
Proof. Because of
AAB -- (A — B) + (B -A)
H, § 3] PROBABILITY ALGEBRAS 37
and
(A — B) (B — A) — O
we find
P(AAB) = P{A - B) + P(B - A),
Sy> - I P(A,A,,...Aik).
\<,i±<h<... <ik<,n
r + k)
nn>= ’£ (-i)* k
Sftk (r = 0, (3)
k =0
where S(f — 1 and
Theorem 11. Let Ax, A2,. . ., An be n arbitrary events and ^ the algebra
of events of all events expressible in terms of the events Ak. Let cx, c2,. . ., cm
be real numbers and Bx, B2,. . ., Bm a sequence of events such that Bk ^
(k = 1,2,..., m). The inequality
m
1 The events Ak(k = 1, ...,«) are here considered as variables (indefinite events);
thus there are no relations assumed between the Ak-s.
II, § 3] PROBABILITY ALGEBRAS 39
4 = Z ck, (oC.Bk
where the summation is over such values of k for which co figures in the
representation of Bk. Since the nonnegative numbers P(co) are submitted
to the only condition IP(co) = 1, (6) holds, in general, if and only if all
numbers 1.^ are nonnegative. But when the sequence of numbers P(Aj)
(J — 1,2consists of nothing but zeros and ones, one and only one
of the elementary events co has probability 1 and all the others have proba¬
bility 0. Thus the proposition that 2^ > 0 for all co is equivalent to the prop¬
osition that (6) is valid whenever the sequence of numbers P(Ak) consists
of zeros and ones only. Theorem 11 is now proved.
From Theorem 11 follows immediately
Theorem 12. If At, A2,. . ., A„ are arbitrary events and Bx, B2,. . ., Bm
are certain events expressible by A}, A2, . . A„, then the relation
m
Z ck P(Bk) = 0 (7)
k=l
holds in every probability algebra, if and only if it holds in all cases when the
sequence of numbers P{Ak) consists of zeros and ones only.
m
Proof. Apply Theorem 11 in turn to the inequalities £ ckP(Bk)> 0
i
tn
and Yj (—cf)P(Bk) > 0. Theorem 12 follows immediately.
/c = 1
Now we can prove Theorem 10 (and thus also Theorem 9). If/fiom
the numbers P(Ak) are equal to 1 and the remaining n — / are equal to
0 (/ = 0. 1, 2, . . ., n), then (3) is reduced to the identity
n—r
'r + k ' l jl, if / = r,
z k r+k (0, if / * r.
(8)
For I < r all terms of the left hand side of (8) are equal to 0, for / — r only
the term k = 0 is distinct from zero, namely 1 and for l > r the sum can be
transformed as follows:
n—r
[r + k ' 1
E (-i)‘
k=0 ( k r + /c,
/ t~r it _ r\
r Z (-1)*
k =0 '
(1 - l)'~r = 0.
40 PROBABILITY [II, § 4
n
c (») -s?>
4-
r+1
r+ 1
- (r= 1,2,... n — 1), (10)
n— 1 In- 1
r r— 1
l'+lj M
-
n ' n
I'+lJ r
f H / M l
1 r+ 1 „ \r r
i, n — 1 in — 1
I'-l
The probability P(A) of any event A is then uniquely determined by the values
of P for the sets which consist of exactly one element. Let {<*>,} be the set
consisting of co; only and let further be P({a>,}) = pt (/ = 1, 2,.. N).
Then we have for each event A
W= E ^
o>i£A
Z
i=i
Pi = L
to each other, that is, if they are equal to — . These special probability alge¬
bras are called classical probability algebras, since the classical calculus of
probability exclusively dealt with these algebras.
At the early stages of development of probability theory one wished to
reduce the solution of any kind of problems to this case. But this often turned
out to be either too artificial or unnecessarily involved. Since, however, in the
games of chance (tossing of a coin, games of dice, roulette, card-games,
etc.) the probabilities can be determined in this manner indeed, and since
many problems of science and technology may be reduced to the study of
classical probability algebras, it is worthwhile to deal with them separately.
In the case of classical probability algebras we have
v J N
Example la. A person having N keys in his pocket wishes to open his
apartment. He takes one key after the other from his pocket at random and
tries to open the door. What is the probability that he finds the right key
at the k-th trial? Suppose that the N\ possible sequences of the keys all have
the same probability. In this case the answer is very simple indeed: N ele¬
ments have (N - 1)! permutations with a fixed element occupying the
(N - 1)! 1
/c-th place. The probability in question is therefore ——- = — ; that
is, the probability of finding the right key at the first, second,. . ., iV-th trial,
respectively, is always — . If the keys are on the same key-ring and if the
same key may be tried more than once, the answer is different and will be
dealt with later (cf. Ch. Ill, § 3, 7.).
Example lb. An urn contains M red and N — M white balls. Balls are
drawn from the urn one after the other without replacement. What is the
probability of obtaining the first red ball at the k-th drawing? In order to
answer the question we have to determine the total number of all permu¬
tations of M red and N—M white balls having for their first k—1 balls white
. [N — M\
ones and for the k-th a red ball. The first k-\balls may be chosen in
N — M'
(k- 1)! M(N — k)\.
k- 1 .
Obviously, the special case M — 1 is equivalent to Example la.
In order to make the calculations more easy, Pk may be written in the
following form:
M k-1 M
N—k+ t n p - N-j+ 1
M
If N and M are large in comparison to k, and — is denoted by p (0 < p <
N
Solution. A sample of n elements may be chosen from N elements in
n
different ways. Suppose that every such combination is equally probable.
-l
N
Then the probability of every combination is Therefore we have only
to count, how many combinations contain just k of the rejects. There can
l M'
be chosen k elements from M in different ways and n — k elements
N— M
from N M in different ways. Therefore the probability in ques-
n —k
tion is:
M) /A — M\
UJ 1 n-k
[N\
n1
occupying the cell can be chosen in different ways, and the number of
1 i""*
Wk =
*J •
Such “problems of arrangements” are of paramount importance in sta¬
tistical mechanics. There it is usual to examine the arrangements of certain
kinds of particles (molecules, photons, electrons, etc.) in the “phase space”.
The meaning of this is the following: If every particle can be characterized
by K data, then there corresponds to the state of the particle in question a
point of the phase space, having for coordinates the data characterizing
the particle. In subdividing the phase space into ^-dimensional parallelepi¬
peds (cells) the physical system can be described approximately by giving
the number of particles in each cell. The assumption that all arrangements
have equal probabilities leads to the so-called Maxwell-Boltzmann sta¬
tistics. This can be applied for instance in statistical mechanics to systems
of the molecules of a gas. But in the case of photons, electrons, and other
elementary particles we must proceed in a different way. For systems of
photons for instance, the following model was found to be valid: in distrib¬
uting n objects into N cells two arrangements containing in each of the
cells the same number of objects are not to be considered as distinct. That
is, the objects are to be considered as not distinguishable and thus only
arrangements having different numbers of objects in the cells can be distin¬
guished from each other. This assumption leads to the so called Bose-Ein-
stein statistics.
Next we calculate the number of possible arrangements under Bose-Ein-
stein statistics. This problem is the same as the following question: In how
many ways can n shillings be distributed among N persons? (Of course,
the number of shillings obtained is of interest, the individuality of the coins
being irrelevant.) This number is equal to the number of combinations of
IN + n - 1
N things, n at a time, repetitions allowed, i.e. to . Another solu-
\ /
tion of our problem is the following: Let to the n objects correspond n colli-
near points. Let these n points be subdivided by N— 1 separating lines. Every
configuration thus obtained corresponds to one possible arrangement.
Every two consecutive lines signify one cell and the number of the points
lying between two consecutive lines represents the number of objects in the
corresponding cell. If there are no points between two consecutive lines,
the corresponding cell is empty. Figure 8 gives a possible arrangement of
eight objects into six cells; here the first cell contains one, the second two
objects, the third cell is empty, the fourth contains three objects, the fifth
is empty, and in the sixth are two objects.
II, § 5] PROBABILITIES AND COMBINATORICS 45
• • • • • • •
Fig. 8
(N+n —k —2'
l n-k
iN+n- 1
l n
'N- 1|
n— 1 n
N N '
n ,
The theory discussed up to now can only deal with the most elementary
problems of probabilities; those involving an infinite number of possible
events are not covered by it. To deal with them we need Kolmogorov’s
theory, which will now be discussed.
In Kolmogorov’s probability theory we assume that there is given an
algebra of sets, isomorphic to the algebra of events dealt with. This assump¬
tion, as we have seen, does not restrict the generality. We assume further
that this algebra of sets contains not only the sum of any two sets belonging
to it but also the sum of denumerably many sets belonging to the algebra
of sets. Algebras of sets with this property are called a-algebras or Borel
algebras.
In Kolmogorov’s theory we therefore assume the following axioms:
then also FI Ak £
*=i
1 Here and in what follows the sign => stands for the (logical) implication.
II, § 7] RINGS AND ALGEBRAS OF SETS AND MEASURES 47
V. P(Q) = 1.
P(A)= £ pk.
tok^A
In this paragraph we shall discuss some results of set theory and measure
theory, used in probability theory. We shall not aim at completeness. We
48 PROBABILITY [II, § 7
assume that the reader is familiar with the fundamentals of measure theory
and of the theory of functions of a real variable. Accordingly, proofs are
merely sketched or even omitted, especially if dealing with often-used con¬
siderations of the theory of functions of a real variable.1
We have seen already in Chapter I that every algebra of events is isomor¬
phic to an algebra of sets. It is always assumed in Kolmogorov’s theory that
the sets assigned to the elements of the algebra of events form a cr-algebra.
Hence the algebra of sets constructed to the algebra of events must be ex¬
tended into a ^-algebra — if it is not already a cr-algebra itself. This exten¬
sion is always possible, even in the case of a ring of sets.
A system of subsets of a set Q is called a ring of sets if
The ring of sets ^is an algebra of sets iff2 the set Q belongs to . In fact,
an algebra of sets can be characterized as a system of subsets of a set
Q having the following properties:
I. A £ and B £ => A — B £
II. A d and B £ =>• A + B £
III. Q
Proof. Obviously, there exists a tr-ring ^'containing Such is, for in¬
stance, the collection of all subsets of Q. Let now be the intersection
asserted:
a) 0;
b) if A £3, then p*(A) = p(A);
OO 00
Let the ring 32 be the collection of all sets consisting of a finite number of
intervals closed to the left and open to the right. p.(A) will be defined as
follows: If A consists of the half-open disjoint intervals [ak, bk), ax < bx <
< a2 < b2 < ... < ar < br, let then be
KA)=
n=1
Z HA„)-
— B„, g(Bn) < + oo (n = 1, 2,. . .) and ]~] Bn = 0 (i.e. for every decreasing
n=1
sequence of sets Bn having the empty set as their intersection) the relation
holds.
Ant'd®, £ An £<32,
n=1
E 4 = E
c=l k=l
KAk) +a Z 4 ;
k—n )
thus from the fact that the sets Bn = Ak satisfy the conditions
k=n
'Z
k—X
Ak =
k=1
Z KA&
whenever B„ £ <32, Bn+l £ Bn, Y\Bn = O hold, one has Z?i = Z (B„ ~ B»+1)
Now we shall prove that the set function g defined by (1) satisfies the con¬
ditions of Theorem 3. Let therefore be Bn £ <32, Bn+1 c Bn (n = 1, 2,...)
00
repeated with the set B[B2 (which may contain closed intervals as well be¬
sides the half-open intervals) such that the sum of the increments of the func-
c
tion F(x) belonging to the removed intervals be at most —. In this way we
8
obtain a set B2 which consists of a finite number of closed intervals. By
continuing the procedure, we obtain a sequence of sets B'n having the follow¬
ing properties: a) B'n consists of finitely many closed intervals; b) B'n+1 £ B'n;
00
These properties are, however, contradictory since from a), b), c) follows
the existence of a number N such that, for every n > N, B'n is the empty set
(indeed, from B'n = B{ B'2. . . B'n ^ O for every n, the sets B'n being closed,
00
the relation Y\ B'n^ O would follow). But this contradicts d), and thus
n=1
we proved our statement that the set function /i defined by (1) is a measure
on
According to Theorem 2 the definition of the measure \i can be extended
to all Borel subsets of the real axis. Thus we obtained on these sets a measure
H such that for A = [a, b) the relation }l(A) = F(b) — F(a) is valid.
Especially, if
0 for x < 0,
F(x) = x for 0 < x < 1,
1 for 1 < x,
then the above procedure assigns to every subinterval [a, b) of the interval
[0, 1] the value b — a.
We have seen that /r is a complete measure determined on a c-ring <32*,
which contains the er-ring «$(<^). If F(x) has the special form mentioned
above, this measure is just the ordinary Lebesgue measure defined on the
interval [0, 1]. Any measure n constructed by means of a function F(x)
satisfying the above conditions is called a Lebesgue-Stieltjes measure de¬
fined on the real axis.
II, § 7] RINGS AND ALGEBRAS OF SETS AND MEASURES 53
The same construction can be applied in cases of more than one dimension.
Let F(xx, x2,. . x„) be a function of the real variables xl5 x2,. . x„
having the following properties:
1. F(x1, x2,. . x„) is in any one of its variables a non-decreasing func¬
tion continuous from the left.
2. lim F(xx, x2,. .., x„) — 0, (k = 1,2,..., n) and lim F(xx, x2,. . .,xn) =
Xk-*-°O
til) = 42? 4? • • • 1
F(« > «*.•••» an)' (3)
Let be the set of all subsets A of the ^-dimensional space which can
be represented as the union of finitely many pairwise disjoint intervals
r
= E M4).. (4)
&=1
then the extension of the set function ju(A) defined above leads to the ordi¬
nary ^-dimensional Lebesgue-measure defined on the ^-dimensional cube
0 ^ xk < 1 {k = 1, 2,..ri).
§ 8. Conditional probabilities
k
have fAB = —. Finally, if fA\B denotes the conditional relative frequency
Since fAB fluctuates around P(AB) and fB around P{B), the conditional
relative frequency fA\B will fluctuate for P(B) > 0 around . This
P(B)
P( A R')
number shall be called the conditional probability of the event A with
P(B)
respect to the condition B\ it is assumed that P{B) > 0. The notation for
the conditional probability is P(A\B)\ thus we put
P(AB)
P(A\B) = (1)
P(B)
vided that P(B) > 0. If P(B) = 0, formula (1) has no sense; the conditional
probability P(A | B) is thus defined only for P(B) > 0.1 Formula (1) may be
expressed in words by saying that the conditional probability of an event A
with respect to the condition B is nothing else than the ratio of the probability
of the joint occurrence of A and B and the probability of B.
Equality (1) is (in contradiction to the standpoint of many older text¬
books) neither a theorem nor an axiom; it is the definition of conditional
probability.2 But this definition is not arbitrary; it is a logical consequence
of the concept of probability as the number about which the value of the
relative frequency fluctuates.
In the older literature of probability theory as well as in some vulgariza¬
tions of modern physics one finds often the misleading formulation that the
probability of an event A changes because of the observation of the occur¬
rence of an event B. It is, however, obvious that P(A | B) and P(A) do not
differ because the occurrence of the event B was observed, but because of
the adjunction of the occurrence of event B to the originally given complex
of conditions.
Let us now state some examples.
Example 1. In the task of pebble-screening one may ask, what part of the
pebbles is small enough to pass through a sieve SA, i.e. what is the probability
of a pebble chosen at random to pass through the sieve SA. Let this event
be denoted by A. Assume now that the pebble was already sieved through
another sieve SB, and the pebbles which did not pass through the sieve
SB were separated. What is the probability that a pebble chosen at random
from those sieved through the sieve SB will pass through the sieve SA as
well ? Let B denote the event that a pebble passes through SB, the probability
of this event let be denoted by P(B). Let further AB denote the event that
a pebble passes through both SB and and P(AB) the corresponding
probability. Then the probability that a pebble chosen at random from
those which passed SB will pass SA as well is, according to the above,
P(AB)
P(A | B) =
P(B) *
Example 2. Two dice are thrown, a red one and a white one. What is the
probability of obtaining two sixes, provided that the white die showed a six?
T
From a sheer mathematical point of view the conditional probability
P(A | B) may be considered as a new probability measure. Indeed, let Q be
an arbitrary set, a cr-algebra of the subsets of Q, and P a probability
measure (i.e. a nonnegative, completely additive set function satisfying
P(Q) = 1). Further let B be a fixed element of such that P(B) > 0. Then
P(A | B) is a probability measure on as well, i.e. P(A | B) satisfies Axioms
IV, V, and VI. Indeed, by introducing the notation P*{A) = P(A \ B) we
BYAn £ P(AnB) „
”=1— = -= V P* (A„).
P(B) P(B) „ti
P(A\B)P(B)
P(B i A) = (2)
~P~(A)~~
hence P(B | A) can be expressed by means of P(A \ B), P(A), and P(B). One
can write (2) in the following form, equivalent to it:
P(B | A) _ P(A | B)
P(B) P(A) ' 1 ^
Let A and B be two events of a probability algebra; assume that F(A) > 0.
and P(B) > 0. In the preceding paragraph the conditional probability
P(A | B) was defined. Generally it is different from P(A). If, however, it is
not, i.e. if
P(A | B) - P(A) (1)
A and B being independent, (2) is valid; conversely, if (2) holds and P(A),
P(B) are both positive, then (1) and (F) hold as well, thus A and B are inde¬
pendent. Hence (2) is the necessary and sufficient condition of the indepen¬
dence, thus it may serve as a definition, either. Old textbooks of probability
theory used to call relation (2) the product rule of probabilities. However,
according to the interpretation followed in this book (2) is not a theorem
but the definition of independence. (Since we take Formula (2) as the defi¬
nition of independence, any event A with P(A) = 0 or P(A) — 1 is inde¬
pendent of every event B.)
If A and B are independent, A and B are independent as well. Namely
from (2) it follows that
are valid for them. It is easy to see that from the m-n conditions figuring in
(3) every one containing Am or Bn can be omitted. If the remaining mn -
— (m + n — 1) = (m — 1) (n — 1) conditions are fulfilled, the omitted
ones are necessarily fulfilled too, as is seen from the relations
m
X P(A,B,)=P(Bt) {.k — 1,2,..., ri). (5)
;=i
are superfluous, since the validity of one implies necessarily the validity of
the remaining three. Thus the independence of the events A and B is equi¬
valent to the independence of the complete systems of events (A, A) and
(B, E). This follows also from the relation
are all equally probable; then each will have the probability We obtain
the same result by the concept of independence, assuming that head and
1
tail are equally probable at both tosses both having probability — and that
the two tosses are independent from each other. Thus the probability of
each possibility is
II, § 9] THE INDEPENDENCE OF EVENTS 59
Let us now extend the concept of independence to more than two events.
If A, B, and C are pairwise independent (i.e. A and B, A and C, B and C
are independent) events of the same probability algebra, the non-existence
of any dependence between the events A, B, and C does not follow. This
may be seen from the following example.
Let us throw two dice; let A denote the event of obtaining an even number
with the first die, B the event of throwing an odd number with the second,
finally C the event of throwing either both even or both odd numbers. Then
P{A)=P{B)=P(C) = ~.
further
P(ABC) «= 0,
thus
P{ (AB)C) # P(AB)-P(C),
P(AB) = P(A)P(B),
P(AC) = P(A)P(C),
P{BC) = P(B)P(C),
P(ABC) = P(A)P(B)P(C)
are valid. The first three of these relations express the pairwise indepen¬
dence of A, B, and C, the fourth the fact that each of the events is inde¬
pendent of the product of the remaining two. Indeed, from the first three
conditions we have:
is valid for any combination (q, /2,. . ., 4) frotn the numbers 1,2,. . ., n.
Since from n objects one can choose k objects in ways, (7) consists of
2n-n-\ conditions. In what follows, by saying for more than two events
that they are independent we shall mean that they are completely indepen¬
dent in the sense just defined. If only pairwise independence is meant this
will be stated explicitly. The independence of more than two complete sys¬
tems of events can be defined in a similar manner.
Combinatorial methods for the calculation of probabilities have already
been mentioned. They rested upon the assumption of the equiprobability
of certain events. By means of the concept of independence, however, this
assumption may often be reduced to more simple assumptions. Besides the
simplification of the assumptions, this reduction has the advantage that
the checking of the practical validity of our assumptions becomes sometimes
more easy.
It was supposed that all combinations are equally probable, the proba-
n l-1
bility looked for is thus
/c
This result may also be obtained from the following simpler assumption:
at every drawing the conditional probability of drawing any object still in the
urn is the same. Here the probability that a given combination occurs in a
11 1
given order is— --- -;-.Namely, at the first drawing there
n n-1 n — k+ 1
are in the urn n objects, the probability of choosing any one is — ; at the
n
second drawing the conditional probability of choosing any one of the n — 1
objects which are still in the urn is ———, etc. Since the elements of the com-
n — 1
bination in question may be chosen from the urn in k\ different orders,
the obtained result must be multiplied by k! and thus we get that the proba¬
bility of drawing a combination of k arbitrary elements is
-l
k! n
n{n — 1) ... (n — k + 1)
II, § 9] THE INDEPENDENCE OF EVENTS 61
and the outcomes of the individual drawings are independent. Hence the
Wu = Pk(\-Pr (8)
Indeed, let At denote the event of choosing a red ball at the z'-th drawing
(i = 1, 2,..., n). These events are, because of the replacement, independent
of each other. The probability that at the z\-th, z'2-th,. . ., 4-th drawing a
red and at all the other j2-th,. . .,jn-k-th) drawings a white ball will
be chosen is nothing else than the probability of the event
As the events Ah, Ah,. . .,Aik, Aa, Ah,. . ., AJn_k are completely inde¬
pendent and P(At) = p, P{Aj) = 1 — p, we get
P(A) (i)
m'
It is easy to see from the results of § 7 that [£2, ^€,P\ is a Kolmogorov proba¬
bility space. In this probability space probabilities may be obtained by
geometric determination of measures. Probabilities were thus calculated
already in the Eighteenth Century.1
Some simple examples will be presented here.
Example 1. In shooting at a square target we assume that every shot hits
the target (i.e. we consider only shots with this property). Let the probability
that the bullet hits a given part of the target be proportional to the area of
the part in question. What is the probability that the hit lies in the part A ?
Clearly we only have to determine the factor of proportionality. If £2 de¬
notes the entire target, the probability belonging to it must be equal to 1.
Hence
P(A) =
KA)
p{Q)
where p{£2) denotes the area of the entire target and p(A) that of A. Thus
for instance the probability of hitting the left lower quadrant of the target is
, 1
equal to —.
As seen from this example, not every subset of the sample space can be
considered as an event. Indeed, one cannot assign an event to every subset
of the target, since the “area”, as it is well known, cannot be defined for
every subset such that it is completely additive and that the areas of con¬
gruent figures are equal.
In general, the distribution of probability is said to be uniform, if the
probability that an object situated at random lies in a subset can be obtained
according to the definition (1) from a geometric measure p invariant under
displacement (e.g. volume, area, length of arc, etc.).
Example 2. A man forgot to wind up his watch and thus it stopped. What
is the probability that the minute hand stopped between 3 and 6 ? Suppose
1 Of course instead of Lebesgue measure the notion of the area (and volume) of
elementary geometry was applied.
II, § 10] “GEOMETRIC” PROBABILITIES 63
the probability that the minute hand stops on a given arc of the circum¬
ference of the face of the watch is proportional to the length of the arc in
question. Then the probability asked for will be equal to the quotient of
the length of the arc in question, and the whole circumference of the face;
. . 1
i.e. in our case to —.
4
In the above two examples the determination of the probabilities was
reduced to the determination of the area or of the length of the arc
in certain geometric configurations. Though this method is intuitively
very convincing it is nevertheless a very special method. Before applying
it to further examples, let us see its relation to the already described combi¬
natorial method. This relation is most evident in Example 2. If we neglect
the fractions of the minutes and are looking for the probability that the
minute hand stops between the zeroth and the first, the first and second,. . .,
the A:-th and k + 1-th minute {k = 0, 1,. . .,59), then we have a sample
space consisting of 60 elementary events; the probability of every event is
1
the same, viz. In the case of the example of the target let us assume,
60“'
for sake of simplicity, that the sides of the square target are 1 m long. Let
us subdivide the target into rr congruent little squares with sides parallel
to the sides of the target. The probability that a hit lies in a set which can be
obtained as the union of a certain number of the little squares is obtained
by dividing the number of the little squares in question through n2. Thus we
see that geometric probabilities can be approximately determined by a
combinatorial method. We must not, however, restrict ourselves to some
fixed n in the subdivision, for then we could not obtain the probability of a
hit lying in a domain limited by a general curve. If the mentioned subdivi¬
sion is performed for every n however large, then the probability of measur -
able sets, or to be more precise, of every domain having an area in the sense
of Jordan, can be calculated by means of limits. For this calculation we
k
have to consider the quotient —f- where kn means the number of small
n
squares lying in the domain if the large squaie is subdivided into con-
k
gruent small squares and we have to determine the limit of for
n -*■ oo.
Probabilities obtained in a combinatorial way (without passing to the
limit) are always rational numbers; geometric probabilities, however, may
assume any value between 0 and 1. Thus for instance the probability that
, 71
the hit lies in the circle inscribed into the square target is equal to —.
64 PROBABILITY [II, § 10
Fig. 9
2 A *
nr 4
metry of the circle we may assume that the midpoint of the chord lies on a
fixed radius of the circle and choose the midpoint of the chord so that the
probability that it lies in a given segment of this fixed radius is assumed to
be proportional to the length of this segment. The chord will be longer
than the side of the inscribed regular triangle, if its midpoint has a distance
r . 1
less than — from the centre of the circle: the answer is thus — (cf. Fig. 10).
2 2
Interpretation 3. Because of the symmetry of the circle one of the end¬
points of the chord may be fixed, for instance in the point P0; the other end¬
point can be chosen on the circle at random. Let the probability that this
other endpoint P lies on an arbitrary arc of the circle be proportional to the
length of this arc. The regular triangle inscribed into the circle having for
one of its vertices the fixed point P0 divides the circumference into three
equal parts. A chord drawn from the point P0 will be longer than the side
of the triangle, if its other endpoint lies on that one-third part of the circum¬
ference which is opposite to point P0. Since the length of this latter is one
third of the circumference, the answer is, according to this interpretation,
Fig. 11
1 Cf. W. Blaschke [1 ].
II § 10) “GEOMETRIC” PROBABILITIES 67
case we only need to compute the area of the domain determined by the
inequalities (Fig. 12)
A 1 1
0<x<—<y< 1 and y —x < —
2 2
or
Fig. 12
The method just applied is often used, for instance in statistical physics.
Here, to every state of the physical system a point of the “phase space”
may be assigned, having for its coordinates the characterizing data of the
state in question. Accordingly, the phase space has as many dimensions,
as the state of the system has data to characterize it (the so-called degrees
of freedom of the system). In our example we assigned a point of the phase
space to a decomposition of the (0,1) interval by two points; the degree of
freedom of the “system” is here equal to 2. The analogy can be made still
more obvious by assigning to the decomposition of the (0,1) interval a
physical system: two mass points moving in the interval (0,1).
Clearly the phase space may be chosen in many ways; by solving problems
of probability in this way, however, one must not forget to verify in every
given case separately the assumption that the probabilities belonging to
the subdomains of the phase space are proportional to the area (volume).
Finally we shall discuss here a classical example, Buffon's needle problem
(1777).
68 PROBABILITY [II, § 10
< — sin 09 and the line x = d if d sin cp < x < d. Thus the needle
2
intersects the line x = 0, if and only if the point (x, cp) characterizing the
position of the needle lies to the left of the sine curve drawn over the line
lies to the right of the sine curve drawn over the line x = d with the same
amplitude (Fig. 13).
I
Since the area under a half-wave of a sine curve of amplitude — is equal
to /, the area of the domain formed by the points which correspond to inter¬
section will be 21. The area of the whole rectangle being nd, the sought
t . 21
probability is —— . Thus in many repetitions of Bufifon’s experiment one will
21
find intersection in approximately a fraction — of the experiments. It was
nd
tried more than once to determine approximately the value of n by this
II, § II] CONDITIONAL PROBABILITY SPACES 69
termine the value of n with any prescribed precision. Of course this would
have no practical importance, since there are more straightforward and
reliable methods to compute the value of n. Still the question is of great
interest, since it shows that certain mathematical problems can be solved
approximately by performing experiments of a probabilistic nature. Now¬
adays difficult differential equations and other problems of numerical anal¬
ysis are treated in this manner (this is the so-called Monte Carlo method).
Questions dealt with in this paragraph are closely connected to integral
geometry.
p I 4,1 B = X P(AJB).
n=1 n=1
P(AB | C)
P(A | B) =
P(B | C) ‘
If the Axioms a), b), and c) are satisfied, we shall call the system
[fi, , -f,P(A | fi)] a conditional probability space.
P(A | B) < 1.
that BC £38 and Pc(B) > 0, further if P£(A | B) is, as usually, defined by
P*c (AB)
P*C(A\B) =
P*(B) ’
we have
P*(A | B) = P(A | BC).
Theorem 7. Suppose Q C -ft and put P*(A) — P(A ] Q). Then [Q, /**]
is a Kolmogorov probability space. Further if P*(B) > 0 we have
P* (AB)
P(A I B) =
~P*(B) '
Remark. 38 may contain sets B such that P*(B) — 0. On the other hand, sets
B for which P*(B) > 0 may not belong to Hence [£?, 38, P(A \ B)]
is not necessarily identical to the Kolmogorov probability space
[Q, P(A | £2)], not even in the case Q £ 3H.
From the theorems proved above one readily sees how the generalized
theory of probability can be deduced from our axioms.
Let us mention here some further examples.
Example 1. Let Q be the ^-dimensional Euclidean space; let the points
of Q be denoted byai = (co^ co2,.. .,coK). Let denote the class of all mea¬
surable subsets of Q, let further/(co) be a nonnegative, measurable function
defined on Q and -% the set of all measurable sets B such that | f(co)dco be
finite and positive. Put B
j /(co) dto
P(A | B) = AB
\f(oi)da ‘
[Q,, 38, P(A | B)] is then a conditional probability space. If j f(ai)d<x> <
n
< + oo, a conditional probability space generated by a Kolmogorov proba-
II § ‘1] CONDITIONAL PROBABILITY SPACES 73
bility space is obtained; if, however, | f(co)dco = + oo, this is not the case.
n
Especially when /(co) = 1, we obtain the uniform probability distribution
in the whole ^-dimensional space. In this case
Pn (AB)
P(A\B)='
Pn(B)
I 1
P(A\B)^^1
n£B
is equal to the ratio of the number of elements of the set AB and the set B.1
Evidently the question arises how conditional probabilities are connected
with relative frequencies, i.e. whether the generalized theory does have a
frequency-interpretation too.
The answer is affirmative and even very simple. The conditional proba¬
bility P(A | B) can be interpreted in the generalized theory (as well as in the
theory of Kolmogorov) as the number about which the relative frequency of
A with respect to the condition B fluctuates. Thus the generalized theory
has the same relation to the empirical world as Kolmogorov’s theory.
/«(A B)
1 In both cases, P{A | B) could have been represented as the ratio where
fi(B)
fx is an unbounded measure.] (With respect to the conditions for the existence of such
measures cf. A. Csaszar [1], and A. Renyi [18]).
74 PROBABILITY [II, § 12
§ 12. Exercises
1. Let pu p2, pl2 be given real numbers. Prove that the validity of the four inequal¬
ities below is necessary and sufficient for the existence of two events A and B such
that P(A) = pu P(B) = p2, P(AB) = p12.
The numbers pu p2, p12 are therefore nonnegative and do not exceed 1.
Consider the interval I — (0, 1) and suppose that a random point P is uniformly
distributed in this interval; i.e. let the probability that P lies in a subinterval of / be
equal to the length of this subinterval. Let A denote the event that the point lies in
the interval 0 < x < pu and B that it lies in the interval px — p12 < x < py +
+ Pi — Pi2 • Then we have P(A) = pu P(B) = p2, P(AB) = pl2.
4. How can the conditions of Exercise 2 be simplified if we assume that for every
k = 2, 3, . . . , n
Plilz—ijc PiiPii • * • Pi^
5. Let Au A2,. . ., A„ be any n events and suppose that the probabilities
P(Ak Ar3. . . Aik) (1 < k < n, 1 < q < i2 < . . . < ik < n) are known. Find the
probability that at least k of the n events Au A2, . . . , A„ will occur.
Remark. If we define the “distance” d(A, B) of the events A and B as the probability
P(A A B), then we have the “triangle inequality” d(A, C) < d(A, B) + d(B, C).
7. If the distance of A and B is defined as
P(A A B)
for P(A + B) > 0,
d* (A, B) = P(A + B)
0 otherwise.
8. What is the probability that in n throws of a die the sum of the obtained numbers
is equal to kl
Hint. Determine the coefficient of xk in the expansion of the generating function
(x + x2 + x3 + x4 + x5 + *6)".
9. What is the probability that the sum of the numbers thrown is larger than 10
in a throw with three dice ?
Remark. This was the condition of gain in the “passe-dix” game which was current
in the Seventeenth Century.
10. What is more probable: to get at least one six with four dice or at least one
double six in 24 throws of two dice? (Chevalier de Mere’s problem.)
12. An urn contains n white and m red balls, n ^ m; balls are drawn from the urn
at random without replacement. What is the probability that at some instant the
numbers of white and red balls drawn are equal?
13. There is a queue of 100 men before the box-office of an exhibition. One ticket
costs 1 shilling. 60 of the men in the queue have only 1 shilling coins, 40 only 2 shilling
coins. The cash contains no money at the start. What is the probability that tickets
can be sold without any trouble (i.e. that never comes a man, having only 2 shilling
coins, before the cash desk at a moment when the latter contains no 1 shilling coin) ?
14. A particle moves along the x-axis with unit velocity. If it reaches a point with
integer abscissa it has one of two equiprobable possibilities: either it continues to
proceed or it turns back. Suppose that at the moment t — 0 the particle was in the
point x — 0 . Find the probability that at a time t the particle will have a distance
x from the origin {t is a positive integer, x an arbitrary integer).
15. Let the conditions of Exercise 14 be completed by the following: at the point
with abscissa x0 (a positive integer) there is an absorbing wall; if the particle arrives
at the point of abscissa .v0 it will be absorbed and does not continue its movement.
Answer the question of the preceding exercise for x <, x0.
16. A box contains M red and N white balls which are drawn one after the other
without replacement. Let Pk denote the probability that the first red ball will be drawn
at the &-th drawing. Since there are N white balls, clearly k <, N + 1 and thus
P-i + P2 + • • • + -Pjv+i = 1. By substituting the explicit expression of Pk we obtain
an identity. How can this identity be proved directly, without using probability theory ?
17. Let us place eight rooks at random on a chessboard. What is the probability
that no rook can take another ?
Hint. One has to count the number of ways in which 8 rooks can be placed on a
chessboard so that in every row and in every column there is exactly one rook.
18. Put
76 PROBABILITY [II, § 12
and
wk = \ k |^(i-pyn—k {k — 0, 1,...,«),
M M
where p = —-. Prove that if M and N tend to infinity so that = p remains
N
constant, then Pk (M, N) tends to Wk.
19. Put
N- M
{k - 1)! M(N - k)\
k - 1
Qk(M,N) =
Nl
and
vk = P{\ - Py k-l
(k = 0,1, 2 . . .),
20. How many raisins are to be put into 20 ozs of dough in order that the proba¬
bility is at least 0.99 that a cake of 1 oz contains at least one raisin?
to be perfect, the vessel contains ■ gallons of water. Similarly, the number of the
x
fishes in a pond may be determined as follows: 100 fishes are caught, marked (e.g.
by rings) and replaced into the pond. After an interval of some days 100 fishes are
caught again and the marked ones are counted. If their number is x > 0 then the
100
pond contains about -- fishes. If the pond contains 10 000 fishes, what is the
x
probability that among the fishes of the second catch the number of marked fishes
.s 0, 1, 2, or 3?
22. A stick is broken at a random point and the longest part is again broken at
random. What is the probability that a triangle can be formed from the three pieces
so obtained? (Observe that the conditions of the breaking differ from those of Example
4 of § 10.)
23. Consider an undamped mathematical pendulum. Let the angle of the maximal
elongation be 2°. What is the probability that at a randomly chosen instant the
elongation will be greater than 1°?
24. Let Buffon’s problem be modified by throwing upon the plane a disk instead
of a needle. What is the probability that the disk will not cover any of the lines?
25. In a five storey building the first floor is 8 metres above the ground floor, while
each subsequent storey is 6 meters high. Suppose that the elevator stops somewhere
because of a short circuit. Let the height of the door of the elevator be 1.8 meter.
II, § 12] EXERCISES 77
Compute the probability that at the time of the stopping only the wall of the elevator
shaft can be seen from the elevator.
26. What conditions must the numbers p, q, r, s satisfy in order that there exist
events A and B such that
27. A box contains 1000 screws. These are tested at random so that the probability
of a screw being tested is equal to . Suppose that 2 per cent of the screws are
defective; what is the probability that from the tested screws exactly two are defec¬
tive?
28. If A and B are independent events and A H B, prove that either P(A) — 0
or P(B) = 1.
29. Show by an example that it is possible that the event A is independent of both
BC and B + C, while B and C are also independent but A is not independent either
of B or of C .
32. Let Au Ait... ,A„ be any distinct events. Let P(Ak) = pk(k— 1, 2,..., «),
further let Ur denote the probability that exactly r from the events Ak (k = 1,2.n)
occur. Put
Sk= X P(AhAh...Alk) (.k = 1,2,..., n).
r+ 1 r+2 n
Ur = Sr $r +1 + Sr + 2 —...+(— 1)"
1 n — r
How will this expression be simplified if we assume that the events Alt A2,. . . , A„
are completely independent and equiprobable?
r + 1
Sr=Ur + Ur+l + . . • + U„.
1
78 PROBABILITY [II, § 12
36. Prove Theorem 10 of § 3 using the results of Exercise 35, and by determining
the coefficients Ctj2^lr in the particular case when the events Ak are independent.
According to the statement of Exercise 35 the formula with these coefficients will be
valid in the general case too.
1
e.g. for a-> oo we have Wr(ri) -
e • ri
38. The events Alt A2, ..., A„ are said to be exchangeable1, if the value of
depends only on r and does not depend on the choice of the different indices
h> 4> • • • > ir (r = 1,2, , n). Thus if Ax, A2, . . ., A„ are independent and equiprob¬
able, they are also exchangeable. Show that from the exch: ngeability of the events
Au A2,. . ., A„ their independence does not follow.
39. a) Let an urn contain M red and N — M white balls, n balls are drawn without
replacement, n < min (M, N — M). Let Ak denote the event that the &-th drawing
yields a red ball. Prove that the events Ax, A2,. . . , An are exchangeable.
b) Prove that the events At, A2,. . ., A„ defined in Exercise a) are even then
exchangeable, if every replacement of a ball drawn from the urn is accompanied by
throwing R balls of the same colour into the urn.
40. Each of N urns contains red and white balls. Let the number of the red balls
in the r-th urn be an that of the white balls br and let vr be the probability of drawing
a red ball from the r-th urn; that is, we put vr = ——-. Perform the following
ar+ br
experiment. Choose first one of the urns; suppose that the probability of choosing
the r-th urn is pr > 0 (r = 1,2,..., N). Draw from the chosen urn n balls with
replacement. Let Ak denote the event that the A>th drawing yields a red ball. Prove
now the following statements:
a) The events Ax, A2, . . ., A„ are exchangeable.
b) The events Ak are, generally, not even pairwise independent.
c) Let Wk denote the probability that from the n drawings exactly k yield red balls.
Compute the value of fVk .
d) Let nk denote the probability that the first red ball was drawn at the A>th
drawing; compute the value of nk .
41. Let Ak denote the event that given the conditions of Exercise 37 the k-\h person
draws his own visiting card. Prove that the events Ak are exchangeable.
42. Let TV balls be distributed among n urns such that each ball can fall with the
same probability into any one of the urns. Compute
a) the probability P0(n,N) that at least one ball falls into every urn;
b) the probability Pk («, TV) that exactly k (k — 1,2,,n — 1) of the urns remain
empty.
43. Let Ak denote in the preceding exercise the event that the &-th urn does not
remain empty; show that the events Ak are exchangeable and
44. Banach was a passionate smoker and used to put one box of matches in both
pockets in order to be never without matches. Every time he needed a match, he chose
at random either the box in his right or that in his left pocket with the same probability
_. One day he put into his pockets two full boxes, both containing n matches. Let
Pk denote the probability that on first finding one of the boxes to be empty, the other
box contained k matches. Calculate the value of Pk and find the value k which
maximizes this probability.
45. An urn contains M red and TV — Tkf white balls, —= p . Let Pr denote the
46. Let an urn contain TVf red and TV — TV/ white balls. Draw all balls from the
urn in turn without replacement and note the serial numbers of the red drawings.
Let these serial numbers be ku k2,. .., kM, and put X — kx + k2 + . . . + kM . Let
P„ (M, TV) denote the probability that X = n (A < n < B), where
M(M+ 1)
A - and B = A + M(N — M).
2
80 PROBABILITY [II, § 12
Put
B
F(M, N, x) = £ Pn(M,N)xn.
n=A
Determine the polynomial F(M, N, x) and thence the probabilities Pn(M, N).1
Prove that
PB-n (M, N) = PA+n (M, N).
47. Prove by means of probability theory that if cp{ri) denotes the number of the
positive integers less than n and relatively prime to n (n = 1,2,.. .), then2
1 '
<p(n) = n 11
p\n P,
where the product is to be taken over all distinct prime factors p of n .
Hint. Choose at random one of the numbers 1,2,,n such that each of these
numbers is equally probable. Let Ap denote the event that the number chosen can
be divided by the prime number p . Show that if pu p2, . . . are the distinct prime
factors of the number n, then the events APi, A,P2. . . are independent. The proba¬
bility that the chosen number is relatively prime to n is, by definition, -. On the
n
other hand we have P(Ap) = — , hence, because of the independence of the events Ap,
P
^-=p(ni,)=n^.) = n(i-4i-
p\n
48. a) Let f? 'be a countably infinite set, let its elements be cou co2, . . ., con,....
Let cA consist of all subsets of Q and let the probability measure P be defined in the
Pn<Y. P« (« = 1, 2, . . .).
k = n+1
c) Given the conditions of Exercise 48 a), prove that to every r-tuple of numbers
xu x2, . .., xr with
1
Pn< — Y Pk (« = 1,2, . . .).
r k=n
representable too. Indeed we can select frorh the sequence x„ an infinite subsequence
xnic(k = 1, 2, . . .) such that in the representation of each x„k the greatest member
is ptl. Take now from this sequence an infinite subsequence having in its representation
for second greatest member ptl. By progressing in this manner we obtain a sequence
oo
pis(s — 1, 2, .. .) and it is easy to verify that Y p,t — x. The range of the function
5=1
P(A) is thus a closed set. Furthermore, if x is a number which can be represented as
N
a sum of a finite number of the prs, e.g. x = Y pu, then
i=l
N
x = lim (£ Pu + p„).
ft—► CO /= 1
CO
If x = ^ Pij, then
i=\
n
x = fim Yj Pir
ft-*- CO /= 1
in the following manner: suppose we have xx > x2 > . . . ^ xr. Then xx > —, and
on the other hand px<— hence px can be used for the representation of xx. Let now
r
be x\ = max (xx — pu x2) then x[ > — (1 — px). Sincep2 < — (1 —Pi), Pi can therefore
be used for the representation of x[, that is for one of the x;-s. Proceeding in this way
we can see that every p„ can be used for the representation of an x;. Since
fJPn=fJ Xj = 1,
n=1 /=1
49. The Kolmogorov probability space [Q, cA, P] is said to be non-atomic, if there
exists to every event A of positive probability an event B cz A such that 0 < P(B) <
< P(A). Prove that in the case of a non-atomic probability space [Q, cA, P] the range
of the function P(A) is the whole interval [0, 1].
82 PROBABILITY [II, § 12
Hint. Prove first that for any e > 0 Q can be decomposed into a finite number
of disjoint subsets A, (Af € cA; j — 1,2,. . . , m) such that P(Aj) < e. This can be
seen as follows. If A £ cA, P(A) > 0, then A contains a subset B c A such that
0 < P(B) < e. Indeed if P(A) < e, we can choose B — A. If P(A) > e, then (since
P is non-atomic) a B cz A can be found such that B ZcA and 0 < P(B) < P(A);
P(A) P(A)
here either P(B) or P(A — B) is not greater than —-—. If —-— < e, we have
P(A)
completed the proof; if > e, the procedure is continued. Since for large enough
2
P(A)
r we have < e, there can be found in a finite number of steps a set B such that
V
According to what was said above, f(/f) > 0 for P(A) > 0. Choose a set Ax ^cA
for which 0 < P(At) < e, further a set A2 cz A, for which
and then a set A3 cz At + A2 for which e > P(A3) ^ (^i + A2); generally, if
the sets Au A2,. . ., An are already chosen, we choose a set A„+1 such that the con¬
ditions
A„+1 c Ay + A2 + . . . + An
and
are satisfied. Then Av A2, .... A„,... are disjoint sets, hence ^ P (A„) < 1
n=l
and thus lim P (A„) = 0 and at the same time
/!—► CO
00
Since fie (A) is a monotonic set function, we get introducing the notation £ An = B
n=2
that [iE (B) = 0. But it follows that P(B) = 0 and thus, introducing the notation
A\ — At + B, we obtain that
CO
Choose now N so large that £ P(A„) < e. Then the sets A\, A2,.. ., AN-1 and
n^N
CO
A'n = Yj A„ possess the required properties. Now we can construct for an arbitrary
n= 1
number x (0 < * < 1) an A ^cA such that P(A) = x in the following manner: Q is
decomposed first into a number Nx of disjoint subsets Av such that
Let xlir — P(^£j Ay}. Then x lies in one of the intervals [xlir, x1>r+1), r = 1, 2,,
7=1
Nx — 1, let it be e.g. the interval [xI>fi, xliri+1). If x ■= xliri, we have finished the
construction. If xljfl < x < x1>(.1+1 we decompose 41>ri+1 into subsets A2t (J =
= 1,2,..., N2) such that
Let
Then x lies in one of the intervals [x2>r, x2>r+1); e.g. x £ [x2>r2, x2_r +x). By con¬
tinuing this procedure we obtain a set
A — X Au+ £ A^ + .. . + £ Asi + . ..
i=i i=i i=i
50. Prove for an arbitrary probability space that the range of P(A) is a closed set.
Hint. A set A £cA will be called an atom (with respect to P), if P(A) > 0 and if
B £ cA, B c A imply either P(B) = 0 or P(B) = P(A). Two atoms A and A’ are, a
set of zero measure excepted, either identical or disjoint. From this it follows that
there can always be found either a finite or a countably infinite number of disjoint
co
atoms An (n = 1,2,...) such that the set Q — A„ contains no further atoms. Put
n=1
Then /u(A) = (A) + M2(A). Here /u^A) can be considered as a measure on the
class of all subsets of the set Q’ having for its elements the sets A„, and /u2(A) is non-
atomic. Hence the statement of Exercise 50 is reduced to the Exercises 48a) and 49.
CHAPTER 111
ZPn=h (2)
n
Let Bh Bo,. . ., Bn,... be a complete system of events and let P(Bt) > 0
(i = 1, 2,. . .). Then an arbitrary event A £ ^ can be decomposed accord-
in, § 2] THEOREM OF TOTAL PROBABILITY AND BAYES’ THEOREM 85
P(AB„) = P(A\Bn)(PBn).
according to (2) the probability P(A) is the weighted mean of the conditional
probabilities P(A | Bn) taken with the weights P(Bn). From this it follows
immediately that
M
77 (k
P{Ak) 2, 3, . . ., N). According to the theorem of total proba-
bility
P(A2) = P(A21 AJ P(AJ + P(A21 Af P(Ai),
86 DISCRETE RANDOM VARIABLES [HI, § 2
hence
M— l M M N-M M
a) " N- 1 ’ ~N + N — 1 ’ N~ _ '
P(A | B) P(B)
P(B | A) = (4)
P(A)
If {Bn} is a complete system of events and if in- (4) Bk is substituted for
B and Expression (2) for P(A), we have
P(A\Bk)P(Bk)
P(Bk | A) = (5)
Y,P(A\Br)P(B„)
n\
P(Bku k2, — PlP2---Pkrr- (2)
kx\ k2\ ... kr\
The name “polynomial distribution” comes from the fact that the terms
P(Bku k„...,kr) can be obtained by expanding (px + p2 + . . . + pr)n
according to the polynomial theorem. If r = 3, we call the distribution
the trinomial distribution.
where pt is the probability that A did occur at the z-th experiment. The
summation is to be taken over all combinations (zl5 z2,. . ., ik) of the k-th
order of theelements (1, 2,. . ., n) and . . -,jn-k denote those numbers
of the sequence 1, 2,. . ., n which do not occur among z'l5 z2,. . ., ik. The
numbers P(Bk) form a probability distribution. If for instance all probabili¬
ties Pi are equal to each other, we obtain as a particular case the binomial
distribution (1).
The distribution (3) occurs for instance in the following practical problem:
In a factory there are n machines which do not work all the time. They are
switched on and switched off independently from each other. Let pt denote
the probability that the z'-th machine is working at a given moment, let
P(Bk) be the probability that at this instant exactly k machines are working,
n
then P(Bk) is given by the Formula (3). The fact that ]T P(Bk) = 1 can
k=0
be seen directly in the following manner: A simple calculation gives that
£ P(Bk) xk = f[ (1 - pt + Pi X);
k=0 (=1
(M\ fA — M\
P(Ck) =
UJ n —k ]
(k = 0,1,..., n). (4)
and let Cki, ks, . . ., kr denote the event that among n balls drawn without
replacement the first colour occurs kx times, the second k2 times,. . ., the
r-th colour kr times (kx + k2 + . . . + kr = n). By a simple combinatorial
consideration we obtain that
Nx (N, Nr\
.o = i-
This can be seen directly, if we compare the coefficient of xn on both sides
of the identity
n (i+*)*'=(! +
2=1
x)n.
M
^o) = -F>
k—l
M M
P(Ak) =
N-k
n N-j
(k= 1,2,... .N — M). (6)
144222
90 DISCRETE RANDOM VARIABLES [HI, § 3
_M N~M M *~x
N + kti N-k ,U
This identity also has a direct proof, but it is not quite simple. It happens
often that certain identities for which a mathematical proof may be rather
elaborate, are readily obtained by means of probability calculus.
ki' = YAn
71 = 0
The probability that we obtain at the first k drawings white balls and at the
(k + l)-st drawing a red one is
X P(Ak)=p£ qk = = 1.
Hence the probability of Q' is 1 and thus P(Q — Q') — 0. Though it is, in
principle, possible that Q — Q' occurs; this possibility can be neglected in
practice. Hence the system {Ak } of events is, in a wider sense of the word,
complete.
The distribution pqk (k = 0, 1,.. .) is often called the geometric distri¬
bution, since the sum of the members pqk is a geometric series. We shall see
HI, § 3] CLASSICAL PROBABILITY DISTRIBUTIONS 91
where the meaning of E„ can be only A or A, the number x having the binary
expansion x = 0, where
( 1 if En = A,
[ 0 if En = A.
P(C) =pkql.
From this we can compute P(C) for every C £ Clearly [^0,P] is a proba¬
bility algebra, but is not a a-algebra. But if we consider the least a-al¬
gebra containing and extend the set function P{C) defined over
(readily seen to be a measure on to then we obtain the Kolmogorov
probability space sought for (cf. Ch. II, § 6). In order to prove that P(C)
is a measure on let us consider the above mapping of the sample space
onto the interval [0, 1]; let the interval [0, 1] be denoted by Q*. There cor¬
responds to the algebra of sets the class of the subsets of Q* consist¬
ing of a finite number of pairwise disjoint intervals with binary rational end¬
points. Just like in Chapter II, § 7 there can be given a function F(x) so that
the probability belonging to the interval [a, b) = 1 be equal to F(b) — E(a)m
92 DISCRETE RANDOM VARIABLES [III, § 3
m+ 1
Indeed, if the interval [a, b) is of the form (m being odd) and
2n
mil 1
- — -;— -:— ”f” • • • “f" -:— (4 < 4 < . . . < 4 = n),
2n 21' 2h 2lk
then we put
F(b) - F(a) = / q"~k.
From this F(x) can be determined at every binary rational point x. Thus
for instance
F(0) = 0, F(l)=l,
F
fi \ , „ f3
= + q2p-.
[«j = q + pq, f
[tH- f 8
5 \
F f7 = q+pq+p2q, ;tc.
T) = q + qp' F J
In general, if
m= E
k=1
It is easy to see that F(x) is an increasing continuous function and F(0) =
= 0, .F(l) = 1. Hence the result of Chapter II, § 7 can be applied here. The
extension of is in this case the collection of all Borel-measurable
to show that £ P{A(J?) = 1. This follows from (8) in the following manner:
k=0
00 00
k + r- 1'
I p q (-#
. r-1 , k=0
The distribution (8) will be called the negative binomial distribution of r-th
order, since the probabilities in question can be obtained as terms of the
binomial series (for a negative exponent) of the expression p\ 1 — q)~r.
Since the events A$ (k = 0, 1,. . .) form a complete system of events, the
probability, that the number of occurrences of an event in infinite repeti¬
tions of an alternative remains bounded, has the value zero.
Indeed, if C„ denotes the event that in the infinite sequence of experi¬
ments A occurred exactly n times; then, as proved above, P(CJ = 0. If
therefore C denotes the event that A occurs in the infinite sequence of events
00
only a finite number of times, then C = £ C„, and
«=o
P(C) = t P(Cn) = 0.
n=0
Thus the event A occurs infinitely many times with the probability l.1
9. Consider the following problem: let an urn contain M red and N - M
white balls. Draw a ball at random, replace the drawn ball and at the same
time place into the urn R extra balls with the same colour as the one drawn.2
Then we draw again a ball, and so on. What is the probability of the event
that in n drawings we obtain exactly k times a red ball? Let this event be
denoted by Ak. Of course we assume that at every drawing each ball of the
1 Later on we shall prove more: let kn denote the number of occurrences of A in
the first n experiments, then not only lim kn — + 00 with probability 1, but more
n-*-co
k
precisely lim —— = p with probability 1.
n—*-co tl
2 R can be negative as well. In case of negative R we remove from the urn R balls
of the same colour as the one drawn.
94 DISCRETE RANDOM VARIABLES [III, § 4
urn is selected with the same probability. We compute first the probability
that we obtain at each of the first k drawings a red ball, and white balls at
the remaining n — k drawings. Clearly this probability is
n cM+jR> n
j=o h=a
(.N-M + hR)
n —1 (9)
n
l-0
(n+ir)
;m("+;s) n cn
ln\k~1
\K)J=0_h = 0_
n-k-l
-m+hk >
P(Ak) = (10)
ff (n+«).
1=0
M kz} l N — M + jR
N+kR /J N+jR ,
(11)
So far we have only considered whether a random event does or does not
occur. Qualitative statements like this are often insufficient and quantitative
investigations are necessary. In other words, for the description of random
mass phenomena one needs numerical data. These numerical data are not
constant, they show random fluctuations. Thus for instance the result of
a throw in dicing is such a random number. Another example is the number
of calls arriving at a telephone exchange during a given time-interval, or
the number of disintegrating atoms of a radioactive substance during a
given time-interval.
In order to characterize a random quantity we have to know its possible
values and the probabilities of these values. Such random quantities are
HI, § 4] THE CONCEPT OF A RANDOM VARIABLE 95
called random variables. In the present Chapter we shall discuss only random
variables having a countable set of values; these are called discrete random
variables. The random variables figuring in the above examples were all
of the discrete type. The life time of a radioactive atom is for instance also
a random variable but it is not a discrete one. General (not discrete) random
variables will be dealt with in the following Chapters. In what follows ran¬
dom variables will be denoted by the letters of the Greek alphabet.
Let A be an arbitrary event. Let the random variable be defined in
the following way:
I
I if A occurs,
0 otherwise (i.e. if A occurs).
P(Za = 1)=P(A),
and similarly
P(&A = 0) = P(A) = 1 - P(A).
that b, — xn, then clearly AnAm = 0, 'An^m and £ vfn — Q, hence £P(AJ =
n=1 n=1
= 1. Conversely, there can be assigned (in several ways) to every complete
system of events {An} a random variable £ such that in case of the occurrence
of A„ the value of £ should depend on the index n only. £ can for instance
be defined in the following manner:
b =n if An occurs (n = 1,2,...).
The value n may be replaced by /(«), where f(n) is any function defined for
the positive integers, for which f(n) A f(m) if n ^ m. Thus we can see that
a complete system of events can be assigned to every discrete random va¬
riable in a unique manner, while there can be assigned infinitely many dif¬
ferent random variables to a complete system of events.
We shall deal in this Chapter with random variables assuming only real
values. It must be said that probability theory deals also with random vari¬
ables whose range does not consist of real numbers but for instance of
96 DISCRETE RANDOM VARIABLES [HI, § 4
^-dimensional vectors. There are also random variables whose values are
not vectors of a finite dimension but infinite sequences of numbers or func¬
tions, etc. Later on, we shall also examine such cases.
Now let us see, how the notion of a random variable is dealt with in the
general theory of probability.
In Chapter II we were made familiar with Kolmogorov’s foundation of
probability theory. We started from a set Q, the set of elementary events,
and an-algebra consisting of subsets of Q. Here consists of all events
coming into our considerations. Further there was given a nonnegative
er-additive set function P defined on such that P(Q) = 1. The value
P(A) of this function for the set A defines the probability of the event A. Nat¬
urally, we understand by a random variable a quantity depending on which
one of the elementary events in question occurs. A random variable is there¬
fore a function £ — £(co) assigning to every eleme'nt co of the set Q (i.e., to
every elementary event) a numerical value.
What kind of restrictions are to be prescribed for such a function? If we
have a probability field where every subset of Q corresponds to an event,
no restriction is necessary at all. But if this is not the case, then the definition
of a random variable calls for certain restrictions.
Since we consider in this Chapter discrete random variables only, we
confine ourselves (for the present) to the following definition:
Let [£?, P] be a Kolmogorov probability space. A function £ = E(m)
defined on Q with a countable set of values is said to be a discrete random
variable, if the set, for which £(co) takes on a fixed value x belongs to *^6 for
every choice of this fixed value x.
Let xx, x2,. • . denote the different possible values of the random variable
£ = £(<«) and An the set of the elementary events co £ Q for which £(co) = x„,
then An must belong to the algebra of sets<^f for every n. Only in this case
the probability
= X.) = P(An)
is defined.
A complete system of events associated with a discrete random variable
thus consists of those subsets of the space of events for which the random
variable takes on the same value. Especially, if £A = <^(co) is the indicator
of the event A, then <^(co) is a random variable having the value 1 or 0
according as co does or does not belong to the set A.
The sequence of probabilities of a complete system of events is said to be
a probability distribution. Now that we have introduced the concept of
random variable this probability distribution can be considered as the set
of all probabilities corresponding to the different values taken on by a ran¬
dom variable. If for instance an experiment having the possible outcomes
A and A is independently repeated n times, then the number £ of the experi-
Ill, § 4] THE CONCEPT OF A RANDOM VARIABLE 97
ments showing the occurrence of the event A is a random variable with the
binomial distribution, i.e.
where the sum is to be extended over all values of k such that xk < x.
B(a, p, x) is called Euler's incomplete beta integral of order (a, /?). It is well
known that
B(pc,P) = B(ocJ1,1) =
mm (3)
n* + p) 5
where
F(a) = | x* 1 e x dx (a > 0)
is the so-called gamma function. B(a, P) is called Euler's complete beta inte¬
gral of order (pc, p). It is readily verified through integration by parts that
for every n and m. Hence in case of two independent random variables the
joint distribution of £ and r\ is, according to (1'), determined by the distri¬
butions of £ and t].
This definition can be generalized to the case of several random variables.
The discrete random variables £l5 £2, • • are said to be (completely)
independent, if for every system of values xki, xki, . . ., xkr the relation
Proof. The proof will be given in detail only for r — 2, for r > 2 the
procedure is essentially the same.
Let {xjk} be the sequence of the possible values of the random variable
(J — 1, 2) and {A k) the complete system of events belonging to the ran¬
dom variable £ ; Ajk is thus the set of those elementary events co £ Q for
which f(a>) = xjk.
If yjt is one of the possible values of the random variable rjj = gff),
then the set B), defined by gfff) — yji can obviously be obtained as the
union of finitely or denumerably many sets Ajk; Bn is equal to the union of
the sets Ajk whose indices satisfy the equation gfxjk) = yjt.
Since the complete systems of events {Alk} and {A2k) are independent,
the sum of an arbitrary subsequence of the sets Alk is independent of the
sum of an arbitrary subsequence of the sets A2k. From this our assertion
follows.
We give a reformulation of the above theorem which we shall need later
on. Let £(co) be a discrete random variable with possible values jq, x2, . . .,
xn, . . . and let A„ denote the set of those elementary events co for which
£(co) = xn. Let further be the least cr-algebra containing the sets An.
is called the o-algebra generated by £. Clearly, consists of the sets
obtained as the union of finitely or denumberably many of the sets An.
Obviously ^ Jf £r are independent random variables,
. . ., are the o-algebras generated by f, f and Bj
is an arbitrary element of ^ (j = 1, 2, . . ., r), then the events B1} B2, . . ,,Br
are independent.
If g(x> y) is any real valued function of two real variables, then - as men¬
tioned above — ( = g(rj) is a random variable.
Ill, § 6] CONVOLUTIONS OF DISCRETE RANDOM VARIABLES 101
The sum is here extended over those pairs (n, m) for which g{pcn, ym) = z.
If such pairs do not exist, the sum on the right hand side of (1) is zero.
In order to compute P(£ = z) we have to know therefore, in general,
the joint distribution of £ and rj. If £ and r\ are independent, then
= xH,ij = ym) = P(£ = x„) P(rj = yj and thus
Let us consider now the important special case when £ and rj are independent
and g{x, j) = x + y, hence £ = £ + q. Then
K
P(^ = k) = pkqn l~ (k — 0,1,..n±),
A
plqn2-1
P(rj = l) (l — 0,1,..., ri<i),
n2 pkqtii+rti-k
pa=k) = (4)
k-j)
102 DISCRETE RANDOM VARIABLES [HI, § 7
«1 + Mo
z
7=0
»1
,j
[ «2
U-./J k .
'n pk qn-k
o'
P(C = k) =
II
(5)
k.
where n = nx + n2.
Hence the random variable £ has a binomial distribution too. This result
can also be obtained without any computation as follows: Consider an ex¬
periment with the possible outcomes A and A; let P(A) — p. In the above
example £, resp. rj is equal to the number of occurrences of A in the course
of nx resp. «2 independent repetitions of the experiment. The assertion that
£ and r\ are independent means that we have two independent sequences of
events. Perform a total number of n = nl + n2 independent experiments,
then £ = £ + 17 means the number of the occurrences of A in this sequence
of experiments; hence £ is a random variable having a binomial distribution
of order n and parameter p; that is, Formula (5) is valid.
We encounter a practical application of this result when estimating the
percentage of defective items. Consider a sampling with replacement from
the population investigated. According to the above this can be done also
by subdividing the whole population into two parts having the same per¬
centage of defective items and selecting from one part a sample of n1 ele¬
ments and from the other one a sample of n2 elements. This estimating pro¬
cedure is equivalent to that which consists of the choice of a sample of
n = + «2 elements from the whole population.
It is to be noted here that the distribution of the sum of two independent
random variables with hypergeometric distributions does not have a hyper¬
geometric distribution. Hence the former assertion is not valid if the sam¬
pling is done without replacement. The difference is, however, negligible in
practice, if the number of elements of the population is large with respect
to that of the sample.
one of such data is the expectation defined below (first for discrete distribu¬
tions only).
Let the possible values of the random variable £ be xlt x2,. . . with corre¬
sponding probabilities pn = P(f = x„) (n = 1,2,.. .). Perform TV independent
observations of £; if N is a large number, then, according to the meaning of
probability, at approximately Npx occasions we shall have g — xx, at approx¬
imately Np2 occasions £ = x2, and so on. Taking the arithmetic mean of the
^-values obtained at the N observations, we obtain approximately the value
this is the value about which the arithmetic mean of the observed values of
£ fluctuates. Hence we define the expectation E(f) of the discrete random
variable by the formula
(i)
k
Obviously, E(f) is the weighted arithmetic mean of the values xk with weights
pk} In order that the definition should be meaningful we have to assume
the absolute convergence of the series figuring on the right side of (1). Other¬
wise, namely, a rearrangement of the xk values would give different values
for the expectation.
If £ can take on infinitely many values, then E(f) does not always exist.
E.g. if
P(( = X) = E (4=1,2,...),
then the series £ pkxk is divergent. Clearly, the expectation of discrete and
k
bounded random variables always exists.
Sometimes, instead of “expectation”, the expressions “mean value” or
“average” are used. But they may lead to confusion with the average of the
observed values. In order to discriminate the observed mean from the
number about which the observed mean fluctuates we always call the latter
“expectation”.
Obviously, the expectation E(£) depends only on the distribution of
hence if £■, and £2 are two discrete random variables having the same dis¬
tribution, then £■(£•[) = £(£2). Therefore E(£) can also be called the expecta¬
tion of the distribution of f The fluctuation about E{£) of the averages
1 Hence E(0 lies always between the lower and the upper limit of the possible
values of £.
104 DISCRETE RANDOM VARIABLES [III, § 7
formed from the observed values of t, is described more precisely by the laws
of large numbers, which we shall discuss later on. Here we mention only
that the average of the observed values of £ and the expectation E(f) are
essentially in the same relationship as the relative frequency and the proba¬
bility of an event. This will be readily seen if we consider the indicator
of an event A having the probability p\ indeed, E(E, A) = p • 1 + (1 — p)- 0 =
= p and the average of the observed values of is equal to the relative
frequency of the event A.
Next we compute the expectations of some important distributions.
prq{n-\)-r = ^
at the beginning of the time interval and p = 1 — e~Xt (.1 is the disintegra¬
tion constant). Hence the expected value of the atoms disintegrating during
the time interval t is given by iV(l - e~Xt); thus the expected number of
nondisintegrated atoms is Ne~Xt. This exponential law of radioactivity does
not state — as it is sometimes erroneously suggested — that the number of
the nondisintegrated atoms is an exponentially decreasing function of the
time; on the contrary, it only states that the number of nondisintegrated
atoms has an expectation which is an exponentially decreasing function of
the time.
'k + r - I
P(£ = r + k) = prqk (k = o,i,...),
rII, § 8] SOME THEOREMS ON EXPECTATIONS 105
M
E(0 = n
~N'
the lot examined. We want to estimate p from a sample of size n. The number
of defective items has the same expectation np as in sampling with
replacement.
Theorem 1. If E(f) and E(rj) exist, then + rj) exists too and
those of rj are r\2,. . tjn, then — y 4 fluctuates about the number £■(£)
n k=i
1 ” 1 n
and —V t]k about the number E(rj), hence-—Y + r]f) fluctuates about
n n k=1
the number E(£) + E(rj); in consequence E(f + rj) = E(f) + E(rj). Let us now
give the proof of the theorem. Let the possible values of £ be Xj (J = 1,2,.. .)
and those of rj yk (k = 1,2,.. .), let further Ajk denote the event that £ = x}
and r\ = yk. Clearly, the Ajk (j,k = 1,2,.. .) form a complete system of
events. Further
yP(Ajk) = P(r, = yk)
j
and
■£P(AJk)=P(( = Xj).
k
On the other hand, the possible values of £ + t] are the numbers z repre¬
sentable as Xj + yk. It may happen that a number z can be represented
in more than one way in the form z = Xj + yk; in this case
Since the sum of two absolutely convergent series is itself absolutely conver¬
gent, we obtain that
Theorem 3. Let cl5 c2,. . ., cn be constants and £■,, £2,. . -An random vari-
HI, § 8] SOME THEOREMS ON EXPECTATIONS 107
written in the form £ = £ <jj), where £} is the indicator of the event A at the
7=1
y'-th experiment. Since E(^) = p, it follows from Theorem 2 thatEf^) = np.
Similarly a random variable having a negative binomial distribution of
order r can be considered as the sum of r independent random variables
each having a negative binomial distribution of the first order, with the
same parameter p. Thus it follows from Theorem 2 that the negative bi-
r
nomial distribution of order r has the expectation —, as proved already.
P
Similarly, a random variable with a hypergeometric distribution can be
represented as the sum of n indicator variables whose expectation is p (cf.
the example after the theorem of complete probability). These indicator
variables are not independent, but this does not affect the validity of
Theorem 2.
Theorem 5. If t; and rj are discrete random variables such that the expecta¬
tions E(f2) and E(tf) exist, then E{fyj) exists as well and
Ca = a -
108 DISCRETE RANDOM VARIABLES [HI, § 8
Since > 0 we have E(£f) > 0 for every real X, therefore the polynomial
(2) in X of degree 2 is nonnegative. But as it is well known this is only pos¬
sible if (1) holds, which is what we wished to prove.
Let ^ be a discrete random variable and A an event having positive proba¬
bility. The conditional expectation of £ with respect to the condition A is
defined by the formula
provided that the series on the right side is absolutely convergent (which is
always fulfilled if E{f) exists), where x„ (n = 1,2,...) denote the possible
values of £. E(£ \ A) is therefore the expectation of the conditional distri¬
bution of £ with respect to the condition A. If the events A„ (n = 1,2,...)
form a complete system of events, then in view of the theorem of total proba¬
bility
C — £i + £2 + • • • +
Yj Qn ( I Ex | + | E21 + ... + | En | )
n=1
converges), we have
00 00
E^rj)=E(Om- (7)
110 DISCRETE RANDOM VARIABLES [HI, § 9
Proof. Let Ajk denote the event £ = xj} r] — yk (j, k = 1,2,...). Clearly,
the possible values of are the numbers which can be represented in the
form z = Xjyk. Further zP^rj = z) = z £ P(Ajk) = £ XjykP(Ajk), hence
xjyk=z x/yk=z
Em = YJY,xjykp<<Ajk). (8)
j k
§ 9. The variance
The expectation of a random variable is the value about which the random
variable fluctuates; but it does not give any information about the magni¬
tude of this fluctuation. If we compute the expectation of the difference
between a random variable and its expectation we obtain, as we have already
seen, always zero. This is so because the positive and negative deviations
from the expectation cancel each other. Thus it seems natural to consider
the quantity
d(0 = E(\Z~E(01) (1)
as a measure of the fluctuations. Since, however, this expression is difficult
to handle, it is the positive square root of the expectation of the random
variable (£ - E(£))2 which is most frequently used as a measure of the
magnitude of fluctuation. This quantity, called the standard deviation of £,
is thus defined by the expression
D\0 = £(?)-[E(Of-
This is the formula by which the standard deviation is most readily com¬
puted. If the discrete random variable b, assumes the values xn (n = 1,2,.. .)
with probabilities pn = P{t, = x„), then
n
(*„ - £(0)2 (3)
and, according to Theorem 1,
£2(0 = Z/i Pn A - (Z Pn
n
XnY• (4)
Theorems 2 and 3 are similar (from a formal point of view even equal)
to the well-known Steiner theorem in mechanics which states that the mo¬
ment of inertia of a linear mass-distribution about an axis perpendicular to
this line is equal to the sum of the moment of inertia about the axis through
the center of gravity and the square of the distance of the axis from the
center of gravity, provided that the total mass is unity; consequently, the
moment of inertia has its minimal value if the axis passes through the center
of gravity.
Theorem 3 exhibits an important relation between the expectation and
the variance.
Theorem 2 is mostly used if the values of £ lie near to a simple number A
but the expectation has not exactly this value. For computational reasons
it is then more convenient to calculate the value of Elf — A)2.
112 DISCRETE RANDOM VARIABLES [HI, § 9
d*(0=E2(\Z-E(0\
Equality can occur in other cases besides the trivial case when £ is with
probability 1 a constant, thus e.g. if f takes on the values +1 and -1 with
D{rf) = | a | • £>(£)•
Especially, we obtain that the standard deviation does not change if we add
a constant to the random variable £, or multiply it by -1.
It is seen from (3) that the variance of a random variable depends on its
distribution only. Hence we can speak about the variance of a distribution.
We shall now compute the variances of certain discrete distributions and
for sake of comparison we determine the values of d(0 as well.
The value of d(f), for sake of simplicity, will only be determined for a bi-
HI, § 9] THE VARIANCE 113
(2 N
N
1 ™ (2N)
U r ~"1 =
)2 N
(2N\
N
N
22N
lim -^=1.
V- 00 bN
N
m= 7T
2 ’
it follows that
d(0 m-
7r
Thus the quotient ——- tends ter N oo to the limit . We shall see
4 m v *
later on that this holds for a whole class of distributions.
00 1
D\l;)=pY1(k+\fqk--j,
k=0 P
and therefore
D2(0
m IN — M
P(f = k)
UJ n —k
(k = 0, 1,...,«)•
1
As in the two preceding examples we obtain
n — 1
i -
N— 1
. , , . M
Let us introduce the notations —r = p, q = 1 — p, then
n- n
p>(0 = Jnpq 1 -
N— 1
k =1 fc=1
k=l k=1
m„) = Dfn .
Let E denote the expectation of the distribution of £k, then
E{U = nE.
_ D
E(U Ejn
116 DISCRETE RANDOM VARIABLES [III, § 11
tends to zero for n-> oo, provided that E is distinct from zero. Consequences
of this are dealt with in Chapter VII. If £ is a positive random variable, the
quotient
m) is called the coefficient of variation of f
m
As an interesting consequence of Theorem 2 we mention that if ^ and rj
are independent, then
= £l + £2 + • • • +
D2 (C„) = npq.
rq
D\Q =
n* , _ m - m] [n - md (i)
{ mm
is said to be the correlation coe fficient of £ and tj. (If £ or r) is constant, we
put i?(£, rj) = 0.)
From this definition follows immediately that R(rj, £) = i?(£, rj). If the
possible values of £ and rj are xm (m = 1,2,...) and yn (n = 1,2,.. .),
and rmn = P(f = xm, r\ = y„), then
n) = mm ?? m(y* ”m)-
If £ is any nonconstant random variable, the random variable
r .t-m
(2)
m
satisfies
E{0 = 0 and D(£) - 1.
Theorem 1. We have
E^rj)-E(QE(ri)
m n) = mm
(4)
R(Z,Z) = + 1
118 DISCRETE RANDOM VARIABLES [III, § U
and
R(€,-Z) = -1.
R(Z,rj) = 0.
E(£ri) = Ett)E(ri).
P(AB)=P(A)P(B),
where 0 < p < 1. Then E(f) — E(q) — E{fq) = 0, hence £ and q are uncor¬
*1 = ac, + h (5)
with probability 1, where a and b are real constants and a ^ 0; in this case
R(£, rj) = +1 or — 1 according as a > 0 or a < 0.
Proof. Let £(£) = m. If the relation (5) holds between £ and rj, we have
E(a(f — rrif) 1
i.iJm =sgna-
Suppose, for instance, R(£, rj) = +1. (The case R(f, rj) = — 1 can be dealt
with in the same manner.) Put
r= z-m , _ h - E(rj)
m 5 " dm ’
then by (3)
E(,Z' n) = i,
hence
£(«' - iV) =2-2 = 0.
P^ = n)= 1,
that is
z -m
t] = E{rj) + D(rj)
D(0
with the probability 1.
Thus, unless a linear relation of the form (5) holds between f and rj, the
absolute value of their correlation coefficient is less than 1.
1 if x > 0
sgn x = 0 if x = 0
1 if x < 0.
120 DISCRETE RANDOM VARIABLES [HI, § II
Proof. Let
E E ru xhj=( E Pi 4) ( E ?/Ty);
' = ly=l J=1 y=l
m n
E %h4ykj =0 {h = 0, 1,.. .,m - 1; k = 0, 1,...,n - 1). (7)
dik = E (8)
y=i
we have for the unknowns dik (i — 1,2,..., m) the system of linear equa¬
tions
Since the above consideration holds for every & = 0, 1,. . n — 1, we obtain
The same can be shown for every i = 1, 2,. . ., m. From this follows
ru = Pi Qj,
be a polynomial distribution
n\
P(Z1 = k1,Z2 = k2,...,Zr = kr) = PllP22 • • • Pr',
kx\ k2\... kT\
i n—k
Wk = (i __L| (1)
W N
n-k k-1
• n 7=1
(2)
It is known that
lim (3)
n— oo
hence from (2)
Xk
1 im Wk = —— e~x (k = 0,1,...). f4)
k\
Let
(k =0,1,...). (5)
I
A:=0
I ir =
k=0 K'
1. (6)
Thus the probabilities defined by (5) are the terms of a probability distri¬
bution, called the Poisson distribution with parameter X: the meaning of X
in the above example is the average number of balls in one urn. It can be
shown by direct calculation that X is the expectation of the Poisson distri¬
bution (5). Namely from the relation
Thus the expectation of the Poisson distribution (5) is X; hence the distri¬
bution (5) can be called the Poisson distribution with expectation X. The vari¬
ance of the Poisson distribution can easily be calculated;
co ik oo 1k
hence
D2(Z) = X 2 + X-X2 = X;
124 DISCRETE RANDOM VARIABLES [HI, § 12
that is, the standard deviation of the Poisson distribution (5) is D(f) = X.
Thus the variance of a Poisson distribution is equal to the expectation.
In the passage to the limit in (4) no use was made of the property that the
Therefore our result can also be stated in the following form: The k-th term
pkqn-k
Wk = (8)
of the binomial distribution tends to the k-th term of the Poisson distribution,
i.e. to the limit
Pk (9)
for x > 0, z > 0, denote the incomplete gamma function of Euler and
oo
the complete gamma function of Euler. Partial integration yields the formula
T(r+ 1,2)
i xke
k—0 k\
= 1 fe-‘ dt = 1
P(r+ 1)
(12)
Let us now return to our practical problem. Because of the relation be¬
tween relative frequency and probability, the ratio of defective bottles and
produced bottles is approximately equal to the probability of a bottle being
defective, provided the number of manufactured bottles is sufficiently large.
This probability, however, is 1 - W0 hence approximately 1 - e~x. Since
the beginning of this paragraph, but only 100(1 — e-1) = 63.21 %. Of course
such a large fraction of defective items will not occur. If for instance x = 30,
the fraction of defective items is 100(1 — e-0-3) « 25.92% instead of 30%.
Clearly, if the number of stones is large, it is more economical to produce
small bottles, provided of course that there is no way for clearing the liquid
glass. Using 0.25 kg glass per bottle instead of 1 kg, the fraction of defective
items decreases for x = 30 from 25.92% to 7.22%. As is seen from this
example, probability theory can give useful hints for practical problems
of production.
P(As+t)=P{As+t\As)P(As), (1)
hence
G(s + t) = G(s) G(t). (2)
m
t\ = [G(0] "
n
(7(0 = (9)
However, because of the monotonicity of G{t), (9) holds for every t. There¬
fore
F(0 = 1 - (7(0 = 1 - <rA'. (10)
The left side of (11) is the probability that an atom, which did not disinte¬
grate until the moment t, will disintegrate before the moment t + At. X has
Ill, § 13] APPLICATIONS OF THE POISSON DISTRIBUTION 127
thus the following physical meaning: the probability that an atom disinte¬
grates during the time interval between t and t + dr is (up to higher powers
of At) equal to XAt. The constant X is called constant of disintegration', it
characterizes the radioactive element in question and may serve foi its
identification. It is attractive to give another interpretation of the number
X, which enables us to measure it. The time during which approximately
half of the mass of the radioactive substance disintegrates, is said to be the
half-life period. More exactly, this is the time interval such that during it
e -XT _
(12)
(N-k)Xt
Pk(t) = (1 — e x,)k e (13)
128 DISCRETE RANDOM VARIABLES [HI, § 13
The half-life period of radium is 1580 years. Taking a year for unit we
obtain X = 0.000439. If t is less than a minute, If is of the order 10~9. For
1 g uranium mineral, containing approximately 1015 radium atoms, the re¬
lative errors committed in replacing Pk(t) by P*{t) are of the order 10~3.
If we restrict ourselves to the case where t is small with respect to the half-
life period, we can choose the model so that the Poisson distribution re¬
presents the exact distribution of the number of radioactive disintegrations.
Consider a certain mass of radioactive substance and assume
1. If h < t2 < t3 and Ak(tl7 t2) denote the event that “during the time
interval (tl5 t2) k disintegrations occur”, then the events Ak(tlf t2) and
At(t2, ts) are independent for all nonnegative integer values of k and /.
1 - W0(t) - Wx(t)
lim
Wx{t)
= 0, (16)
r^o
or equivalently
1 ~ W0(t)
lim = 1. (17)
o Wi(t)
In words: the probability that there occurs at least one disintegration is,
in the limit, equal to the probability that there occurs exactly one.
HI, § 13] APPLICATIONS OF THE POISSON DISTRIBUTION 129
Wk(At)
lim = 0 if k = 2,3,... . (19)
At
YJWk(At)=\-W0(At)-W1(At). (20)
k— 2
*i (0 = kU
n2t2
^2 (0 =
2 ’
and, in general,
(Htf
Vk if) =
k\ '
Hence
(ut)ke ^
Wk{t)= kT~ (* = 0,1,...).
The distribution of the stars thus follows the same law as the radioactive
disintegration; the only difference is that here the volume plays the role
Ill, § 14] THE ALGEBRA OF PROBABILITY DISTRIBUTIONS 131
of time. The same reasoning holds for particular kinds of stars as well, e.g.
for double stars. In the same manner the distribution of red and white cells
in the blood can be determined. Let Ak denote the event that there are
exactly k cells to be seen in the visual field of the microscope, then we have
(k = 0,1,...), (27)
where T is the area of the visual field and X is the average number of cells
per unit area.
^k Z ®nPnk■ (1)
00 00 00 00
E
k=0
nk= Z a« E
7i=0 k=0
Pnk = E a« = L
7i=0
(2)
00
77 will be called the mixture of the probability distributions taken with the
weights a„.
132 DISCRETE RANDOM VARIABLES [III, § 14
pkqfi-k
(P)
re~x
taken with the weight a„ = is a Poisson distribution. In fact
n\
M N-M'
k n —k
Jf„(M,A) =
A
n
(M N-M
N-(M-k)
[A] „n „N-n
lk n -k ) f.M\ pkqM-k.
I (4)
n=k u (A] UJ
3 = Z Pj 9k-
7=0
■ (5)
As it was seen in § 6 <32 is the distribution of the sum £ +1] of two independent
random variables £ and rj having the distributions 3 and Q respectively.
Even without the knowledge of this result, it is readily shown that <3“ is a
probability distribution. In fact rk > 0 and
00 00 00
Z
=0
PjQk-j=
7=0
Z
<ljPk -7’
we have
3Q = Q3. (7)
Z
i+j+h=k
PliPyPvr
hence
•$m (p) 35n(p) = &m + n (p). (1°)
= {1,0, 0,..0,...}.
Obviously, for any distribution & one has
(12)
Thus the distribution plays the role of the unit element with respect to
the convolution operation.1 The distributions <%„, defined by pn = \,pm — 0
for m ^ n, are also degenerate distributions. It is easy to show that
(Z an(^nU)- (14)
n=0 n=0
oo
ity distributions can be defined in the following manner: Let g(z) = £ Wnzn
n=0
00
If for instance <%j is the degenerate distribution defined above and if g(z) =
= (pz + q)n (0 < p < 1), then, because of (13), we have
XkWl _ “ Xk \Xke~x
exp[A(^i - l)]=e A £ = e XY — &k=l
fc = o k\ k=0 k\ k\
f Pu = 1 (2)
k= 0
and represents an analytic function which is regular in the open unit disk.
The introduction of the generating function makes it possible to treat some
problems of probability theory by the methods of the theory of functions
of a complex variable.
Examples
From the generating function of a distribution one can obviously get all
characteristics (expectation, variance, etc.) of the distribution. We shall
now show that these quantities can all be expressed indeed by the deriva¬
tives of the generating function at the point z = 1. Since the generating
function is, in general, defined only for | z | < 1, we understand by the
“derivative at the point z = 1” always the left side derivative (provided it
exists).
If the derivatives G(p(z) of G5 (z) exist at z = 1, we have the following
relations:
<?;(')=z
k=l
kPl.
00
and, in general
00
1) = E Kk ~ 1) • • • (k - r + IK (r = 1,2,...), (4)
k=r
where the series on the right is convergent. Conversely, it is easy to show that
if the series in (4) converges, the derivative G^r)(l) exists and Formula (4) is
valid. The number
G'i(l) = M1,
where the are Stirling numbers of the first kind defined by the relation
Mx = G\ (1),
M2 = G\(X) + G\(X)
and, in general,
Ms = ±o<pGf{\), (7)
7=1
Equations (7) allow the calculation of the central moments of t;, i.e. the
moments of £ — E(f)\
In fact
(9)
(11)
or
(12)
The function Hfw) is called the moment generating function of the random
variable f In order to calculate central moments we put
(14)
Ifw) is called the central moment generating function of f Hfw) and Ifw)
exist only if G^w) is regular at z = 1, The necessary and sufficient condi¬
tion for this is the existence of all moments of £ and the finiteness of the ex¬
pression
This condition is always fulfilled for bounded random variables and also
in case of certain unbounded distributions, e.g. the Poisson distribution and
the negative binomial distribution.
If H^w) exists, then 7^(0) = 1, since G{(1) = 1. But then there can be
found a circle | w | < r in which Ifw) A 0, hence In Ifyv) is regular. Put
Kfw) — In f(w). Since 0) = 0 and 7^(0) = 7^(0) = 0, we have for | w | < r
(15)
Ill, § 15] GENERATING FUNCTIONS 139
k2 = m2 = D2 (£),
k3 = m3, (16)
kA = m4 — 3ml.
Gi(z) = e^~1\
f (w) = eX(e"-1-WJ,
hence
00 wl
K^w) = X(? - 1 - w) = X £ — . (17)
1=2 l-
and, consequently,
Remark. For 1=2 relation (20) is already well known to us: the variance
of the sum of independent random variables is equal to the sum of the
variances. For 1=3, relation (20) shows that this holds for the third cen¬
tral moments, too.
weights <xn (<x„ > 0, £ a„ = 1), of the random variables (n = 0, 1,. . .),
H=0
then
00
Proof. The probabihty that the quantity rj is equal to the random variable
£nis, by assumption, equal to a„. Thus, if qk = P(rj = k) and pnk = P(£„ =
= k), we have
00
dk^Y^nPnk- (22)
n-= 0
Consequently
00 00 00
Gn 0) = k=0
Z <lk zk = n—0
Z a« k=0
Z Pnk zk, (23)
where the order of the summations may be interchanged because of the ab¬
solute convergence of the double series. Relation (21) is herewith proved.
Theorem 3. Assume that the random variables £ls £2, have the
same distribution; let G(z) be their common generating function. Let further
be v a random variable taking on positive integer values only, which is inde¬
pendent from the £„-s. The generating function of the sum
*1 — £i + £2 + • • • + £* (24)
<?,(*)=I
Let A(z) be the generating function of v, then according to (30) and (33) we
have
G(x, y) = A(p(x - 1) + q(y - 1) + l).
But from this follows because of the regularity of g(z) that g{z) = eNz.
Hence A(z) = eN(z~i:>; that is v has a Poisson distribution.
Now we shall prove the following theorem:
and
00
z Pk = 1 (37)
k=0
are valid for
o
Pnk = P(£n = k) (38)
11
then the generating functions of the converge, in the closed unit circle, to
the generating function of the distribution {pk}. Hence we have
where
G„(z) = k=0
Z PnkZk (40)
and
Conversely, if the sequence G„(z) tends to a limit G{z) for every z with \ z | <
< 1, then (36) and (37) are valid, i.e. G{z) is the generating function of a
distribution {pk} and the distributions {pnk} converge to this distribution
{Pk}•
Remark. If (36) does hold while (37) does not, then (39) is valid only in
the interior of the unit circle. This can be seen from the following example.
Let = n, hence
1 for k = n,
0 otherwise;
consequently,
lim pnk = 0 (k = 0, 1,...),
n-+ oo
but
0 for | z | < 1,
lim Gn{z) = lim zn =
n-+ oo n-*~ co 1 for z = 1,
Proof of Theorem 4. First we show that (39) follows from (36) and (37).
144 DISCRETE RANDOM VARIABLES [III, § 15
00 £
E
k=N
Pk < ~r,
4
(42)
where pk has the sense given in (37); this will be always possible because of
(37). Choose next a number n so large that
k=N
E Pnk = 1 - E
k=0
Pnk ^ 1 - E
k=0
Pk + -T
4
= E
k=N
Pk + X <
4 ^
’
It follows from relations (42), (43) and (44) that for | z | < 1 and for suffi¬
ciently large n
N—1 00 oo
it follows according to the known theorem of Yitali that G(z) is regular for
| z | < 1 and that Gn(z) converges uniformly to G(z) in the entire circle
| z | < r < 1. Putting
oo
G00 = k=0
E Pk zk
and denoting by Cr the circle | z | = r < 1, we obtain that
Gn(z) G{z)
lim pnk — lim dz —
2ni yk +1 yk + 1 = Pk-
2ni
Cr Cr
Ill, § 15] GENERATING FUNCTIONS 145
From this (36) follows. Since G(l) = lim G„(l) = 1, we get (37).
<?.(*) =
n
Clearly
-e
lim Gn (z) = PKz-1)
and since eA(;r_1) is the generating function of the Poisson distribution X),
our statement follows from the second part of Theorem 4.
It can be proved in the same manner that the negative binomial dis¬
tribution &r(p) converges to the Poisson distribution &(X) for r -* oo, if
(1 — p)r = X is constant. In other words, if
(r + k - 1
Pitr = k) = Pr<f (k = 0,1,...),
k
X X
where p — 1 — -— and q = 1 — p = — , then the distribution of £r con-
r r
verges to the Poisson distribution X). Since the generating function Gn{z)
X
of the distribution 1 - is given by
rx _ Ay
r
C„(z)
Xz
~7 /
and
(i - A V
r = gA(z-l)
lim
n-*oo
in fact, the number of electrons emerging from the «-th screen is the sum of
the electrons liberated by those emerging from the (n — l)-th screen. Thus
the random variable t]n is exhibited as the sum of independent random vari¬
ables, the number of terms of the sum being equal to the random variable
Vn-v Put
G(z) = f pk zk (46)
fc = o
and let Gn(z) be the generating function of rjn. We have G1(z) = G(z) and
it follows from Theorem 3 that
The generating function G„(z) is thus the n-th iterate of G(z). Sometimes it
is convenient to employ the recursive formula
In general, we have
C„+m(z) = G,(G„(z)). (49)
Consequently
M„ — M" (n= 1,2,...). (52)
The expectation of the number of electrons emitted from the n-th screen
is thus the «-th power of the expectation of the number of electrons emitted
from the first screen. For M > 1 this expectation increases beyond every
bound for n oo; for M < 1 it tends to 0. In the latter case the process
stops sooner or later. Let us see now, what is the probability of this. Let
P„k be the probability that k electrons are emitted from the «-th screen;
particularly, we have
f„,o = C„(0)- (53)
the sequence P„ o is thus monotone increasing. Since for every n Pn>o — 1 >
the limit
lim Pnfi = P (55>
P = G(P). (57)
Since (7(1) = 1, 1 is also a root of this equation. We shall show that for
M < 1 there exist no other real roots. In this case therefore the probability
that no electrons are emitted from the n-th screen, tends to 1 if n -> oo. To
prove this draw the curve y = G(a). Since (7(a) is a power series with non¬
negative coefficients, the same holds for all its derivatives, (7(a) is therefore
monotone increasing in the interval 0 < x < 1 and is also convex. The
equation P = G(P) means that P is the abscissa of the intersection of the
curve y — G(x) and the line y = x. Since (7(0) > 0, (7(a) — a is positive
for a = 0. Now if (7'(1) — M > l, G(a) — a is, because of (7(1) = 1, nega¬
tive in an appropriate left hand side neighbourhood of the point a = 1
(see Fig. 14). As (7(a) is continuous, there exists a value P(0 < P < 1) satis¬
fying (57). Because of the convexity of (7(a) there can exist no further points
of intersection.
It can be proved in the same manner that for M < 1 Equation (57) has
no real roots other than P = 1. (There can of course exist complex roots
of (57).)
It is yet to be shown that for M > 1 the sequence Pn0 (n = 1, 2,. . .)
converges to the smaller of the two roots of Equation (57). This can be seen
immediately from Fig. 15 by relation (47) which gives in case of M > 1 for
HI, § 16] APPROXIMATION OF THE BINOMIAL DISTRIBUTION 149
Fig. 15
Thus the probability that from the n-th screen there are exactly k > 1 elec¬
trons issued tends to 0 for n —► co for each fixed value of k. From
00
it follows that for large enough n the number of the emitted electrons (pro¬
vided that the process did not stop) will be arbitrarily large with a (condi¬
tional) probability near to 1. This is in accordance with experience.
where [x] denotes as usually the integral part of x; i.e. [x] = k for k <
< x < k + 1 (k = 0, 1,...). Then we have
B B+Y B+'2
Z /(^) = .f /(*)dx + .f e(x)f'ix) dx. (4)
A-Y
k = np + z and n — k = nq — z. (6)
| np+z i nq — z
n
Wk = 1 - 1 + es (7)
2n (np + z) (nq — z) \ np + z nq — z
with
d ^ 0k 6n — k
12 n \2k 12 (n-kY
(8)
z
x= (9)
1 The proof of this formula can be found e.g. in K. Knopp [1],
HI, § 16] APPROXIMATION OF THE BINOMIAL DISTRIBUTION 151
remains bounded:
I | ^A (A = constant). (10)
For the different factors on the right hand side of (7) we obtain
n 1 x(q-p)
1 - Ml (11)
2n(np + z) (<nq - z) yjlnnpq 2\fnpq 1«,
and
np+z V nq—z
1 - 1 +
np + z nq
(q-P)*3 1
=e *ar1+
2 +o (12)
bjnpq n
(13)
3=0 (!)•
n pk qti k
W, = (k = 0,1,...,«), (14)
k
further if
k — np
x = <A, (15)
yjnpq
then
X2
' 2
(x3 - 3x) {q - p)
1 + (16)
sJlTinpq 6Jnpq +0|T
order not exceeding that of bN”.) If, however, lim = 0, this will be denoted by
/V—>-co Qpj
aN = o (bN). (Read: “aN is of smaller order than bN”.)
DISCRETE RANDOM VARIABLES [III, § 16
152
1
where the constant intervening in O depends on A only.
n
In practice, usually the following weaker form
(k — 7ip)2
exp
2npq ■1 11
1 + o (17)
W, =
yjlnnpq - ■y/n 1-
of Theorem 1 suffices.
function
Fig. 16
line either to the left or to the right, with the same probability Under
the last line of nails there follows a line of n + 1 boxes in which the balls are
accumulated. In order to fall into the A-th box (numbered from the left,
k = 0, 1,...,«) a ball has to be diverted k times to the right and n — k
times to the left. If the directions taken at each of the lines are independent,
the probability of this event will be 2~". By letting a large enough num¬
ber of balls roll down Gabon’s desk, their distribution in the boxes exhibits
quite neatly a curve similar to the Laplace-Gauss curve. Theorem 1 states
that the limit relation
pk qn~k
lim 1 (19)
exp [
=
n-+ oo (k — np)2 T
yjlnnpq 1 2 npq .
y/npq
should hold, where a and b (a < b) are two given real numbers. It follows
from Formula (16) that this probability is
W(n){a, b) z wk =
, k-np
a<, j-
l npq
< b
q-p
z *
(xzk - 3xk) + O (21)
Jlrcnpq a<,xic<b 6jnpq
k — np
where xk = was substituted. It will be seen that there exists a limit
Jnpq
lim W(,,) (a, b) = W(a, b) (22)
are integers; this can always be done without changing the value of
W^(a,b). It follows then from (4) that
b
2 (q - p) (x3 - 3x)
W(n)(a, b) 1 + dx -f O (23)
2n J bjnpq n
v/2
x8
Since je 2 (x3 — 3x)(£x can be given explicitly, we have
Theorem 2. If
I
A<,k<,B
pk q"~k
In
yjln J 2 dx + R (24a)
where
q-p
R = (24b)
bjnpq
for each given pair (a, /?) of real numbers (a < /j); it suffices in fact to replace
a by a„, /? by bn, where an is the least number such that
Obviously,
1 1
a„ — a = O and bn-P = 0
V ti ■
hence
bn x!
lim J e 2 dx = J e 2 dx.
Thus the right hand side of (25) gives an approximate value for the proba¬
bility that the number of occurrences of an event A (having the probability
P(A) = p) in an experiment consisting of n independent trials, lies between
the limits np + a Jnpq and np + /? Jnpq. To use this result we must have
the values of the integral
p
y
x/2
dx
for every pair (a, /?). The integral je 2 dx cannot be expressed by elemen-
DISCRETE RANDOM VARIABLES [III, § 16
156
is tabulated with a great precision and a table of its values is given at the
end of this volume (cf. Table 6). The curve y = $(x) is shown in Fi§- 18-
i r -xZ
<f>( + oo) = e ~ dx= 1. (27)
V 271J
+ 00+00
1 r r |
$2(+oo) = -^— e 2 dx dy —
= Jj re 2 dr = 1.
— 00 —00 0
pk qn-k =
lim £ (28)
«-*■<» k—np
I npQ <.y
HI, § 17] BERNOULLI’S LAW OF LARGE NUMBERS 157
and let fA(n) be the relative frequency of the event A in a sequence of n inde¬
pendent repetitions of the experiment. Given two arbitrarily small positive
numbers e and 5, there exists a number N depending on e and 5 only such that
for n > N
P(\fA(n)-P{A)\<e)>\-b. (1)
Proof. We have
P(\fA(n)-P<.A)\<£)= Z pk qn~k.
\k—np\ <ns KkJ
<P(Y)-<P(-Y)> 1- —. (2)
158 DISCRETE RANDOM VARIABLES [III, § 17
pkqtt-k
P( I /M ~ P(A) I < «) a I (3)
\k-np\<Y fnpq
from (2), (3) and (4) it follows that (1) is verified for n > N = max (A^, A2).
Bernoulli’s law of large numbers can also be proved directly, without the
use of the de Moivre-Laplace theorem.
Formula (1) is equivalent to
The identity
/ \
(given as relation (5) in § 9) states that the variance of the binomial distri¬
bution is equal to npq. Thus we have
and, consequently,
^ -~7~, it suffices to take for N the value N = —=—. We shall see in Chap-
4 4 e2 8
ter VI that one can take for N a much smaller value as well.
The method of proof employed above is often used in probability theory.
Later on (in Ch. VII) it will be formulated in a more general form as fhe
inequality of Tchebychev.
Finally, some remarks should be added here concerning Bernoulli’s law
of large numbers.
III. § 18] EXERCISES 159
§ 18. Exercises
1. Suppose a calculator is so good that he does not make more than three errors
in the average in doing 1000 additions. Suppose he checks his additions by testing
the addition modulo 9 and corrects the errors thus discovered. There can, however,
still remain undetected errors: in fact, it may occur that the erroneous result differs
from the exact sum by a multiple of 9. How many errors remain in the average among
his additions?
Hint. It can be assumed that, if the sum is erroneous, the error lies with an equal
the event “the sum is erroneous”, B the event “the error could be detected by testing
the sum modulo 9”. The probability sought is the conditional probability P(A i B);
2. A missing letter is to be found with the probability P in one of the eight drawers
of a secretary. Suppose that seven drawers were already tried in vain. What is the
probability to find the letter in the last drawer?
N,
lim ~ = p, > 0 (j = 1, 2,. . ., r) ,
N->- oo N
6. Deduce Formula (4) of § 4 from Formula (12) of § 12, using the convergence of
the binomial distribution to the Poisson distribution.
Xk e~x
7. Determine the maximal term of the Poisson distribution-(k — 0, 1, . . . ;
k\
k > 0).
8. If A is constant and N = n In n + A n, there exists a limit of the probabilities
Pk (n, N) (cf. Ch. II, § 12, Exercise 42.b) for « —> oo: we have for any fixed real value
of A and any fixed nonnegative integer k
(e x)k exp (— e *)
lim Pk («, n In n + A«) = (*=0,1,...).
n—+- oo kl
HI, § 18] EXERCISES 161
o T, M _ R
~ P> ~T7 ~ r> and n ~> 00 so that
N N
n (AT+jR) []
r (N — M + jR)
i=0 /=0
•
lim
n —*■ co
Ti(N+jR)
1=0
k — 1
P
1 + p- 1 + P ,
Thus under the above conditions, the Polya distribution tends to a negative binomial
Xk
distribution. p — 0, the above limit becomes ; the limit distribution is then a
k\
Poisson distribution.
10. A roll of a certain fabric contains in the average five faults per 100 yards .
The cloth will be cut into pieces of 3 yards. How many faultless pieces does one expect
to find?
Hint. It can be supposed that the number of faults has a Poisson distribution.
The probability of finding k faults in an x yards long piece is therefore equal to
(* = 0, 1,. ..).
11. In a forest there are on the average ten trees per 100 m2. For sake of simplicity
suppose that all trees have a circular section with a diameter of 20 cm. A gun is fired
in a direction in which the edge of the forest is 100 m away. What is the probability
that the shot will hit a trunk?
Hint. It can be assumed that the trees have a Poisson distribution; the probability
that on a surface area T m2 there are k trees is equal to
- - (4 = 0,1,...).
12. In a summer evening there can be observed on the average one shooting star
in every ten minutes. What is the probability to observe two during a quarter of an
hour?
DISCRETE RANDOM VARIABLES [III, § 18
162
13. At a certain post office 1017 letters without address were posted during one
year. Estimate the number of days on which more than two letters without address
were posted.
14. Let &x(p) = {p, 1 — P) be a binomial distribution of order 1; let g(z) =
1 _ „
=-. Determine the distribution g[^x(j>)]-
1 — az
15. Let A (p) be the same as in the preceding exercise. Show that
16. Let p be the probability of an event A. Perform n independent trials and denote
by/the relative frequency of A in this sequence of trials. With the aid of the approx¬
imation of the binomial distribution by the normal distribution answer the following
questions:
a) If p — 0.4 and n — 1500, what is the probability for / to lie between 0.40
and 0.44?
b) If p — 0.375, how many independent trials have to be performed in order
that the probability of \f— p I < 0.01 is at least 0.995?
7
c) Let p = —, n — 1200. How should £ be chosen in order that the probability
17. Put
X
oo
Ill, § 18] EXERCISES 163
for M = 1, 2, . . . .
21. The function of two variables W(x,y) = *_j fulfils the partial differential
equation of heat-conduction
dxp j dsy
dJ 2 dx2 ’
the function
"<*»> = z(l)4-
1
k<
. _ n+x * ^
AnU=-A\U
where
AnU = U (x, n + 1) - U(x, n),
1 (k — «P)2
pk qn~k exp
V2nnpq [- 2 npq
0k-npY n
lim ---= 0 .
n -f- oo ^
M N — M\
n — k (k - np)2
exp
y 2n npq 2 npq
where
and 2
| k — np | = o(n3).
Thus also the hypergeometric distribution can be approximated by the normal
distribution.
0(x) =
1 (- i)V*+1
+ 4- _|-1---b . .
=T+ 112-3 214-5 318- 7 ^ k\2k(2k + 1)
V 2n
How many terms are to be taken to calculate 0(2) with an accuracy of four decimal
digits?
How many terms are there to be taken to calculate 0(A) with an accuracy of 10-8?
1 + |1 - e 2
I + n -
< 0(x) < - for x>0.
be uniformly fulfilled for t in every finite interval. Show that for every x > 0, y > 0
we have
a+
\'nb
l~ nb r A r — u2
lim
n -*■ co J 2ti J
V
'Mrwt-jsrjI » 1)
e 2 du.
(nb
Ill, § 18] EXERCISES 165
Xk e~X
lim Z
A-*cx> k<X.+xl X
tt = ^w-
29. Show with the aid of the result of Exercise 27 and the relation
P* q" ' — (n — k) tk (1 - dt
Km Y
Y, ( ? / cf
cf k =
k = ®(x).
®(x).
r—> cd
t—s-cd kk —
— nn \
\ K .
deduced in course of the proof of Bernoulli’s law of large numbers, that for any
function f(x), continuous in the closed interval [0, 1 ], the so-called Bernstein poly¬
nomials of fix)
Nk being the number of particles having the energy Ek(k — 1,2,, «). Let Wk be
the probability of a particle being in the state k, W(NU N2,.. ., N„) the probability
that the system is in the state characterized by the occupation numbersNu N2,..., N„.
By assuming that the states of the particles are independent from each other, we have,
obviously
N\
W(NX ,N,,...,Nn) = w p... w* (1)
NX\N2\... N„\
with
N = Nx + N, + ... + IV, . (2)
The probabilities of the possible states have therefore a multinomial distribution.
If, however, the total energy E of the system is given, not all these states can actually
occur: besides condition (2) the following one must be fulfilled, too:
YJNkEk=E. ■ (3)
k=1
According to the definition of the conditional probability, probabilities (1) fulfilling (3)
are simply multiplied by a constant factor. Find the values of Nu N,,..., N„ fulfilling
(2) and (3) for which the expression (1) takes on its maximal value.
Hint. Consider the numbers Nk(k = 1, 2,..., n) as continuous variables and replace
in (1) the factorials Nkl by r(Nk + 1), then apply the well-known identity
g(x)dx
In .T (jV -f 1) In N - N+ In V2^ — /
= (w+4 .1 x + N ’
r’{N+1) i r e(x)dx
r (N + 1) n + 2N + J (x + N)2 '
Wk e-pE*
Nk&N n
£ W,e~PEt
/=i
where the constant /3 must be chosen so that (3) is satisfied. This is Boltzmann’s energy
distribution.
Let A be an event with probability p. Show that during the course of n independent
Y ,k
k=0 K
pkq"~k > £
k=m+ 1
pk qn~k.
By putting (1)
n _m-r -rt-m+r
Br = (r = 0,1,..., m),
m — r
n p>n + r qtt-m-r
cr = m + r
(r = 0, 1,..., n — m),
tB'< EC- (H
r=0 r=0
negative for r > s, where s is the least integer for which sC? + 1) > npq. As D0 = 1,
/) = = npq q > 1 there exists an integer k > 1 such that —- > 1 for
Cx npq + P
D
k— I
1 )Br> £ (*-r l)Cr (2)
!(*-»•
r=0 r=0
and
(k - 1) Y r=0
Br > (k - 1) Y C” ' =0
36. Prove the following asymptotic relation for the terms of the multinomial
distribution. For
38. Let £ and rj be nonnegative integer valued random variables such that if the
value of rj is fixed £ has a Poisson distribution and conversely. Show that
AV v'k
R,k = PG = J,V = k)=C U, k= 0, 1,...),
j\ k\
1 X' (/ vlk
C /= 0 k=0 j\ k\
For the independence of { and rj it is necessary and sufficient that v — 1 should hold.
The distribution Rik is therefore a generalization of the Poisson distribution for two
dimensions. (Distribution of N. G. Obreskov.)
39. Let £ and rj be two independent random variables both having a Poisson
distribution with expectation X. Determine the distribution of £ — r\.
40. Each of two urns contains N — 1 white balls and one red ball. Draw from
both urns n balls (n < N) without replacement. Put now all 2 N balls into one and
the same urn and draw 2n balls without replacement. In which one of the two cases
is it more probable to obtain at least one red ball?
41. Let X be the disintegration constant of a radioactive material. Let the proba¬
bility of observing the disintegration of any one of the atoms be denoted by c (c is
proportional to the solide angle under which the counter is seen from the point from
where the radiation starts). Let N denote the number of the atoms at the time t — 0,
£, the number of disintegrations observed in the time interval (0, /)• Prove by applying
the theorem of total probability that £, has a binomial distribution.
Hint. The probability that exactly n atoms disintegrate during the interval (0, t) is
_ e-foy e-XHN-n).
(1
Note that because of c (1 — e~h) < 1 — e~cXt somewhat fewer disintegrations are
observed than when the value of the disintegration constant would be Xc and all
disintegrations would be visible. But this difference is only important for large values
of t.
42. Let £ls £2, .. ., be independent random variables with the same negative
binomial distribution of order 1:
P^k = «)=d- p)p" 1 (n = 1, 2,..k = 1,2, ... ,r; 0 < p < 1).
£ — £i + £2 + • • • + ev+i.
, (1 - p) X
where x =-. It is known that
XZ co
l—e
Z L*X)*,
1 — z k=0
where the
ex dk
Lk(x) = (.xk e~x)
xk dxk
(1 ~P)
P(C = «) = (! — p)e~* Ln_x f- P) A] p”-' (n= 1,2,...).
43. Calculate the expectation of the number of marked fishes at the second capture
(cf. Ch. II, § 12, Exercise 21), if there are 10 000 fishes in the lake and if at the first
capture 100 fishes are marked.
44. Calculate the expectation of the number of matches in one of the boxes in
Banach’s pockets at the moment when he found the other box empty for the first
time (cf. Ch. II, § 12, Exercise 14).
45. Calculate the expectation of the sum defined in Chapter II, § 12, Exercise 46:
X = ky + k2 + . . . + kM.
Hint. Let &k be the distribution of a random variable which assumes the values
Show that the distribution of X - M(M + l)/2 can be written in the form
^N-M+ 1
1 ^2 • • • ^ M
M(N+ 1)
From this it follows that E(X) =
2
46. Suppose that a player gambles according to the following strategy at a play
of coin tossing: he bets always on “tail”; if “head” occurs, he doubles his stake in
the next tossing. He plays until tail occurs for the first time. What is the expectation
of his gain ?
Hint. If the tail occurs at the n-th toss for the first time (the probability of this
event is ), the gain of the player, if his bet at the first toss was 1 shilling, will be
1 shilling, since
n— 1
2” — £ 2k = 1.
k= 0
n=1 L
It seems that with this strategy the player could ensure for himself a gain. This,
however, would be true only if he would dispose over an infinite sum of money. His
fortune being limited, it is easy to show by a simple calculation that the expectation
of his gain is 0 even if he doubles the stake always when a head appears.
48. The chevalier de Mere asked Pascal the following. Two gamblers play a game
where the chances are equal. They deposit at the beginning of the game the same
amount of money. They agree that he who is the first to have won N games gets the
whole deposit. They are, however, obliged to interrupt the game at a moment when
the one player gained N — n times and the other N — m times (l<n<iV;l<m<
< N). How is the deposited money to be distributed? Calculate this proportion for
n = 2 and m = 3.
Hint. The distribution of the deposited money is said to be “fair” if the money
is distributed in the proportion p„ : pm, p„ denoting the probability that the first
gambler would win and pm the probability that the second. Thus each gambler receives
Ill, § 18] EXERCISES 171
a sum equal to this expectation. The problem is thus to calculate the probability that
the first (or the second) wins, under the condition that he already won N — n
(i. e. N — m) games.
49. In playing bridge, 52 cards are distributed among four players. The values
ot the cards distributed are measured by the number of “tricks” in the following
manner: If a player has the ace and the king of the same suit, this amounts to 2
ace to 1; ace alone to 1; king alone to — trick. What is the expectation of the total
z
number of tricks in the hand of a player ?
Hint. Obviously, the expectation of the number of tricks is the same for all players
and in each of the suits. Hence the expectation of the total number of tricks for a
player in all four suits is equal to the expectation of sum of tricks for the four players
in one suit. Thus it suffices to consider one suit only, e.g. spades. The expectation
of the tricks in the hand of one player is equal to the sum of the expectations of all
tricks present in the spades. However, this sum is equal to 2, except in the case when
the ace, the king, and the queen of spades are in the hands of different players; in
3
this case the sum of tricks is — . Hence the expectation looked for is 1.801.
M
50. a) There are M red and N — M white balls in an urn. We put -= p. Draw
N
n balls without replacement from the urn and let the random variables £,k (k =
= 1,2, ..., n) be defined as follows:
Proof. Let £~\A) be the set of those elementary events co £ 12 for which
£(co) £ A. We have clearly
(la)
n n
Let Ix denote the interval (— oo,x) and Iab the half-open interval [a, b).
By assumption, £~\IX) = Ax £ Hence,according to (lb), £-1(/a>6) £
for every pair of real numbeis (a, b), a < b. Since is a u-algebra, it follows
from (la) and (lb) that ^~\A) £ for every Borel-set A of the real line.
Theorem 1 follows immediately.
Let F(x) = P(£ < x) be the distribution function of the random variable
If the random variables £ and ij are almost surely equal (i.e. if P{f A rj) =
— 0), then their distribution functions are obviously identical. In what
f ollows we shall establish some properties of distribution functions.
lim P(5J = 0-
«— + 00
it follows that
lim F(xn) = 1.
tl~* + 00
lim F(x) = 0.
JC— — 00
The converse of this theorem is also true: Every function F(x) having
these properties can be considered as a distribution function. The proof
runs as follows: Let a = (7(y) be the inverse function of y — F(x). (The de¬
finition of G(y) is unique, if the following conventions are adopted: if F(x)
has a jump at x0, i.e. if F(x0) = a and F(x0 + 0) = b > a, we put G{y) = x0
for a < y < b; if F(x) is constant and equal to y0 in the interval c < x < d
but F(x) < jo for x < c, we put G(yn) = c.) If Q is the interval (0, 1), the
system of all Borel-measurable subsets of Q and P^jis for A £ the
Lebesgue measure of A, then the function rj{y) = G(y) defined for all y
is a random variable on the probability space [£>, P] and the distribu¬
tion function of rj{y) is
0 for x < c
1 otherwise.
the inequality
n
E ~ak)<s
implies
n
I I F(bk) — F(ak) | < s.
k=l
Example. We have already seen (Ch. Ill, § 13) that the function de¬
fined by
1 — e~Xx for > 0
F(x) =
0 otherwise,
Fix) =
— 00
J fit) dt (3)
— CO
1
/(*) =
v/2?r
where pL, p2, p3 are nonnegative numbers having sum 1 and Ft{x) (/ = 1,
2, 3) are the three distribution functions such that F^x) is the distribution
function of a discrete random variable, F2(x) is an absolutely continuous
distribution function and F3(x) is a singular distribution function. This
decomposition is evidently unique.
F(x1} x2,. . ., x„) = P(£i < xlf £2 < *2, • • •> £„ < *«)• (1)
The probability figuring on the right side of (1) is always defined; in fact,
let A^} denote the level set of all co £Q such that £*(co) < x {k = 1 2,. n), , . .,
then Aand
k=l
GENERAL THEORY OF RANDOM VARIABLES [IV, § 3
178
p( kri= l <d-
If a problem of probability theory involves n random variables, these can
always be considered as components of an 77-dimensional random vector.
In general, the function defined by (1) will be called the joint distribution
function of the random variables £lf £2, • • •> £«•
For example the value F(xlf = P(£ < x1? 77 < Ji) of the distribution
function of a 2-dimensional random vector represents the probability that
the endpoint of a random vector £ = (£, 77) beginning in (0, 0) lies in the
quadrant of the (x, y) plane defined by x < xl5 y < y-y.
Let us consider now some general properties of multidimensional distri¬
bution functions.
Pifl 1 ^ £1 < bx, a<l < < b2) = F(bu b2) - F(alt b2) - F(by, a.j) +
+ F(alt a2).
A = t Ak,
k=1 *=1
IV, § 3] MULTINOMIAL DISTRIBUTIONS 179
we find that
5. We have
n
v ^
where e1; e2, ...,£„ assume the values 0 ««<:/ 1 independently of each other
and ak < bk (k = 1,2,..., «) ore arbitrary real numbers.
Property 5 does not follow from properties 1-4. If for instance n = 2
and
F(x Xi) = f 1 if + ^2 > 0,
1’ 2 [O otherwise,
properties 1-4 are fulfilled but property 5 is not, since for instance
5'. We have
for hk> 0 for any real numbers xk (k = 1,2,..., n). Here the “product”
of the (commutative) operations means that they are to be performed
one after the other. It is easy to prove that if condition 5' holds for hx — /?2 =
= ... = hn = h > 0 it is valid in general.
Conversely, it can be shown that every function F(xx, x2,. . ., xn) fulfill¬
ing conditions 1-5 may be considered as a distribution function. This
follows from § 7 of Chapter II.
If the distribution function of the random vector £ = (£lt £2,..., £„) is
F(x1} x2,..., xn) and B is a Borel-set of the n-dimensional space, then
8nF(x1,x2,...,xn)
f(x1,x2,...,xn) - dXidx2 "dXn (3)
A<PA<P...A<pF(x x,x2,...,xn)
• • •? *^2) lim j
h-+ 0 n
Further we have
*1 xn
F(x1,...,xn)= j ... j f{h,..., t„) dh... dtn ; (4)
— 00 — 00
hence in particular
+ 00 +oo
f ... J f(xlf..., x„) dx1... dxn = 1. (5)
— oo —'oo
Further
r1 rn
P(ak < £k < bk; k = 1,2,..., n) = j ... J /(xx,..., xn) dx1... dxn, (6)
In other words: the probability that the endpoint of the random vector (
lies in a Borel-set B of the n-dimensional space is equal to the integral on B
of /(xl5. . ., xn).
Proof. Let £_1(B) be the set of those points co £ Q for which ((co) £ B,
where ( is an n-dimensional random vector and B is a Borel-set of the «-di-
mensional space; clearly (-1(B) From this Theorem 1 follows in the
same manner as Theorem 1 of § 1 was proved.
Let us remark that to every 3-dimensional probability distribution there
can be assigned a 3-dimensional distribution of the unit mass such that any
domain D contains the mass P(D). If f(x, y, z) is the density function of
the probability distribution in question, this same function will represent
the density of the corresponding mass distribution.
I
f 1 — e~x<-‘ fj) for t > t0,
F(t | B0) = Q otherwise
and
ne-W-^ for t > t0,
At\B0) =
lo for t < t0-
and
f(x) = YjP(Bn)f(x\Bn). (2)
182 GENERAL THEORY OF RANDOM VARIABLES [IV, § 5
i.e., if the two-dimensional distribution function of (£, rj) is equal to the prod¬
uct of the distribution functions of £ and r]. From (1) is readily deduced
that
P(a < £ < b, c < rj < d) = P(a < £ < b) P(c < rj < d) (2)
and, more generally, for any two Borel-sets A and B (cf. Theorem 2 below):
)
IV, § 5] INDEPENDENT RANDOM VARIABLES 183
Proof. If Bx,. . ., Bn are Borel subsets of the real axis, it follows from (3)
that
In fact, if Bx, . . ., Bn are unions of finitely many intervals, (4) follows from
(3). Let now B2, Bz,. . ., Bn be fixed and let B± alone be considered as vari¬
able: thus both sides of (4) represent a measure. The theorem about the
unique extension of a measure (Ch. TI, § 7, Theorem 2) can be applied here
and it follows that (4) is true for any Borel-set Bx. Let now be B1 an arbi¬
trary, fixed Borel-set and let B3,. . ., Bn be fixed sets, each of them being the
union of finitely many intervals. By repeating the preceding reasoning it
can be seen that (4) remains valid, if B2 too is an arbitrary Borel-set. By
progressing in this manner (4) can be proved. Theorem 2 follows immediately
from (4).
In particular it follows from Theorem 2 that the random variables
hk — ak£k + bk (k = 1,2,..., n)
Conversely, (5) implies the independence of the random variables <^x,. . ., <?„•
184 GENERAL THEORY OF RANDOM VARIABLES [IV, § 6
Proof. (5) follows from (3) because of Formula (3) of § 3. Conversely, (3)
is obtained by integrating (5).
are independent.
= ^(4 < *i, • • ., 4 < x„) PO/i < jl5 . . ., r\m < ym) (6)
m= 1 (1)
for a < x <b.
b—a
0 for x< a.
d— c
J fix) dx = (4)
C
b —a
The case when G is a parallelepiped with its edges parallel to the axes
deserves particular consideration: we have
Hn(G) = fl ih ~ ak)-
k=l
GENERAL THEORY OF RANDOM VARIABLES [IV, § 7
186
Hence
/(*!,.. x„) = fl fk(xk), (6)
k=1
where fk(pck) (k = 1,...,«) is the density function of a random variable uni¬
formly distributed on the interval (ak, &*); consequently £„ are inde¬
pendent. Conversely: if fk is uniformly distributed on (ak,bk) and if the
4 are independent, the vector C = (£i, • ••,£„) is uniformly distributed in
the parallelepiped ak < xk < bk (k = 1,2,..., n).
For an infinite interval (or for a domain of infinite volume) the uniform
distribution can be defined by means of the theory of conditional probability
spaces. We shall return to this in Chapter V.
x—m
F2(x) = Ft (la)
F2(x)=1-F1 (lb)
h (*) = (2)
IV, § 7] THE NORMAL DISTRIBUTION 187
l Ax - a
b —a b —a
where /(x) is the density function of the uniform distribution on (0, 1),
that is
[ 1 for 0 < x < 1
/(*) =
[ 0 for x < 0 and 1 < x.
where
n
where
1
<p(x) = (5b)
J2n
where
X
e 2dt. (6b)
— 00
mx , 1 ix - m2
<P and -cp -—-
<7i °2 l °2
then the density function of the random vector £ = (£, tj) is equal to the
product of the density functions of £ and rj; i.e. to
1 1 [(x-Wi)2 i (y - m2f 1]
h(x, y) exp O ‘ 1
(7a)
2nal u2
A random vector having a density function of the form (7a) or one similar
to it is said to be normally distributed (or Gaussian). Since all distributions
having density functions of type (7a) are similar to each other, the two-
dimensional normal distributions form a family. The density function (7a)
(with mx — m2 = 0) is represented on Fig. 19.
IV, § 7] THE NORMAL DISTRIBUTION 189
A simple calculation shows that the most general form of the two-dimen¬
sional normal density function is given by
J'AC-B2
exp - y (A (x - mx)2 + 2B(x- wj) (y - m2) + C(y - m2)2) ,(7b)
2n
where A and C are positive, B is a real number such that B2 < AC, m1 and
m2 are arbitrary real numbers. If B ^ 0, £ and t] are not independent. In
fact, in this case the density function cannot be decomposed into two factors,
one depending only on x and the other only on y.
We introduce now the concept of the projection of a probability distribu¬
tion. Let £ = be an n-dimensional random vector. The projec¬
tion of the distribution of £ upon the line g having the cosines of direction
n
9k (k = 1) 2,..., w, ^ git 1)>
k=1
Cg — Yj 9k^k-
k=1
+ 00 +00
the results
1 x — m1 y-m2
/(*) =— <P and g(y) = — cp — , (10)
(Jo <?2
where
C
0i =
AC-B2
and cr2
AC - If '
(11)
' The projections of a distribution in n-space on the coordinate axes are also called
its marginal distributions.
IV, § 7] THE NORMAL DISTRIBUTION 191
n
1 1
/(*!,..xn) = „ „ exp
2 z (12)
{In)2 n o-fe
fc=i
If the density function of a random vector has the form (12), it is said to
be normally distributed or Gaussian. Every distribution similar to this is
said to be an n-dimensional normal {or Gaussian) distribution. In order to
obtain the general form of the density of an H-dimensional normal distri¬
bution put
0 = k=i
Z cJk 4 + Wj, 03)
Is 1 i "
2'
dfali • • •> Xn) n n exp -vEtZ^ (xj - mj-) ’ (16)
a k=l °k l7=1
{2n)2 n ak
k=1
or, by putting
Cik Cjk
*,/= i 2 9
*=1 °k
(2 7i)2 n ^
A: = l
(17)
x exp V Z Z bu (*/ ~ m<) (*/ ~ mJ>
* 1=1 7=1
i
1 = 1 y=l
t°jk«;-«/)
j=i
n
has the form (12). Note that the factor 1 / J^[ tx* is equal to the positive square
*=1
root of the determinant | by |. The matrix B = (by) can be written as CSC*,
where C* is the transpose of C and S is the diagonal matrix
f o 0
l
o 0
S= <^2
0 0 ... 4r
i«i=isi “ ni-
Ar = l °k
Consequently, the density function (17) can be written as
j n n
sional normal distributions form a family. It has some interest to study the
case of an m-dimensional vector ( = (£1; . . 0,. . 0) where m > n,
and where the n-dimensional vector (£ls. . £„) has a density function of
the form (12). By applying the orthogonal transformation
n
?j = Z cJk Zk + Wj
k= 1
(7=1,2,..., m), (19)
Zk = z
j= 1
cik (£• - mj) for k -1,2,.. .,n (20)
0 = Z cjk
/=1
- mi) for k = n + 1,. .., m. (21)
Formula (21) expresses that the point (£[,. . ., £') lies in an ^-dimensional
subspace of the m-dimensional space. A distribution of this kind is said to
be a degenerate m-dimensional normal distribution.
P(?l =y)= Z
Hxk)=y
P{£ = *k),
where the summation extends over those values of k for which \l/(xk) = y.
Let us now consider the case of an absolutely continuous distribution
function. Let /(x) be the density function of Assume \p(x) to be monotonic
and differentiable and suppose i//(x) 9^ 0 for every x. If g(y) is the density
GENERAL THEORY OF RANDOM VARIABLES [IV, § 9
194
fin—]2 1
l \ M)_
exp for >- > 0,
y/2nay 2fer2
g(y) = (2)
0 for y < 0.
1
for — 1 < y < + 1,
g{y) = n^l-y2 (3)
o otherwise.
IV, § 9] THE CONVOLUTION OF DISTRIBUTIONS 195
Let two independent random variables £ and rj be given having the distri¬
bution functions F(x) and G(y) respectively. Consider the sum £ = £ + rj;
let H(z) be its distribution function. We have clearly
H{z) =
x+y<z
fj dF(x) dG(y) = (1)
= T F(z-y)dG(y)= +f G(z-x)dF(x).
— OO — 00
— 00 — CO
, +00 (3)
= J j f(x - y) dG(y) dx.
— 00 — 00
From (4) follows immediately (2). Further it can be seen that the distribu¬
tion of C = £ + n is absolutely continuous, provided that one of £ and r\
has such a distribution, regardless of the other distribution.
The function h{x) defined by (2) is called the convolution of the density
functions/(x) and g(x) and is denoted by h =/* g. It is easy to show that
h{x) is a density function; as a matter of fact (2) implies h{x) > 0 and
+ 00 + 00 + 00 + “? +00
x — (a + c)
for a + c < x < b + c,
(b — a){d — c)
h{x) - 1 (7)
for b + c < x < a + d,
d— c
(b T c?) — x
for a + d < x < b + d.
(ib — a){d — c)
The graph of the function y = h(x) is an isosceles trapezoid with its base
on the x-axis (Fig. 21 represents the case a = — 1, b = 0, c = — 1 ,d— +1).
Note that h{x) is everywhere continuous, though/(x) and #(x) have jumps.
(The convolution in general smoothes out discontinuities.)
IV, § 9] THE CONVOLUTION OF DISTRIBUTIONS 197
distribution. Thus for instance the density function of the sum of three
independent random variables uniformly distributed on (— 1, +1) is given by
(3 —|*|)2
for l<|x|<3,
h(x) — 16 (8)
3 - x2
— for 0 <lx| < 1.
The function h{x) (cf. Fig. 22) is not only continuous but also everywhere
differentiable. The curve has already a bell-shaped form as the Gaussian
curve; by adding more and more independent random variables with uni¬
form distribution on ( - 1, +1), this similarity becomes still closer: we have
here a particular case of the central limit theorem to be dealt with later. The
density function of the sum of n mutually independent random variables
with uniform distribution on (—1, +1) is
1 m
Z (-D‘ {n + x — 2k)n 1 for \x\ <n,
(9)
/»(*) = 2" («-!)! *=o
0 otherwise
GENERAL THEORY OF RANDOM VARIABLES [IV, § 9
198
mx 1 x — m?
f(x) = <P and g(x) = — cp
°2 <7o
1 x — (mx + mf)
h(x) =
/ 2 \ 2
(10)
+ a2 V 0"l + °2
3. Pearson’s y2-distribution}
The distribution of the sum of the squares of n independent random vari¬
ables £i,.. with the same normal distribution, plays an important role
in mathematical statistics. We shall determine the density function of this
sum for any n. Let cp(x) be the density function of the random variables
£k {k = 1,2,..., n). Let the sum of the squares of the be denoted by
2 W k2
In = X
fc = 1
ft- (11)
Let h„{x) be the density function of The statement
e -
for x > 0,
K(x) = 22 r (12)
o for x<0
for x > 0.
hi(x) = s/2nx (13)
0 for x < 0,
which shows that (12) is valid for n = 1. Suppose that (12) is valid for a
certain value of n. Given (2) and the induction assumption we have
!
1
K+i(*) = (14)
0
n+1
-1 -
Thus (12) holds with n+l instead of n; thus it holds for every n.
From (12) we obtain that the density function gn(x) of i„ =
= +... + £ is
^_I 2
2¥Ht)
The distribution with density function (12) is called Pearson's distribution
with n degrees of freedom. The distribution with density function (17) is
called the %- distribution with n degrees of freedom.
1 For the proof of this formula cf. e.g. F. Losch and F. Schoblik [1] or V. 1. Smir¬
nov [J ].
200 GENERAL THEORY OF RANDOM VARIABLES [IV, § 9
respectively. It is shown in the kinetic theory of gases that these three ran¬
dom variables are independent, normally distributed, and have the same
density function:
V = yj + rf + C2 • (18)
£ rj C
Clearly, —, —, — have the density function tp(x); hence, by (17), the density
'x ' 3 1 f i = —1 r
is —g3
a w
. SinceF
[2 =—r
2 UJ y/n, we have
2 2
—
‘><t’
»(*) = — —x e {x > 0). (19)
O" n
IV, § 9] THE CONVOLUTION OF DISTRIBUTIONS 201
(The curve representing y = v(x) is drawn on Fig. 23 for a = 1.) Note that
a has the physical meaning
x x yi -r 0j
= 1 - m, (21)
1 - F(s)
from which it foHows that
(*>0) (23)
II
Cn = £ 1 + £2 + • • • T ,
where £ls. . ., £„ are independent and every one of them has distribution (22).
Let Fn(t) be the distribution function and f,(t) the density function of £„.
GENERAL THEORY OF RANDOM VARIABLES [IV, § 9
202
By (23), Formula (24) holds for n = 1. Assume its validity for a certain value
of n. Since {„+i = {„ + £„+i and further C„ is independent of fB+1, For¬
mula (2) can be applied here. Thus we obtain
Substituting here for Fn{T) and Fn+1(T) and integrating by parts, we find
(XT)n e~XT
P(vT = n) = (28)
n\
n e~Xk<
9n (0 = (~ 1 )"_1 X1X2...X„^ TT--— for t > 0. (29)
k—1 11
i^k
let us consider the random variables (i = £t] and £2 — —. Let the density
n
functions of £ and g be f(x) and g{y); we have
£0
c= (4)
\J £>1 + • • • + %n
where £0, Ci,.. C„ are independent random variables having the same
normal distribution with density function
X*
1 2
<P(x) = e
J2n
Let qn(z) be the density function of £. We know already the density func¬
tion of the denominator of (4) (cf. Formula (17) of § 9), hence we obtain
from (3)
n + 1
r
qn 00 = n±1
(5)
2\ 2
x/71 r (i +z0
1
<h(?) = (6)
7T (1 + Z2)
section that
n +m \
r
2 J
h (z) = n+m
for z > 0. (7)
n) tw
r (1+*) 2
2)r W
3. The beta distribution.
If C is the ratio considered in the previous example, let t denote the ran-
c
dom variable t = and k(x) the density function of t. By (1) of § 8
we obtain
1 +c
rln + m\
——i —i
2 i x2 (1 - x) 2 for 0 <x< 1 . (8)
Kx) =
m\
r I—I r
UJ 2j
*(x)=-
i
t2 -1 (1-02
-l
4. Order statistics.
In nonparametric statistics the following problem is of importance: Let
£2,. .., be independent random variables with the same continuous
distribution: let F(x) be the distribution function of £k. Arrange the values
of £1}. . ., £„ in increasing order,1 and denote by £* the /c-th of these ordered
values: hence, in particular
£f = min 4, £*= max (11)
1 <,k<,n l<,k<,n
m = il>kFk(.x) (19)
fc = l
is also a distribution function. It is called the mixture of the distribution
functions Fk(x) (k = 1,2,...) taken with the weights pk. This concept was
already defined in the foregoing Chapter for the particular case where the
functions Fk(x) are discrete distribution functions.
Consider the following example: a physical quantity is measured by two
different procedures,the errors of the measurements being in both cases nor-
l x 1 lx
mally distributed with density functions —<p — and —cp\— . JVX mea-
°i cr2 l a2j
surements were performed by the first, and N2 measurements by the second
method without registering, which of the results was furnished by the first
and which by the second of the methods (the measurements were mixed).
What will be the distribution function of the error of a measurement chosen
at random from these N = N-i + N2 measurements? If
1
*(*) = dt.
yj'2 TC
it follows from the theorem of total probability that this distribution func¬
tion F(x) is given by
N, X n2 X
F(x) = -±* + — <P
N ^2
i.e. F(x) is the mixture of the distribution functions of the errors of the two
N1 j N2
methods, taken with the weights — and —-.
N N
It is easy to extend the notion of the mixture to a nondenumerable set
of distribution functions. If F(t, x) for each value of the parameter Ms a
distribution function and for each fixed value of x F(t, x) is a measurable
function of t and if G(t) is an arbitrary distribution function, the Stieltjes
integral
+ 00
/7(x) = J F(t, x) dG(t) (20)
t- CO
V = £1 + £2 + • • • + (21)
such that the number v of the terms is a random variable. Assume that the
are mutually independent and v is independent of the £k. Let Fk(x) de¬
note the distribution function of £*., Gr(x) the distribution function of £„ =
= £1 + ■ - • + £n and H(x) the distribution function of the random variable
t] defined by (21); let further be P(v = n) = pn (n - 1, 2,. . .). Then, by
the theorem of total probability,
00
Xn t*~1 e~u
GJLx) - dt,
in- 1)1
and, by (22)
tation of £ by
[x] denotes the entire part of the real number x; gh is a discrete random
variable and
+ O0 +00
+ 00
1
/(*) 7t(l + X2)
the expectation does not exist, since in this case the integral (5) does not
converge.
Let us now consider some examples.
E(0 = m.
™-t-
In particular, the ordinary exponential distribution with the distribution
x2 e~2
K (*) = —-n
2Tr
E(xl) = n.
Similarly, for the expectation of yn
n+ 1
E(X.) =sfl
hence
E(xl) * [E(Xn))2 if oo.
r(a + b)
x*"1 (1 - xf-1 (0 < x < 1).
K,b(X) mm
From this, by (5),
a
m= a + b
6. Order statistics.
Let £1}.. be independent random variables each uniformly distrib¬
uted on the interval (0, 1). Let be the random variable which assumes the
k-th of the values ranked according to increasing magnitude;
by Formula (14) of § 10
Fk(x) = Bk>n+1_k(x).
pr d\
Since P(B | A) < - , the existence of £(£) implies the existence of the
P(A)
conditional expectation E(£, | A) for any event A such that P(A) > 0.
If F(x | A) is the conditional distribution function of ^ with respect to
the condition A, then
Clearly, since
I {(oj) dP
= *\A)
— Af «“)<« = SI
where Q{B) = P(B | A), and Q{B) is a probability measure, all results valid
for ordinary expectations are also valid for conditional expectations.
We shall now give some often used theorems.
E(£ ck(k) = t
k=1 fc=1
holds for any random variables £,k with finite expectation and for any con¬
stants ck. Thus the functional E is linear.
The density function of the random variable £ + r\ is, as we have seen al¬
bution of parameter X (i.e. having the same expectation — ). The sum figur¬
ing in this example was one of independent random variables; one should,
however, realize that Theorem 1 holds for any random variables, without
any assumption about their independence.
E(St,)=E(QE(a). . (9)
Proof. Assume first £ > 0. Let Ak be the event kh < 17 < (k + 1 )h;
evidently, the events Ak (k = 0, +1, ±2, . . .) form a complete system of
events. Hence, by Theorem 2,
+ 00
E(t;)kh<E(Zri\Ak)<E(0(k+l)h. (11)
If we put this into (10), the series on the right side can be seen to converge,
thus E(£rj) exists; further (9) holds since the sums
tend to E(rj), if h -» 0. Thus (9) is proved for £ > 0. The restriction £ > 0
can be eliminated as follows: Put
t _ KI + £ E Id-f
(12)
- 2 ’ ^= ;
Etfrj) = E(£x n) - E(t;2 r,) = [E{fx) - £(£2)] E(r]) = E{f) E(rj) (13)
exists. Hence
+ 00
oo 0
E(0 = f (1 - F(y))dy - J F(y)dy. (16)
6 -00
If we add term by term Equations (17) and (18) and let x tend to infinity we
obtain, by (14) and (15), Formula (16).
Conversely, the existence of the integrals on the right-hand side of (16)
implies the existence of the expectation E(f). In fact, the convergence of the
integrals implies for x > 0
-
x( 1 - F(x)) < 2 f (1 - F(y)) dy and
;T
xF(- x) < 2 j F(y) dy,
x ~x
216 GENERAL THEORY OF RANDOM VARIABLES [IV, § 11
hence (14) and (15) are valid. Because of (17) and (18), the second part of
Theorem 5 follows.
Theorem 5 has the following graphical interpretation: Draw the curve
representing F(x) and the line y = 1. The expectation is equal to the differ-
Fig. 24
ence of the areas of the domains marked by + and — on Fig. 24. The
(evident) fact follows that a distribution symmetric with respect to x = a
has expectation a if this expectation exists. A distribution is said to be sym¬
metric with respect to a if
1 Relation (19) holds for every Borel function H(x) provided that its expectation
E[H(x)] exists; cf. § 17, Exercise 47.
IV, § 13] THE MEDIAN AND THE QUANTILES 217
and are called moments of order n(n = 1, 2,. . .) of the random variable £.
f - (Cl, • •O
expectation does not exist, but the median does and is evidently equal to zero.
We introduce the somewhat more general notion of a quantile. The
q-quantile (0 < q < 1) denoted by Q(q), of a random variable £ for, more
precisely, of the corresponding distribution function F(x), continuous and
strictly increasing for 0 < F(x) < 1, by assumption) is defined as that value
1
of x for which F(x) = q. In this notation the median is equal to Q
1 3
In particular, Q is called the lower quartile, Q — | the upper quartile.
4
x m
For the normal distribution with distribution function <P where
X
1
<P(x) = dt,
V2 n
the lower and upper quartiles are
l 3
Q — m — 0.6745 er and Q = m + 0.6745 a.
4~
1
P(( > XE(()) S —. (1)
(The inequality also holds for 0 < X < 1, but in this case it is trivial,
since every probability is at most equal to 1.)
Proof. From 00
m — E( fi) = | xdF(x)
IV, § 14] STANDARD DEVIATION AND VARIANCE 219
follows
00 00
< Xm.
In particular (for £ > 0), the upper quartile can never exceed the fourfold
of the expectation.
D\0 = J (x~E(0)2Ax) dx = [
— to — CO
X2f(x) dx-(
— 00
J xf(x) dx)2. (3)
1. Uniform distribution.
If £ is a random variable uniformly distributed on (a, b), then by (3)
b—a
0(0 =
'
220 GENERAL THEORY OF RANDOM VARIABLES [IV, § 14
2. Normal distribution.
Let l be a random variable with density function
1 1 (x — m) 2\
— ——— exP
a yjlrc a 2cr2
x—m
We know that E(t;) = m. By a transformation of the variable = u
we obtain
+ 00
(x — w)2
D\0 = —7=— f {x-mf exp dx —
■Jlu a J 2(7
+ 00
U2 E 2 C?W.
/ 2 7T
Z>2 (0 = a2.
3. Exponential distribution.
If the density function of the random variable £ is given by Xe~Xx for
00
^«> = aJ(x-T)L-*=T.
o
and
4. Student's distribution.
Let ^ be a random variable having Student’s distribution with n degrees
of freedom; its density function is given by Formula (5) of § 10. Since /(x)
is an even function, E(£) = 0 for n > 2. [For n = 1 (i.e. in the case of the
IV, § 14] STANDARD DEVIATION AND VARIANCE 221
*
(1 +X2) 2
— 00
X
Take for new variable of integration y =-; then
1 +
5. Beta distribution.
If £ has a beta distribution B(a, b), then £(£) = —-— as we have seen.
a+b
From this, by (3)
i
r(a + b) M +1
D2 (0 = (1 — x)b 1 dx —
mm a+b
ab
(a + b)2 (a + b + 1)
D\?) 2 dx - [E(J*)]2 = 2.
for the normal distribution give q{£) « 0.6745 cr. It is to be noted that in
some (chiefly older) books the density function of the normal distribution
is not given in the form
1
<*>(*) = dt,
y/2n J
— OO
but by
0.6745
where q « 0.477 ---yj 2. Anyone of these two forms can be taken as
bution is symmetric with respect to the origin and if its density function is
’
1 3_ 3
= -Q T hence q(f) = Q
Q
l4J
Theorem 1. For every random variable £ symmetrically distributed about
the origin with a continuous distribution function F(x) that is strictly increas¬
ing for 0 < F(x) <1, the inequality
(1)
is valid.
Proof. Let F(x) be the distribution function of £. As E, is symmetric with
2 /£\
respect to the origin, D2(E) = F(£2). Put A = —2^y an<^ aPPty Markov
1
Z\>Q (3)
4 ~2
GENERAL THEORY OF RANDOM VARIABLES [IV, § 15
224
1 D2 (0
From (2) and (3) it follows that — < yyy which proves 0).
The inequality (1) is sharp. This is shown by the following example: Let
the distribution of the random variable £ be the mixture, with weights
(m — o yj'i , m + cr ■J 3 ) (4)
<«) = £( | {-£«)!)
is also used as a measure of fluctuations. By Theorem 6 of § 11
43 =T I *-£«) | </£(*) .
— CO
40
40 = —
e
D\Q = E{Q.
If we put
Dtj = E((£i - m,) (£,- - mj% 2)
then
(3)
i=1 7=1
y Dni • • -Enn j
i i Dijxixj = c2
i=l i=l
is called the dispersion ellipsoid of the distribution. It is easy to see that the
dispersion matrix is invariant under a shift of the coordinate system. Under
the rotation of the coordinate system D is transformed as a matrix of a
tensor. Let in fact C = (c,7) be an orthogonal matrix and
Zk = Z ckj (£/ -
7 =1
then E(fk) = 0 and
D’,J = £(«{;) = i
k=1
clk ±CjhDkt.
h=1
D' = CDC*,
where E denotes the interior of the ellipse with Equation (5) and F its area.
Calculation of the integrals in (6) gives
B
dn — , du — d21-
AC-B2 Y 5 d22 — (7)
AC-B AC-B2 '
IV, § 16] VARIANCE IN HIGHER DIMENSIONAL CASE 227
Let C = (£1, £2) be any random vector. Choose the numbers A, B, C such
that the dispersion matrix of a random vector uniformly distributed in
the ellipse (5) coincides with that of £. We put, therefore
A>2
A = B= — (8)
~A~’
k(o = ~A (io)
471 yj A
i.e. the reciprocal of the area of the ellipse (9), is called the concentration
of £.
lA B
If A, B, C are chosen according to (8), the matrix is the inverse of
[BC
D11 D i2
,D2i D22
The case of higher dimensions turns out to be quite similar. The equation
of the ellipsoid of concentration is here
n n A ■ ■ x- x
(io
1=1 7=1 A
where A is the value of the determinant [ Di} \ and zd,7 the value of the co¬
factor of the element in the i-th row and y'-th column. The concentration,
that is to say the reciprocal of the volume of the ellipsoid (11), is equal to
K0 = (12)
(n + 2)2 n* JA
Of course, this holds only for A > 0. If A = 0, the point (£1?. . ., £„)
lies, with probability 1, on a hyperplane of at most n - 1 dimensions;
228 GENERAL THEORY OF RANDOM VARIABLES [IV, § 16
mi ^fe-"?/)]2)=o;
7=1
consequently the random vector (fl9..., £„) lies with probability 1 on the
hyperplane
7=1
Z *> «/-»*,) = 0.
Consider now in some detail the two-dimensional normal distribution.
Let
B
£>n = , £>12 — - 5 £>22
AC — B‘ AC — B"
—
AC-B2’
It follows that
D,22 D 12 D
A = B= - C = ——
£> i D\’ \D\’
£>,
712
@2 — (14)
■J D1XD 22
°1 n/£>11 > \]£>22 5
we find
f(x,y) = X
27T(T1(T2>/1 - £2
1 2pxy j/2
x exp
. 2(1 -e2) * o“ (15)
°la2
XV, § 16] VARIANCE IN HIGHER DIMENSIONAL CASE 229
The number q is the correlation coe fficient R(f, rj) of the random variables
£ ar*d rj- We have already introduced this quantity for discrete distributions.
It is similarly defined in the general case and its properties are the same.
Thus
... mi) rm - mm
- *«)] [i -
^- mm mm
—““ ' (I6)
Thus we obtain for the density function of the random vector (r\x,... , rj„)
n 2
la
-
1
-1
exp Z • (22)
tfOh, • • •> y„) =
- 2A I
k=l ak
Cjk (yj - ™j)
(2k)2 n
k=1
1
V ^ IV r ,)2 v (23)
I -ZT IE cik(yj - = I-
k~1 °k 0“l 2;
and consequently
1 J_ y (>’f ~ Mif
yn) = - exp
2 h 2,.
(2k)2 f[ ak
k=1
1 ,-x-\
h(x,y) = J2i + x/2
2n
(It is readily verified that h(x, y) is in fact a density function.) The density
functions f(x) and g(x) of £ and rj are
that R(£, rj) = 0. The random variables £ and rj, however, are not indepen¬
dent, since evidently h(x, y) ^ f(x) g(y); thus £ and rj are each normally
distributed and are uncorrelated, but they still are dependent.
From Theorem 2 follows
Proof. Since
n
Z akbk
R(*h, ri2) = k=1
Z *2 Z *2
k=l k=1
the necessary condition that rj1 and i/2 should be uncorrelated is that
Z akbk = 0.
A=1
We shall now show that the random vector (r}x , rj2) is normally distributed.
There can be found an orthogonal matrix (cw) such that
>7y = Z/c = 1
CJk Zk 0' = 1,2,...,n)
is thus a normal distribution and the same holds for the two-dimensional
fj' fj'2
distribution of q1 = —and rj2 — —— ■ Since
ak
and
232 GENERAL THEORY OF RANDOM VARIABLES [IV, § 17
£ ak bk = 0
fc=i
means that these two directions are orthogonal and r\x, rj2 are (up to a
numerical factor) the projections of the random vector (£x,. . ., £„) on
these directions. Our result may thus be formulated as follows: If £x ,. . . ,
are mutually independent random variables with the same normal distri¬
bution, then the projections of the random vector £ = (£x,. . . , £„) on
two lines dx, d2 are independent iff dx and d2 are orthogonal.
§ 17. Exercises
1. Let the distribution function F(x) of the random variable £ be continuous and
strictly increasing for — co < x < + oo . Determine the distribution function of the
following random variables:
t*
y = 0(x) 2 dt.
Jf
V-
2n
1 (x — m)2
y =/(*) = exp -
2ji a 2a2
and determine its points of inflexion. Let A and B be their abscissas. Calculate the
probability that the value of a random variable with density function f(x) lies between
A and B.
l (In x — rrif
exp for x > 0,
ax 2a2
y=f(*)= J271
0 for x < 0
of the lognormal distribution and calculate its extrema and points of inflexion. Calcu¬
late the expectation and standard deviation of the lognormal distribution.
4. a) Show that if the random variable ^ has a lognormal distribution, the same
holds for r] — c£a (c > 0; a ^ 0).
IV, § 17] EXERCISES 233
b) Suppose that the diameters of the particles of a certain kind of sand possess
a lognormal distribution; let /(x) be the density function of this distribution (cf. Exer¬
cise 3), with m — — 0.5, a = 0.3; x is measured in millimeters. The sand particles
are supposed to have spherical form. Find the total weight of the sand particles which
have diameters less than 0.5 mm, if the total weight of a certain amount of sand is
given.
5. Let the random variable r] have a lognormal distribution with density function
1 (In x — mY
/(*) = —-— exp la"-
for x > 0.
2n ax
If the curve of y = f(x) is drawn on a paper where the horizontal axis has a logarithmic
subdivision, then (apart from a numerical factor) one obtains a normal curve. It does
not coincide with the density function of In rj, but is shifted to the left over a dis¬
tance <72.
6. Let the random point (f, rj) have a normal distribution on the plane, with density
function
1 x2 + y2
f(x, y) = exp
Ina2 2a'1
7. a) Let the random point (f, rj) have the same distribution as in Exercise 6. Show
that the angle 6 between the vector f = (f, rj) and the x-axis is uniformly distributed
on the interval (0, 2tt).
b) Determine the density function of 6, if the point (f, rj) has density
1 r l[xi , xS|]
2na.a, £XP[ 2 { of + o% J ‘
8. Let the density function of the probability distribution of the life-time of the
tubes of a radio receiver with 6 tubes be X2t e~*‘ for t > 0, where X = 0.25 if the
unit of time is a year. Find the probability that during 6 years no one of the tubes
has to be replaced. (The life-times of the individual tubes are supposed to be indepen¬
dent of each other.)
is called a Pearson distribution. Show that the following are Pearson distributions:
10. a) Let the point (£, rj) be uniformly distributed in the interior of the unit circle.
We put
V
Q = yj t? + V~» <P = arc tan
0 < C < 1. Show that cp = arc tan y, g = £2 + rf, and C are independent.
d) Find the general theorem of which a), b) and c) are particular cases.
Hint. The independence of the new coordinates results, in the three cases, from
the fact that the functional determinant of the transformation can be decomposed
into factors each containing only one of the new variables.
11. Let £ and r\ be independent random variables with the same density function
13. Let £ and r\ be independent random variables with the same density function
/<*) = —■- , 1 „ •
n e+e
k= 1
IV, § 17] EXERCISES 235
16. Let the random variables ilt £2, . . be independent and uniformly distrib-
n
uted on the interval (0, 1). Determine the density function of C = V i\.
^_|
17. Let the random variables iu i2,.... i„ be independent and uniformly distrib¬
uted on the interval (0, 1). Let = Rk(£2, . . £„) A: = 1, 2.n be the A-th
among the values £u . . i„ arranged in increasing order. a* is called the A-th order
statistic of the sample ,.. ., £„).)
a) Find the distribution function of the random variable it_h — 1;% (1 < A < A +
+ h < ti) and show that it is independent of A.
b) Find the distribution function of the ratio —— (1 < A < A + h < n).
£*+ h
c) Show that —-, —E,. . are independent and that their «-dimensionaI
S2 S3 sn
density function is
Hint. The r)u %>•••» Vn+i are exchangeable random variables and we have
n +1
E %=L
k= 1
19. The mixture with equal weights of the distributions with distribution function
Bk_n+1-k (x) (A: = 1, 2,...,«) is the uniform distribution in the interval (0, 1). How
could this be shown without calculation?
20. If the probability that a car runs at least x miles without a puncture is e Xx
with A = 0.0001, is it worth while to carry three spare tires on a trip of 12 000 miles?
21. Lei the random variables £ and rj be independent, let £ have an exponential
distribution with density function ke~^x (a > 0), and let r] be uniformly distributed
on (0, 2n). Put C, = £ ■ cos rj, C2 = £ • sin rj. Show that Ci and f2 are indepen¬
dent and have the same density function
22. Let fu £2be independent random variables, let the density function
of 4 {k= 1,2,...,*) be
where A > 0 and h is a real number. Find the distribution function of the sum
n
rj = ^ Ik and show that C = exp (— A ??) has a beta distribution.
A=l
23. Let h„(x) be the density function of Student’s distribution with n degrees of
freedom. Show that
lim
n-+ co
25. Let A be the disintegration constant of a radioactive atom. Let there be N atoms
present at the time 0.
a) Calculate the standard deviation of the number of atoms disintegrated up to
the time t.
b) Calculate the expectation and the standard deviation of the half-period (i.e. of
N
the random time interval till the — -th disintegration, if N is even).
26. a) Let ijk (fc = 1, 2,...) be the time required for the transformation of a radio¬
active atom Ax into an Ak+l atom, through the intermediary states A2, .. Ak, i.e.
the duration of the process
A\-* A2-+ . . . -*■ Ak + l.
Let further Xk be the disintegration constant of the Ak atoms, gk(t) the density function
of % and £*(0 the number of Ak atoms which are present at the time t. It is assumed
IV, § 17] EXERCISES 237
that at the moment 0 there are only A, atoms present and their number is equal to IV.
Find the distribution function of iqk and of £*(/) (k = 1, 2, ...).
Hint. Let Pk{t) be the probability that at the time t an atom is in the state Ak.
These probabilities can be calculated in the following way: The probability that an
atom Ak changes into an atom Ak + l during a time interval (t, t + At) is, by definition
of gk(t), equal to gk{t)At + o(At). On the other hand, the probability of this event
is as well expressed by Pk(t)XkAt + o(At)\ the possibility that during the time interval
(t, t + At) an atom passes through several successive disintegrations can be neglected.
Hence we have
Pk(t) = ~ gk(t). 0)
Ak
The expectation and the standard deviation of the number of Ak atoms at the moment
t can now be calculated, since we know that
D(ik(t)) = yjMk(t) 11 -
Remark. The atoms An+l not being radioactive, Mn+l(t) is evidently an increasing
function of time, hence mn+l = +oo ,
d) Show that t = 0 is a zero of order k — 1 of the function Mk(t).
27. Let <*, rj, f be the components of the velocity of a molecule of a gas in a
container. Let the random variables f, rj, ‘Q be independent and uniformly distributed
on the interval ( — A, -\-A). Calculate the density function fx(x) of the energy of this
molecule. Determine further the limit
Hint. Let the mass of the molecule be denoted by m and its energy by E, then
E = — + rf + t2) ,
hence
(21\-\
P(E <t)= /// dxdydz = for \l< A,
6 A3 m m
(*2 +
since the integral is equal to the volume of a sphere with radius , Thus
V m
71 It
fA{t) = /1 for < A,
m A A3 V m
hence
w(0 = c yj t (c = constant).
Y Wje-PE>
/= 1
This result was obtained under the assumption that E can only take on the discrete
values Ek. Let now the energy be considered as a continuous random variable. For
the density function of the energy we obtain in a similar way the expression
wfr) e~P'
P(t)
/ w(u) e /3“ du
where c i& a positive constant and c =-j~ . Calculate under these conditions, for
2c'2
the limiting case c —> +°°> the value of (3, the function p(t), and the distribution
of the velocity of the molecule.
Hint. With the above notations we have for c' —> + <=<>
E _ 3
~N ~~ ~2/T *
3 kT
It is known from statistical mechanics that where k is Boltzmann's
N 2
1
constant and T the absolute temperature. So (3 = and
kT
t
2yjt exp
~kT
P(t) =
^n{kT)
Let the velocity of a molecule be denoted by v and its kinetic energy by Ekin, then the
density function of v will be given by
dE kin v2 ( m
/(») = p(E kin)
dv T\Wt
This derivation of the Maxwell distribution coincides essentially with one usually
given in textbooks of statistical mechanics. (We return to this question in Chapter V,
§ 3.)
29. a) Calculate from the Maxwell distribution the mean velocity of the molecules
of a gas having the absolute temperature T and consisting of molecules of mass m.
b) Show that the average kinetic energy at the absolute temperature T of the
3
molecules of a gas is equal to — kT (k is Boltzmann’s constant).
c) Compare the mean kinetic energy of a molecule with the kinetic energy of a
molecule moving with mean velocity. Which of the two is larger?
30. a) Consider a gas containing in 1 cm3 N molecules and calculate the mean
free path of a molecule.
Hint. The molecules are considered as spheres of radius r and are supposed to be
distributed in the space according to a Poisson distribution, i.e. the probability that
a volume V contains no molecules is expressed by e~NV. The probability that the
volume AV contains just one molecule is given by NAV + o (AV). The meaning
of the statement that 4ta molecule covers a distance s without collision and then collides
on a segment of length zls with another molecule is just the following, a cylindti
of radius 2r and height $ does not contain the center of any of the molecules and
another cylinder of radius 2r and height As contains the center of at least one of the
molecules. Thus the probability in question is
4 r2nNe-4nN,ts As + o (As),
240 GENERAL THEORY OF RANDOM VARIABLES [IV, § 17
i. e. the distribution of the free path is an exponential distribution with density function
4jiNr- e~*nN'Zs .
Hint. Let the length of the free path be denoted by s, the velocity of the molecule
s
by v, then r = —, where r denotes the time interval studied, s and v can be assumed
to be independent, thus ^(t) = iT(.s).E | — j ; the first of these two factors is known
from Exercise 30 a), the second can be computed from the Maxwell distribution.
31. Calculate the standard deviation of the velocity and kinetic energy of a gas
molecule, if the absolute temperature of the gas is T and the mass of its molecules m.
C and the positive x-axis. Show that the density function of d is given by ~m
33. Choose at random a chord in the unit circle and determine the expectation
of its length under the three suppositions considered in the discussion of Bertrand’s
paradox (Ch. II, § 10).
34. Let ..., cn+m be independent normally distributed random variables with
1 **_
density function —;= e 2. Calculate the expectation and the standard deviation of
s]2n
C= £ + ••• + £
£« + l + • ■ • + Sn + m
35. Let mk be the median of the gamma distribution of parameter X and order k.
36. Let the distribution of the random variable £ be the mixture of the distribution
of the random variables €u...,£n with weights pk(k = 1,2,Show that
k=i
X Pk D\tk) + DXfi),
1
where p is a random variable assuming the values Mu Af2, ..., M„with probabilities
IV, § 17] EXERCISES 241
D2(£)> £ Pk D\£k);
*= 1
equality holds iff M, = M2 — ... = M„.
(x — m)2
fix) = -■ ■- exp
yj 2nd 2a2
Deduce £(£) = m from the fact that the function y = /(x) satisfies the differential
equation a-y = — (x — m) y .
b) Let the density function of the random variable £. be given by
Xm~~1 X
X
fix) = e (x > 0),
(m - 2)!
where m > 3 is a positive integer and A > 0. Calculate £(0 from the fact that the
function y — fix) satisfies the differential equation
/= y-
c) Apply the same method in general to Pearson’s distributions (cf. Exercise 9).
39. Suppose that there are 9 barbers working in a hairdressing-saloon. One shaving
takes 10 minutes. Someone coming in sees that all barbers are working and 3 customers
are waiting for service. What waiting time can he expect till he is served?
Hint. Assume that the moments of the finishing of the individual shavings are
independent and are uniformly distributed on the time interval (0, 10')-
40. Let £1; .... be independent random variables having the same distribution.
Prove that
+ ••• + !* i _ (1 <k< n).
£i + ••■ + !(! i n
41. Prove that if the standard deviation of the random variable £, with the distri¬
bution function Fix) exists, then
and
Hint. Let the //-dimensional density function of the random variables ??„... ,r?„ be
B
ffiyi, ■ ■ ■, y„) =
(2nf
exp t(Z E b»y>y)
/= 1 i=\
(i)
242 GENERAL THEORY OF RANDOM VARIABLES [IV, § 17
where |Z?| is the determinant of the matrix B = (bu). There can be given independent
normally distributed random variables £,k such that E(£k) = 0 and
n
A; = k=l
X! c‘k c'k °k' (3)
Hence the matrix D — (Z>iV) can be written in the form D = CS-1 C*, where S'-1
denotes the inverse of the matrix S and thus BD == CSC*CS~1C* = E, where E is
the unit matrix of order n. Thus the dispersion matrix D of the normal distribution
is the inverse of the matrix B of the quadratic form figuring in the exponent of (1).
43. a) Using the result of the preceding exercise, find anew proof for Theorem 2
of § 16.
b) Let be independent normally distributed random variables with
E^k) = 0, D(£fc) = a; show that if the matrix C = (cs) is orthogonal, then the random
variables
n
Vi ^ cik %k
k= 1
are independent.
c) Determine the ellipsoid of concentration of the «-dimensional normal distri¬
bution and prove Formula (12) of § 16.
d) What is the geometric meaning of Exercise b) ?
45. Counters used in the study of cosmic rays, radioactivity and other physical
phenomena do not register all particles hitting the apparatus; in fact, the latter remains
in a passive state for some time interval h > 0 after a hit by a particle, and does not
IV, § 17] EXERCISES 243
register any particle arriving before the end of this time interval. The number of
particles counted is thus smaller than the number of the particles actually coming in.
The average number of particles registered during unit time is said to be the “virtual
density of events” and is denoted by P; the average number of the particles actually
arriving during the unit time is said to be the “actual density of events” and is denoted
by p. (Every arriving particle renders the apparatus insensitive for a time interval h,
regardless whether the particle was registered by the apparatus.) As to the arrival
of the particles, the usual assumption made in the study of radioactive radiations
is introduced, namely that the probability of the arrival of n particles during a time
(of)" e~p>
interval t is given by-(n = 0, 1, ...) .
nl
a) Determine the virtual density of events P.
b) Determine that value of the actual density of events which makes the virtual
density maximal.
Hint. The probability that a particle arrives during a time interval At and is registered
is equal to the probability that the particle arrived during the time interval considered
and no other particle did arrive during the preceding time interval of length h. This
probability is approximately pe~pl'At, hence P = pe~p". If the passive period h is
known and P was experimentally determined, the above transcendental equation is
1
obtained for p . By differentiating we find that P has its maximal value, if p = — ;
1
then P = —-— .
eh
c) Calculate the distribution, expectation and standard deviation of two consecutive
registered particle-arrivals.
Hint. Suppose that an arrival was registered at the 4ime t = 0 and let W(t) be the
probability that at least a time interval t will pass till the following registered arrival.
It is easy to see that lV(t) satisfies the following (retarded) difference-differential
equation
W\t) = P(l - W(t - /;)) (t> h) (1)
and fulfils the initial condition W(t) — 0 for 0 < t < h . The solution of (1) is given by
1 = P ftW'(t)dt.
6
Hence the expectation M of the time spent between two consecutive registered arrivals
1 D
Observe that for h = 0 we have D =' — = M. For h > 0, we have —r < 1. Hence
P M
244 GENERAL THEORY OF RANDOM VARIABLES [IV, § 17
the fact that the apparatus has a passive period diminishes the relative standard
deviation of the distribution.
d) If the radiation has a too high intensity, a “scaler” is commonly used in order
to make the observations more easy. This apparatus registers only every &-th particle.
(In practice A: is a power of 2.) Calculate the virtual density of events for this case too.
Hint. First calculate the probability that during the interval (t, t -f- At) there arrives
a “k-th particle”, i.e. a particle having in the list of arriving particles a serial number
which is divisible by k. Clearly, the probability of this event is
pnktnk~1 e~pt \
ink- 1)1 J At + o(At).
As the factor of At depends alsd on t, the process is not stationary. But this dependence
on t is very weak when t is large; in fact, it can be shown that
pnk jnk — \ e~pt
p_
lim I (1)
/ —► + co n= 1 " (nk - 1)! k
— 1, 2,k — 1) are negative and co0 — 1 = 0 . Hence the probability that a particle
arriving between t and t + At is a “A>th particle”, is given, for a sufficiently large t.
46. Suppose that the expectation of the random variable £ exists and let a be a
real number. Prove that E(|£ — a\) takes on its minimum if a is the median of £.
47. Let £ be a random variable with distribution function F(x). Show that
E(H(0) = I” H(x)dF(x)
— 00
holds without restriction for every Borel-measurable function H{x), such that the
expectation £(#(£)) exists.
Hint. The value of F(/7(£)) only depends on the distribution of 7L(£), hence
on the distribution of £, since for every Borel set B P(/f(£) £ B) = P(£ £ H(B)),
where H~\B) denotes the set of the real numbers x for which H(x) £ B . Hence'
E(H(£)) does not depend on the fact on what probability space [Q, cA, P] the random
variable £ is defined; thus let Q be the real axis, c^the set of all Borel subsets of Q
and P the Lebesgue-Stieltjes measure defined on D by P(Iab) = F(b) - F(a), where
/„(, is an arbitrary interval a < x < b . Under these conditions £(x) = x(— oo<x<
< + 00) has distribution function F(.v), hence
+ 00
E(H(&) = r imdp = J H(x) dF(x).
Q .— CO
CHAPTER V
Let Jr= [Q, P] be a conditional probability space (cf. Ch. II, § 11).
A real valued function £ = £(co) defined for co £ Q is said to be a random
variable on if the level sets Ax of £ (Ax is the set of all co £ Q such that
£(co) < x) belong to for every real x. A vector valued function ( =
= , . . . , Q on 13 is said to be a random vector on if all of its com¬
ponents ^1,... ,^r are random variables on JK Since, by assumption,
is a cr-algebra of subsets of Q it follows that for every Borel set B of
the r-dimensional Euclidean space the set £_1£8) of all co £ Q for which
C(co) £ B belongs to the cr-algebra
If C is any fixed element of «$, ^fc = [Q, P(A | C)] is a Kolmogorov
probability space (cf. Theorem 6, § 11, Ch. II). Since every random
variable £ on JFis an ordinary random variable on dPc, the usual notions
can be applied to the random variables on ^c. Thus there can be defined
for every random variables £ on & its conditional distribution function,
its conditional expectation, etc. with respect to the condition C £ M. All
theorems proved for ordinary random variables are valid for the random
quantities defined on Jr with respect to a conditional probability space
for every given C. New problems, however, arise if we let C vary.
Let £ be a (real) random variable on Iab the interval a < x < b and
dab) the set of all co £ Q with £(co) £ Iab. Clearly, for every Iab the set
£-1(/ai) belongs to but it does not necessarily belong to Let <_y#
be the set of all intervals Iab C. f0 with Iab) £ where I0 is a given,
(possibly infinite), interval. The following two conditions are assumed to
hold for ty#:
Condition Nx. The set oy# is not empty; for Ix £ ^y# and /2 £ ^y# there
exists an /3 £ o'# such that Ix + /2 Cl /3 .
Condition N2. For Ix £ o#, U £ o'#, and Ix c /2 we have
p(r1(/i)irU))>o.
Conditions Nx and N2 are evidently fulfilled if o-# consists of a single
element only. Let J be the union of all intervals / £ The set J C /0 is
<_y#.
246 MORE ABOUT RANDOM VARIABLES [V, § 1
and
P(Z-XlXc)\Z-\lgnbn))
W= an<x< c0.
{c < d and F„(d) - F„(c) > 0 follow from our assumptions). Furthermore,
for an< x < bn and for A > n
Fn(*) = 4W-
Therefore the value of P„(.x) does not depend on n and we can omit the
index n by wiiting simply
The function F(x) is defined everywhere on the interval (a, ft), it is non¬
decreasing and leftcontinuous; for Icd £ ^# we have F(d) - F(c) > 0 and
for c < a < b < d the relation
m-F(a)
F(d) ~ F(c)
(2)
in IQ; let a and ft denote the endpoints of J. Then there exists a nondecreasing,
leftcontinuous function Fix) defined on (a, /?), such that for Icd £ we have
F{d) - F(c) > 0 and for Iab C Icd the relation
Fib) - m
(3)
F{d) - F(c)
holds.
it follows that
f g(x) dx
AB
PiA\B) =
\g{x)dx
for A £ and B £ 3d. (9)
A
Put £(co) = co(—co < co < + oo). Then £ is a random variable on the
conditional probability space y
= [Q, txf, 3d, P], If 70 = (—oo, + oo),
is identical to 3d and all conditions of Theorem 1 are fulfilled; hence £
has a distribution function F(x) and indeed
F(h~\y)) = G(y)
.f{h-\y))
= 9(y)
\h' {h~\y)) I
is a density function of rj = h(c).
fix) =
b b
j xdF(x) j x /(x) dx
a a
E(£ \ a< ^ < b) = (ll)
m m -
j fix) dx
a
(Clearly, the value of E{f \ a < £ < b) does not depend on the choice of
Fix) or f(x).)
250 MORE ABOUT RANDOM VARIABLES [V, § 1
a+ b
E(Z\a<t;<b) = -^r
£2is thus y/x and its density function is —-L= for jc>0. Hence for 0 < a < b
V x
b2 _
j yjX dx
a2 + ab + b~
E(? | a < £ < b}=E(? | a2 < < b2) b°-
3
and consequently
a~ + ab + b1 (b - af
D\^\a<^<b)
12
in accordance with the fact that under the condition a < £ < b, c, is
uniformly distributed on the interval (a, b) and the standard deviation of
A,F = 4J... 4
(12)
I i
is valid.
and k is the number of the values of i for which x,- < xj0).
Like in the proof of Theorem 1, we see that F(xx,.. . , x,.) does not
depend on the choice of /2 . Clearly, F is nondecreasing with respect to
each of its variables, AjF> 0 and (12) is true. Theorem 4 is thus proved.
Every function F fulfilling (12) is said to be a distribution function of (
on J. The distribution function is not uniquely determined; if Fis a distri¬
bution function of £ and p is any nondecreasing function of r — 1 of the
variables x1#. . ., xr then for every X > 0
dr F
f(xx,...,xr) = (15)
dxx... dxr
^F^XOFM-FtiaS), (16)
i =1
/ = n /)(**) >
;=i
(17)
| g(pc)dx
AB
P(A | B) =
f 9(x) dx
B
where k is the number of the values of i for which x, < 0 and g(x) is the
density function of £.
In the case <?(x) = 1, £ is uniformly distributed on the whole space Er.
In this particular case we can put
P(C~1(QI£-1(£))
H{x)=\F(x-y)dG(y).
6
If
a<,x+y<b_
dF(x) dG(y)
P(a <^ + rj<b\c<^ + t]<d) =
JJ
c<,x+y<d
dF(x) dG{y)
H(b) - H{a)
H(d) - H(c) ’
Kx) = ]Ax~y)dG(y)
o
Example 10. Let the random vector (£l5. . . , £„) be uniformly distributed
on the ^-dimensional space, and put
Z* = £i + • • • +&
i (x - y)
We obtain by induction
"-1
hn{x) = x2 for x > 0.
(ij = P(B | Ak) for every co £ Q such that £(co) — Instead of (1), the
notation r/ = P^B) will be also used.
Let U denote any Borel set of real numbers and £_1(t/) the set of all
co £ Q such that £(co) £ U. Let further be the family of the sets £-1(£/).
We have thus The family $ is a cr-algebra since
Thus if we would have the inequality 1 - f(co) < 0 on a set C with positive
measure, this would imply P(CB) < 0, which is impossible.
We shall call the random variable /(to) the conditional probability of the
event B for a given value of £ and shall denote it by PAB). Thus we can
write
P(AB) = $P((B)dP (6)
A
P(A)= f1 -dP,
A
Ps(G)=l. (8)
One can prove in a similar manner that with probability 1
holds for every Borel set of U of the real axis. Here F(x) denotes the
distribution function of £ .
Obviously, the relation g(f(oo)) = f(cd) holds for almost every co £ Q .
If the random variable P(B \ £ — x) is defined by the function g{x) of
formula (9), then by definition it only depends on x; further for almost
every co £ Q P(B [ = x) = PfB; co) where x = £(co).
If A is a fixed set, A £ P(A) > 0, then P(B | A), considered as a
function of the set B £ tAf, is a probability measure. We shall now discuss,
how far this remains valid for PfB). Suppose Bk (k = 1,2,...),
00
£ and define the random variables PfBk) (k = 1,2,...) and P/fB) as above.
Then, for A £
P(ABk) — \ PfBk)dP (10)
a
and
P{AB) = \Pi(B)dP.
A
£ P(ABk)=P(AB)
k=1
it follows that
P(AB)=\(£pt(Bk))dP,
A k=1
OO
hence £ P?(Bk) fulfils relation (6) which defines PAB). Thus with proba-
/c = 1
bility 1
00
The elements co for which the relation (11) does not hold form thus a
set C of measure zero, i.e. P(C) = 0. Since PfB) is determined only almost
everywhere, one cannot expect to prove more than this. The exceptional
set C depends on the sets Bk and the union of the exceptional sets corre¬
sponding to the individual sequences is not necessarily a set of measure
zero since the set of all sequences {Bk} is nondenumerable if has infinitely
many elements. Thus we cannot state that for a fixed £, P^B) as a function
of B is a measure; in general this is not true.
In practice, however, this fact causes scarcely any difficulty at all. In most
cases, the conditional probability PfB) = P(B | £ = x) is studied simul¬
taneously for nondenumerably infinitely many B only when the con¬
ditional distribution of a random variable 17 is to be determined with respect
to the condition £ = X; i.e. if the probabilities
1 < y 1£ = x)
are to be considered for every real value of y . If these conditional proba¬
bilities can be defined in such a manner that P(r\ < y | £ = x) is a distri¬
bution function with probability 1, then this function is said to be the
conditional distribution function of r/ with respect to the condition ^ = x
and is denoted by F(y | x):
F(y I x) = J f(t | x) dt
— 00
+ 0°
/(*) j
= Kx> y) dy
be the density function of Let £~\U) and £-1(F) denote the events
^ 6 L and rj £ V respectively, where U and V are Borel sets on the real
axis. Assume that the function /(x) is positive for x ( L Then
hjx, j)
Pin € vff 6 h(x, y) dxdy dy\f(x)dx,
x£U
/(*)
x(.U y £V
y£Y
v, § 2] NOTION OF THE CONDITIONAL PROBABILITY 261
hence
P(n = vl« = *) = j
ykv
thus the conditional density function g(y | x) of rj with respect to the con¬
dition £ = x is given for the x values which fulfill f(x) > 0 by
*”'*>-7W- (15)
giy | x) is not defined for those x values for which /(x) = 0.
Similarly, if giy) is the density function of rj and/(x | y) is the conditional
density function of £ with respect to a given value y of ij (i.e. with respect
to the condition n = y), we find for g(y) > 0 that
a* w = ^f.
g(y)
06 )
2. Let £ and n be independent random variables and F(x) the distribution
function of £ . Then
exp
2n
Let q and $ (0 < & < 2n) be the polar coordinates of the point (£, n)-
Find the conditional distribution of # with respect to the condition
o — r > 0. We have
dx dy,
P(0 < 0 < <p, £2 + >?2 < *2) = (*2 + y2)
0<,&<<p
x2+y2^R2
262 MORE ABOUT RANDOM VARIABLES [V, § 2
Hence & is uniformly distributed in the interval [0, 2 n) under the condition
q — r for every r > 0 and thus § and q are independent.
1 for co £ B,
0 otherwise.
In fact, if A = ,,
P{AB) = $ dP = $XB(co)dP,
AB A
where
1 for co £ B,
Xb(«>) =
0 otherwise.
Pi(B) = Xb(co).
6. (Particular case of 5.) Let Q be the interval [0, 1], let be the set
of Borel subsets of Q and P the Lebesgue measure. Put
_ f 1 for to £ B,
PAB) =
[ 0 otherwise.
7. Let Q be the unit square of the plane (x, y), the class of the Borel
subsets of Q and P the two-dimensional Lebesgue measure. Put £(x, y) = x.
Since, for every B £ ^ and for any Borel set U of the real axis (according
to the theorem of Fubini),
P(BZ~\U))= f( f dy)dx,
U (x,y)£B
we find
P(B | £ = *o) = j dy = y(BXo),
(x0,y)£B
Let 3^= [Q, 3d, P(A | 5)] be a conditional probability space and £
a random variable on 3 . Let B £ and C (i 3d be given sets, with
P{B | C) > 0 and let be the least tr-algebra with respect to which f
is measurable. Consider the measures yc(A) = P(A | C) and vc(^4) =
= P(AB | C) on vc(A) is absolutely continuous with respect to yc(A);
there exists thus, by the Radon-Nikodym theorem, a function measurable
with respect to f{co) = PfB I C) such that
Let us point out the following circumstance. If A(x) is the set of all
co £ Q for which £(co) = x, it may happen that the sets CA(x) belong
to the family for some values of x or even for every one of its values
and thus P(B \ CA(x)) is defined. But a priori it is not at all certain that
P(B | CA(x)) coincides with P^B | C), i.e. that
This regularity property does not follow from the axioms and if necessary
it must be postulated as an additional axiom.
Consider now the following important particular case:
Let Q be an arbitrary set and a a-algebra of subsets of Q. Let further
P be a o--finite measure on and let be the family of sets
P(AC)
If C ( J and fiCC, it follows from (2) by putting pc(A)
that
KQ
p(AB)
P(AB | C) = fj(a>,B)dMc.
M(C) A
(3)
P{AB\C)=\P£B)dpc (4)
A
then g(y \x) will be called the conditional density function of ip with respect
to the condition £ = x.
Let £ and rp be random variables defined on y with the two-dimensional
density function h{x, y); assume that the integral
+ CO
exists for every jc. Then f(x) is a density function of In fact, if U and V
are two intervals, U c: V, we have
j f(x) dx
P(Z£U\Z£V) = (8)
$f(x)dx
if z-w
In this case the conditional density function of g with respect to the
condition £ = * is equal, for f(x) > 0, to
Kx- y)
g(y I x) = (9)
fix)
PjKx,y)dxdy i(ig(ylx)dy)Ax)dx
y(W u w (10)
P(rp£W,UU\UV) =
f f{x)dx f f(x)dx
266 MORE ABOUT RANDOM VARIABLES [V, § 3
+ OO
1. Let the point (£, rf) be uniformly distributed in the domain of ihe
plane defined by | x2 — y21 < 1. The density function h(x,y) of (<f. i/) is
+ 00
/(*)=! h(x,y)dy.
hence
Hence £ is. for >7 — y (| y [ < 1), uniformly distributed on the interval
(“ s/y2 4 1,+ Jy2 + 1).
is thus
h(x,y)=f(x)g(y),
9(y I x) = g(y).
The conditional density function g(y | x) does not depend on the value
of x.
3. Let (^l5 £2,.. ., £„) be a random vector uniformly distributed in the
whole ^-dimensional space and let rin = + £, 1 + . .. + <j^. Determine
the conditional density function of £k with respect to the condition rj„ =
"-i
= y(y > 0). We know already that the density function of rjn is y2
for y > 0 (§ 1, Example 10). It follows that the two-dimensional density
function of and r]k is
0 otherwise.
For the conditional density function of £k with respect to the condition
vjn = y we find thus
n-3
fn(x\y) =
~s~
for | x | < Jy, (12)
J fn (x I y) dx = 1. (13)
-fy
From (12) and (13) it follows
(14)
n |
L (x I y) =
l f tJ for s/y <x <+Jy. (15)
j*y r n— 1
2
268 MORE ABOUT RANDOM VARIABLES [V, § 3
hence every £k (/c fixed) has in the limit a normal (conditional) distribution,
if the condition imposed is rjn — no2 and n tends to infinity.
^ k=1
tfl + ll + Cl),
where m represents the mass of a particle of the gas. The conditional
density function of the distribution studied is, by the above example.
3n
r
2E m Jt2 7WV 3”-3
An r 1
m 2nE 3n — 1 2E
2
2E
for IaI < (17)
m
3
By taking into account that E = — kTn (k Boltzmann's constant, T the
absolute temperature of the gas) we find for the conditional density function
h„{x | T) of the velocity components £k, rjk, Ck at constant temperature T
3n |
i
r 3n-3
2 ) x2 m
hn{x\T) = 1 2
1 3nkTn ^ 3n — 1 3 kTn
(18)
yj m 2
for
hence
1 2kT
lim hn (x | T) — (19)
n-*- + co 2nkT
m
vk = \j^k + vik + C| j
Vk^s/tk + vl + Ck'*
if we put
bSn = Z
k=l
& + ’ll + Cl
and if hn(v, y) is the two-dimensional density function of vk and rj3n, we
have
3/1 — 5 _
3« —5
v2
yn ip\y) = Dn- 1 -
2
for 0 < v < Jy, (21)
y
y
the constant D„ being determined by
fy
V„ (v \y)dv = 1. (22)
270 MORE ABOUT RANDOM VARIABLES [V, § 4
Hence
(3 n
D„ = (23)
" -J* r f3" ~ 3
If W„(v | T) denotes the conditional density function of the velocity of the
particles at a given absolute temperature T, we have
2
f 2J 3n |
(v\ T) = Vn V x
m Jn 3 nkT.) [3n - 3
’ r
mv,2 \Sn — 5
3nkT
x 1 for 0 < v< (24)
3nkT) m
The distribution with density function (24) is called the Maxwell distribution
of order n. As we have already seen, it tends for n -> oo to the ordinary
Maxwell distribution, i.e.
v*m
lim Wn (u | T) = 2k T
v2, e (0 < V < + 00). (25)
\E(t1\OdP = YJE(rj\Ak)P(AAk),
A A =1
v, § 4] GENERALIZATION OF CONDITIONAL EXPECTATION 271
hence
\E(r,\OdP = Sr,dP, (1)
A A
v(A)=\t]dP (AZisfJ;
A
v(A) = f/(o>) dP
A
f 1 for co ^ R,
TbO^O | q
then
v(A)= f rjBdP = P(AB),
A
and
E{f!B \0 = ps (B).
The conditional probability P^B) of B for a given value of Z may thus
also be considered as a conditional expectation.
Of course one may ask whether E(r\ | Z) is with probability 1 equal to
the expectation of the conditional distribution of r\ for a given value of Z
(i.e. to the expectation of the distribution P^rj-^V)). The response is
affirmative, provided that P^{r]~\V)) is with probability 1 a probability
distribution. This can always be achieved, as we have already seen. In this
case
E(rj \0 = f ijdPt* (3)
h
272 MORE ABOUT RANDOM VARIABLES [V, § 4
with probability 1. In order to prove this it suffices to show that for every
A £ the relation
^\r,dPt)dP = \VdP (4)
A h A
holds. Obviously, this relation is fulfilled for // = t]B, where rjB is the
indicator of the set B\ indeed in this case
§Ps(E)dP = P(AB) -
A
Relation (6) furnishes a new proof of the fact that, for independent £ and
*1’ = E{^)- E{rj) (cf. Ch. IV). In fact, it follows from (2) and (6) that
jE(E(fj\0\ff(0)dP=jridP (10)
= f (Cl + C2 y]2)dP.
A
+ 00
f(x) = j h(x, y) dy, (1)
— 00
+ oo
g(y) = f k(x, y) dx, (2)
274 MORE ABOUT RANDOM VARIABLES [V, § 5
Kx\y) =
Kx>y) for g(y) > 0, otherwise arbitrary, (3)
9{y)
and
+ oo
9(y) = J g(y I x)f(x)dx. (6)
— oo
g(ylx)=m>m
Ax)
ror /(,)>„, (7)
hence by (5)
9(y I x) Ax\y)g(y)
(8)
I f(x I t)g(t)dt
J7(*l y)dG(y)
P(a<ri<b\Z = x)= —^-, (10)
J f(x\t)dG(t)
— oo
Let it be mentioned that h(x, y),f(x), g(y) are only defined up to a constant
factor. If f(pc | y) and g(y \ x) are computed by (3) and (4) or (8), this factor
disappears. The obtained density functions f(x | >>) and g(y | x) are already
so normed that their integral from — oo to +00 has the value 1.
E(E(r,\0)=m- (1)
For the variance of E(rj | f) we have
Proof. We have
rj - E{rj) = [17 - E(rj [ 0] + [E(rj | 0 - E(g)],
therefore
Remarks.
(6)
AC
Then by Theorem 1
0 <Kffrj) < 1. (7)
E(ln- £(>|lf)]2) = 0,
r, = E(rj\0; (8)
1/ is thus measurable with respect to | and therefore it can be written in
the form rj = g(£). Conversely, if r\ — gif), then *1 is measurable with
respect to thus (8) is valid with probability 1, therefore it follows
that Kfifi) = 1.
Unlike the correlation coefficient, the correlation ratio is not symmet¬
rical. To characterize the dependence between ^ and rj both quantities
Kfit]) and Kff) can be used, provided that the variances of both £ and rj
exist and are positive. The conditional expectation E(rj | if) can be charac¬
terized by the following property:
Theorem 4. If £ and i; are any two random variables and D2(fi) is finite
and if g(x) is a Borel-measurable function, then the expression
E([« - 0({)f I {) = K-l ' UK))2 dPe S K"! “ E(l I 0)* dpf <1 °)
It follows from (9) and (10) that
g(0=E(r,\{)
In particular, it follows from Theorem 4 that for any two real numbers
a and b
DM
U = h = E{r]) ~ ■ 03)
The line
D(n)
y-m = R(t, n) [x - E(0] (14)
where y = g(x) runs through the set of all Borel-measurable functions for
which the expectation and variance of g{£) exist. The relation
<r2 (£» n) = E X
'tP(Ak)P(Bj)
(2)
It is clear that <p(£, i/) is zero iff £ and tj are independent. If the number
of the values xk is n and that of the y,-s is m with m > n, then
(p2^,r])<n-\, (3)
1 <«>
It can be seen from (4) that in (3) the sign of equality holds iff for every
/c and for every j either P(Ak Bj) = P(Bj) or P(AkBj) = 0. Since, however
this cannot occur unless for one kj the relation P(AklBj) = P(Z?y) and for
the other k ± kj the relation P{AkBj) - 0 is valid. But then £ = xkj for
rj = yj and consequently £ = f{r\).
Conversely, if f = /(>?), rj) = n - 1. If both £ and 17 assume
infinitely many values, the series on the right of (1) may be divergent;
in this case <p(£, rf) = +00.
Before defining the contingency for arbitrary random variables, the
notion of regular dependence will be introduced. Let £ and 1/ be any two
random variables. If C is an arbitrary two-dimensional Borel set, we put
Let and B be Borel sets on the x-axis and the y-axis respectively; put
Let A x B denote the set of the points of the (x, y)-plane for which x £ A
and 7(5. Define the measure Q{C) for the two-dimensional Borel sets
of the form C = A x B by
holds. If F(x) and G(y) are the distribution functions of E, and 17, respectively,
and if A and B are any two Borel subsets of the real axis, then the function
k{x, y) satisfies the relation
P{AkBj)
for x = xk, y = yp
P(.Ak) P(Bj) (10)
Kx, y)
0 otherwise.
h(x, y)
k(x, y) (11)
.f(x)g(y) ’
or equivalently by
+ oo +00
<p2 (E, n) = f f k2 (x, y) dF(x) dG(y) - 1. (13a)
— oo—oo
282 MORE ABOUT RANDOM VARIABLES [V, § 7
h\x,y)
dxdy — 1. (13b)
f(x)g(y)
+ 00+00
But by assumption
+ 00+00
+ 00+00
i?(«(0, v(rj)) = J
— oo
j u{x) v(y) [/c(x, j) - l] dF(x) dG{y).
—00
(17)
where u(x) and v{y) assume all Borel-measurable functions for which
expectations and variances of u{f) and v(rj) exist, can also be considered
V, § 7] DEPENDENCE OF TWO RANDOM VARIABLES 283
a) =
b) 0 < tj)< 1;
c) tf y — a(.x) and y = b(x) are strictly monotonic functions, then
<A0(0, m) = v);
d) ij/(£, rj) = 0 iff c and tj are independent;
e) if there exists between £ and tj a relation of the form U(fi) = V(rj),
where U(x) and V(y) are Bor el-measurable functions with D(U(fi)) > 0,
then iK£, rj) = 1;
f) we have
I h) I ^ min (K< (rj), Kn (£)) < max (K\ (rj), Kn (0) < <K£, f) ^ rj).
Proof. Properties a), b), and c) are direct consequences of the definition.
If £ and rj are independent, clearly i//(£, rj) = 0. Conversely, if
As a and b are arbitrary this means that £ and rj are independent, hence
d) is proved. If U(f) = V(rj) with D(U(f)) > 0 we know that
R(U(0, V(rj)) = 1,
hence i//(£, rj) = 1. Property f) can be deduced by comparing the defini¬
tion of maximal correlation to Theorem 1 of this § and to Theorem 5
of § 6.
A further notion which we want to study here is that of the modulus
of (pairwise) dependence of a sequence of random variables. Let
£l5 . be a (finite or infinite) sequence of arbitrary random
variables. We define the modulus of dependence of the sequence {£„} as
284 MORE ABOUT RANDOM VARIABLES [V, § 7
the smallest positive real number A satisfying for all sequences {xn} with
Z *^ < +oo the inequality
11
n
Z
m
(J xn xm) I ^ A Z xl
n
(19)
Z Z W*
n m
xn Xm
n m
IEn m
I* PM; n
(2i>
then for every random variable rj for which Eiyf) exists we have
X£2070<£0r). (23)
= m,)- (24)
Obviously,
1
n - > 0. (25)
1
(27)
~ Z a\ S £(,=), (28)
£„=/„(£„)• (29)
I Z Z £(C. U *„
n m
| < ££n m
«{„, UI x.\ • I x„, I < A Z
n
(31)
Thus the lemma can be applied to the sequence {£„} with B = A, provided
286 MORE ABOUT RANDOM VARIABLES fV, § 8
E(n \U~E(r,)
(34)
JnKn) Din{r()
where D^irj) = B(E(rj | £„)) is the standard deviation of E(rj \ £„). Then
according to Theorem 5 of § 6
D\t,)Y.KUtj)<AD\,i). (36)
n
1 With the help of this generalization Renyi succeeded to prove that every positive
integer n can be written in the form n — p + P, where p is a prime and P is the product
of at most K prime factors; K denotes here a universal constant. Cf. A. Renyi [2],
v, § 8] FUNDAMENTAL THEOREM OF KOLMOGOROV 287
of real numbers. /7„ the function, defined on £2, projecting Q upon the
subspace Qtl on the first n coordinates ofco; i.e., for co = (col5 co2,... ,co„,...)
we put
nn CO = («1, (02,. . ., (On). (2)
For A c Q. let H„ A denote the set of all elements of C2„ which can be
brought to the form y = TItl(o with co £ A.
Let now d c .Q„ be any subset of Qn. We shall call the set of elements
0) — (&>!, <u2, . . . , co„ , . . .) such that IJnco = (col5. . . , co„) £ A an n-dimen¬
sional cylinder with base A; we shall denote this set by n~\A).
If A is Borel-measurable, the corresponding cylinder set is said to be
a Bor el cylinder set. Let be the set of all Borel cylinder sets; is an
algebra of sets. To see this let us remark that an n-dimensional cylinder
set is at the same time an (n + w)-dimensional cylinder set as well. In fact
A + B = TIf\A' + B'),
a — b = n^\Ar - By,
Hn^(An) — nN\M(AN+M)
Pn+m(An+m) = P/vC^nP
Consequently, the definition of P(A) does not depend on the base figuring
in the construction of A.
Clearly, the set function P(A) is nonnegative; it is easy to show that it
is (finitely) additive. If A £ B £ AB = 0, then, because of
A = n~N\AN), B = n~N\BN),
(We made use of the fact that the value of P(A) does not depend on
the dimension of the chosen base of A.) It is further clear that P{Q) =
— Pn{Qn) — 1. It remains to prove that P{A) is not only additive but
also or-additive on By Theorem 3, § 7 of Chapter II it suffices to show
that P has the following property:
CO
cannot be empty.
It can be assumed without restriction of generality that An is an n-dimen-
sional cylinder; in fact if dn denotes the exact (minimal) number of dimen¬
sions of A,., we have dn < dn+1. Further lim dn — + oo can be assumed,
n-*- + co
since in the case of d„ <, d our assertion would follow from Pa(A) being
v, § 8] FUNDAMENTAL THEOREM OF KOLMOGOROV 289
P(.A, - D.) = P(A. (C, + ... + C„)) < 2 PM, - C*) <
k=l
/c = 1 2 ’
hence
Thus the set Dn cannot be empty for any value of n. Choose now in Dn
a point ai(n) = (o/^, coty ,, ca("}). Then a sequence {«,} can be given
with
lim co(kni) — cok (k = 1,2,...)
y-^ + oo
(G. Cantor’s “diagonal method”). Since all Z„ are closed, for every n
(«1, • • • A-*n) £
OO
Z)„ is not empty. Similarly, A„ cannot be empty either, and thus our
77 = 1 77=1
assumption leads to a contradiction. Hence we must have p = lim P(^„) = 0
n ->- + 00
and P is cr-additive on it follows that the extension of P is cr-additive
on the a-algebra We have proved that [£?, P] is a Kolmogorov
probability space Put therefore
§ 9. Exercises
1. Let there be given in the plane a circle Cx of radius R with its center in the
origin, and a circle C2 concentrical with Cx having a radius r < R . Let us draw a
line d at random which intersects Cu so that if the equation of d is written in the form
x cos cp + y sin cp = q,
cp and q are independent random variables, <p being uniformly distributed in (0, n)
and q in (- R, +R) . Let £ denote the length of the chord of d inside C2. Determine
the distribution function, expectation, and standard deviation of £ .
for x < 0,
1 x‘
P(£ <x\(p) = r2-— for 0 < a < 2r,
~R
l for 2r < x.
At the point x = 0 the distribution function has thus a jump of the value 1 —— .
This leads to
r2n
£(0 =
2Ji
and n2
Hint. We have E(£) = E(E| 9?)), where 99 has the same meaning as in Exercise 1.
E(£ | (p) is equal to the integral along the chords of the domain B lying in a given
direction, divided by 2R; for fixation of 99 means restriction to the chords which
form an angle 99
71
+ — with the x-axis. Hence E(£ \ <p) 1*1 |2?| being the area
2R
of B. We see that E(0 = 'll . It is not necessary to require the convexity of B neither
2R
that it be simply connected.
3. Let there be given in the plane a curve L consisting of a finite number of convex
arcs and contained in a circle C of radius R. Choose at random (in the sense explained
in Exercise 1) a line d intersecting C. What is the expectation of the number of the
points of intersection of this line with L?
Hint. Consider first the particular case when L is a segment of length / of a straight
line. In this case the number v of points of intersection is 0 or 1. If 95 is the angle
between the normal to d and the segment L, the expectation under the condition of
/ cos 99
fixed 99 is E(v | 99) = -———. This leads to
71
2
From this it follows for polygons, and by a limit procedure for all piecewise convex
I £ I-
(or concave) curves L, that E(v) =-, where | L | is the length of the curve L.
tzR
Hint. We have
7V
2
2n rn+1 (
E(£n) = ——— I sinn+1 # d&.
0
Note. Exercises 1 to 4 present well-known results of integral geometry1 from a
probabilistic point of view.
5. Establish the law PV= RT for ideal gases on the basis of the kinetic theory
of gases. V denotes here the molar volume, P the pressure, T the absolute temperature
of the gas, further R = Nk where N is Avogadro’s number and k Boltzmann’s constant.
Hint. The pressure of the gas is equal to the expectation of the quantity of motion
imparted by the molecules of the gas during unit time to a unit surface of the vessel
wall. We assume that the shocks are perfectly elastic. If a molecule of mass m and
velocity v strikes the wall in a direction which forms an angle ft with the normal
vector of the wall, then the quantity of motion imparted by the molecule will be
2 mv cos ft. In order to strike a unit surface K of the wall during a time interval
(t, t + 1), the molecule of velocity v moving in a direction which makes an angle
with the normal vector to the wall has to be included at the time t in an oblique
cylinder of (unit) base^f and height v cos ft. Under the assumption that the molecules
are uniformly distributed in the recipient, the probability of the shock in question
v cos ft-
is ——— , where W denotes the volume of the vessel. Hence the expectation of
the quantity of motion imparted to the wall by the considered molecule will be
2 v~m cos2 ft 4 e cos2 ft
-—-= -—-, where e is the kinetic energy of the molecule. The
W W
4e cos2 ft
quantity-—— is a random variable. Hence we have to calculate its expectation.
W
(Here the relation | ??)) = £(c) is to be applied.) If the velocity components
are supposed to be independent and to have normal distributions with the density
1 f x2 \ jkT
function-= exp — ——- where a — . — , then ft and e are independent and
a^J In \ 2a I V m
the distribution of the direction of the velocity vector is uniform. Hence
4e
E cos-1 4r E(e) £(cos2 ft).
~W
3 1
We know already (Ch. IV, § 17, Exercise 29b) that£(e) = — kT. Since £(cos2 ft) =
eT’
we find
kT
E cos2 ft
W
for the expectation of the “pressure” exerted upon the wall by one molecule. Since
there are N molecules in a gram molecule of gas, we find for n gram molecules, because
of the additivity of the expectation, the value
p_ nNkT NkT RT
~ w ~ ~ ~V ’
w
where V —- is the molar volume and R — Nk the ideal gas constant.
0 otherwise.
V, § 9] EXERCISES 293
8. Let the random vector (£, rj) have the normal density function
V) =-7= .
^ AC
E(y | S) = - -|r I,
£■(11 y) = — -4-1?,
l*l_
*„(!) = (»?) = I m y) I
y/AC~'
9. If the random vector (£, 57) has a nondegenerate normal distribution, show that
10. If the functions a(x) and b(x) are strictly monotone, then
| 1 for to £ A,
£(«) =
[ 0 otherwise,
] for a> 6 B,
y( co)
0 otherwise,
then
V(£, y) = <P2(£,»?) = ATK*?) = K'jiS) = R2(l rj) =
13. Prove the following variant of Bayes’ theorem: Let £ be a random variable
with an absolutely continuous distribution with the density function f(x) and let r\
be a discrete random variable. Let yk (k — 1,2,...) denote the possible values of
rj and pk(x) the conditional probability P(i) = yk \ £ = x). Let f k(x) be the conditional
density function of £ given rj = yk. We have
fk(x) = -.
J Pk(Of(t) dt
— 00
Hint. By definition
hence
J Pk{t)f{t)di
P(g < x,rj = yk) — CO
P(£ < x\rj = yk) =
nv = yk)
J Pk(t)f{t) dt
14. Suppose that the probability of an event A is a random variable with density
function p{x) (p(x) — 0 for x < 0 and 1 < x). Perform n independent experiments
for which the value P(A) = l; is constant and denote by rjn the number of the experi¬
ments in which A occurred. Let pnk(x) be the conditional (a posteriori) density function
of £ with respect to the condition rjn = k {k = 0, 1, 2,... n); according to the preceding
exercise
xk(\ — x)n kp(x)
Pnk(x) =
J tk{ 1 - t)n-kp{t)dt
0
a) Show that if t, has a beta distribution of order (r, s), then ^ has under condition
V„ — k a beta distribution of order (k + r, n — k + s).
b) Ifp(x) is continuous and positive on (0, 1) and if /is a constant (0 </< 1),
then
■V2
/(i -/) /(I -/) 1
lim ■ Pn. [fit] /+ y e 2
n —► -f co
v'27r
15. Let C be a random variable and let £t, ^2» • • •, be random variables which
are for every fixed value of C independent and have a normal distribution with
expectation t and standard deviation a {o > 0 is a constant). Let p{x) be the density
function of £ . Study the conditional density function pn(x |y) of C under the con¬
dition
+ • • • +£„
and show that if p(x) is positive and continuous, we have, for fixed x and y,
Pn \y + -= y
lim V" 2(72 .
n —► -j- oa
Vn
V, § 9] EXERCISES 295
16. Let £ be a random variable with an exponential distribution. For every given
value of £, let £t, £2 , ..£„ be independent normally distributed random variables
with expectation £ and standard deviation a > 0 . Determine the conditional distri¬
bution of £ with respect to the condition
fi + £i + * • . +
17. Let /i be a random variable having the density function p(t). Let for every given
value of g the random variables £ls. . ., c„ be independent, normally distributed,
with expectation g and standard deviation a > 0. Show that £lf. .. are exchangeable
(cf. Ch. IV, § 17, Exercise 18).
18. Let , . .., be independent random variables having the same distribution
and finite variance. Put t]n= £t + £2 + • • • + 5« (« = 1,2,..., TV). Calculate the
correlation ratio Knn (?;N) (n < TV).
19. Let fj,. .., £n be independent random variables with the same distribution.
Put rj„ = £, + . . . + ln (n = 1,2,..., TV). Calculate the contingency
ditional probability space. Let g{t) = ---(0 < t < 1) be its density function.
*(1 - 0
Let p be constant during the course of n independent experiments and let r\„ denote
the number of those experiments in which the event A occurred. Calculate the
a posteriori density function and the conditional expectation of the random variable
p with respect to the condition tj„ = k (0 < k < n).
22. Let £ be a random variable with Poisson distribution and expectation A, where
A is a random variable with a logarithmically uniform distribution on (0, + 00).
Calculate the a posteriori density function and the conditional expectation of A with
respect to the condition £ = n > 1.
Hint. Bayes’ theorem gives for the conditional density function of A with respect
to the condition £ = n:
Xn-le~l
9n0) =
(« - 1)!
23. Let the random variable £ have a normal distribution N(ju, a), where [i is a
random variable uniformly distributed on the whole real axis. Determine the a posteriori
density function and the expectation of /li with respect to the condition £ — a.
1 0* - a)2
9(j* I £ = a) = exp
2(72
24. Let £[, £2) ..., £n be independent random variables having the same normal
Hint. By assumption, the n-dimensional density function of the £k(/c = 1,2, .... n) is
X exp E
2ff2 k= 1
The density function of C„ is
1 / n n(x — m)2
9{x) — exp
2a2
hence the conditional density function of the random vector (£t, . . ., £„) for £n = x is
1
7-=-exp
(cr j2ji)n~l j n 2^ <** “ *>’
This function does not depend on in; a property which is expressed by saying that
t„ is a sufficient statistic for the parameter m.
25. a) Let there be given n independent random variables £u with the same
normal distribution N(0, a). Put
£=~ y, ^ n k=1
and t = x & - o2-
k=1
{k = 1, 2, n). Put
n
Then — / n C and
E
/=
1
= E^ A= 1
+Ek—l
02.
hence
t= E 1=2
rf-
We know (cf. Ch. IV, § 17, Exercise 43) that rju...,rj„ are independent normally
distributed random variables with expectation 0 and standard deviation a; hence
£ = and r = Y rjf
V" ;=2
are independent; r has a /^distribution with (« -- 1) degrees of freedom.
b) Let ..., £„ be independent random variables with the same normal distribution.
Let the expectation [jl and the standard deviation o of the £k be independent random
variables on a conditional probability space, p. being uniformly distributed on the
whole real axis and a logarithmically uniformly distributed in the interval (0, + Co).
Put
C =
£l + £t + ...+JJL and T = £ (4 _ 0£.
k= l
Hint. The density function of the vector (p, o2) with respect to the condition
£ = x, r = z is, according to Bayes’ theorem and the result of Exercise 25 a)
n— 1 z n(x - ,m)2]
z 2 exp exp
n ' 2o2 2 a2
2n
2 2 on+'r
X • LL
thus o and - are independent.
a
If r) is the indicator of the event B (0 < P(B) < l), it follows from (1) that
hence, by Theorem 3 of § 7,
00 P(A )
I 1 1^1 A„) - E(V)r < DHv).
n=l 1 - P(A„)
Thus
lim
+CO
,
1 r\A„) Wv I An) - E(v)]2 = 0,
which proves (1).
Remark. Cf. Ch. VII, § 10, Theorem 1.
E
n= 1
P(A„) =
Let
denote the event that infinitely many of the events A„ occur simultaneously. Show
that P(B) = 1.
Hint. Let C be any event with 0 < P(C) < 1. Like in Exercise 26, it follows from
Theorem 3, § 7 that
Apply (1) to C — Ck — ^ A„. Obviously, P(Ck) > 0. It follows from (1), in view of
n=k
P(Ck | A„) = 1 for n~> k, that P(Ck) = 1; hence P(Ck) — 0. Since B = ]~| Ck, we
*=i
have B = ^ Ck and hence P(B) = 0, which is equivalent to P(B) = 1.
k=\
28. Let £ and r] be arbitrary random variables, f(x) and ^(x) Borel-measurable
functions such that
and
or to put it otherwise, suppose that R(u{0» t>0?)) assumes its maximal value for u = f
and v = g. Then the following equations hold with probability 1, where A = \p (£, rj)\
E(Ki) | v) = te(rj) 0)
and
hence also
Hint. We have
On the other hand, if /*(£) = E(g(^p\^ £ujgjs E^f*(£)^ = 0 and Z>(/*(0) — 1> then
E(nsM,» = = D,
we conclude that D- < Hd, rj). Hence Z)2 = ip2(Z, rf). Since in Schwarz’ inequality
equality holds only in the case of proportionality, we must have E^g(rj) | £) = A/TO
which proves (2). But
E(mE(g(rj) | 0) = A£ (/2(0) = A,
hence A = i/<£, i?). Equation (1) is proved in a similar way.
29. With the notations of Exercise 28 we have
£(/(0 I gTO) =
and
1/(0) = A/(0.
Hence the regression curves of £* = /(0 with respect to ??* = g(rj) as well as that
of rj* with respect to i* are straight lines (or, as it is expressed, the regression of i*
and r]* is linear).
300 MORE about random variables [V, § 9
30. Let L\ be the set of all random variablesM) such that f(x) is a Borel-measurable
function with £(/(£)),— 0 and £(/2(c)) is finite. If we put
AM) = E(E(M)\r]M).
Show that Af(f) = /,(£) belongs also to L\ and the linear transformation AM) of
the space L\ is positive and symmetric, i.e. it fulfils the relations
CHARACTERISTIC FUNCTIONS
E(Q=\CdP, (1)
h
which implies
E(0 = E(0 + iE(r1). (2)
e( fi q=n=1 %)•
k=1 k
w
If A(x) = a(x) + ib(x) is a complex valued Borel function of the real
variable x and £ is a real random variable, further if the expectation of
£ = A(£) exists, then the latter can be calculated by
It is easy to prove that for every random variable with complex values
|£(C)|<E(|C|). (5)
= PkeitXk- (3)
k=i v y
First of all, let it be noted that every distribution function has a charac¬
teristic function since the Stieltjes integral (2) exists always, in view of
1 eUl | = 1.
If £ assumes positive integer values only, with
=k) = pk (k = 0,1,...),
we see that
G’«(z) = Z Pkzk
Ar = 0
I 9>s(01 S £( I ew |) = 1.
Consequently,
1 Vsih) - (7)
From
1 _ eitt 11 < JL for 11, | < A and \t2 — tx | < =-• <5,
3
hence
9>„(0 = eibt
Proof.
Theorem 4. If tlf t2 ,... , tn are arbitrary real numbers and Zj,z2,... ,zn
arbitrary complex numbers, further if cpft) is the characteristic function of
a random variable c, and if z — x — iy is the conjugate of the complex
number z = x + iy, then we have
n /7
<pf-t)=E(e-iS') = E(eiil).
Vsi+c,+...+«■ (0 = k11
=1
v&CO-
Remarks:
1. Theorem 6 expresses a property of the characteristic functions which
exhibits their successful applicability to probability theory. Indeed, the
distribution of a sum of independent random variables is the convolution
of the distribution functions of the individual terms; the calculation of this
convolution is in most cases rather complicated. On the contrary, Theorem
6 allows a very simple calculation of the characteristic function of a sum
of independent random variables from the characteristic functions of its
terms, as it is just their product. Further, as we shall see in § 4, from the
properties of the characteristic function the properties of the corresponding
distribution function can be deduced.
<Pb+dO = <Pdt)<PdO
the independence of ^ and £2 does not follow. Let for instance be
gL = = £, where £ has a Cauchy distribution: (pft) = e~ul
306 CHARACTERISTIC FUNCTIONS [VI, § 2
I' I A I c7F(a)
— 00
+ 00
y'fij) =J — 00
lAe'*' dF(x);
in particular
<Pt(P ) = iMv (12)
By iterating the operation we obtain
0=1 l/wMI dx
1 It suffices to assume the finiteness of Ck; this implies the finiteness of Cu ..Ck_.
Cf. S. Bochner and K. Chandrasekharan [1], p. 29.
VI, § 2] BASIC PROPERTIES 307
we obtain
+ 00
1 n (0 ,< c‘ (17)
' M* '
Since by assumption ]fik\x)\ is integrable on (-oo, +oo), (14) follows
from (16) by Riemann’s lemma concerning the Fourier integral.1
Inequality (17) is obviously of interest for the study of the behaviour
of yfit) for large values of 111.
Remark. According to Theorem 7, the “smoothness” (differentiability)
of (pfit) is determined by the behaviour off(x) for \x\ + oo ; by Theorem 8
the “smoothness” of fix) determines the behaviour of 9off) for |t| -> 00 .
The two theorems are therefore in a certain sense dual.
1
lim sup (19)
«-►+00
h
« Mn (tty
<p4(0 = I —— (20)
«=o n-
q>ft) is even a holomorphic function in the whole band \ v\ < R of the complex
plane t = u + iv.
Proof. If the assumptions of the theorem are fulfilled, 9oft) is, because
of (11), arbitrarily often differentiable at the point t = 0 and we have
<p(lf(0) = inMn. From this (20) follows immediately.
Because of (13) for every real t0 and every n
\cpf\t0)\<M2n. (21)
+ 00
| qf-+» (to) |<J \x |2n+1 dF(x) < jM2n M2n+2 < Min +2M2* + 2. . (22)
— 00
1 y(w) (to) 1
lim sup <- (23)
«-► + 00
n\ R
Mn — E(bf) (n — 1,2,...).
= 1 for n = 0, + 1, + 2,... ;
if E, does not have a lattice distribution, we have |<p£t) \ < 1 for every t ± 0.
4- oo
2nn '
Vs = E Pk =i•
+ oo +00
j eKux-*)dF(^ = 1 = j dF(x),
hence
+ 00
| [1 - cos (t0x — a)] dF(x) = 0.
2kn a
Since 1 - cos (toX - a) is positive except for x = —-h — (A = U,
to 'o
±1,. . .) (for which values it is equal to 0), all jumps of F(x) must therefore
2k a
belong to the arithmetic progression dk + r with d = —— and r — —.
to ‘o
<p&) = 'LPkVtk(f)-
k
dz.
— oo L
■x _ zy x* |r|
j j e 2 dz \ <e 2 j e 2 du (1)
x—it o
implies
x _ _zy
lim | f e 2 dz | = 0. (2)
|x|-»oo x-it
Hence
+ 00
1
(3)
v'/2tz
— OO
and consequently
(0 = e “ . (5)
1
9^(0 = X j e x(X u) dx = (6)
■-4
sin At
At
+ 00
0ixt
9>«(0 dx — e (7)
= —
71 1 + x"
xi=i
k
& =1
X2
2
(1-2/0 . 1
<p&Q = e dx —
J\ - 2it
o
(One has to take that branch of the square root which is equal to 1 for
t = 0.) Hence Theorem 6 of § 2 leads to
Since the unicity Theorem lb follows from the inversion Formula (1),
it suffices to prove the latter. Before beginning the proof we have to make
first some remarks. It was pointed out in § 2 that cp(-t) = cp(t). Thus if
Re{z} denotes the real part of the complex number z, (1) can be rewritten
in the form
1 f f e~ita - e~ilb 1
m - m = 2^-) Re mo —Tt— dt. (2)
— 00
e-‘‘a_ e~itb
The real parts of q>(t) and of-are even functions, while their
it
imaginary parts are odd functions. Therefore the same holds for
e-na _ e-‘‘b
m = m —-—
u
(3)
hence by (2)
+T
e~ita - e~i,b
F(b) — F(a) = lim <?(0 dt. (5)
r-*oo 2n it
-T
1
7(0 dt. (6)
In it
— 00
But if this integral exists, its value is by Formula (5) equal to F(b) - F(a).
For the proof of Formula (2), we need two simple lemmas.
Lemma 1. Put
314 CHARACTERISTIC FUNCTIONS [VI, § 4
Proof. If we put
sin u
S(x) = -du,
7t u
we have
S(a, T) = S(xT). (10)
Put
sin u
-du,
-tJ u
then we have
sin u
cn=(-iy n I mz + u
du (n — 0,1,2,...); (11)
«-i
2 r sin u
S(X) — Yj ck~I- du for nn<x<(n+\)n (12)
k=0 71 u
„, . 2 (' sin u ,
5(oo) = — -du — l. (17)
n J u
Lemma 2. Put
+T
1 j' sin t(z — a) — sin t(z — b)
D(T, dt (18)
-T
and
+ OO
Proof. Since
Re cp(t) dt =
2n it
+ 00 +00
On the other hand, since a and b are points of continuity of F(x), we have
by Lemma 2
+ 00
F(b) - F(a) = f D(z, a, 6) JF(z). (23)
— 00
In order to prove (2) it suffices thus to prove that the order of integration
may be reversed in the right hand side of Formula (22). The difficulty is
that the integral (19) representing D(z, a, b) is not absolutely convergent.
But by Lemma 2 we know that D(T, z, a, b) — D(z, a, b) tends unifoimly
to zero on the whole real axi", except for the intervals a — 5 < z < a + 5
and b-5<z<b + 5, where 5 is a small positive number. Furthermore
on these intervals |D(T, z, a, b) \ < 2. Since a and b are continuity points
of F(x), we have
+ 00 +00
I D(T, z, a, b) dF{z) =
— 00
+T +co
sin t{z — a) — sin t(z — b)
dF{z)\ dt. (25)
t
T -oo
(26)
— CO
VI, § 4] SOME FUNDAMENTAL THEOREMS 317
J
— 00
I <7(0 I dt (27)
exists. Then
F(jc + h) - Fix - h)
fix) = lim
A-0 2/i
+ CO
sin th
lim I —~t— [7(0 + qp(—0 eitx (28)
A-0 47T J
sin th
[99(f) e itx + 9f—t) eltx] ^ 2 | 99(f) |,
th
the limit and the integration can be interchanged according to the theorem
of Lebesgue, hence
+ CO
It is easy to show that the integral figuring on the right hand side of (29)
is a uniformly continuous and bounded function of x. This leads to
hold, hence
+A
f*
+A
elxt d[Fn (x) - F(x)} | <
—x
Now |Fn(x) — F(x) | < 2 and according to the theorem of Lebesgue limit
and integration can be interchanged, hence the right hand side of (37),
and by (36) cp„(t) — <p{t) too, tend for n -a oo uniformly to zero if [t| < T.
Thus we proved that the condition of Theorem 3 is necessary.
We show now that it is sufficient as well, i.e. that from (32), with cp(t)
continuous for t = 0, follows (31). According to a well-known theorem
of Helly every sequence {Fn(x)} possesses a subsequence (F^x)} that
converges to a monotone nondecreasing function F(x) at all continuity
points of the latter.
We show first that this function F(x) is necessarily a distribution function.
It suffices to show that F(+oo) = 1, F(— oo) = 0, and that F(x) is left-
continuous. This latter condition can always be realized by a suitable
modification of F(x) at its points of discontinuity. Since F(x) is a limit
of distribution functions, we have always 0 < F(x) < 1. Hence it suffices
to prove that F(+ oo) — F(— oo) = 1. First we prove the following formula:
+
1 f 1 — cos xt , „ ,
[F„O0 - F„ (-+)] dy - — - - <P„(0 dt if x > 0.
~2 (38)
.) n J t
In fact
+ 00
, N 1 f 1 — cos xt ,N ,
d„(x) = — - - <P«(0 dt = -2
n J t
+ 00 +00
1 — cos xt
cos yt dt dFn(y) (39)
— 00 —CO
+ co
1 — cos xt
dt = | x |. (40)
r
— CO
0 for x < y.
320 CHARACTERISTIC FUNCTIONS [VI, § 4
Hence by (39)
i r \_cos xt
Fn(x) - Fn (-x) > — -3- Ut) dt, (43)
TC J Xt
— CO
or
+ 00
i 1 — cos u
FJL*) ~ U-x) > ~ du. (44)
ft u
Suppose that x and —x are both continuity points of F(x) and that n runs
through the sequence {nk }. Then from the theorem of Lebesgue concerning
the interchangeability of the limit and the integral it follows that
+ 00
1 f 1 — cos u
F(x) - F(-x) > (45)
— 00
(T > 0 fixed arbitrarily) follows from the already proved necessity of the
condition of Theorem 3. Herewith our theorem is completely proved.
Let us add some remarks.
F„(x) =
II for x > n,
0 for x < n.
For every finite x, lim Fn(x) = 0, nevertheless <pn(t) = emt does not tend
rt— oo
, , sm«t , , , .
then (pit) =-, and thus the limit
nt
1 for t = 0,
(f(t) =
0 otherwise,
thus (pit) is not continuous for t = 0. The sequence Fn(x) converges when
- a + 1) - — ilk + D) - (* = o,i,...).
We know that
* 1 n2
(47)
to (2n+\f “ 8 ’
hence
+ 00
x P({ = 2« + 1) = 1,
n= — co
and we find
8 * cos2«+l )t 2 11
^(0 = -r I for 111 < n, (48)
71 Mto (2« +t 1)
, \2
7t
P(*1 = 0) + +f P(r, = 4« + 2) = 1
/7= — 00
2 11 n
*,(0-i- + 4r£ c~<4* + 2>‘ - 1 - for \t\<~. (49)
7T A: = 0 (2k + 1) n
Vi (0 — (0 for u | . (50)
The function 9^(0 is periodic with period n. Let the real axis be partitioned
into subintervals
2k- 1 2k 4- 1
n < t < —• n (k = 0, + 1, + 2,...),
VI, § 5] ON THE NORMAL DISTRIBUTION 323
then we see that the functions cpft) and cpft) are identical on intervals
with an even index k and are of the opposite sign on intervals with an
odd index k.
Theorem 1. If E, and >] are independent random variables with the same
distribution and finite variance, further if £ + rj and £ — rj are independent,
then £ and q are normally distributed.
Now cpft) can never be zero. In fact, if for a value t0 we would have
(n = 1,2,...). (6)
The right side of (6) tends for n -a oo to <5'(0), i.e. to zero. Hence
<5(/) = 0. (7)
<K2o=mr (8)
This leads to
t
~2T ) (9)
1
t2 t 2
T"“J
The right hand side of (9) tends to —since i/i(0) = ip'(0) — 0, ip"(0) —
— — 1. Hence
o
\[f(t) =-—- and (f{t) - e 2
£+ Y]
By assumption, the characteristic function of —-=— is also equal to
v 2
cp(t). By Theorems 3 and 6 of § 2, however, the characteristic function of
£ + n . 2
— i=— is <r , hence
x/2 In/2
From this follows, as in the proof of Theorem 1, that y(t) ^ 0 for every t.
If we put again In (p(t) = )K0> then 'KO is twice differentiable,
W-4]
m \ 22 I
(12)
t \2
j” xdF(x) = 0, +f x2 dF(x) = 1.
x—m
If the family of distributions IF , a > 0, is closed with respect
to the operation of convolution, i.e. if for any real numbers mx, m2 and for
any positive numbers oq, cr2 there can be found constants m and o (m real,
a positive) such that
x — m1 x — m2 x — m
=F (13)
326 CHARACTERISTIC FUNCTIONS [VI, § 5
then
X
(14)
— 00
+,00
<p(t) = | e'xt dF(x),
— oo
then
For oy = o2 = —7= (15) reduces to (10); hence Theorem 2' follows from
V2
Theorem 2.
Theorem 2' explains to some extent the fact that errors of measurements
are usually normally distributed. In effect, the condition, that the sum of
two independent errors of measurement belongs to the same family of
distributions as the two errors themselves, cannot be fulfilled, in the case
of a finite variance, by other than normal distributions. The condition that
F(x) should have finite variance is necessary for the validity of Theorem 2'.
Thus, for instance, for the distribution function
1
arc tan x
71
(16) follows easily by taking into account that the characteristic function
of F(x) is equal to
as pointed out above. There exist, however, other stable distributions, e.g.
the Cauchy distribution. Stable distributions will be dealt with in § 8.
We deal now with some further remarkable properties of normal distri¬
butions. If £ and r] are independent normally distributed random variables,
their sum £ + r] is, as we know already, normally distributed too. We shall
now prove that the converse of this statement is also true: this result is
due to H. Cramer.
9>«(i)%{t) = e 2 . (17)
If F(x) and G(x) denote the distribution functions of t; and rj, respectively,
we have
-f OO +oo
We show now that the definition of cpft) and (pn(t) can be extended to
all complex values of t, so that (pft) and (p.ft) are entire functions of the
complex variable t. Let us first suppose t = iv (v real) and let A and B
be any two positive numbers. We have
and
+ 00
<pn (iv) = j e~vydG(y). (20b)
— 00
The definition of cp^t) and cpn(t) can thus be extended to every complex t.
It is easy to see that cp^(t) and cp^t) are holomorphic on the whole complex
plane, hence they are entire functions of t.
Because of (17), (p^t) ^ 0, cpn(t) # 0 for every t. Hence lnq9^(/) and
In 99,(0 are entire functions too, where that branch of the logarithmic
function is to be taken, for which In 1 = 0. If a > 0 and b > 0 are such
+ 00
j
-a\v\
and
+ 00
M
9\ O' ») = e~xv dG(x) > (23)
Hence, for t = u + i v,
v*_
.2
r+%1 14? +b\t\
I Vt 0) I ^ Vi 0 v) = < 2e <2e (24)
V« O' V)
Similarly, we obtain
HI’
+ a|r|
Vn 0) I— (25)
1
Re(ln <p{ (0) | = In < 1121 + max (a, b) \ 11 + In 2. (26)
I Vi (01
We have 9^(0) = 9\(0) = 1; furthermore we may suppose without
restricting generality, <^(0) - <^(0) = 0. Indeed if we would have <^(0) = a,
and consequently, <pn(0i) — — a, we could always consider instead of £
and rj the random variables £ — a and t] + a whose characteristic functions
satisfy the above conditions. From this we conclude that the functions
In g>{ (0 In <p (/)
and are everywhere holomorphic, furthermore
are, because of (25) and (26), bounded on the whole t-plane. According
to a well-known theorem of H. A. Schwarz the relation
2ji
/(z) = to (/(0)) + A.
J Mrt"*))**^*
0
f Reie 4- z
(27)
holds for | z | < i? for every function /(z) holomorphic on | z | < R. It follows
from (27) that -—^ ^ and ^—^',a^ ^1 are bounded on the whole plane;
n (0 = exp (2 8)
1
<N
1
and
r t2 ]
n (0 = exP (29)
2o\
h (/*(<))" = *■ ‘2 , (30)
k=1
then it follows from Cramer’s theorem that the functionsfk(t) (k = 1,2,,.., r)
are characteristic functions of normal distributions. In fact, if N denotes
the common denominator of the numbers ax,. .., a,, we have
r Nt‘
n(/*«=*-*',
k=l
(jo
where Nak (k = 1, 2,..., r) are integers. Hence
_ M2
/*(')«» (<) = «' ", (* = 1,•■■,'■), (32)
where we have put
fcw-wwr-'noswr1. j*k
gk(t) is also a characteristic function, hence by Cramer’s theorem fk(t)
330 CHARACTERISTIC FUNCTIONS [VI, § 5
Proof. The following proof is due to Yu. V. Linnik and A. A. Singer [1].
It consists of five steps.
Step 1. Put
(k = 1, 2,..., r).
(9k(t))*k = e~ 2 (34)
k=1
holds if 111 < 5. Furthermore^ (t) is a real and even function. If we prove
from (34) th&t gk(t) is the characteristic function of a normal distribution,
then the theorem of Cramer implies the same conclusion for fk{t) . It
lollows from (34) that^t) ^ 0 for ] t| <5, hence we may take the log¬
arithm of the two sides of Equation (34):
r
1
Z afc In (35)
k=l 9k (0
Let Gk (.x) be the distribution function corresponding to the characteristic
function gft). It follows from the assumptions concerning gk{t) that Gk(x)
is symmetric with respect to the origin; hence we have for any a > 0
+ 00 + fl
9k (0 = J COS tx ■ dGk (x) < 1- j (1 - cos tx) dGk (x).
— OO —a
VI, § 5] ON THE NORMAL DISTRIBUTION 331
n
Since for | / j < —— the relation
2a
l
holds and since for 0 < x < 1 we have jc < In , it follows from
1 — X
Tt
(35) for a > —— that
25
+a
r C t2 n
E (1 - cos tx) dGk (x) < — for \t\< (36)
k=l J £ 2a
—a
_ j
E
k=1
.f
-a
dGk (X) < 1. (37)
71
Since (37) holds for every a > , the integrals
25
f x2 dGk (x) (k r)
— 00
-i«kA
k=1
(0) = Z % T-2 daK W = I •
k=l —co
(39)
z = H(y), y = h(t)
and if we put
1 dv h(t)
h0(t) = h(t), hv(t) = (v= 1,2....),
v! df
then we have for every integer p
dpz ^ d’H(y)
(40)
dtp dyl h ^2 • • • ■ • 4 '
E 4 =z and
j=1
E 44
7=1
= p>
E 4 4 E 4 4 ~ ^<7*
7=1
~
7=1
exist. Suppose that this holds for a given integer q; from this it follows
that gk(t) (& = l,... ,r) is exactly 2q times differentiable and that
[ill
d2q
U J = (2?)! E E (-!)'(/-l)!Efl
h 2«
9k (04
•1
!J (42)
k=l 1=1 7=1
E
k=1 a tin
2q
= (2?)! E
k=1
E (-l)'-'(' - 1)! [SbW - S«(0)],
/=2
(43)
with
"
■5«(o=En—(0:,,!
7=1 4!
VI, § 5] ON THE NORMAL DISTRIBUTION 333
7=1
= Z
7=1
hh = 2<l'
We show now that the right hand side of (43) has the order of magnitude
0{t2) when t -* 0. In fact if v < 2q is an odd number, then by the induction
hypothesis g^\t) — 0(\t\). Hence it suffices to consider terms for which
all the lj are even. If v is even, v < 2q — 2, then we have
- rt’Ho) = 0(t2),
9 kif)
- 1 - 0(t2)
9k (0
+f x2q+2dGk(x) ik = l,...,r)
exist and this means that gk(t) (k = 1,,r) is at least (2q + 2) times
differentiable. As we know already that the integrals
f x2dGk(x) (&=1,...,/•)
— 00
Step 3. We shall show now that the gk(t) are holomorphic in a circle
11! < R (R > 0). In order to show this we have to evaluate the order of
334 CHARACTERISTIC FUNCTIONS [VI, § 5
&t (0 = 9k
2q\ d2q
I 7?°(0)...yP(0) (45)
lx + ... + //• = 2q ~dt^
/!
It
*i!
n
7=1 V
(46)
where in the inner sum the summation is to be taken over the i}-s and
Ifs such that Yj ij = v, Y hb = ^ Because of
it follows that all nonzero terms on the left hand side of (45) have the
sign of (-1)9. The right hand side is
1
H2q 00 =
(48)
(2qf.
VI, § 5] ON THE NORMAL DISTRIBUTION 335
Thus
(-i)g
*M0) =
q\ 2q
Since on the left hand side of Equation (45) there occur the terms
2qcckg{^\0) too, the relation
(2qy(2q)l
^(0)<
Hi 2q
must hold, wherefrom
f (Q) | (49)
lim sup
q~*~ oo l (2q)\ I
Step 4. We show that the functions gk(t) are also entire functions. Put
hk{t) = gk i Jt. Since gk{t) is an even function, hk(t) is holomorphic in
the circle 11 \ < — . Suppose that not all gk(t) are entire functions; then
e
the same holds for the functions hk{t). Let hko(t) be the function hk(t) which
has the smallest radius of convergence, which radius we denote by R.
Take 0 < r < R; put k(t) — hk(r + t). Then
r+t
n
k=1
arm e 2 (50)
and
r
n ’. 3T4V)
^k(t)Yk_c
J
(51)
3Pk (t) is thus holomorphic in the circle 11 \ < —, i.e. hko(t) is holomorphic
336 CHARACTERISTIC FUNCTIONS [VI, } 5
Step 5. The proof of Theorem 4 can now be finished like that of Cramer’s
theorem. If we choose the numbers ak > 0 (k = 1, 2, .... r) such that
Ok
then
1112
^\9k{t)\<~-+ C\t\,
Z(Xk
Proof. Linnik has shown that the proof of this theorem can be reduced
to that of Theorem 4 as follows: Take ax = a2 = ... = a, = 1, which
does not restrict the generality, rjy and rj2 are by assumption 'independent
hence we have
E(ei(«>h + v,h_)) __ E(eiun^ E(em*y
11 + M = fl <pk(bkv). (54)
k=1 k=1
In a neighbourhood of the origin the cpk(t) are not all zero. Put therefore
M) = In (pk{t), then
We prove now that ij/*(t) = -ckt2. This means that (pk(t)(pk(-t) is the
characteristic function of a normal distribution and the proof of Theorem
5 is finished by Cramer’s theorem.
Multiply the two sides of (55b) by x — u and integrate from 0 to x, thus
X“
Z ( (x - u) <A*0 + bkv) du = ~2
Z 'l/t(bkv) | +
k=1 J k=l
If we put
x+buv
where
A(v) = £ bk\jj*(bkv).
k=1
n =e
where
<Pk(x) = = <Pk(*)
Since \ | ^ 1, the relation B"{0) < 0 must hold. Equality cannot hold
here, since then the £k would be constants with probability 1. Hence we
can put 5"(0) = — o2 < 0 and we have thus
«= E & and *1 = Yj 5*
k=1 k=1
are independent.
— =Z^2- tf2 = I
k=i n jt1 j=2
which shows the independence of £ and ?/.
2. The condition is sufficient. We may assume E(^k) — 0. By the assump¬
tion of the theorem
E{em+vn)) = E(eiui) E(eiv”). (62)
If we differentiate on both sides of (62) with respect to v (which is allowed
because of Theorem 7 of § 2), and substitute v = 0 afterwards, we obtain
E(neiui) = E{eiui) E{r,) = (cp(u))n E(r\), (63)
where
<p(u) = E(eiuik)
is the characteristic function of the random variables £k. From
n-1
1
rj = 1 - Z «!-— Z= 1 j<k
k=l
Z £A (64)
E(t;keiu^)=-icp'(u) (65)
and
ml em‘) = - ■?», (66)
(63) can be written in the form
= 0~ l)ffa(9»(“))" (67)
The left hand side of (68) is the second derivative of In cp(u). If we integrate
twice and consider that 0(0) = iE(gk) = 0, we find
G 2U2
\nrp(u) — — (69)
For sake of brevity we write F(x) = F(xx, x2,. . . , x„). We define the
characteristic function of £ by
+ 00 +00
<P«(0)=1.
2. For every t
1^(01 ^ 1-
3. qft) is a uniformly continuous function of t.
4. If
d2 <p&)
6. = - E(£j£k) when Egfa) exists.
dtjdtk t=o
<Pt(t) = (ipkei,k)N-
/c = l
% = Z cJk£k + mi
k=1
(J = 1,2,..., n). (3)
bhj — Z °kchkcjk-
k=1
(8)
Z Z
h=li=l
bhjth{j
is positive definite and the matrix B = (bhj) is the inverse of the matrix
A = (ahj), the elements of which are the coefficients in the expression of
the density function of ;/. The matrix B is thus the dispersion matrix of 17.
In fact, a simple calculation shows that the matrix B can be written in the
form B — CSC-1, where S is the diagonal matrix with elements erf.
On the other hand, we have proved (cf. Ch. IV, § 17, Exercise 42) that the
density function g{y) of t] with y = (ylt . . . , y„) is given by
J n n
n n
IZ avzkZj
h=1y=l
is positive definite and its determinant is denoted by \A\, then the characteristic
function of q is given by (7) where m = (mx , , m„) and where (bhj) — B —
— A~x is the dispersion matrix of q.
1 n -—t-
k=1itu
<pwdt> (io>
-T -T
Z «f = 1,
A:=l
PROOF. Let be any line passing through the origin with direction
cosines ak (k = l,... ,n), and put a = (al5. . . , a„). If £a, rja, are the
projections of >/, £ upon da, we have C« = + ?7a- Hence because of
the independence of £ and rj:
Proof. It follows from the assumption that the random variables eitkik
are also independent, hence
If on the other hand * - (Xl,. .., x„) and if F(x) is the distribution
function of Fk(x) the distribution function of £k(k = 1,... ,ri), further if
C(*) ^ FI Mxk)>
VI, § 6] MULTIDIMENSIONAL CHARACTERISTIC FUNCTIONS 345
then by (15) it follows that the characteristic functions of F(x) and G(x)
are identical. Hence, because of the uniqueness, F(x) = G(x) and the
random variables are independent.
As an application of the preceding theorem we prove now the following
theorem due to M. Kac (cf. Ch. Ill, § 11, Theorem 6).
” E^Xit.f E(rt)(it2y
Wi) = L ' and <pn(t2) = £
k=0 k\ 1=0 /!
for every complex value of t1 and t.z. If we put £ = (<£, q) and t = (tx, t.,),
it follows from (17) that
00 00
E(erjl)(ih)k (it-))1
<p&) = E Z 2>-
k=01=0 kin
Hence the theorem is proved.
Theorem 3 of § 4 may also be extended to the case of higher dimensions:
furthermore, for any A > 0 the convergence in (19) is uniform for \ t\< A,
where \ t | = sjt\ + ■ ■ - + t„ .
N\
P(£ni — kl, •• •■> £Nn — k„) — ■ r>kl
Pi Dk" :
Jrn (20)
kf
£nj ~ NPj
hNj — (21)
s/NPj
In (pnN(t) tjJPj) +
7=1
» I
( Ui
+ Ain 1 + I Pi - 1 (22)
7=1 lexp «
By substituting the expansions of ex and ln(l + x) we have
in effect the random variables r\Nk are connected by the linear relation
n _
X
k=i
>iNk \/Pk = Q;
X Xks/Pk =
k=1
o.
where y and a > 0 are real constants and G(u) is a nondecreasing bounded
function. (Formula of Levy and Khinchin.)
If the distribution has a finite variance, (1) may be written in the form
+ 00
dK(u)
In 9ft) = imt + cr2 (e'“f - 1 - iut)- (2)
1 This second definition is not completely exact, since it is not certain that the
£,k-s can in fact be realized on the probability space in question. However, if the words:
“for a suitable choice of the probability space” are added, then the second formulation
becomes correct and equivalent to the first one.
348 CHARACTERISTIC FUNCTIONS [VI, § 7
*(,o = |0
[ 1
u~h■
for u > h,
then
(p{t) = exp {X(eilh — 1)}.
§ 8. Stable distributions
\ x — mx' x — m2 x—m
* F —F (1)
1 °2
= (pfiJ) eiynt
i
with qn> 0; hence [^(t)]" (n = 1, 2, . . .) is again a characteristic
function.
For a detailed study of stable distributions we refer to the books cited
in the footnote of the preceding paragraph. Levy calls only those
nondegenerate distribution functions F(x) stable, for which to any two
positive numbers cx and c2 there exists a positive number c such that
F(cxx) * F(c2x) = F(cx). Distributions, which we called stable above,
are called quasi-stable by Levy. It can be shown that a distribution with
characteristic function cp(t) is stable in the sense of P. Levy, iff In <p(t)
may be written in the form (3) with y = 0 for a # 1 and /? == 0 for a = 1.
Thus the following result is valid:
where c0 is positive.
Co
(6b)
In particular, if cx = c2 = 1, we have
r (0 = 9>(?0- (8)
It may be supposed q # 1, since # = 1 would imply, because of <p(0) = 1
and the continuity of (p{t), the relation (p(t) = 1 and y(t) would then be
the characteristic function of the constant zero. Furthermore (8) implies
<p{t) 9^ 0 for every real t: since if q>(t0) = 0 could hold, (8) would lead to
h
(n = 1,2,...) (9a)
q
and
<p(qn t0) = 0 (n = 1,2,...) (9b)
which is impossible both for q > 1 (9a) and for q < 1 (9b) in view of
9?(0) = 1 and the fact that cp{t) is continuous at t = 0.
Thus i//(t) = In (f {t) is continuous too, and i/^(0) = 0. Let cl5 c2,. .., cn
be any n positive numbers; according to (7) there exists a c > 0 with
£ Wckt) = 'i'(ct)-
fc=i
dn)
If we put c -, we obtain
c(w)
— = 1A (11)
m
= (12)
We show now that c(r) is uniquely determined and the relation
for every t, (13) holds for any two positive rational numbers r, s. Let now
9 be a rational number q > 1 and t a real number such that # 0.
Then
'l'([c(q)]nt)=qn\l>(t). (14)
for every n, which contradicts the continuity of i/i(t) on every finite interval.
c{r) f r )
SmCe~c(yr “ C tI ’ r > S imPlies CW > C(J)- The function c(r) is thus
increasing. For every irrational X > 0 we define c(A) by
are valid for every positive value of A and /t and that c(A), as a function
of the real variable A > 0 is increasing. Put
and
jp
c(x) — x “ , for x > 0. (20)
This leads to
XaiP(t) = iKXt) (21)
for every real t and positive A. Thus for t > 0 we have
Put \J/(\) = — (c0 + ic]l). Since <p(—t) = cp{t), and thus also i//(—t) = \p(t),
we obtain for every real t
H0 = Cl 1 f |“, : (24)
Because of \<p(t) \ < 1, c0 > 0 holds. It remains to show that 0 < a < 2.
But a > 2 would imply 9/(0) = 0 and hence D%E) = 0, and thus £ would
be a constant. Herewith (6a) is proved.
for any nonnegative integers k and N. If f(x) £ C and if a, b are real, and
A is a complex number, then A/(x) £ C, f(ax + b) £ C, and f(k)(x) £ C
(k = 1,2,...); furthermore f(x) £ C, g(x) £ C imply f(x) + g(x) £ C. Let
K denote the set of all infinitely often differentiable functions g(x) such
that
g(k\x) = 0(\ x |^*) for | x | 00 and k = 1,2,...,
where the numbers Nk are integers. It is easy to see that f(x) £ C and
g(pc) £ K imply f(x) • g{x) £ C.
A sequence of functions {/„(X)} (fn(pc) £C; n = 1, 2,. . .) is said to be
regular, if for every //(*) £ C the limit
exists. Two regular sequences {/„(+)} and {gn{x)} are said to be equivalent
if for every h{x) £ C
+ 00 +00
f
— 00
F(x)h(x)dx
by
+ 00 + 00
J F(x)h(x)dx = lim j f, (x) h(x) dx, (4)
— 0° «-<- + oo —‘oo
where on the right hand side the limit exists by assumption a d remains
the same when {f„(x)} is replaced by another sequence equi' dent to it.
If F(x) ~ {/„(*)} is regular, the sequence {Affax + 6)} s evidently
regular for any two real numbers a, b and for any complex number A
We put therefore AF(ax + b) ~ {Af„(ax + £)}. If F(x) ~ fn(x)} and
G(x) ~ {gfx)}, the sequence {fn(x) + gn{x)} is again regul vr; wo put
F{x) + G(x) ~ {/„(*) + #„(*)}• Finally if {/„(*)} is regular and < g{x) 6 K,
the sequence {/„(*) g(x)} is regular. For F(x) ~ {/„(x)} we put
F(x)g(x) ~ {/„(*) • g(x)}.
{/«(*)} is regular, {/„'(*)} is also regular because if h(x) £ C we have
+~° +00
+ °0 +00
f' w ~ iax)}.
Thus we have
+ 00 + 00
j F' (x) h(x) dx — — [ F(x) h'{x) dx. (6)
(cF(x))' = cF'(x),
(F(ax))' = aF'(ax).
Example. Let us put
n nXl
"2
fn (x) =
2n
If h(x) 6 C, it is clear that
+ 00
lim j fn (x) h(x) dx = h{0).
n-+ oo —oo
Proof. f(x) = 0(|jc| k) for every integer k, hence the integral (7) exists
for every real number t; further
+ oo
J 1/00 I -\x\kdx
exists too. The function <p(t) is thus infinitely often differentiable and
+ 06
i'N
<K0 = | fN)(x)ei,xdx (N= 1,2,...). (9)
The integral on the right hand side exists, since f{x) £ C, hence y(t) =
= 0(|t|_iV) for [t| -> + co and for every integer iV. By (8) the same holds for
(p{k\t) since (ix)kf(x) £ C, hence Theorem 1 is proved.
The function rp(t) is the Fourier transform of f(x).
+ 00
+ 00
Proof. The preceding theorem guarantees the existence of j’ | and
— 00
<P(0= J f(x)eiudx,
— 00
00
7(0= j g(x)eitxdx,
VI, § 9] CONDITIONAL PROBABILITY DISTRIBUTIONS 357
then we have
\ + 00 + CX"
Proof. By Theorem 2
+ 00
Hence
+ 00 + 00 + 00
+ 00
1
27z
J
— 00
7iO<P(~Odt
+ 00
F(t) = \ F(x)ell3Cdx. (12)
—*00
J
+ 00 +00
+ 00 +00
Proof. Put
+ 0°
n f iiZi -n(x-yy
2^J Me" e 2 d>'-
— 00
+S° +oo
hm j fn(x)h(x)dx= f fix) h(x) dx,
n-*oo — oo —oo
q.e.d.
VI, § 9] CONDITIONAL PROBABILITY DISTRIBUTIONS 359
Let f{x) CD and let Fix) ~ {/„(*)} be the generalized function corre¬
sponding to it (in a unique manner) according to Theorem 5. Write
fix) ~ Fix).
Let now £ be a random variable on a conditional probability space with
density function fix') and assume that fix) £ D. The characteristic function
<Pft) of £ is defined by
with Fix) ~ fix) in the sense of (12); +*(?) is thus a generalized function. If
j fix) dx = 1
— 00
then <pfit) ~ Qft)-, the definition of Fft) is thus consistent with the
definition of ordinary characteristic functions 9oft). In fact, in this case
(pft) is continuous and j yft) | < 1, hence cpft) £ D. It suffices thus to
prove that for every lit) 6 C the relation
+ C0 +00
Furthermore, by Theorem 1,
By the definition of Fix), the right hand sides of (18) and (19) coincide;
we get thus (17).
360 CHARACTERISTIC FUNCTIONS [VI, § 9
hence for the generalized function F(x) corresponding to f(x), the relation
x2
F(x) ~ {exp is valid. Since
2n
+ 00 _ X2 _ nt2
| e 2" eltx dx = JInn e (21)
we obtain
+ CO
n nt2
<p{(0= F(x) e,xt dx ~ In 2tt5(0. (22)
In-
+ 00 • ;kx--*F _ n(fc + Oa
j e ’ 2“ eix,dx = Jinn e 2
— CO
\ I n - )
e 2 | = 2n5(k + /).
+r°
lim j Fk (x) h(x) dx = F(x)h(x)dx. (23)
k-t-oo -oo _oo v '
VI, § 9] CONDITIONAL PROBABILITY DISTRIBUTIONS 361
+ 00
#(0= J F(x)eitxdx.
— 00
Cn = £l + £2 + • • • + (25)
Proof. Let fn(x) denote the density function of C« and a the standard
deviation of the fix) < M implies fn(x) £ D. We show that for every
Kx) 6 C
+ 00 + 00
lim er
«-*- 00
J fn(x)h(x)dx =
1
2n
J* h(x)dx. (27)
-00
Relation (27) proves the theorem. Indeed if it holds for every h(x) £ C,
let then be h{a, b, e, x) a function of the class C such that
and
0 for x < a — s,
+ 00
J fn (X) KC + e, d — £, £, X) dx
Since
f fn(x)dx
P(a<Cn<b\c<{n<d)= JL- (31)
J fn (X) dx
+ 00 <1<L< + 00 (32)
J h(c, d, £, x)dx I h(c + £, d — £, £, x)dx
I, I \ i I x ® “f- ^ i t (x b
h(a, b,e,x) = k\-| — k
where we put
for x < 0 ,
X
1
J exp dt
0 -1(1 -0 .
*(*) = for 0 < x < 1,
J exp dt
-o
for x > 1.
VI, § 9] CONDITIONAL PROBABILITY DISTRIBUTIONS 363
where
/ = lim inf P(a < < b | c < C„ < d)
and
L = lim sup P(a<Cn<b\c<Cn< d).
When e 0, the first and the last member of the threefold inequality (32)
, b — a
tend to ———, hence (27) implies (26).
Let now be F„(x) ~f,(x) and let <ph(t) be the Fourier transform of.
o Jlizn fn(x). By Theorem 6 it suffices to prove that $n(t) -> <5(0, where
<P„(t) is the generalized function corresponding to cpn(t) (n = 1,2,...)
and <5(t) is Dirac’s delta.
Put
+ °°
<P(t) = j f(x)el,xdx.
We see that cp„(t) = a Jinn (pn(t). We have to show that for every x(0 6 C
one has
+ OO
The proof can be carried out by means of the method of Laplace (cf. Ch.
Ill, § 18, Exercise 27).
By Theorem 11 of § 2 we have | cp(t) | < 1 for t ^ 0; furthermore by
Theorem 8 of § 2
lim cp(t) = 0.
|/|- + CO
Hence there can be assigned to every s>0ag = q(e) with 0 < q(e) < 1
such that | 99(0 | < q{s) for 111 > e. But then we have
a
J >B
95" (0 x(0 dt <oJn[q{E)]n
J
— OO
\x(t)\dt. (34)
+ EG In
r
U
2 r u | u
1 f du.
l +1/ X
VS Jexp ~ T . Cyjn j,
—ea/n
Hm ° J
—£
^^ dt = <'35')
(35) and (34) lead to (33).
Let us remark that the assumptions of Theorem 7 can be considerably
weakened.
The product of two generalized functions is generally not defined. The
way which would seem quite natural to follow leads astray: the regularity
of {/„(*)} and {#„(*)} does not, in general, imply the regularity of
{f„{x)gn(x)}. Just take as an example
fn (*) = 9n O) =
+ 00.
— 00
+ 00
F(x) h(x) dx — j* h(x) ddF(x) (36)
S
— 00 ‘—00
VI, § 9] CONDITIONAL PROBABILITY DISTRIBUTIONS 365
for every h{x) £ C, where on the right hand side figures an ordinary Stieltjes
integral. Then the Fourier transform of F(x), will be considered as
the characteristic function of the random variable £.
Example. Suppose that £ is uniformly distributed on the set of the integers,
i.e. the distribution function of £ is given by [x] ([x] represents the integer
part of x, i.e. the largest integer smaller than or equal to x). In this case
x
k = — oo
m=k= — x oo
(4°)
where x(0 is the Fourier transform of h(x). (40) is Poisson's well-known
summation formula. In particular, if h{x) = exp (—x"/!2), then
Jn
x(0 = exp
X
+ 00 /XT + =° _
£ e-k'» = \rY Y e . (41)
k = -cc ^ k = - co
and their variance finite. Suppose further that the greatest common divisor
of the values assumed by - £2 with positive probabilities is equal to 1. Put
— £i + £2 + • • • + £n (n — 1, 2,. . .), then for any two integers k and l
P(Cn = k)
lim = + 1. (42)
«-*- 00 P(Cn = l)
Hence when n -> 00, the distribution of '(n tends to the uniform distribution
on the set of integers.
+ 00
<p{t)= £ P^x = k)eikt (44)
k= — co
+n
Since by assumption 99(0) = 1, 9/(0) = 0, /'(0) = -a2 and | <p (t) | < 1
for 0 < | t | < n, the method of Laplace (cf. Ch. Ill, § 18, Exercise 27)
leads immediately to the result.
This result can be rewritten in the following manner:
/-+ °° +00
for every h(x) £ C; hence o J2nn cp\t) tends for n 00 to the generalized
function (39). Thus if Fn(x) is a generalized function such that
Y __+00
_J Fn(x)h(x)dx = oj2nn £ P(£„ = k) h(k),
00 k = — 00
§ 10. Exercises
<Pt (0 = £ PK e‘Xn<
n= 1
is absolutely and uniformly convergent, hence it can be integrated term by term.
Furthermore, since for every nonzero real number x
T
■ lim ( eixt dt — 0,
T —► 03 J
-T
3. Prove the theorem of Moivre and Laplace by means of the result of the preceding
exercise.
Hint. By Exercise 2
For \k — np\ = 0(ji3 ) the method of Laplace leads after some calculations to
(k - np)
i.(< (pe" + qf e~iM dt =
Jlnnpq
exp
2 npq ]+0!v)-
4. Prove the following characteristic property of the normal distribution: Let
F(x) be an absolutely continuous distribution function, let F\x) = f(x) and
+ 00
J x2f(x) dx = 1.
If we put
+ °>
we have
< In ^2ne,
_ 1
where equality holds only for /(a) = (2ji) 2 exp . Hence //(/(a)) assumes
its largest value in the case of the normal distribution. (In information theory the
number H(f(x)) is called the entropy of the distribution with the density function/(a);
cf. Appendix.)
5. If £"(() exists, then we know that rp^t) is differentiable at t = 0 and tp'e(0) = iE(Q.
Show that the differentiability of cp?(t) does not necessarily imply the existence of £(£).
Hint. Put
6. Let £ be a random variable and AT. = £(|£|a), a > 0. Suppose that M is finite.
Show that if 0 < < a. ,
<(MJ\
bq
Hint. For positive a and b p > 1, q = —?— we have2 ab < — + —.
p — 1 Q
Apply this inequality with
a -
A ’ 6“ * - 7•
N\
Pklk*-kr = ~k[\ k2l .. . krl ^ • • • •P&r (*/ ^ 0, X k, = N
when A-a oo, with £ Pn, = 1 and lim NpNj = \ (J = 1, 2, . .., r - 1).
i— 1 TV —► co
(i + IX (^-i))",
/■=i
lim
N-+a>
(1 + X PnM'’ ~ '))'V = II exP
7=1 7=1
~ W
<KS) =7 — 00
1*1 dF(x).
= 1 j" 1 -Re(y(Q)
d(5) dt. (1)
Hint. From
co
follows
CO CO
L f 1 - Re (vO) ± _ 11 C ( C 1 — cos xt 3
if -J (J— CO —CO
——dt)
dt dF(x).
1 - Re (cp(t))
J dt.
b) If we add to the assumption a) the other one that the variance of £ exists, we
have further
= _± ( Re (y'(Q)
Re(y7(o; d!
n J t
9. Let £t, f2) be independent random variables having the same distribution
which is symmetric with respect to the origin and has variance 1. Consider the sums
t„ = £i + £2 +...+ £„• Show that
d(U
D(Cn)
370 CHARACTERISTIC FUNCTIONS [VI, § 10
Hint. If cp{t) is the characteristic function of the random variables £„, we have
95C„//«(0 = <P
V"
Since cp{t) is real for every real t, we obtain, by taking into account Exercise 8,
d(C„)
d(C„) Jn
Jn f
f n_ ,, , <P'
cp («)
iu
du.
DiZn) *1 .) U
10. If <?(/) is the Fourier transform of the generalized function F(x), the Fourier
transform of F{ax + b) is
itb
1 - ™ _(t\
—e 0 — .
a I 1 a)
11. With the same notations, the Fourier transform of F'(x) is — it0(t).
12. a) With the preceding notations, the Fourier transform of x"F(x) is (—i)"0<n\t).
b) If the conditional density function of i is x2n (n = 1,2,...), the (generalized)
characteristic function of £ is 2ti(—1)” <5t2n>(r), where 8(t) denotes Dirac’s delta
(cf. § 9, p. 355).
13. a) Let £lf £», be independent random variables having the same normal
distribution, E(^k) = m, l = — +... + {„), and
^= ./ ~ 02 + (^ - 02 + • ■ • + (4 - 02
n - 1
Show that
T = a/” ^ ~
I'
fc= *
- ^,i>)2 + X («* - Ar=/i-fl
S =
n + /7i — 2
Show that
^ - <f<2> / ~nm
t=
s \ n + m
14. Prove that the following property characterizes the normal distribution: If
f(x) ( —00 < x < + °o) is a continuously differentiable positive density function
such that for any three real numbers x, y, z the function
x y -f* z
has its maximum at t = -- , then
fix) = exp
yjlna 2a2
9 + g = o.
, , fix)
gix) - -ttt = Cx (C = constant).
fix)
Hence by integrating.
Cxi
fix) = A exp
15. If and are independent random variables having the same nondegenerate
distribution with finite variance and if the random variable a£,x + b£2 (0 < a < b < 1;
a2 + b2 = 1) has again the same distribution, then this distribution is normal with
expectation zero.
1 1
2x
fix) = X ix > 0)
V 2n
il/it) = —
= 4.1| (p(u) du
20. We know (Ch. IV, § 10) that the quotient of two independent N (0, 1) random
variables has a Cauchy distribution. Show that this property is not characteristic for
the normal distribution. If £ and rj are independent, have the same distribution of
zero expectation and if — has a Cauchy distribution, then it does not follow that
V
£ and rj are normally distributed.
1
fix) = v/2 i- '< x < +°°).
n 1 + a4
\y\dy
dix)
n2 ) (1 + y4)i 1 + *4y4) Ji(l + a2)
— 00
1 fix) is the density function of £ 2; where E, is V(0, 1); this distribution is some¬
times called the “inverse normal distribution”.
2 Cf. R. G. Laha [1].
CHAPTER VII
P(\£-M\>XD)<-^r. (1)
Ma = E(\Z-M D (2)
P(\(-M\>W)<d[L-. (3)
t + In o# (e)
P k > M + < e~ (6)
In order to get the sharpest possible bound we have to choose e such that
t 4- In M(e) , ...
the expression-is minimal or at least nearly minimal.
£
§ 2. Stochastic convergence
holds for every positive e however small, then the sequence gn (n = 1,2,.. .)
is said to converge stochastically (or in probability) to zero. If the random
variables £„ (n = 1.2,...) fulfil the relation
for any fixed e > 0 we shall say that the sequence Cn(n = 1,2,...) con¬
verges in probability (or stochastically) to the constant a and indicate this by
lim st („ = a (3)
or by
a. (4)
VII, § 2] STOCHASTIC CONVERGENCE 375
P (5)
Pd
P(\tn-P\>s)< (6)
ns
if now n tends to infinity, the expression on the right of (6) tends to 0, which
proves the theorem.
The definition of stochastic convergence can also be given in the follow¬
ing form: the sequence £„ (n = 1, 2,. . .) converges stochastically to the
number p when to every pair s, 8 of positive numbers (however small) there
can be chosen a number N = N(s, 8) so that for every n > N
0 for x < p,
lim F„ (x) = (8)
fl-*-oo
1 for x > p.
or
(12)
It is easy to prove the following
hence
P(C„ < x) < P(C„ <x\An) + P(An). (13b)
But
lim P((„ < x) < P(£ < x + e) for every e > 0. (15)
n-*-co v 7
P(C„ < x) > P(zl„) P(C„ < x \An) > P(C < x - £) - P(i„). (13c)
Since e can be chosen arbitrarily small, (15) and (16) imply the statement
of Theorem 1.
VII, § 3] GENERALIZATION OF BERNOULLI’S LAW 377
4+4+•••+4
5
n
lim st
4 + £2 + • • • + 4
n
= E(£k); (1)
n-*- co
i.e. that the empirical mean tends in probability to the common expectation
of the %k. It is easy to show that this property remains valid for arbitrary
independent identically distributed random variables with finite variance.
D2
jP(|C„-M|>e)< ,2 ’
ne
lim -
n—oo ft k=l
lim —- = 0,
n— oo ft
where
V k=i
Sn=jYDl
we have1
lim — = 0. (3)
1 "
k
n k=1
the relation
C„am
is valid.
1 This condition is certainly fulfilled e.g. if the random variables ^ (or at least
the numbers Dk) are uniformly bounded.
VII, § 3] GENERALIZATION OF BERNOULLI’S LAW 379
C*n=~ i (4 ~ Mk).
11 1
Taking into account that £(£*) = 0 and D(0 —-, we obtain the re¬
ft
lation
lim st C* -0.
n-*-co
Now
1 n
r*n = 'on
r - r - I Mk
n k=\
and by assumption
n
I Mk-M
follows
lim st ((„ — M) = 0. (4)
The assumptions of the above theorem can still be weakened. Instead of the
pairwise independence of £k it suffices to assume that there does not exist
a strong positive correlation between most pairs. More precisely, the follow¬
ing theorem holds, due essentially to S. N. Bernstein:
lim — £ Mk = M;
n—oo ft k= l
lim — Yj R(k) — 0-
oc n k=1
380 LAWS OF LARGE NUMBERS [VII, § 3
1 "
Then £„ = — Y £k converges in probability to M:
n k=i
(5)
if this is done, the remaining part of the proof can be repeated word by word.
We prove therefore (6). We have
hence
DtDi+k<Sl-
j=i
C2 9 c2 1
Z)2(C„) < -i- + —^ (8)
n n „
n k=1
Hence by condition b)
K
D\U <-+2K
n
n k=i
Thus we have
Theorem 4. Let 4 be pairwise independent and identically distributed
random variables and suppose that the expectation
mk) = m (9)
exists. Then for
i "
c„ = — 14
n i
one has
(10)
Proof. Without restricting generality we may assume M = 0. Put
+k
—
n
i mt)=fi I *dF(x). n fc=i '
(12)
—k
Since by assumption
+ co
we can write
(13)
lim — Yj E(£t) = 0-
n-+~oo ^ /c=l
D\ek)<E{0= t x2dF(x),
—k
hence
+n +
-n -jn \x\>Jn
and consequently
lim -4 £ ^(G) = 0-
(15)
„~co » fc-1
382 LAWS OF LARGE NUMBERS [VII, § 3
If we put
«* + £ «).
n *=i fc=,-+i
Theorem 2 implies
lim st £*r = 0.
#Q<
Hence for any £ > 0
C„I > <0=P(IC„ I > «,C*, # O + />(IC„I > «,c = £„) (17)
and thus
lim sup P( | Cn | > £) < <5. (18)
lim
n-*- oo
p( I C„ I > e) = 0,
which was to be proved.
Remark. When the random variables t;k are not only pairwise but com¬
pletely independent, Theorem 4 can be proved by the method of character¬
istic functions. We have seen in § 2 that the stochastic convergence of C„ to 0
is equivalent to the convergence of the distribution function Fn(x) of C to
the degenerate distribution function D0{x) of the constant 0 (i.e. to Tfor
-r > 0 and t0 0 for x < 0). Because of Theorem 3 (Ch. VI), § 4, it suffices
thus to show that the characteristic function cpn(t) of C„ tends, for every t,
to the characteristic function of the constant 0, i.e. to 1. If <p(t) is the charac¬
teristic function of the random variables then by assumption y'(0) = 0.
<p(t) - 1
£(t) = (19)
then
lim e(r) = 0. (20)
1-0
VII, § 3] GENERALIZATION OF BERNOULLI’S LAW 383
and that these conditions are not only sufficient but necessary as well for
1
t.
"
n k=1
then
n
Cn =
k=\
has the same Cauchy distribution as £k, can be interpreted as follows: When
we take a sample from a population with a Cauchy distribution with den¬
^f(e) = E{ec^~M)),
t + In (e)
P £>M + < e -t it > 0). (1)
t + In (e)
It was already observed that the choice of a minimizing
a
makes the inequality (1) as sharp as possible.
We prove now the following
e2 D2 eK
In (b) < 1 + 0eK
Proof. Since
(2)
n
-#(£) = f] E(e^k),
k=l
VII, § 4] BERNSTEIN’S IMPROVEMENT OF CHEBYSHEV’S INEQUALITY 385
co n zn
or-ik 8 Qk
= 1
n=0 n\
and the are bounded, the series is uniformly convergent and the expecta¬
tion of eE'k can be calculated term by term from the power series. Thus we
obtain
2 r)2 oo p()in
£(«*'*) = 1 + ip. + £ (3)
2 „T3 n\
E(£l)<DlKn~\
we obtain
n-2
n =3 n\
As
1 1 1
<
n\ 6 (« — 3)!
1 00 (eK) n —2 1
eJT
E(eEik) < 1 + e2 Dl < 1 +s2Z>2
T + „?3 »! T + _6
which leads to
1 eKesK
£(«*) < [I l+e2D2
fc = l T+ 6
Because of 1 + * < ex, we obtain
r e2 D2 eKeeK
(e) = E(eEi) < exp 1 +
/ £2 D2 eKe E K ~l
t+ 1+
< e -t (4)
386 LAWS OF LARGE NUMBERS [VII, § 4
Put
s/21
£ - (5)
D
Then (4) leads to
(6)
XK _X?_
XK ^
£ > XD 1+ < e 2
* (7)
Jd
XK
Thus if X is large and - small, we obtain a much sharper inequality than
that of Chebyshev’s.
If we apply the obtained result to — we find that
XK
XD
Z\>XD 1+
~6D
< 2e (8)
XK
e~b < e < 3.
UK
< H < X 1 +
2D,
«-*(€) = I [&-£(&)]•
fc=i
VII, § 4] BERNSTEIN’S IMPROVEMENT OF CHEBYSHEVS INEQUALITY 387
■ P2
M \ > pD) < 2 exp 2 (10)
2 [i+f|
- 2D
-
In this formula
n "
M = Y, Mk and j YDt,
k=1 k=1
D
while p is a positive number such that p <
1c
Let us apply now this result to the case where the £k have a common
distribution. Let Mx be the expectation of the £k and D\ their variance. Then
n
the expectation of the sum £ = £ is equal to nMx and its variance to D'\n.
k= 1
£>, /-
It follows from (10) for p < yjn that
K
PW„-p\>i)<f^.
388 LAWS OF LARGE NUMBERS [VII, § 4
1 1
Thus e.g. for p — q £ Chebyshev’s inequality guarantees
20 ’
the validity of
1 1
P >- <- (13)
20 100
only for n > 10 000, while by using (12) we find that (13) holds already for
n > 1283.
1
P > <
50 loo
is valid for n > 62 500; while applying (12) we see that it is valid already
for n >7164.
In these examples s > 0 and 5 > 0 were given and we wanted to estimate
the least number n0 = n0(s, 5) such that for n > n0
P(|£„-p|>e)<<5
9 in ~
-gfi2-=«o (e, <5).
Then
P(\£n-P\>s)<S. (14)
\ 2
2 pq 1 + In —
2 pq d
n>
| £
Since 2pq < —— and 1 -\- — ~7~ ^*or 0 < e < pq, 04) is always
2 2pq
(2)
390 LAWS OF LARGE NUMBERS rvn, § 5
then
P(AJ = 0. (3)
for every n. Because of (1), the right hand side of (4) tends to zero as n -* + oo,
hence we have (3).
If the An are completely independent, (1) is not only sufficient, but also
necessary in order that with probability 1 at most finitely many of the An
00
should occur. If £ P(A„) = + oo, thenP(Af) is not only positive but equal
n=1
to 1. Thus we have
P(AJ = 1. (5)
Proof. Evidently
oo oo
Am ~Yj 11 Ak , (6)
n=1 k=n
(7) and (6) imply P(AJ = 0, hence (5). But for N > n
00 _ N N N
The series £ P(Ak) being divergent, the right hand side of (8) tends to zero
Lemma C. If Ai, A2,. .An, . . . are arbitrary events, fulfilling the condi¬
tions
f;P04„)=+oo (9)
n=1
and
ti p(Ak A,)
lim inf - =1, (10)
(£p(Ak))2
k=l
then (5) holds; thus there occur with probability 1 infinitely many of the
events A„.
I I if A„ occurs,
0 otherwise.
D~ ( afc)
p(\i«k-ir(.A)\>£ p( ~— (ID
‘-1 j=i ‘-1 t'tiw)'
i
Now E(otkcc,) = P(AkAi), hence
If we put
n
tl
1
dn = P Z P(Ak) (14)
I «fe<
,k = l 2 k=1
we have
lim inf d„ = 0. (15)
«-*■ GO
392 LAWS OF LARGE NUMBERS [VII, § 6
It follows from this that one can choose an infinite subsequence of posi¬
tive integers nx < n2 < . . . < ns < ..., such that
00
ni j n>
X ^ TX P(Ak),
k=1 z k=1
00
except for a finite number of values of j. Thus by (9) the series £ ak is di-
*=i
vergent with probability 1, which proves our statement.
The just proved lemmas will serve us well in proofs dealing with improve¬
ments of the law of large numbers.
§ 6. Kolmogorov’s inequality
Theorem 1. Let the random variables rjl5 rj2, ...,rjn be completely inde¬
pendent, put further E(rjk) — Mk, D(r}f) = Dk. If e is an arbitrary positive
number, we have
k X Dl
P( max | X (nj - Mj) | > e) < __ . (1)
1 <,k<,n j=1 £“
Proof. Put r,* = rjk - Mk, ^ ^ r,f (k = 1,2,.. ., n). Let further Ak
denote the event that Ck is the first among the random variables A, Ca
Cn which is not less than e, i.e.
The right hand side of (3) becomes smaller if the term k — 0 is omitted from
the summation. Hence
rt n
Y.Ol>Y. P(Ak)E<ll\At). (4)
k=1 k=1
Let us now consider the conditional expectation E(£% | Ak). From £„ = (,k +
n
Now, by the definition of the events Ak, we have | C& | > s whenever Ak
occurs; hence E{C,k \ Ak) > e2 and thus by (8)
E(ejAk)>s‘.
t Dl > e2 £ P(Ak),
k=1 fc = l
P( lim tjn — 0) = 1.
n-+ao
lim st qn = 0. (3)
n-+ oo
Proof. We show first that (2) follows from (1). Let e > 0; let A„(e) denote
the event sup \ qm\> s and C the event lim rjn = 0; put further Bn(e) —
tn^>n n~*- co
00
= CAn(e). Then B„+1(e) c B„(s) and the set Bn(e) is obviously empty.
11 = 1
P(BM) = P(A„(e)).
holds as well. Conversely, assume that (2) holds. Let D(e) denote the event
lim sup | rjn \ > e (e > 0, arbitrary).
VII, § 7] THE STRONG LAW OF LARGE NUMBERS 395
proved.
Then
/’(lim — M) = 1. (4)
are simultaneously fulfilled with a probability > 1 — <5 for e > 0 and 5 > 0
however small, if the index n is larger than a number n0 depending on e
and <5.
Proof of Theorem 1. We consider
If the inequality AN > e is fulfilled for an N such that 2s < N < 2S+1, then
A2i 2i+ i > 6 is fulfilled for at least one l > s. Hence
CO
2/+1 D2 2D2
P( max k | Ck ~ M \ > e2‘) < 2- -- (7)
1^A:<2'+1 £ *•* £2-2'
x 2D2 ” 1 4D2
^ - e) - ^2 Ys ~ ^272? • (8)
l =S
N
If N -> oo, it follows from (6) and from 2s > — that
2
lim P(d„ > e) = 0.
N-+CO
C„ = 4- i (& - AQ,
n k=l
then
P(lim C„ = 0)= 1. (9)
n-»co
1 "
= —If*
n fe=I
VII, § 7] THE STRONG LAW OF LARGE NUMBERS 397
P( lim C„ — M) = 1.
n-+ oo
It follows from AN > e, 2s < N < 2S+1 that d2;)2;+i ^ £ for at least one
l > s; hence
oo
P(AN>e)<YJP(M 2m>£). (10)
l=S
1
P(d2,j2;+i > e) <P( max k\£k\>e • 2l)<
22/e2
E
k=1
Dl
l<,k<2,+ i
Hence by (10),
1 oo i 21+1-1
o i=s £ k=1
Now it can be shown that the right hand side of inequality (11) tends to
• £ Dl
zero as n increases (hence as N increases too) provided that the series 2^
k = l /v
is convergent. To show this we need the following lemma due to L. Kro-
necker.
00
Lemma 2. If the series E ak is convergent and if qnis an increasing sequence
k=\
of positive numbers, tending to + co for n-> Co, then
1
lim -- E akHk = 0. (12)
«-► oo din k=1
398 LAWS OF LARGE NUMBERS [VII, § 7
small positive number) large enough in order that n > n0 should imply j rn j <
1 " 1 "
— I ak <lk = — E rk (qk ~ qk-1) ” rn +i- (13)
1in k=l Qn k=1
1 ^ j Aq 2e
I akqk\<—^ + —
<ln k=l qn 3
Choose now nx > n0 such that —— < -—. Then for n >
qni 3
1 "
— E < 8,
cln k = 1
E^
lim -*=1. = 0;
hence the right hand side of (11) tends to 0 as iV -> oo. This and Lemma 1
lead to Theorem 2.
e* = f I^ k,
[ 0 otherwise
£**
Sfc
_ p _ Cat
— Qic
p* •
VII, § 7] THE STRONG LAW OF LARGE NUMBERS 399
Put
1 n 1 n 1 «
v 1 v-i e r** _ V k**
9n 2-i 2-i „ 2u •
n k=i n k=i n k■ = '1
Let F(x) denote the distribution function of the Since we have assumed
M = 0, we have
P( lim £* = 0) = 1, (14a)
Theorem 2 applies to the random variables ££, since it is easy to show that
00 D\Zt)
Z>(£*) exists and that X < + * oo. Indeed
k=1 k2
Dxek)<E(.et*)=f
—k
*?dF(x).
Hence, because of
1 1 1
k 2 “ Jr* Kki -1) r
we have
-7 + 1
+ °o
<2 f | x | dF(x).
— 00
Hence (14a) must hold. Now consider the random variables £**. We have
n — 1
lim lim Zn~ = C - C = 0.
n-*- oo n n-+oo n
Thus in
> 1 holds with probability 1 only for a finite number of values
+ 00
hence j | x \ dF(x) exists and M = E(£k) exists, too. Hence by the first part
Then
P(lim An= 0) = 1. (1)
Af-oo
1
An < max (A$, A$) + (2)
M ’
where
= max ! Fn (xM>fc) - F(xM<k) \,
1 <.k^M
and
P( lim Fn(x + 0) = F(x + 0)) = 1.
N-*- co
This particular form shows clearly that the strong law of large numbers
and Glivenko’s theorem have a definite meaning even for the practical
402 LAWS OF LARGE NUMBERS [VII, § 9
case when only finitely many observations are made. In fact, always when
a large sample is studied, this theorem is implicitly used; hence it has the
right to be called the fundamental theorem of mathematical statistics.
On the other hand it must be noticed that Glivenko’s theorem does not
give any information how N0 figuring in (4) depends on £ and 5. This ques¬
tion will be answered by a theorem of Kolmogorov dealt with later on
(cf. Ch. VII, § 10).
The strong law of large numbers can be still further improved. To get
acquainted with the methods needed for this, we give here first a new proof
of Theorem 1 of § 7 concerning bounded random variables. The proof rests
upon the Borel-Cantelli Lemma A.
Let £l5 £2,. . ., ... be independent and identically distributed bounded
random variables, and suppose \£n\ < K. Suppose further E(£n) = 0,
1 "
and put D = D(£n) and £„ = — Y £&• Then, according to Theorem 1 of
n k=1
§ 4, we have the inequality
P(\C„\>e)<2q" (1)
D2 a2
for 0 < e < ——■-, where q — exp 2 . From (1) follows
K f eK
2D2 1 +
. ID2 /
the convergence of the series
(2)
n— 1
Then
(4)
Ck = £i + £2 + • ■ • + £k — kM.
Then
(5)
are fulfilled. Let j?*. denote the event — Cl- > — 2 and yl the event
£n > x — 2 ,/nD. If both ^4^ and Bk occur, A occurs as well. The events
Ak (k = 1,2,...,/?) mutually exclude each other, thus the same holds for
the events AkBk; the events Ak and Bk are evidently independent since Bk
depends only on the random variables £fc+1,. . ., b,n and Ak depends only
on h,. . ., £k. Since A1B1 + . . . + AnBn £ A, the independence of Ak
and Bk implies
n n n
Y.P(Ak)P(Bk)=Y.p(AtBt) = P(YiAkBk)<,P(A). (6)
k—\ k=1 k=1
(8)
Thus
(9)
404 LAWS OF LARGE NUMBERS [Vir, § 9
In order to show that with probability 1 at most finitely many of the events
Ak(e) occur, it suffices, in view of the Borel-Cantelli lemma, to show that
00
k=kk = V +sf
V Nk +1
we conclude from (12) by applying Theorem 1 of § 4 that the following
relation holds if k is large enough:
9
kk
P(Ak 00) ^ y exp (13)
2 fl + ]
JJ
Since lim kk
k-+co 72 In InNk and lim = 0
there exists a number k0 depending only on e such that for k > k^we have
kl
^ (1 + e) In In Nk > (1 + £)ln (k In y).
HkK
2 1+
VII, § 9] THE LAW OF THE ITERATED LOGARITHM 405
The series ^ T(TA.(e)) is thus convergent for every positive £ and according
k=1
to the Borel-Cantelli lemma,
hn (15a)
lim sup_ < 1 = 1.
«-<» yj2n In In n
_ Jn_ (16a)
P lim sup
/!-*■ 00 yf 2n In In n
and
( Cn~P I (17)
P lim sup < 1 = 1.
n-*- co 2pq In In n
n
406 LAWS OF LARGE NUMBERS [VII, § 10
/ Ch~P
lim sup
2pq In In n
n
/ <s
\
P lim inf- . <sn ~ P = -1 (18)
j n-»oo / 2/R/ln In n
\ V /
Proof We use elements of the theory of Hilbert spaces. Let be the set
of all random variables £ for which E{?) exists. Put (& q) = Etfq) and 11 11 =
7 ^2 - ^ is then a Hilbert space. Let ct„ denote the indicator of the event
1 for co£An
an = V-n ifO) -
0 for co£A„ (» = 0, 1,...).
If/J is the indicator of B and if cc„ - d = y„ we can write (1) in the form
hm (fi, y„) = 0,
(3)
VII, § 10] SEQUENCES OF MIXING SETS 407
We show that (4) implies (3) for every p £ 3tf (hence not merely for the p
which are indicators of sets).
Let denote the set of those elements of ^ which are linear combinations
of the yn or limits of such elements, in the sense of strong convergence,
that is 8n -* 8 means that lim || 8n - 8 || = 0. In other words, 3t*x is the least
B P3tf,. In fact, in the latter case there exists for every a >0ay = £ ckyk
k=i
with I] ft — y || < a. Because of Schwarz’ inequality and of
we have
By (4) lim (y, y„) = 0, thus lim sup | (P, y„) \ < a, since for a > 0 there
n-*-cc «-*-oo _
can be chosen any positive number however small. (3) is theretore proved
for every p 0 3f x.
Let now 3T2 be the set of elements 8 of 3? such that (8, y„) = 0 for
n = 0,1, ... . 3t2 is then the subspace of 3? orthogonal to 3f For
P 0 3ft? 2 (3) is trivial. Now according to a well-known theorem of the theory
of Hilbert spaces1 P £ 3f? can be written in the form P = Pi + Pi, where
j5x £ 3ft? x and p2 £ 3ft? 2. Furthermore,
hence (3) holds for every P £3?. Theorem 1 is thus proved. As an applica¬
tion we prove now a theorem which shows new aspects of the laws ol large
numbers.
Proof. Choose two numbers a and b such that P(a < £ < b) > 0 and
a, b are two points of continuity of the distribution function of £. Then
P(a — C« < b) > 0 for n > n0. Let A0 = Q and let Ak denote the event
a — C«0 +k+i < b {k — 1,2,...). For k > 1 we have
P(An | Ak) =
for any e > 0 whenever n is large enough. Similarly, for sufficiently large n,
will be called a stable sequence of events} We shall prove first that the set
function Q(B) on the right hand side of (1) is always a measure, i.e. we prove
dQ
— tx(co) (2)
~dP
Let A'n be a mixing sequence of events in the probability space [Q, Px]
with density dx and A"n a mixing sequence of events in the probability space
[f3, ^€,P2] with density d2(0 < dx < d2 < 1); put An = A'nQ1 + A"Q2-
Then clearly we have for every event B £ ^
where
g(5) = d1P(BC21) + d2 P{BQ2).
if to £ Qx,
if co £ i22,
then
2(5) = f ocdP.
i
Thus the sequence of events {A,.} is stable but not mixing, since its density
is not constant but assumes two distinct values with positive probabilities.
Clearly, there can be constructed in a similar manner stable sequences of
events with densities having an arbitrary prescribed discrete distribution.
Now we shall prove the generalization of Theorem 1 of § 10 concerning
stable sequences of events.
exist, then the sequence of the events {An} is a stable sequence of events.
by (£> fi) = E(fq) and || £ || = (£, £)2, respectively. Let a„ be the indicator
of the event An. Let 3?\ be the subspace of the Hilbert space ^spanned by the
elements oq, a2,.. ., oq, . . .; thus Af*x consists of the finite linear combinations
(with real coefficients) of the elements of the sequence {a*} and of the
(strong) limits of these elements. It is easy to see that if £ £ jrx, then the limit
while if £ is the limit of linear combinations of the a.k, the limit (5) exists
again, since
L(0 = (L a),
i.e. the sequence a„ converges to a, in the sense of weak convergence in the
Hilbert space. (A sequence of elements a„ of a Hilbert space is said to con¬
verge weakly to a (a £ if for any element £ £
Theorem 3. Let a„ denote the indicator of the event An and the Hilbert
space formed by the random variables with finite second moments defined on
the probability space [Q,ts£,P]. The sequence of events {A„} belonging to
the probability space [Q, P] is stable, iff a„ converges weakly in 3? to an
element a £ . If the sequence of events {An} is stable and if a„ -*> a, then a
is the density of the sequence of events {A„}.
A stable sequence of events {A,,} is mixing, iff there exists a number
d(0 < d < 1) such that for every event A
Theorem 4. From any sequence of events one can select a stable subse¬
quence.
(1)
whenever < n2 < . . . < nk.
First we prove the following theorem:
hence
lim P(A„) = Pl,
VII, § 12] SEQUENCES OF EXCHANGEABLE EVENTS 413
further by assumption
Pz = ^2 ^i)> if n > 3,
hence
Pz = lim P(A„ A2 Afi = dP(A2 Ak) = d3,
n-*-oo
Pk = dk,
i.e.
P(Ani Ant... AJ = P(AJ P(AJ ... P(AJ,
whenever 1 < ^ < n2 < . . . < nk. But this means that the events An are
independent.
Now we prove a theorem due to B. de Finetti.
Proof. By Theorem 1 the sequence {An} is stable. Let a denote the den¬
sity of this sequence. Then
hence by Theorem 3 of § 11
Similarly,
p3 = P(An Ak A,) if n> k > l,
thus
Ps = lim P(An Ak A,) = j cexk a, dP,
n-<- oo O
hence — by taking the limit first for k -> oo, then for / -» oo, — we obtain
that
P'i = \ot3dP
h
Pk = f *k dP. (3)
n
Let now F(x) denote the distribution function of the random variable a.
Thus (cf. Theorem 6 of Ch. IV, § 11)
which was to be proved. The proof gives, however, somewhat more than
stated by Theorem 2; in fact we have proved the more general
i w-iy
' /'
= (-l )'A'pk+l, (6)
j=0 ,J ,
{-\)lAlpk+l>0 (7)
hold. Sequences of numbers having property (7) are called absolutely mono¬
tonic sequences. Hence an absolutely monotonic sequence is nonincreasing,
its first differences form a nondecreasing sequence (i.e. the sequence is
convex), its second differences form a nonincreasing sequence, etc. Note
that inequality (7) can be obtained from the representation of the sequence
of numbers pk in (2) or (3), since
Hence we can see from (5) that given the sequence pk, the joint distribution
function of a finite number of random variables chosen arbitrarily from
oq, a2,. . ., a„,. . . is given as well; the conditions of compatibility are,
obviously, fulfilled and thus the existence of th e(exchangeable) sequence of
events with the required properties is ensured by the fundamental theorem
of Kolmogorov.
416 LAWS OF LARGE NUMBERS [VII, § 12
«i + «2 + • • • + a„
lim = a. (9)
n-*- oo n
Proof. Let
E (a/t~a)
a-, + a2 + + fc=i
tn =
72 77
oo
Thus the series E -£((«) is convergent, hence (by the Beppo Levi theorem)
n—1
00
the limit lun ~ exists with probability 1 and is equal to the density of the
means of the random variables ock is, in general, not constant — contrary
to the case of independent random variables.
Let us now consider as an example Polya’s urn model (cf. Ch. Ill, § 3,
Point 9). Let there be in an urn M red and N — M white balls. Balls are
drawn at random, the drawn ball is replaced and simultaneously there are
added into the urn R > 1 balls having the same colour as the one drawn.
If An denotes the event that the ball drawn at the «-th drawing is a red one,
according to Formula (10) of § 3 in Chapter III the sequence of events
{An} is exchangeable and
k-1 M + 1R
Pk=
/=o N+1R
n (k= 1,2,...). (16)
IN
r M N-M
R
F(x) = tR (1 ~t) R dt. (18)
M' ' N — M'
F r
R
Thus by Theorem 4 in case of Polya’s urn model the relative frequency of
the drawings yielding a red ball among the first n drawings converges with
probability 1 to a random variable having a beta-distribution of order
M N M prom tkjs jt follows that the distribution of this relative fre-
R ’ R ,
quency converges to the mentioned beta-distribution. In fact, if a sequence
t]n of the random variables converges with probability 1 to a random vari¬
able ri, then (cf. § 7) also r\n tends in probability to rj and thus (cf. Theorem 1
of § 2) the distribution of rjn tends to that of r\. Hence we have
Theorem 5. Let in Polya’s urn scheme vn denote the number of red balls
drawn in the course of the first n drawings, then
N
r R PL _i N~M _i
i.e. the limit distribution of the relative frequency of the red balls drawn is
M N-M
a beta-distribution of order
~R* R
In particular, if M — R = 1 and N = 2, the relative frequency of red
balls will be in the limit uniformly distributed on the interval (0, 1).
Furthermore, it is easy to see that Formula (10) of Chapter III, § 3 is a
special case of the present Formula (5).
As is seen from this example, the general theory of stable sequences of
events permits a deeper insight into some particular problems already dis¬
cussed.
Proof. We need a lemma from set theory. For its formulation the follow¬
ing definition has to be introduced: If o# is a system of subsets of Q having
two properties:
Proof. It suffices to show that if (e^f) is the least monotone class con¬
taining it is identical with o(t^€). Let A be any subset of 12 and ^A the
VII, § 13] THE ZERO-ONE LAW 419
It follows that C 6 ~*r(C), hence P(CC) = P(C) P(C) or P(C) - P2(C). But
this is impossible, unless either P(C) = 0 or F(C) = 1. Thus Theorem 1 is
proved. Finally we mention a generalization of the above theorem.
or P(C) - 1.
The proof will only be sketched, because it is similar to that of Theorem 1.
Let ^be the collection of sets independent of C. As in the previous proof,
420 LAWS OF LARGE NUMBERS [VII, § 14
we show for the least a-algebra L$n, relative to which the random variables
£i> £2> • • in are measurable, that (n = 1,2,...). Hence, accord¬
ing to the lemma, *$(1) £ therefore C £ ^ and, consequently, P2(C) =
= .P(C). Notice that Theorem 2 of § 10 can also be deduced from this.
The series )T converges with probability 1 iff the following three series con-
n= 1
verge:
I £(*£),
n=l
(2)
I ^2(^)- (3)
n=l
Remark. It is easy to see from the zero-one law that r\t, converges
• • n=\
either with probability one or with probability zero.
We show first that the conditions (1), (2), (3) are sufficient. From
Proof.
(1) and the Borel-Cantelli lemma it follows with probability 1 that rjn = tj*
co oo
for sufficiently large values of n; hence the series £ rjn and £ q* are, with
prove this for the series £ <5„, where 5n = rj* - E(n*). We know that the
71=1
and £ Z)2(5«) < + co. Hence for an e > 0, however small, Kolmogorov’s
inequality gives
771 . N
P( max \pk\>t)& y, D\e,k).
n<,m<.N k=n £ k=n
(4)
VII, § 14] KOLMOGOROV’S THREE-SERIES THEOREM 421
Choose now from the sequence of all positive integers a subsequence rij
OO
(«! < n2 < • • •)> such that the series E dj converges, where
j=1
00
dj = Y.
k=nj
I I
k = nj
holds with probability 1 for sufficiently large j and m > nJr If n is an integer
between rij and m, we have with probability 1
m n-1 m
IE4I^I
k=n
E 41 + 1 kE
k=ni =m
41 — 2s. (7)
of denumerably many sets of zero measure being a set of zero measure too)
with probability 1 the relation
m 1
almost sure convergence of the series E V«• If this series converges with
n=l
probability 1, r]n 0 as n -> a. with probability 1. Hence we must have
with probability 1 | rjn \ < 1, i.e. f/„ = rj*, except for finitely many values
422 LAWS OF LARGE NUMBERS [VII, § 14
D~ = D\f), its characteristic function cpft) fulfils for | 11 < —the in¬
equality
DH2
3
<PfO | < e (9)
2.2
D~t < £ (2Mmra ^ p2t2
9>e* (0-1 +
n=3
i.e.
2# 2 2 ,2 r>2,:2
Z>T DU
I 9V(0 I ^ 9?^*(0 — 1 + + 1- < 1 - < e 3
N
tion function of Z 17* converges to the distribution function of if at
n— 1
every point of continuity ot the latter. Hence
OO
n w)=<ko>
n=1
exists an e > Owith \]/{t) + 0 for | t1 < e. Thus if | t \ < min (e, ~rr), then
N 4/1
- I ln I * ft) | tends to — ln | 1p(t) |. Because of (10) we have
n=1
fJE(n*) = Y 0i
n= 1 n= 1
Z E(dln) (12)
n=1
and
Z D% (^) (13)
n— \
1 1
if P Vn = ± - — ’ EQln) = 0, D (f]n) = —, , then the conditions of
n
however, the series E rjn converges with probability 1 for any rearrange-
n=1
ments of its terms; the sum and the set of its points of divergence depend
of course on the rearrangement in question.
00
to a well-known theorem of Beppo Levi, that £ E(\ rjn |) < + oo. On the
11 = 1
00
other hand, it is sufficient that the series (1) and £ E(\ rj*|) converge, where
«=i
)]* is defined as in Theorem 1. This condition is necessary as well, since if
Theorem 1 is applied to the sequence \r]*n\, it can be seen that the con-
00 CO
theses of Theorem 2 are fulfilled; hence the series ]T ijn converges with prob-
77 = 1
ability 1. According to Kronecker’s lemma (Lemma 2 of § 7) with qn = n it
follows that with probability 1
E krlk
1 n
lim
n-*- cg
k=1-
yt
lim — Y 4 = o,
Ti—oo ^ Ar=l
Pn = P(Bn\C) (3)
X Pn = +00 (4)
n =1
and
00 p D2
e < +»■ (5)
“-1 ( Eft
k-1
)2
Define
S,(V)= £ it and JV„(0 = £ 1. (6)
1 <J<<,n 1 <.k<,n
S„(V) is thus the sum of the b,k (1 < k < n) whose values belong to V and
Nn{V) is their number. Then
Sn(V)
P lim =M
72-*-CO Nn{V)
YPj
426 LAWS OF LARGE NUMBERS [VII, § 15
D\8k | C) =
(YpjY
7=1
Y Ek (f* - M)
lim _ c = 1. (7)
Z
k=1
Pk
Put now
£k ~ Pk
Vk= —k-
I/*
7=1
00
By repeating the preceding reasoning for the series Y *1k we find that
E(nk | C) = 0 and
f n
\
Y £k
p lim k=l
-1 c = 1. (8)
n-+ oo
V YPk
k-1 /
[ " \
E ekU
p lim fc=1 — M c (9)
n-+ co
l E £k /
x fc=i
Remarks.
1. If Dk is bounded, e.g. independent of k, the condition (5) is a conse¬
quence of (4) by the Abel-Dini theorem; hence in this case it suffices that
(4) is fulfilled.
00 E Pk I Mk - M |
y
^
*=i_< n
+00 (io)
n=1 v—\
E Pk
k=1
3. If the whole real axis is taken for V, then clearly Bn = C and pn= 1;
. . “ D2n < + 00.
hence (4) is trivially fulfilled and (5) reduces to the condition y —
n=1 72
428 LAWS OF LARGE NUMBERS [VII, § 16
4. Consider now the following special case: Let V be the set which con¬
sists only of the two elements 0 and 1, let further be P(£ = 1 | Bn) = p and
P{£ = 0 | Bn) = 1 — p = q. This situation can be described as follows: &
represents an infinite sequence of independent experiments, A and B are the
possible outcomes of the individual experiments. If at the k-th experiment
both A and B occur, we have t,k — 1; if A and B occur, we have £k = 0;
finally if B does not occur at the &-th experiment, takes on a value
distinct from 0 or 1. Then
M = E(UBn)=P(l;n=l\Bn)=p.
and (4) implies (5). The quotient fn{A \ B) = -*?" ^ is thus the conditional
§ 16. Exercises
1 ”
lim - Yj Mk= M;
n-fco n k=l
lim
n~*- co ^ X
*=1
D‘‘ =°
VII, § 16] EXERCISES 429
Z Z rh x‘ x> ^ c i'Z= l x‘
i=l /=1
cb
for every system of real values x, such that Z xf converges; C is a positive constant.
(=i
Given these conditions.
lim st — Y £k = M.
o « *= i
1 "
hence E(£n) — 0 and £>(?„) = .Therefore the condition lim — Z — 0
n f /~k _ t3
<pn{t) = J~[ cos —^— and lim (r) = e 4 ;
A: = 1 ^ n —► oo
Pttn = ±n6)= y-
Show that the law of large numbers holds for the sequence if 0 < 5 < — •
6. Let the events Au A2, . . ., An be the possible results of an experiment. Let there
be performed N such independent experiments. The probability that the event Ak
occurs exactly vk(N) times (k = 1,2and in a given order, is equal to
n Pltm>
k=l
The quantity H(d) = — ^ P^HzPk is called the entropy of the complete system of
k=l
events d — (Au A2,. . ., A„) (cf. Appendix). Prove the limit relation
vk(N)
lim st ——— = pk for k = 1,2,..., n.
N-+ 03 A
7. Let an urn contain a0 white and b0 red balls. If we draw from the urn a white
ball, we put it back and besides we add to the urn ax white balls and bx red balls.
If we draw a red ball we put it back and add to the urn a2 white and b2 red balls where
a\ + bx = a2 + b2, a2 > 0. The same procedure is repeated after all subsequent
drawings. Let C„ denote the number of white balls drawn in the first n drawings.
Prove the relation
lim st — =
bi +
8 a) Let r]n{n — 1, 2,...) be bounded random variables, |7?„| < C. The necessary
and sufficient condition that rj„ should converge in probability to zero is the fulfilment
of the relation
lim E( \Vn\ ) = 0. (1)
n —► oo
P( \r)„\ >£)<
e
By assumption lim P(A„(8)) = 0. Hence lim sup E(\Vn\) < 8. Since 6 can be arbitrarily
11 1 n-*CO
small, the necessity of (1) is proved.
b) Suppose that lim st £n = c and that/(a) is a Borel-measurable bounded function
n —*■ co
which is continuous at the point c. Then lim £(/(£„)) = /(c).
/!-> 03
Hint. Evidently, lim st </(£„) -/(c)) = 0. Since f(x) is bounded, it follows because
n —co
of Exercise 8.a) that
hence
lim £(/0 = /(c).
9. Let /(x) and ,g(x) be continuous functions on the closed interval [0, 1 ] which
fulfil the relation 0 < f(x) < Cg(x), where C is a positive constant. Then
dx
*. if...+ + dXidXt_I™.
J -J Axi) + Axt) + ■ • • + 9(xn) f-f g(x) dx
g(X)
Mi) + AO + • • • + AO
Vn =
and
.Ml) + fffe) + • • • + Mn)
n
We have thus
1 1
lim st r\n = J /(x) dx, lim st £„ = J g{x) dx,
1
and, since J g(x) dx > 0, we have by the result of Exercise 3,
J/(x) dx
Vn 6
lim st
n —cd (3/7
j gix) dx
q.e.d.
11. Let g(s) denote the Laplace transform of a function fix) which is bounded and
continuous in the interval [0, +°o):
oo
g(s) = J e~sxf(x) dx.
o
Prove the Post Widder inversion formula
(- l)"-i„y«-1> _
f(x) = lim for a > 0.
x"(n - 1)!
Hint. Let • ••» 4> • • • be independent random variables having the same
t
exponential distribution with expectation a, i.e. < /) = 1 — e~ x for t > 0 If we
1 \
put 4 — — ^ {*> then lim st £„ = a, hence (see Exercise 8.b))
n k=l «-* co
lim E(/(Q)=/(a).
n —► co
nt
n"tn 1 exp nn (— l)"-'^-»|J.
x
mo) = m in - 1)! a"
dt
A"(n - 1)!
12. Let r, be uniformly distributed on [0, 1], Let £„(/•) denote the number of occur¬
rences of the digit r ir= 0, 1, ..., 9) among the first n digits of the decimal expansion
of show that
Ur) 1
a) P lim = 1 Cr = 0, 1,..., 9),
\n —oo lo
( n
4(0
b) lim sijp f—
To
2/i In In n 3 /
V
Htnt. Let the random variable 4(r) be equal either to 1 or to 0 according to the
«-th digit in the decimal expansion of y being equal to r or distinct from it; then
4(0 in = 1,2,...) are independent and have the same distribution P(£ ir) = 11 = -i_
9 10’
p(Ur) = 0) = — (r= 0,1,..9). a) is obtained from the strong law of large numbers,
i*(£*=±D = y (k=
1 1 2 n\ „2n-2k
In - 1
E{q^) = ^r 1 +
k
+ n - 1
k=0 \
q +
E(q^) < 11 + j
c) Using b) show that
c„
P |lim sup < 1 = 1.
yj In In In n
d) Show that
( k - i \
k - 1
n—1
E(U = I 2k — for n = 2, 3, 4, . ..
k= 1
0 otherwise.
Hint. Let p„%k = P(Cn = k) (k = —1, 0, 1,We have the following recursive
formulas:
Pn + 1. —I — 2 1 Pn.o)
Pn +1.0 — 2 P"’l‘
a) follows by induction.
434 LAWS OF LARGE NUMBERS [VII, § 16
1C,
< e {m = n, n + 1,. . k — 1) and > £
m
hold and put A = Ak; then A, An, A„+1, ... is a complete system of events. Hence
k=n
l l
> £ £(£ | Ak)
m=k /n2 (m + l)2
If m > k, it can be shown as in the proof of Kolmogorov’s inequality, that
E{Fm\Ak)>E{tt\Ak)>k*e\
Hence
E(V | AJ > e2,
and
16. Prove the following generalization of Exercise 14. If f2, . . . are completely
independent and if £(£*) = 0, D\Zk) = D\ , further if 0 < Bk < B., < ... is a sequence
03 jy2
of positive numbers such that £ + oo, we have, for e > 0,
fc=i Ek
1 f 1 Dl
asS7§r £2 l E« + .E H
^ t=n+l Bk
17. Deduce from the inequality in Exercise 16 the following theorem: Let rju r]2,. . .,
r)„,. . . be completely independent random variables with expectations E(rjk) = Mk > 0
and with finite variances D2(rjk) = D\. Suppose that
«) Y, Mk= +°°,
k=1
Dl
(5) the series is convergent. Then with probability 1
k= I
(I M<)‘
/= 1
E Vk
k= 1
lim = 1.
E
k=1
Hint. This is a generalization of Theorem 2 of § 7; in fact, if % = £* — Mk + 1,
oo £)2
then E(j]k) = 1, Z)(%) = Dk\ thus if £ -77 < +°o then with probability 1
k=i *
1
lim E % =1
and thus
Thus if conditions (4) and (5) of Theorem 1 of § 15 are fulfilled, conditions a) and
/3) of Exercise 17 are fulfilled as well and it follows with conditional probability 1
with respect to the condition C that
Yek{Zk-M+ 1)
lim —-^-- = 1. (5)
CO
E Pk
k=1
If we apply the theorem in Exercise 17 to the sequence r\k — ek we have again, with
conditional probability 1 under condition C
E £*
k= 1
lim -V-- L (6)
k=1
±Pk
From (5) and (6) it follows that
/
E £^k
lim -^-4-= M = 1.
E £*
fe = l
436 LAWS OF LARGE NUMBERS [VII, § 16
19. If the random variables £ls f2> . are identically distributed and if
the fourth moment of q. exists, then for the validity of the strong law of large numbers
it is sufficient that the £* are four-by-four independent (instead of being completely
independent); thus, if any four of the random variables are independent and if
£(£*) = 0, Z)2(|„) = D-, E(Z£) = Mt, further if we put £„ = — £ £*, then we have
P( lim C„ = 0) = 1.
n—► co
PC I £„ | > e) < ^* + Mn - 1} Di .
e4 e4 «4
Hence the series £ P(| | > £) is convergent and we can use Lemma A of § 5. (The
n= I
idea of this proof is due to F. P. Cantelli.)
20. If £t, £2,.. ., are identically distributed random variables with finite variance,
it suffices for the validity of the strong law of large numbers the still weaker criterion
that the £k are pairwise uncorrelated (instead of completely independent).
4 = ^1 4-
12 k= 1
According to Chebyshev s inequality, P(|C„a | > £) <—-—— ; hence the series
n2
to-
e2
CO
max
»2+1£V<0i + 1)2
I
fc = «2+l
4 < en2 (8)
for a sufficiently large n. (7) and (8) lead with probability 1 for n2 < N < (n + l)2
to the inequality |£„| < 2f for a large enough n, which proves our statement.
Remark. By a different method1 it can be proved that even the convergence of the
® ]n2£
series D\ ——- is sufficient.
k— 1 K
22. Let us develop the positive number x lying between 0 and 1 into Cantor’s series
<7l q-l ■ ■ ■ Qn
belonging to the sequence qn (q„ > 2, qn integer), where the “digits” e„ (x) may take on
the values 0,1,..., Q„ - 1 (n = 1,2,...). If V is a random variable uniformly
distributed on the interval (0, 1), let £„(£) denote the number of digits e,(rj) equal to k
(j = 1,2,...,«). Assume that the sequence q„ fulfils the conditions lim qn = + 00 and
00 1 n~*‘00
V — = + oo. Show that
n=\ (In
Cn (k) for k = 0, 1,....
P lim n
n —*■ co l
Hint. Let
1 for e„ (rj) = k,
0 otherwise.
1 1
and D2(^nlc) - — the convergence of
Hence, for q„ > k, E(Z„k) —
<In Qn
follows from the Abel-Dini theorem. Thus we can apply the result of Exercise 17-
The statement of the present exercise can also be obtained as a particular case o
Theorem 1 of § 15.
23 Let ?? be the frequency of the event A in a sequence of n independent experi¬
ments, while P(A) = p(0 < p < 1; q = 1 - p). Prove the complete form of the law
of the iterated logarithm, i.e.
Vn - nP (9)
jP I lim sup- = 1=1
n—<x> 2npq In In n
and
Plton inf —-======== = — 1 j = 1.
(10)
I n—a. sj2npq\n\nn J
(ln In ri) 2
O
f h£** „a H=1 - -g) )+ Jn
Since we have (cf. Ch. Ill, § 18, Exercise 18):
K‘‘
1 - &{x) > - - 2
_/ 2nx
1 +
it follows that
Vn — nP
>l-e > for n > n.,
AJlnpq ln In n ln n
Vn* ~ nkp
"/— > 1 — e.
y/2nkpq ln Inn*
00
Thus the series P(Ak) is divergent. It is easy to show that the sequence Ak fulfils the
24. Let <Sj, c2, . . ., . . . be pairwise independent random variables with common
+n
a) lim J xdF(x) — 0,
w —► 00 —n
(Theorem of Kolmogorov).
Hint. Let
k for | {* | < n,
^nk
otherwise
and
1
„ Yj ^nk-
4=1
Then
Furthermore
-Uc-1)
+/z k
hence
lim D2(£*) = 0 ,
+ co it )"xdF(x) + 8n'
E(e“'n) — |l + j* (exP~-lj^F(*)j = l1 +
—n
n
where
+n
(' t2
8n I < n[F(-n) + 1 - *(»)] + — j x2 dF(x)
— £ 1 + £2 + • • • + £„•
where
X
- £(C„)
C„ - z Zk and Cn =
k=1 D(0
<P = 1 - + o (5)
Ps/n, 2n
K„ = + (9)
r _ v i ) r* Cn — ^(Cp) (im
2-1 £n — N • (10)
k=1
lim ^- = 0 (11)
Az-*- + co ^n
is fulfilled, then
Remark. The condition (11) is evidently fulfilled when all £,k have the same
distribution. In effect, in this case Dk = D, Hk = H, Sn = D^fn, Kn =
Later on it was proved by Markov that Liapunov’s theorem can also be proved
by the method of moments.
VIII, § 1] THE CENTRAL LIMIT THEOREMS 443
H\fn, hence
H
lim K" lim = = 0.
c
n-+- + oo ^n D n^ + oo %Jn
Hi < CD\,
hence
K,
!L <
lim = 0. (13)
+ O0
where
Lindeberg proved the central limit theorem under still more general con¬
ditions. His condition is, in a certain sense, necessary as well. It is formulated
in the following theorem due to Lindeberg:
Sn=iD\F (15)
k=1
and let Fk(x) be the distribution function of lk - Mk. If for every positive
s the so-called Lindeberg condition
Z «* - Mk)
r* _ *=i (17)
S n o
we have■
lim P(Cn <x) = 0(x). (18)
n-*- + oo
Remark. From Liapunov’s condition (11) one can deduce (16); indeed
we have
+ CO
AZ F*
1 ” 1 1
5isf t J ^VFfc(x)<—-g- LX
—1 ./
I* I3 <#*(*) =
£
(19)
n k~l J jaz At = 1 J
IJC! > — 00
Similarly, (16) can be deduced from (13), too. Hence it suffices to prove
Theorem 3 (Lindeberg’s theorem); then Liapunov’s theorem (Theorem 2)
will also be proved.
k-1
o«y
e - £ V- (21)
7=0 J k\ '
hence (21) holds for k — 1; if (21) holds for any k, it follows from
eiu _ £ c=
dv (23)
7=0 J 7=0 J'-
VIII, § 1] THE CENTRAL LIMIT THEOREMS 445
that (21) holds for k + 1 too; hence by induction (21) holds for every k.
Thus the lemma is proved.
We have therefore
~ itx x2;2
e Sn = 1 -— + 0X where | 91 \ < 1 (24)
2Sl
and
itx
itx x212 x3t3
eSn = 1 + + 02 where I 0, I < 1. (25)
Sn 2 52 6 Si
Now let e > 0 be given. The integral (20) can be separated into two parts:
UJ _ J1
+ sSrt
itx
t
<Pk
— ESn
e Sn dFk (x) +
I
|*|>eS7i
eSn dFk(x). (26)
Consider first the first integral on the right side of (26). Because of (25)
we have
J J J J
CaS/i ^ &S n
with
eSn
11
tfP | < —o \x\3dFk(x)^-^-Dl (28)
* “ 653 ,
—eSn
i
ItX
Sn dFk (x) = J dFk (x) + J xdFk (x) + Ff\ (29)
|x| >e<Sji \x\>eSn 1*1 >eSn
with
If we add (27) and (29), we obtain by (28), (30) and by taking into account
that E(rjk) = 0,
t2D\
<Pk = l + R(P, (31)
S„n 25?
446 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 1
with
1113 pD2 t2 r
J x2dFk(x)• (32)
|jc| > eS'rt
max Dk
lim^^-= 0. (33)
«->- 00
In fact
n
where
max D\ n
!S5fr-^2 + ^r I I *VFt(x). (34)
*=1 J
\x\ > sSfl
max Dk
1 <k<,n
lim sup < £. (35)
n-*- + oo
max Dk
1 <,k^n
< a. (37)
This can be done because of (16) and (33). Let further be a < ,thus
<, 1
2 2 Si
Hence from (39) and (40) it follows for n > rt0(s) that
t t*
[\t
n
*=i
<Pk —e < £ +r + (41)
Thus
p > 2.
Let us add that Lindeberg first proved his theorem by a different method,
viz. by a direct study of the convolution of the distributions (see § 12).
Lindeberg’s condition (16) is, as was shown by W. Feller, necessary as
well, in the following sense: If • • • are independent random
variables with finite expectation and finite standard deviation, if Fk(x) is the
448 THE LIMIT THEOREMS OF PROBABILITY THEORY rvni, § i
c. = E k=1
U-
By the same method which served to prove Theorem 3 we can prove the
following, somewhat more general, theorem:
lim V J x2dFnk(x) = 0
n-* + co k = l \x\>e
Vo! ? the books °f B- V- Gnedenko and A. N. Kolmogorov [1] and W Feller f71
ol. 2, containing the detailed discussion of many further results in this domain. ’
VIII, § 2] THE LOCAL FORM OF CENTRAL LIMIT THEOREM 449
In the preceding section we have seen that the distribution function F„(x)
of the standardized sum C* of n independent random variables £2,. . .,
. . . converges, under certain conditions, to the distribution function of
the normal distribution as n -* oo. It is therefore natural to ask under which
conditions the density function of (* (if it exists) tends to the density func¬
tion of the normal distribution. For this the conditions must certainly be
stronger, since it is known that F„(x) -> <P(x) does not- necessarily imply
F'n(x) -*<P'(x). We prove first in this respect a theorem due to B. V. Gnedenko:
£l + £2 +••• + £/!
r* = (1)
Djn
tends to the density function of the normal distribution; hence we have
1
1 o
lim /„ (x) = (2)
Jin
The convergence is uniform in x.1
1 Fig. 25. The figure represents the case when the random variables are uniformly
distributed on (- + yj 3)-
450 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 2
Proof. The supposition D = 1 does not restrict the generality. Let cp(t)
be the characteristic function of and (pn{t) that of £*. We know that
fi(x)
Fig. 25
Lemma. If the density function g(x) is bounded, g{x) < K, and the charac
teristic function
+ 00
HO = j g(x) e,tx dx (4)
- 00
+ 00
is nonnegative, then the integral j \p(t)dt exists.
— 00
J jJ
0 -v
HO dtj dv = 2 J g(pc) -
— cos 2Tx
dx. (6)
-T
J iHO dt <
J 9(-x)1~
— cos 2Tx
dx. (8)
VIII, § 2] THE LOCAL FORM OF CENTRAL LIMIT THEOREM 451
(10)
— 00
(11)
— 00
uniformly in x.
We show now that the integral
The second integral does not depend on n and becomes arbitrarily small by
choosing T sufficiently large. It suffices thus to study the first integral. In
order to evaluate it, we separate it into two parts. For w^Owe have
i/2
<f(u) = 1 - — + o(u2).
l t
V 1-7=
J l sjn
T T
dt (16)
J
n
tends to zero as n -*■ + oo. First we choose q = q(e) with 0 < q < 1, so
that | cp(t) | < q when | 11 > £ > 0.
In fact, according to Theorem 8 of Chapter VI, § 2,
lim | cp{t) | = 0.
t-*~ + CO
Since the £/t do not possess a lattice distribution, ! w(t) i ^ 1 for every
t # 0; therefore if we put
sup | 95(/) | = q
\‘\>e
+ 00 +°° +oo
r t \ n — r _ +-
J 99 [“^J dt = J I 9>(«) \ndu < y/n qn~2 j | cp(u) |2 du. (17)
v . L
VIII, § 3] DOMAIN OF ATTRACTION OF NORMAL DISTRIBUTION 453
-r '-a- / --
Since we have already shown that j | cp(u) |2 du is finite and lim sJnqn~2'—
-oo n-+ + co
= 0, the integral (16) tends also to zero as n -> +oo. All these restric¬
tions are valid uniformly inx. (2) holds thus uniformly for — oo < x < + oo.
Theorem 1 is herewith proved.
When f(x) is not bounded but for any given k fk(x) is, (2) remains still
valid. This can be shown by a slight modification of the above proof. The
condition that fk(x) be bounded for a value of k (and, consequently, also
for every n > k) is evidently necessary for the uniform convergence of
/»(*) to
c„ = E4
1 k=
limit relation
lim + =0 (2)
•^+0° J xflF(x)
-y
holds, then (1) is valid for every suitably chosen sequence of numbers (A„}
and {£„}.
Notes
1. Condition (2) is not only sufficient but also necessary for the validity
of (1). But this will not be proved here.
2. If the standard deviation of the random variables £k exists, i.e. if
+ OO
j* x2dF(x) is finite, (2) is evidently true; this follows immediately from the
— 00
inequality
r[^(-T)+0 -^00)] < J x2dF(x).
\*\>y
Thus we can see that Theorem 1 of the present section comprises Theorem 1
of § 1.
+ CO _(_ 00
and from this follows that D2 = J x2dF(x). Thus if j x2dF(x) does not
exist, the sequence of numbers {S„} for which (1) holds”cannot have the
order of magnitude ^fn. (Clearly, by the proof of Theorem 1, Sn tends to
infinity faster than jn.)
VIII, § 3] DOMAIN OF ATTRACTION OF NORMAL DISTRIBUTION 455
Proof of Theorem 1.
Lemma. If we have
y y
j xdF{x) = >’(1 — F(yf) — T(1 — F(Y)) + j (1 — F(x)) dx
and
-y
-y
f xdF(x) = YF(- Y) - yF{— y) - f F(x)dx,
J -Y
hence
and by (3)
Y +x
j
y<,(x\<,Y
| x | dF(x) <y{ 1 - F(y) + F(- y)) + a J j t2dF(t)
dx.
thus
+y
_|- t2 dF(t).
y
-y
456 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 3
Since the right hand side of (5) does not depend on Y, we conclude that
+c
J | x |dF(x) exists; the lemma is herewith proved.
— CO
(7)
j' x2dF(x)
o
Then by assumption lim 5(y) = o. Put further
<5 (y)
A(y) =
(1 -F(y))- (8)
It follows from
A(y) = y >
l
(9)
0 - F(y)) | dF{x) (1 - F(y)) F(y) - ~
that
lim A(y) = +oo. (10)
y-* + oo
^(C„) = n2.
(11)
VIII, § 3] DOMAIN OF ATTRACTION OF NORMAL DISTRIBUTION 457
n{ 1 - F(C„)) = . (12)
Put now
+ Cn
S~ = n | x2dF(x), (13)
-Cn
and let cp(t) be the characteristic function of tk. cpJt) the characteristic func¬
tion of (JSn. We have
itCn
<Pn(0 = E(e Sn ) = <P 1 r (14)
sj.
However, we have
+ Cn
itx
\x\>Cn
By the lemma of § 1
+ Cn
itx
t*
(eSn - 1 )dF(x) = - — + Rn (17)
-c„
holds with
+ Cn
r l3l*l3 , c, 1113 y^c„) (18)
I < dF(x)
' 653 6n 2n
-Cn
2n+—n-)
As regards the question whether other distributions than the normal also
have a domain of attraction, the following example shows that this is pos¬
sible. Let £1; £2,. . . . . be completely independent random variables
possessing a common stable distribution of order a(0 < a < 2) and charac-
and
OO
Rnk = Z Pnk(r).
r=2
(2)
kn
I'm Y, Pnk( 1) = A, (A)
n-*- + oo k=l
kn
Proof. Let gnk{z) denote the generating function of the random variable
Lk-
CO
Gnkifi) = X Pnk(r) Zr (|Z|<1). (4)
r=0
Clearly
we can write
The identity (38) of § 1 implies, since | gnk{z) \ < 1 and | 1 + p„k(l)(z - 1) | <
< 1,
kn
ri 9nk{f) - n 0 + Pnfif) (Z 1)) < 2 X Rnk- (8)
k=1 fc=l
which is because of (B) fulfilled for n > n0, then identity (38) of § 1 leads to
<
ll 0 +Pnk<fi)(z - 1)) - ri eXP - !))
k=1 k-t
Since fj g„k(z) is the generating function of rjn and e*z 1} that of the Poisson
k=1
460 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 5
then conditions (A), (B), (C) are evidently fulfilled. In this case rj„ = Y, £nk
k=1
has a binomial distribution:
X
PiVn =j) =
n
"i-j-r
n
n X V X \ n~j Xke~x
lim 1 -
n— + oo \j n k\
The statement of the central limit theorem is valid for certain sequences
of weakly dependent random variables. In the present and in the following
two sections we prove some results in this direction. These results have
practical importance, too, since in the applications the independence is
often only approximately true. The following theorem1 refers to samples
taken from a finite population, a situation very often encountered in prac¬
tice.
*=1
aN,k (1)
and
N
Mn\2
I
k=\
aN,k ~
AJ• (2)
n
Dfi/.n — D N Jv
1 -
N
(3)
Put further
dN,n (e)
If the condition
lim dN'„(e) = 0 (6)
N— +oo
lim P
Ca,« < X = <P(x). (7)
N-*- -f 00 D N,n
1 M,N D N,n ^ 2
< < Ne2 <m .
z aN,k n2
T n2
UN
aN.k-
Mn i , „
-N - <,eDN,n
N L)N
Hence, for N > N0 we have n = n(N) > and since e > 0 can be
We may assume
Mn = 0. (9)
462 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 5
In fact, if (9) is not fulfilled, consider instead of the numbers aN k the numbers
aN,k = aN,k--; for these (9) is clearly fulfilled and if Theorem 1 holds
N
i; (io)
the random variables (Nn and ~CNN-n have indeed the same distribution
N
and if n > — , we may take instead of n the number N - n.
(13)
and
1 for k = 0,
(15)
0 for k = ± 1, + 2,...
— 71
+ 7E
1 r N
<PN,n(0= ~2nBNn(X) j O [(1 + (16)
VIII, § 5] CENTRAL LIMIT THEOREM FOR FINITE POPULATION 463
Indeed if we calculate the value of the expression behind the sign of integra¬
tion, by taking in the product (N - m) times the first and m times the sec¬
ond term, we obtain a term multiplied by the factor e'(rn~ri)<f; such a term
vanishes therefore when the integration is carried out provided that m # n.
If N —► + oo and JV — w -> + oo, which is certainly fulfilled in our case because
of (8) and (10), it follows from Stirling’s formula1 that
1
(17)
N
1
<PN,n
D N,n ^/in
— J
n QkW* 0
ft-1
(18)
where we put
•A taNk
QtM, t) = (l-X) exp — iX + +
Jnx{\-x) &N,n
>A taN,k
(19)
+ X exp /(I - X) +
JNX{\ - X) DN,n
l
According to the lemma of § 1 we have
v2 X(l — X)
(1 _ X) e~Uv + XeKl~X)v = 1 - + Rx (20a)
with
X(l-X)\v\
R,\< (20b)
and
(1 - X) e~iXv + XeK1-X)0 =l+R2 (21a)
with
X(\-X)v2
\R*\< - —» (21b)
1 If A is fixed, (17) follows directly from the Moivre-Laplace theorem. In cur case
X depends on N, hence the latter theorem cannot be applied. But Stirling’s fprmula
leads easily to (17).
464 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 5
- COS V) <
2(1 - 2) f 1p 2
taN,k
e'ki'l', 0= 1 - + (1 + 9^) (23)
2 l ViV2(l - 2) DN,n
‘N ~ 1, (27)
\aN,k\> «^5-
lim rjN = 0.
N-^ + oo
(29)
2e< •, - <*
VNX(l-X)
Since
N-1n
( 2(1 — cos e) «(1 — COS £)
J'NX 1 < yjn exp + XlN\ ,
the right hand side of (31) tends to zero as N -» + oo because of (28) and
because
/— (( tt(l-cosa)
t/%( 1 r»r\ c «A
From (18), (25), and (31) we obtain, since e can be taken arbitrarily small
T 00
ta
2
lim cpKn =e e 2 dif/ = e (32)
N-+ + 00
D N,n! V2
M' ‘N-M
m n —m
(33)
P(&N,n = m) =
N\
n
nM
+ 00. (34)
If
.... M .
When — is constant or remains above a positive bound, this means that n
nM N
must tend to + oo with N. From-> + oo it follows, because of M < —
N 2
, N ,
and n < —, that N —► + oo, « -> + co, and M —► + oo. Theorem 1 con¬
(M N-M
k n—k
lim X = *(x). (35)
NpX~ + oo k<np + xlf np(l-p)(l — X)
by S. N. Bernstein.
M
Note further that if p — is constant and n increases more slowly than
h2
N (it, for instance-> 0), (35) can be derived from the Moivre-Laplace
c, = kt=l t*
Assume that there exist two sequences {C„} and {S',,} with Sn -*■ + oo and a
distribution function F(x) such that at every continuity point of F(x) the dis¬
tribution function of
in-Cn
hn =-c-
tends to F(x):
lim P(rj„ < x) = F(x).
«-*- + 00
By assumption
lim />( | e„ | > 5) = 0.
n-*- + oo
F(x - 8) -P(\en\>8) < P(9n + £„ < x| |£„ | < 8) < F(x + 8), (5)
F(x - 8) < Jim P(9n + £„ < x) < lim P(9n + en < x) < F(x + ,5). (6)
n~* + oo n~* + co
£n0 + k
lim P
«-*- + 00
*ln ~ < X = F(X).
(8)
Since
£nu + k Cn C, n0 + k n
tin ~
—
9„n = yi _ £"» + fc „ feC,0 +
~ 'In A > £„ = —--
VIII, § 6] APPLICATION OF MIXING THEOREMS 469
The conditions of Theorem 1 of Chapter VII, § 10 are thus satisfied and for
every B with P{B) > 0 the relation
holds. The theorem is therefore proved for every x such that F(x) > 0.
If x is a continuity point of F(x) such that F(x) = 0, we have
k=l
nn=
possess a limit distribution, then rj„ is in the limit independent of ary ran¬
dom variable 9 in the following sense: For every y such that P{9 < > 0
the relation
lim P(r]n < x, 0 < y) = lim P(rjn < x)P{9 < y) (11)
n-*- + oo n-*~ + oo
Theorem 2. Suppose that the random variables £i, • • •, £n, • • • are inde-
n
P(q00<x\B)=P(B\B) = P(B),
i.e. F(x) — P(B) = 1, which contradicts our assumption that 0 < F(x) < 1.
Hence Theorem 2 is proved.
Naturally, it follows from Theorem 2 that, under the conditions of the
theorem, the limit of rjn cannot exist almost everywhere. Still more is true:
the probability of the existence of the limit lim qn is equal to zero.
71—► + 00
The set C of the elements to £ Q for which lim qn{w) = qm(od) exists is
n-*- + oo
obviously measurable. Suppose we have P(C) > 0, then qn would con¬
verge on the probability space [12, P(A | C)], with probability 1
and therefore also in probability, which contradicts Theorem 2.
We now prove a lemma.
Q(A) = J X(a>)dP.
A
VIII, § 7] SUMS OF A RANDOM NUMBER OF RANDOM VARIABLES 471
C„ = ^=- (i)
Vn
the relation
lim P(Cn < x) = <P(x) (2)
n— + oo
1 The sequence of events {A„} is also mixing with respect to [Q, cA, Q] since
Q*(A) = Q(A | B) is also absolutely continuous with respect to P. Hence by Lemma 2
lim Q(A„ | B) = d.
rt-* + oo
2 Cf. P. Revesz [1].
472 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 7
Proof. Put
(6c) is a consequence of (3); (6a) and (6b) express the fact that
cni, cni, • ■ ; Cnk,. . . is a probability distribution. The three conditions (6)
can be expressed bv saying that (C„*) is a permanent Toeplitz matrix. A
theorem1 known from the theory of series permits to conclude that, if
lim Sn — S
n-+ -f oo
then
)im ktcnksk
=
w—*- + oo
= s.
1
Now
00
00
From (2) and the above-mentioned theorem from the theory of series we
obtain (4) and Theorem 1 is proved.
The situation is somewhat more complicated if we do not suppose that
v„ is independent of the variables £k. In this case a stronger condition than
(3) must be imposed upon v„. As an example we prove now a theorem
which is a particular case of Anscombe’s theorem.1 The reasoning is inspired
by W. Doeblin.
Proof. Put
n
(H)
K
Furthermore
I j hv„ hxn
Cv„
(12)
Let <5 > 0 be arbitrary. Choose N and nx so that for n>nx the inequality
P(| 9n | > N) < 8 should hold. Choose n2 > n2 such that for n > n2 the
< 2<5 for n > n2. Consequently, 6n (y„ - 1)L0 and the present lemma
follows from Lemma 1 of § 6.
According to these two lemmas it suffices for the proof of Theorem 2 to
show that
9Vfl P Q
(13)
*/y„ ~_nxn nk -
>£ = Y.P > £, vn = k\ (15)
JK k=l
max
<h - ’h,
> E < 2(5. (17)
| k — A/j| < £^<5A/i
A
Inequalities (16) and (17) prove (13).
Finally, as an application of Theorem 1 of § 6 we prove a theorem in
which v„ fulfills a condition of other type than (9).
= [«a], (18)
where [x] denotes the integral part of x. Under these conditions (4) is valid.
f P(Ak)<e, (22)
k=m +1
A a. (23)
n
Then (4) is valid.
The proof1 rests upon Theorem 3 and uses the same method as the proof
of Theorem 2.
1 Cf. A. Renyi [31]; later J. Mogyorodi [1], further J. R. Blum, L. Hanson and
J. Rosenblatt [1], have proved that in Theorem 4 the restriction that a should have
a discrete distribution can be omitted.
476 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 8
Similarly, we can show that for arbitrary integers 0 < n± < n2 < . . . <
< ns < n.
nm = (/#>)• (4)
Instead of pty we write simply pjk and instead of n1 simply 77.
VIII, § 8] LIMIT DISTRIBUTIONS FOR MARKOV CHAINS 477
Clearly for every positive integer m and for j > 0 the relation
£ /$?> = i (5)
k=0
holds. In fact, the terms of the sum are the probabilities belonging to a com¬
plete system of events. Hence the matrix /7m, which has nonnegative terms
only, has the property that the sum of terms in each row is equal to 1. Such
matrices with nonnegative elements are called stochastic matrices. The matrix
J7m can be computed from /7 as follows. According to the theorem of com¬
plete probability (cf. Chapter III, § 2, Formula (2)) we have for 1 < r < m
00
=E
1=0
co
Thus we have
The matrix of m-step transition probabilities is thus the ra-th power of the
matrix of one-step transition probabilities.
So far we have only considered transition probabilities, i.e. conditional
probabilities. In order to determine from these the probability distribution
of we must know the state of the system at the instant t = 0 or at least
the probabilities of the initial state of the system, i.e. the probability distri¬
bution P((o = k) {k = 0, 1,. ..). With the notation P(C„ = k) = Pn{k)
(n = 0, 1,. . .) one can thus write
If Co is constant, e.g. equal to j0, then P0(j0) = 1 and P0(j) = 0 for j ^ y0.
478 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 8
In this case
Pn(J<)=P(j:l (11)
A-A A |
n=
n 1-Aij’
A
W) - (12a)
A + /i
and
P
p«(°)=TVv+ c1 -k - t*r p o(o) A + p.
(12b)
where P0(l) and P0(0) are the probabilities that at time 0 the machine works
and does not work, respectively. Since 0<A<l,0</i<l,we have always
| 1 - A - ^ | < 1; hence (12a) and (12b) lead to
A
lim P„(l) = lim Pn(0) = (13)
A + p A + fi ’
initiaf itJ = ll™ have ^0) = A and Pn( 0) = ^ without any assumption on the
initial state, m this case the C„-s are independent from each other!
VIII, § 8 J LIMIT DISTRIBUTIONS FOR MARKOV CHAINS 479
on the initial distribution P0(j), then the Markov chain is called ergodic.
An initial distribution such that £„ has the same distribution for every value
of n, is called a stationary distribution. If the Markov chain is ergodic and
there exists a stationary distribution, the latter is evidently the limit distri¬
bution of It is easy to show that there exists a stationary distribution,
iff the system of equations
x0 = (1 - X) x0 + gxx,
X g
*i = -y~~— » *o = -y~— • (16)
A + fl A + fl
In this example there exists a stationary distribution and the Markov chain
is ergodic.
The following theorem, due essentially to A. A. Markov, shows that this
holds under rather general conditions.
Pt> o, (17)
i.e. that the matrix IJS has at least one column in which all elements are posi¬
tive. In this case the chain is ergodic; the limits
exist and do not depend on j. The sequence of numbers P0,. . .,PN is the unique
480 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 8
which satisfies
N
1Pj= I- (20)
j= 0
Proof. By assumption
Clearly
with
mk < Mk. (30)
If we can prove that mk = Mk for A: = 0, 1,. .N, then (18) will be proved.
Now for a suitable lx the equality
Ar?+'>=7>i,r> = £ (31)
j=O
a certain /0;
Let H be the set of all j (0 < j < N) for which p\s) - p$ ^ 0 and let H
be the complementary set of H, i.e. the set of those j (0 < j < N), for which
Pu ~ P(i?J < d holds. Put
(34)
A = Y(Ph) - Pu) and b = Yl(Pij-Pu)-
j£H HH
hence (36) is valid in both cases. It follows thus from (35) and (36) that
Qt = Z Qpf-
/=0
Qt = 1 Z0 QipP-
=
was to be proven.
The numbers Pk fulfill the equations
= 1l0 P,P&
=
(41)
Finally, let us mention the following particular case: assume that for the
matrix of the transition probabilities (pJk) the sum of all columns is equal
to 1:
E Pjk = 1 for k = 0, N.
j=o
The matrix 17 = (pjk) as well as its transpose II* — (pkj) are stochastic
matrices; such a matrix II is called a doubly stochastic matrix. In this case
(40) is fulfilled for Q — ——— (k = 0, 1,.. N); the solution of (19) being
N+ 1
1
unique, there follows Pk — . Thus for a doubly stochastic matrix II
N+ 1
fulfilling the conditions of Theorem 1 the relation
1
lim /#> holds for j, k = 0,1,..., N.
«-»-F oo N+ 1
It follows from (42) that the probabilities of the N + 1 states are in the limit
equal to each other, regardless of the initial distribution.
A particular class of the Markov chains is that of the so-called additive
Markov chains. If £0, g1}. . ... are independent random variables and
if we put („ = £0 + £1 + • • • + the random variables £„ (n = 0, 1,...)
form a Markov chain, since
P(Cn = k)
lim = 1 (44)
n-> + aO P{L = 0
holds, hence (n has in limit a uniform conditional distribution on the set of
all integers. (This may happen also for nonadditive chains.) Further, if the
expectation of £k is zero and their variance is equal to 1, then from
484 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 8
Cn
lim P < X = *(X).
n-+ + oo \Tr‘
For homogeneous Markov chains with a finite number of states the suit¬
ably standardized sum
Vn — Co + Cl + • • • + £„
I Cfc. (45)
k= 0
Then
\
Vn
2+ g
lim P < x = *(*). (46)
72-*-00 nXg{2 — X - g)
(A + gf
Xg( 2 — X — /<)
= 2(1 - X).
a + /o3~
Hence (46) reduces to
2/7
lim P < x = 4>(x).
72->-{-00 .>>2(1-2)
Theorem 2 is thus a generalization of the Moivre-Laplace theorem.
Proof. If z„ denotes the instant when the system returns for the n-th
time to the state Ax, we have 0<t1<t2<...<t < • r __ j.
VIII, § 8] LIMIT DISTRIBUTIONS FOR MARKOV CHAINS 485
= 0 for k < or xn < k < t,,+1 (n = 1,2,.. .)• Put <5t = t,, bn — xn —
— t„_i. It is easy to see that <5„ are independent and (<5! excepted) have the
same distribution. For by the definition of Markov chains, xn — x„_x is
independent of the random variables <5l9 52, ■ .., <S„_! which depend only
on the states of the system at instants t < t„_x. The fact that the random
variables 8n (n - 2, 3, . . .) are identically distributed follows from the ho¬
mogeneity of the Markov chain. Clearly for every n> 2
P(5„=l)=l-n
and
P(5„ = k) = pX( 1 - X)k~2 for k > 2.
( X+
x*~k^r
lim P < x *(x). (47)
k-* + oo Jkn{2 -X-fi)
X
Now obviously P(r]n < k) — P{xk > ri)\ in fact rj„ < k means that up
to the moment n the system was less than k times in the state Ay, thus its
k-th entrance into the state Ax occurs after the moment n, hence xk > n and
conversely. If we put
nXfi( 2 — X — ju)
—(X + Ixf~ ’
hence
f X \ l X + \x
V n~ *k - ■—« —k
X + /i X 1
P < x =P >- x + O (48)
nXn(2 - X — /x) yjkpi2 - X - /.l) A
v v a+nf
(
In ~ n
X + fx
lim P — < x = 1 - 0(—x) = d>(x). (49)
+00 nXn{2 — X - n)
/
(* + A*)3
Theorem 2 is thus proved.
the trivial solutions G(x) = 0 and G(x) = 1 excepted, are the functions of
the form G(x) = exp ( — Ax) with A > 0.
The meaning of (1) becomes particularly clear if we interpret £ as the dura¬
tion of an event which takes a certain lapse of time to occur. In this case (1)
expresses the fact that the future duration of an event which is still in course
at a moment y does not depend on the time passed already since the begin¬
ning of this event.
Arrange the random variables Ci, £2, ...,£„ in increasing order and let
Ct = Rk{M
be the A-th of the ranked variables Cj- Then1
. If we put Co — 0 and
nX
1 F(x) being continuous, the probability that two of the £, are equal is zero; this
possibility can thus be omitted.
488 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 9
where <5,- (J = 1,2,..., k) are independent and possess the same distribu¬
tion. Equation (7) shows that the variables (* form an additive Markov
chain. By means of (7) the distribution of £* can be determined explicitly.
Let the preceding result now be applied to the theory of order statistics.
Let Ci> C2, •••>£« t>e independent random variables with the same contin¬
uous distribution function F(x). As above, put = Rk(£lt f2,. . .,£J,
hence £*<£*< •••<£* are the variables ^arranged in increasing order.
The theory of order statistics deals with the study of £*; f * is called the
/c-th order statistic. This study can be reduced to the case when £k are expo¬
nentially distributed and by (7) we then have to consider sums of indepen¬
dent random variables only. In order to show this, put
1
Ck= In (k = 1,2,...,n) (8)
and
Ct = 7?, (Ci,..C„) (9)
1
Since In ~f\xj 1S nonincreasing, we have
1
Ct = ^ (k = 1,2,...,«) (10)
and as are independent, the same is valid for (k.
Consider now the distribution of £k. Let y = F~\x) (0 < * < 1) be the
inverse function of x - F(y) (- oo < y < + oo). Then the relation
is valid, i.e.
for 0 < x < +oo. Hence the random variables (k are exponentially distri¬
buted with expectation 1. Thus can be written in the form
Si <5«+l-k
cn
£* = F_1 (e-<«-n-*) = F~ 1 exp
n n— 1 k JJ . (12)
where <5t, d2,. . .,<5„ are independent random variables with expectation 1.
Our result implies the theorem of van Dantzig and Malmquist stating that
P(E* )
the ratios —v ----- (/c = 0, 1,. . ., ri) are independent of each other
F(£t)
(cf. Chapter IV, § 17, Exercise 17). Indeed we have according to (12)
F(xJ+l) (14)
= j In l<j<k-l,£t=xk
f(xj)
F(x)
= P\Sn+i-k<k In a=xk = P(Z*k+x<x \ek=xk). (15)
F(*k)
holds for 0 < x < 1, the random variables F(f*) are uniformly distributed
in the interval (0, 1). The random variables F{Ck) are thus the ordered ele¬
ments of a sample selected from a population uniformly distributed on
(0, 1).
490 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 9
Starting from this point of view, many results on order statistics can be
derived quite easily. As an example, consider the following problem: What
is the limit distribution of the random variables S* when both n and k tend
k
to infinity in such a way that— tends to a limit q(0 < q < 1)? In
'Zm-Q x
lim P = *(x). (18)
n-*+oo D y/n.
where
Q = F~\q) (19)
and
D AQ)^ (20)
Proof. We consider first the limit distribution of
C»jj
n+l-k(n) = In (21)
F(£*
k{n ))
By (12)
n+l-A:(n)
n+l-k(n) = I
n+l-j
(22)
7= 1
N 1 I \
£ In N+C + O — ,
k=i k liVj
M, = = In — + 0 (-1=1 . (24)
# LV*n)
Since
= J_1_
k=N1 k2 Nx N2 w)
we get
f 1
S2n = D2 (C*+1_fcW) = -5—^ + O (25)
nI ’
according to (23) and from
N• 1 1
1 +oM '
£NlJ? = JNl ~ In: TV?
it follows that
n+l-fc(n) J) _ 1 S'! 1
I ^ = O (26)
j= 1
n + 1 -y n,2 ’
IKn] 3 1
= 0 (27)
u, Jn)
Cn+l—k(n)
X
lim P = *(x). (28)
n-* + oo
C« + l-k(n) — In q < x
1 -q Jn
l-q
= P tin) >F 1 k exp —X (29)
nq
1 -q
f7-1 \q exp —x
nq
/I -q }
exp x . - 1
, \ nq
= F~\q) + (30)
JiQdn)
where lim 0n = 1; further
n-+ + oo
1
q exp —x x +0
nq I ) n n
Now (29), the continuity of/(x), and the lemmas of § 6 and § 7 imply (18),
hence the theorem is proved.
The theorem states that the empirical sample quantile of order q of a
sample of n elements is for sufficiently large n nearly normally distributed
0 for x< ,
k
F„(x) = < for F* < x <£*+1 (k = 1, 2,..., n — 1), (1)
n
1 for F*
*3n < x;
Theorem 1 (Smirnov).
1 — e 2y2 for y > 0,
lim P{Jn sup (Fn (x) - F(x)) < y) -
n-* + oo — co <x< -f co
0 otherwise.
Theorem 2 (Kolmogorov).
K(y) for y > o,
lim P( Jn sup | F„ (x) - F(x) \ < y) =
/2-^ + QO — CO < A < -f- CO
0 otherwise.
where
K(y) — +f (-\fe~2k2yt. (2)
k= — oo
Notice that in these two theorems the limit distributions do not depend
on F(x). It suffices that F(x) is continuous, this guarantees the validity of
these and all further theorems in this section. The values of the function
K{y) figuring in Kolmogorov’s theorem are given in Table 8 at the end of
this book.
The theorems of Smirnov and Kolmogorov may serve to test the hypothe¬
sis that a sample of size n was drawn from a population with a given con¬
tinuous distribution function F(x).
The theorems of Kolmogorov and Smirnov refer to the maximal deviation
between Fn{x) and F(x). Often it is more convenient to consider the maximum
F fix) — Fix)
for F(x) > a > 0 of the relative deviation —--. The follow-
F(x)
ing theorems are concerned with this relative deviation.1
Theorem 3. We have
Ffix) - F{x)
lim P
n-+ + oo H Xa^xK+vo
SUP ~^(x)-
x \A) <y
Vifr
x1
2
dx for y > 0,
0 otherwise,
Theorem 4. We have
The values of the function L(z) defined by (3) are tabulated in Table 9.
We may be interested in the maximum of the relative deviation over an
interval (xa, xb), where and xb are defined by F{xa) = a and F{xb) = b
(0 < a < b < 1). This problem is solved by
r Fn(x) - F(x)
lim P fin sup -— <y =
II-*--f CO V xa<,x<Xb F(x)
0-»|/S
1 bf2
is valid.
n 1 -b
exp
2(1-6)/ J du dt
i.e. the probability, that the empirical distribution function remains every¬
where under the theoretical distribution function, tends to zero. According
to Theorem 3 the same holds if we restrict ourselves to values of x superior
to xa (a > 0). However, if we consider an interval [xa, xb] with 0 < a <
< b < 1, then by Theorem 5,
0
-d/~
1' b-a
bt2
4vKj + 00
exp
2(1 ~b) J chi dt, (4)
1 la(\-b) _ 1 a{\ — b)
arc tan arc sm (5)
271 b —a 2n b{ 1 - a)
1 I a(l — b)
lira P ( sup (F„(x) - m) < 0) = — arc sin ^ (6)
n—+ oo xa-£x<,Xb
The theorems of Smirnov can be derived by passage to the limit from the
following theorems due to Gnedenko and Koroljuk, which give the exact
distributions of the quantities sup (F„(x) - G„0)) and sup | F,£x) - Gn(x) |
for finite values of n.
n
sup (F„(x)-G„(x))<z =
— 00 <X< -f 00
0 for z< 0,
' 2/7
n — e
l- for 0 < z <
In
. n .
1 otherwise.
VIII, § 10] EMPIRICAL DISTRIBUTION FUNCTIONS 497
1 otherwise.
The values of
1 ' In \
(-\)k
2n n — kc)
. n .
are tabulated in Table 7, for n < 30; for n > 30 Theorem 8 can already
be applied.
First we prove Theorems 9 and 10; Theorems 7 and 8 can then be derived
by passing to the limit. Collect the random variables . . . , rjx,. . . , rjn
into one sequence and arrange these 2n numbers in increasing order; let
denote the A>th number in this ordered sequence. One can suppose
that C* < C* < . . . < C*2„. Put
1 if £* is one of the
9k = — 1 otherwise.
Thus in the sequence 0X, 0,,. . ., d2n, n numbers are equal to 1 and n
numbers are equal to —1. Put Sk = 0X + 02 + • • • + 9k. We prove first
are valid.
498 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 10
and, similarly,
sup n I Fn(x) - Gn(x)| = max n \ F„(C* + 0) - Gn{Q + °) I = max \Sk\-
— co<x< + co l<tk<,2n l<Cik<Cl2n
find the number of the sequences 0ls. . . , 92n fulfilling this condition
2n
and then divide this number by . We arrived thus at a combinatorial
n
problem. Its solution will be facilitated by the following geometrical
representation: Assign to every sequence 9j_,... ,9„ a broken line in the
(x, y) plane starting from the point (0, 0) with the points (Sk, k) (k —
— \ ,2,..., 2n) as vertices. (Here (a, b) denotes the point with coordinates
x = a, y = b.) There corresponds thus to every sequence 9lt. . . , 92n a
“path” in the plane; all paths start from (0, 0) and end at (0, 2n); all are
composed of segments forming with the x-axis an angle either of +45°
or of —45°. We have to determine the number of those paths which do
not intersect the line x = z ^2n. Let this number be denoted by U+ (z).
If a path intersects the line x = z ■s/2n, it is clear that it reaches the line
x = {zj2n} = c, too.
Thus we have to count those paths which lie everywhere below the
line x = c. First we count the paths which intersect the line x — c.
If a path intersects the line x = c, we uniquely assign to it a path which
is identical with the original one up to the first intersection with the line
x = c and from this point on is the reflection of the original path with
VIII, § 10] EMPIRICAL DISTRIBUTION FUNCTIONS 499
respect to the line x = c. The new path ends at the point (2c, 2ri). By this
procedure, we assign to every path going from (0, 0) to (0, 2n) and inter¬
secting the line x = c in a one-to-one manner a path which goes from
(0, 0) to (2c, 2n) and is composed of segments which again form an angle
of +45° with the x-axis. The number of paths having one or more points
in common with the line x = c is thus equal to the total number of the
, . 2n
paths going from (0, 0) to (2c, 2ri). This number is ^ . Hence
2n
n + c
lim
«-»• + 00
2n)
n)
In this section we shall study limit theorems of another type than those
encountered so far. As we do not strive at the greatest possible generality
but rather wish to present the different types of limit distributions, we shall
restrict ourselves mainly to the simplest case, i.e. to the case of the one-
dimensional random walk (classical ruin problem). We shall find in the
study of this simple problem a lot of surprising laws which contribute to
a better understanding of the nature of chance. Theorems 1 and 2 are
concerned with the problem of random walk in zz-space.
Let the random variables g2,. . . , £n,... be independent and let
1
each of them assume the values +1 and -1 with probability .The
2
random variable
k=l
(1)
may be considered as a gambler’s gain in a game of coin tossing after n
tosses, provided that the stake is 1 unit of money. The value of which
is always an integer, can also be interpreted as the abscissa at the time
t — n of a point moving in a random manner on the real axis. This point
performs on the real axis a “random walk”, in the sense that it moves
during the time intervals (0, 1), (1, 2),.. . either one unit step to the right
or one unit step to the left, both with probability — . We shall deal with
lattice”. Imagine a point which moves “at random” over this lattice. We
understand by a “random walk” the following: If the moving point can
be found at a time t = n at a certain lattice point, then the probability
that at the time t = n + 1 it can be found at one of the adjacent points
of the lattice is equal to —— for all adjacent points which have r — 1 coor¬
dinates equal to those of the preceding point and one coordinate differing
by ±1. If the position of the point at the time t — n is given by the vector
C(„r), then the random vectors £(nr) (n = 0, 1,.. .) form a homogeneous
additive Markov chain, namely
C!,r) = tir) + t
k=i
where the random vector If# represents the displacement of the point
during the time interval (fc — 1 ,k); by assumption, the random vectors
& are independent and identically distributed. For r = 1 we obtain the
one-dimensional random walk problem discussed above; in this case we
write simply £„ and £k instead of and
We prove now first a famous theorem of G. Polya.1
(2 n)\
Pti = ‘In
(2 r) «! + ... + «, =n(nl\.,.nr\y
2 n] n\
(2 r)
in
n
z
nl+...+nr=n nf-.. .nr\
(2)
In particular
In}2
p(2) _ l n -
2n /|2« >
In
n\
p(3) _
2« — '2 n E k\ H(n — k — l)\
k+l<ji
1
(3a)
V nn
and
1
p(2) ~ (3b)
-r2n ~
7T«
«!
E = r\
n1+...+nr = n
On the other hand, it is easy to see that among the polynomial coefficients
the largest are those in which the numbers nl5 n2,. . ., nr differ at most
by +1 from each other (cf. Chapter III, § 18, Exercise 3). Hence
2n
pW _ n
r2n ~ E <
(2r) 2 n nl+... + nr = n
'2 n (3c)
n n\
< max = o r_
(4 r)n r ' n^. ... n<\
Unj—n n* >
1
(4)
X = + oo for r = 1 and r — 2,
n—1
In the latter case the Borel-Cantelli lemma permits to state that for
r j> 3 the moving point returns with probability 1 at most finitely many
times to its initial position.
For r = 1 and r = 2 we shall show that with probability 1 the
moving point will sooner or later (and therefore infinitely often) return
to its initial position. In order to prove this, consider the time interval
which passes until the first return of the moving point. Let Q(nr) denote
the probability that the point walking at random on the r-dimensional
lattice reaches its initial position for the time after n steps. Obviously,
n-1
p&>=ea+ i pnosi-a- (5)
k=i
Put
00
Gr(x) = X p2hk (6)
k=i
and
00
Hr(x) - X (7)
k=l
then from (5)
Gr(x) = Hr(x) + Gr(x)Hr(x), (8)
hence
Gr(x) (9a)
1 +G,(x)
and
Hr(x)
(9b)
W-l-H#)-
504 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 11
Clearly,
Gr(x)
ew= E Qtt = W) = lim r-^M , (10)
k=l jt-1-0 1 “r
where Q(r) denotes the probability that the moving point returns at least
+ 00
once to the origin. For r = 1 and r = 2 the series Z P$ is divergent, hence
k=l
g(r) = 1, while for r > 3
Z p®
k=1
Q(r) =
1 + I P&
k=l
hence 0 < 0(r) < 1. (E.g., for r = 3, g(3) a; 0.35.) Thus we have proved
Theorem 1; at the same time we have obtained
Theorem 2. For r > 3 a point performing random walk over the lattice
Gr has a probability less than 1 to return to its original position.
i 2k\
f 1\
UJ
4^ -**=
00
I
2
(- *)* = i;
/c = l \ k / x/l-
this and (9a) lead to
( 1 \
0$ = n>_2-P$. (12)
VIII, § 11] RANDOM WALK PROBLEMS 505
Let vx be the number of the steps in which the moving point first returns
to its initial position; hence vx is a random variable and P(v1 — 2k) = Q$.
It follows from the asymptotic behaviour of the sequence that the
expectation of vx is infinite. Let (p{t) be the characteristic function of vx:
<P(0 = 1 - V1 - elu,
hence
f/|" ,-
lim (p ^2 = exp (-7-2 it). (13)
n-* + co
But we have
+ 00
1
exp ixt-
1 2xi
exp(- <J- lit) = —= _3 dx, (14)
v/27I ./
v 0 X2
e 2x
—-3~ for x > 0,
fix) = y/2n x2
0 otherwise.
Theorem 3. If vl5 v2,.. •, v„ ,.. . denotes the moments when the moving
point performing a random walk on the line returns to its initial position,
i.e. when (,,k = 0, then for x > 0 the relation
is valid.
THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § II
506
Theorem 4.
|2<P(y) — 1 for y > 0,
lim P < y (18)
00
|0 otherwise;
9„
hence has in limit the same distribution as the absolute value of a random
I2k\ 2 n — 2k |
k , n—k )
P(n2n = 2k) = 2/1
(k = 0,1,...,«) (19)
22
holds.
Remark. Clearly, n2n cannot be an odd number; in fact, n2 is either 0
or 2, according as Ci = 1 or Ci = — 1. Similarly, n2n - 7r2„_2 is either Oor 2
since C2n-2 is an even number: if Can-2 ^ 0, we have necessarily Can-2 ^ 2,
hence n2n — 7r2„_2 = 2; if Can-2 < 0> we have necessarily Can-2 ^ —2,
hence n2n — n2n_2 = 0; if Can-2 = 0, then n2n - n2n_2 is equal to 0 or
to 2.
(2k\
1 = 1 for k = 0 .
VIII, § 11] RANDOM WALK PROBLEMS 507
1 " (2k 2 n — 2k
= 1 (20)
22n l k n — k
holds.
Remark. Relation (20) is a corollary of (19);
bilities P(n2n - 2k) for k = 0, 1, . . . , n, we c
n2„ is always even. But since we wish to use (20) for the proof of (19), we
have to prove (20) directly.
'2k)
= 1 XT. (21)
JT k=0
Let us take the square of both sides of (21); since on the left side we get
1 00
—-= £ xk, (20) is obtained by comparing the coefficients of xn on
1 ~x k=0
both sides.
Now we prove (19) by induction. Clearly, (19) is true for n — 1; in effect
1
P(n2 = 0) = P(u2 = 2) =
Suppose that (19) is valid for n < N and let denote the least index j
for which £/ = 0; vx is necessarily an even number. Furthermore
N
P(n2N = 2k) = Yp(nw = 2k, vx = 21) + P(n2N = 2k, vj > 2N).
i=i
But
P(n2N = 2k, vx = 21) =
According to (12)
'21-2
i-K
P(Vl = 21) = Q® =
22/"2
f (2/— 21 (21) \
1 N /— 1 J l
+ 22/-2 \P(n2(N-i) — 2k) +P(ji2(N-r>— 2(k — /))]. (22)
z 1=1 \ 22/ /
The probability P(n2n = 2k, vl > 2 N) is evidently zero for 0 < k < N.
If k = 0 or k = N,
[2 N\
N
P(n2N = 2N, > 2N) = P (n2N = 0,v1> 2N) = ±2N+1 (23)
Theorem 6.
nN
lim P < x = — arc sin ^Jx for 0 < x < 1. (24)
N-*- + oo N
2k
k 1
(25)
22" sfi~k '
1 [«•>’] 1 1
D I <<
(26)
P\X-~2n<y 71 fc=[/!A-]+l k fi k n
n n
VIII, § II] RANDOM WALK PROBLEMS 509
hence
dt
^ 712” ^
lim P
n-*-+ oo
x-^<y
TTo
cides with that of —— which proves Theorem 6.
2n
This theorem can be proved in a more elegant way, which, however,
requires more powerful tools. This rests upon the following generalization
of Lemma 1:
Lemma 2. We have
1 dn , 2 nn
Proof. We see that the left side of (28) is the coefficient of xn in the
power series expansion of
_1_ 1
^(1 - e“ x)(l -e-“x) Jl~2x cos t + x2
1 = y Pn (cos t) xn (29)
yj 1 — 2x COS t + X2 n=0
it cos 99]
lim e* Pn cos dtp =
n-*- + °o l 2 J
i f1 e'V(1+u)/2 1 \ eitxdx
* J
-1 0
D\n2n)^P'n{ 1) =
hence
'n(n + 1)
D(.K2n) = (30)
F\x) = - ‘
ti^Jx{\ - a:)
at this point. Consequently, this value is the least probable one for the
bourhood of a point x (0 < x < 1) is the greater the farther the number
x is away from One would expect rather the contrary: indeedit would
seem quite natural that the moving point would pass approximately half
of its time on the positive and the other half on the negative semiaxis.
However, Theorem 6 shows that this is not the case. Or, to put it in the
terms of coin tossing: One would consider as the most probable that both
players are leading during nearly of the whole time. But this is not so;
on the contrary, — is the least probable value for the fraction of time
In this case (24) is valid even if the variance does not exist.
We now determine the exact distribution of 9n (the number of the zeros
in the sequence (i, (2,..., C„) for even values of n. We prove first
2k 2n — k
P(d2n = k) = (/< = 0, 1,..., 2n) (32)
n
holds.
From this we derive (32) by the method used in the proof of Theorem 5.
We obtain from (32) by Stirling’s formula
2n
E(6„) (33)
n
£1 + + • • • + £« 1 +X
lim P < x\ = — arc sin (34)
n-* + co I 71
i i • £i T Go T . . . -f- £„
Consequently, the ratio -- does not tend to zero as
n
n -> +oo, though this would seem to be quite “plausible”. However the
following theorem due to Erdos and Hunt1 is valid:
Theorem 8.
y e* \
h k
lim —tt——
N
= 0 - 1. (35)
N— + oo
I — /
A:=1 k
^4inZn = k)P(-k<tm_n<0).
k=i
hence
\E(snem)\<C2 (36a)
here and in what follows C1; C2,. . . are positive constants. If m - n < n,
we use instead of (36a) the trivial inequality
|£(£„Em)|<l. (36b)
Thus we obtain
< C3 In N. (37)
If we put
N
E n
n=1
An — (38)
N
i
E
»=i n
we find
Q
E(AN) = 0 and E{Al) <
In TV
Hence, by applying Chebyshev’s inequality,
Cj
P(|^|>£)< (39)
e2 In N '
From this we obtain that the series ^ P(\A%k* \ > a) converges for every
k=i
\An\£A2*+-±.
Theorem 9.
2<P(x) - 1 for x > 0,
lim P ( max (fe < x
(40)
0 otherwise.
n- + °o 1 <,k<,n
514 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 11
Theorem 10.
0 otherwise.
Theorem 9 can be derived from the following formula (cf. Chapter III,
§18, Exercise 19):1
P( max (k < m) = 1 —
Jc_1_
(42)
1 ^k<Zn m + 2k 4k
It was shown by Erdos and Kac2 that Theorems 9 and 10 can be amply
generalized. They can be extended to the sums of independent, identically
distributed random variables.3
It is interesting to compare Theorems 9 and 10 with the results of the
preceding section. Those results can be put in the following form:
Theorem 11.
Theorem 12.
‘ For mother proof of Theorem 9 see Chapter VII, § 16, Exercise 13. e.
2 Cf. P. Erdos and M. Kac [1].
3 F°d -the ®^ension to random variables which are not identically distributed
see A. Renyi [9], ’
VIII § 1 2] PROOF OF LIMIT THEOREMS BY OPERATOR METHOD 515
The number ||/|| is called the norm of the function/ = f(pc). Clearly, if
/ £ C3 and g 6 C3 , then/ + g 6 C3 , further if/ £ C3 and a is a real number,
then af £ C3 . It is easy to see that if / 6 C3 and g 6 C3 , then 11/ + ^ 11 <
< ||/|| +||^|| and if/e C3 and a is a real number, then \\af\\ =
— Ial ||/|| • An operator A which assigns to every function / £ C3 an
element g = g{pc) = Af of C3 is called a linear operator if it possesses the
following properties:
A(B + C) — AB + AC.
-f CO
is a contraction operator.
+ 00
II-A/ll S ll/ll j' dF(y) = ||/||,
— 00
Lemma 2. Let F(x) and G(x) be any two distribution functions. The operators
AF and AG associated with them are commutative and AFAG = Ah, where
^ ~ H(x) is the convolution of the distribution functions F(x) and G{x), i.e.
+ 00
H(x) = j F(x - y)dG(y).
— 00
VIII, § 12] PROOF OF LIMIT THEOREMS BY OPERATOR METHOD 517
Proof. Clearly
lim —— = 0 (5)
w—oo
is fulfilled, then
1 c -—
lim Fn (x) = $(x) = \e 2 du- ^6)
«->co 2.71 d
518 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 12
standard deviation —^ . Then UnX Un2. . . Unn is nothing else than the
Now
+ °o +00
+ °° +00 + oo
we obtain
+ oo
+ 00 +00
and
-t- + GO
there follows
Because of the Holder inequality one has for every random variable
l i
3 (15)
<£(m3) 9
hence
— Knk, (16)
and thus
tDlk<Kl (17)
k=1
(7) and (14) lead to
M
\\AfJ- A,f || (18)
l-sj ’
and thus by (5)
lim || AFnf- A0f || = 0. (19)
Thus we proved that if / 6 C3 , then for any value of x (and even uniformly
in x)
+ 00 + 00
From this follows that (6) holds for every x. Indeed if e > 0 is arbitrary,
let /£(x) be a function belonging to C3 with the following properties:
/«(*) = 1 if X < 0,/£(x) = 0 if x > e and /£(x) is decreasing if x lies between
0 and e. Such a function can be given readily, e.g. the following function
has all the required properties:
1 for x < 0,
4
fe O) = for 0 < x < e, (21)
0 for e < x.
Then
+ 00
Hence
lim sup Fn (x) < 4>(x + e), (24)
n-*-oo
and
lim inf Fn (x + e) > <P(x). (25)
n-*- oo
4>(x - e) < lim inf Fn (x) < lim sup Fn (x) < <P(x + e). (27)
n-+- go n-*- oo
Since (27) is valid for every positive e, it follows that (6) is fulfilled for
every x. Theorem 1 is herewith proved.
Now we pass to the proof of the Lindeberg theorem by the operator
method. We prove the theorem in its most general form, i.e. we present
the proof of Theorem 4 of § 1.
As we have seen in the proof of Theorem 1, it then follows that for every
real x
lim Fn {x) — (P(x).
00
Let U„k denote the operator associated with the distribution function
Fnk(x) of the random variable and Vnk the operator associated to the
normal distribution with expectation 0 and standard deviation Dnk. Then
according to our assumptions
and
2 3
f(x + y) =/(x) + yf (x) + i-/' (x) + r (x + »2 y\ (32)
Use in the first integral on the right hand side of (33) the equality (32)
and in the second integral (31). We obtain
bl>«
522 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIH, § 12
Put
sup | /"(x) | = Mx and sup \f"' (x) | = M2,
then
1
Unkf-Ax)- — Dlkf\x) <-^ sM2Dl„ + M1 I y‘dFnk(x\ (35)
I
|J/|>£
On the other hand
+ CO
, (36)
^/=/W + 4'Z>-/'W + 4‘ J //"'(*+ 02 3-) D nk
hence
1-00
1 1 M2D3nk
Vnkf-f{x)-—Dlkf"{x) < — M2DAnk | \y\3d<P(y)< • (37)
i i M n
E II Unkf- Vnkf\\ < -- eM, + M, £ W+-ij; D^. (38)
ft=i 6 t=i J 3 k=1
b|>s
P(Znk=V=Pnk- (41)
Put
K = Z Pnk (42)
k=1
and suppose that
lim A„ = A (43)
/I-*-CO
and
lim max pnk = 0. (44)
n-*- oo 1 <,k<,n
Then the distribution of
Cn — £nl + £«2 + • • • + (45)
Proof. Let K denote the set of all real-valued bounded functions f(x)
(x = 0,1,2,.. .) defined on the nonnegative integers. Put ||/|| = sup|/(x)| .
Let there be associated with every probability distribution SA —
= {Po , Pi, ■ ■ • , Pn , • • •} an operator defined by
for every / £ K. Clearly, Arp maps the set K into itself, A& is a linear con¬
traction operator, further if SA and Q are any two distributions defined on
the nonnegative integers, then AppA^ = A^ where = fA • Q, i.e. J?
is the convolution of the distributions Aft and Q; that is, if SA = {pn} and
Q = {<In}, then = (r„), where
n
= ^ PkQn-k'
fc = 0
Let Unk denote the operator associated with the distribution S/'\k of the
random variable £nk and Vnk the operator associated with the Poisson
distribution with parameter pnk. Then Unl Un2. . • Unn is nothing else than
the operator Appn associated with the distribution £An of the random variable
while Vnl Vn2. .. V„n is the operator QAn associated with the Poisson
distribution with parameter A„ (taking into account that if QA is the
Poisson distribution with parameter A, then QAQU = Ox+n)-
524 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 12
holds. In fact, if (47) holds for every / £ K, choose for / the function for
which /(0) = 1 and /(x) = 0 for x > 1; then it follows from (47) that
for every r (and even uniformly in r)
2r„ e~Xn'
lim P(Zn = r) - = 0, (48)
ft-*-oo r\
00 Vr v 6
+/(* + I) te* - p„t c“'“) - E Ax + r) •£=*- (51)
,.2 rl
and thus
+ [l-e-»(l+/>*)]). (52)
Since
1 — x ^ e~x < 1 — x + x2 for x<l, (53)
there follows
(60)
j x2 dF(x)
then by assumption
lim 8(y) - 0. (61)
y-*+00
Put further
<500 y
(62)
A(y) =
(l-FO))2
(1 - F(y)) § x2 dF(x)
By our assumption A(y) is continuous for y > y0 . Let C„ denote the least
positive number for which
A(Cn) = n\ (64)
then Cn -»• 00, furthermore
yCn
Put 2 Cl
x2dF(x) = (66)
— Cn
then
c, (67)
S. V2
Now let £/„fc be the operator associated with the distribution of the random
variable — and Vnk the operator associated with the normal distribution
Now
Cn
On the other hand, if in the integral on the right hand side of (69) f(x + y)
is expanded into a Taylor series up to the third term and if it is taken into
account that by our assumption the distribution with the distribution
function F(y) is symmetric with respect to the point 0, then we have
where
+ Cn
— Cn
\Rn\<JL J |y|3dF(y)<
BC„ = B
3 nS„ 3n
*/W7)
J2
(73)
-Cn
hence because of <5(C„) -> 0 the validity of (68) follows. Herewith Theorem 3
is proved.
Finally we make some remarks concerning the relation between operator
and characteristic function methods.
The convergence of a sequence Fn of distribution functions to a distri¬
bution function F is proved by the operator method by showing that for
every / ( C3 one has AF„f-+ AFf. This implies that the characteristic
function yn of the distribution function Fn tends to the characteristic
function y of the distribution function F; in fact, if f(x) = e,xt, then
/<EC3 and Ahnf = 7'x J eity dFn(y), hence Atnf= ei,x <pn(t) and,
— 00
every x which is a continuity point of F(x)). On the other hand, the method
by which we proved (76) in each of the above discussed cases, can be applied
for distributions of sums of independent random variables only, while the
method of characteristic functions can be applied in other cases too (cf. e.g.
§ 5 or Exercise 26 of § 13).
§ 13. Exercises
1. Prove Theorem 2' of Chapter VI, § 5 by means of the central limit theorem
(Chapter VIII, § 1, Theorem 1).
Hint. If F(x) is a distribution function with expectation 0 and variance 1 such that
* F = F
J2;
VA? +
then F(x) is equal to the n-fold convolution of F(x yjn). This converges to the normal
distribution as n-> +oo.
2. Let £t, ^2> • • • > £n> • • • be independent random variables and suppose
Under what conditions on the positive numbers a„ does Liapunov’s condition of the
central limit theorem hold for the random variables £„?
Hint. Put
It follows that
<^<
s. s„ - Sn
£K„) = y • Show that the distribution function of ^J~n - 1 tends to the normal
distribution function with expectation 0 and standard deviation 1.
c) Let 4 be a random variable having a beta distribution of order {np, nq). Show
VIII. § 13] EXERCISES 529
4. Let e„(x) denote the n-th digit in the decimal expansion of x (0 < x < 1); the
n
values of en(x) are thus the numbers 0, 1, . . . , 9. Put Sn(x) = £ ek(x). If En(y) is the
k=l
25’„(.r) — 9 n
set of the numbers x for which < j,and if \E„(y)\ denotes the Lebesgue
33«
V
measure of En(y), show that
Hint. We choose a point r\ at random in (0, 1); i.e. rj is a random variable uniformly
distributed in the interval (0, 1). The random variables £„ = £„(??) are then independent
and identically distributed; the central limit theorem can be applied. We have:
E(Q = 1, D(Q = •
5. Let qu q2,. .., q„,... be a sequence of integers > 2. It is easy to show that
every number x (0 < x < 1) (a denumerable set of numbers expected) can be repre¬
sented in one and only one way in the following form:
v £n(x)
* = Y --,
»=i di q-i ■ ■ ■ qn
— £ £k(x). Now if E„(y) denotes the set of numbers x (0 < x < 1) such that
*=i
Sn(x) - (dk - 1)
n
<y
X (ql - i)
k=l
is fulfilled.
530 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 13
Hint. Choose at random a point t] € (0, 1) and put C„ — £n(rj). It is easy to see
that the random variables £„ are independent. Furthermore
6. Let .... £„ ,... be independent random variables, with the same normal
distribution. Put
I (f* - «„) /-
*=1
--- , and rn = — J n .
n *=1 n — 1
Show that
x
l c -u%
lim P(t„ < x) = —— e 2 du.
n —► -j* co Jln J
Hint. The distribution of t„ is Student’s distribution with n — 1 degrees of freedom
(cf. Ch. IV, § 10). Its density function is
n
1 2
S„-i(x) =
y/(n - 1) n
8. (Ehrenfest’s model of heat conduction.) Consider two urns and N balls labelled
from 1 to N. The balls are arbitrarily distributed between the two urns; assume that
the first contains M and the second N — M balls. We put into a box N cards labelled
also from 1 to N. We draw a card from the box and put the ball bearing the same
number from the urn in which it is contained into the other urn. After this the card
i s replaced into the box and the operation is repeated. Let Cn denote the number
of the balls in the first urn after the «-th step (i.e. after drawing n cards) (n =
= 1,2,...; Co = M). The states of the system consisting of the two urns form a
Markov chain. The transition probabilities are
PkJ = 0 for | k — /1 7^ 1.
Show that
9. Let Gabon’s desk be modified in the following manner (cf. Fig. 26): From the
N-th row on the number of pegs is alternatingly equal to the number in the (N — l)-th
row and in the N-th row. On the whole desk there are N + n rows of pegs. Determine
the distribution of the balls in the containers when the number n of balls is large.
10. The random variables f0, ft.f« • • • form a homogeneous Markov chain;
all Cn take on values in (0, 1); let the conditional distribution of £„+, under the
condition = y be absolutely continuous for every value of y (0 < y < 1); let
p(x, j) be the corresponding conditional density function. We assume that for
0 < x < 1 and 0 < y < 1 the function p(x, ^) is always positive and that for every
x (0<x < 1) 5 p(x,y) dy = 1 holds, further that p(x, y) is continuous. Let p„(x,y)
11. Let a moving point perform a random walk on a plane regular triangular
lattice. If the moving point is at the moment t = n at an arbitrary lattice-point, it
may pass at the moment t = n + 1 with the same probability to any of the 6 neigh¬
bouring lattice points. Show that the moving point will return with probability 1 to
its initial position, but that the expectation of the time passing until this return is
infinite.
The following Exercises 12 through 18 all deal with homogeneous Markov chains
with a finite number of states, fulfilling the conditions of Theorem 1, § 8. The notations
are the same. The states are denoted by A0, Au ..., AN. The random variable £„ is
equal to k if the system is in the state Ak at the time n (k = 0, 1,..., N). We put
Ptto = J) = P0(j), Pit = P(An + m =k\Zm = j), p% = plk and P„(k) = P(fn = k). We
assume that min plk = d > 0. According to Theorem 1 of § 8 the limits lim p)k = Pk
n —*■ CO
N
exist and are independent of j. Furthermore £ Pk = 1.
k=0
12. Let
i.e. the system passes approximately a fraction Pk of the whole time in the state Ak.
Hint. We have
£ N
lim st — = Y kPk.
n-*-+co n *=1
14. Assume that at t = 0 the system is in the state Ak . It returns to it for the first
time after a certain number of steps. Let this random number be denoted by .
Show that P(ym > «) < (1 — d)n.
Hint. We have P(v°° > 1) = 1 — pkk\ hence the inequality is true for n = 1.
Suppose for a proof by induction that the inequality is true for n . Then
P(ym > n + 1) = Y,
h^lc
Z
i±k
P(yik) > n>£n= J, £«+1 = h),
hence
P(vm > n + 1) = £
i 9tfc
P(y™ >n,'Qn= j) £
h+k
pm <
b) E(v= Z- •
k
Hint, a) follows from Exercise 14. Let further Vk(z) denote the generating function
of v(k); we have
1
Vk(z) = 1
Uk{zV
where
CO
Uk(z) = +Z
The relations
lead to
16. Let the numbers p,® (r = 0, 1,. . .) denote the values of n for which rf® = 1
(pm < pW < . . .); nT is defined here as in Exercise 12. Show that the standardized
distribution of pt** tends to the normal distribution as r -> + 00 •
17. Show that the distribution of the random variables C(„k) introduced in Exercise
12 tends, after standardization, to the normal distribution as n —► +oo. (Generaliza¬
tion of Theorem 2 in § 8.)
Hint. It is easy to see that P(fjjw < r) = P(/Ak) > n) ; for if the system passes less
than r times through the state Ak during the first n steps, then it will return to it for
the r-th time after the moment t = n, and conversely. Thus we are back to Exercise
16 and find
18. Put
Show that the limits lim P(lk"> = P/* exist and form a stochastic matrix.
Hint. We have
Hence
and
Z Pkpk, = P, U= o, 1,...,A0,
k=0
hence we find that £ Pfk = 1 ; the transition probabilities Pt define thus again a
k=0
Markov chain.
0
EXERCISES 535
VIII, § 13]
Hint. We have
n—r
and, consequently, Ax
co -Ax
{Xx)r 1
lim P(n
n-+ +oo
< x) = £
—
f-k
m L r l (* - 1 )'•
tk~l e 1 dt.
20. Suppose F(x) = x (0 < x < l). Show that »{** and n (1 - are
independent in the limit as n -» + 00 and have gamma distributions of order k and j,
respectively: s.
lim P(jt <C x, n(l ^n,n+i—D < y)
n-r+m
density function of
In In n + In 4jt
Z*k -m + a <2\nn-a
2x/2 In n
yj 2 In n
Remark. We can derive from this result the theorem of Smirnov (§ 10, Theorem 1),
24. (Wilcoxon’s test for the comparison of two samples.) Let and
Vu • ■ • > Vn be independent, identically distributed random variables with the common
continuous distribution function F(x). Let the numbers £k and rj, be united into a
single sequence, let them be arranged in increasing order and investigate the “places”
occupied by £1, • . ., €m. Let vu v2, . .., vm denote the ranks of the elements
<=x» • • •, Cm in this sequence. Put
m{m + 1)
— V1 + V2 + • • • + Vm
2
a) Show that W is equal to the number of pairs , ?],) such that £/> rj,.
Show that
Cn + m (Z)
(Z) =
C„(z)Cm(z)
where we have put
Q(z) = n
1=1
(1 ~ zp
7(1-z) •
d) Show that
I n m(n + m + 1)
D(W) =
V n
number of those triplets (/,/, k) for which rj, > £k, rj, > £k and i < j. We put
2 2
Show that if F(x) = G{x), then E(L) = — and if F(x) # G(x), then E(L) > — .
26. Let there be performed N independent experiments. Let the possible outcomes
of every experiment be the events Au . . ., Ar. Let pk = P{Ak) (k = 1, 2, ;
r
let vk denote the number of occurrences of the event Ak, where ^ vk = N. If
k=i
- Npky
Npk
r—3 _ t
exp
L (i <t-cz
k=1 <•>.)’)
(Cf. Ch. VI, § 6, (23)).
27. If C1( C2, • ■ ■ , C«, • • • are random variables such that the &-th order moment
of C„ tends as n —> + oo to the k-th order moment of the standard normal distribution,
i.e. if
-f co
_ t‘ 1.3 . . .{k — 1) for k even,
lim E(Ckn) = —= tk e 2 dt
»-*■+<» yj 2n 0 for k odd.
~ Y E(^}
i=o J•
lim E{eu^) = e 2 ,
N-+ + co
28. Let . .., be random variables which assume only the values 0 and 1
and let rjnk be the sum of all products of k distinct elements of the sequence
> ^nn •
rlnk = Z ^nii ^ni2 • • • ^nik-
xk
Show that if E(ri„k) tends to — (k = 1, 2, .. .) as n-> + oo, then the distribution
of the sum
Z Zm = Vnl
1=1
29. Let flf £2be independent random variables assuming the values
P(nin — f2„ — 0) =
(2;)
^ 1J22,1 ^°r k — 0, 1, . . /I
and
2n 21 1
P(7l2n — 2k, C2n 2/) = 4r
- Z
I<,l<,n- n l (n-l+ 1)/
for k = 0, 1, . . . , n and j — 1, %
30. By using the results of Exercise 29 show the following: If y„ is any sequence
yn
of integers such that y„ and n are of the same parity and lim —— = y (y is here
n-f + a> yj n
VIII, § 13] EXERCISES 539
7tn
lim
M -*■ +00
P\-H-<X £n yn
4 f(t! y)dt.
with
+ 00
2e
r e
_ 1/2
2 du
At | y) =
V 2n 1 - J'2 4
y
IT
7,
Remark. For y = 0 the conditional limit distribution of — with respect to the
n
condition Cnln —■► 0 is thus uniform on (0, 1). If we notice that n is, in the limit,
normally distributed, it follows that
In \
lim P — < x | £„ > 0
I ~r ^ 1 1 — t= i
V 1 - / + V t + /(1 - t)
I
CHAPTER IX
APPENDIX
INTRODUCTION TO INFORMATION THEORY
§ 1. Hartley’s formula
It follows
Yl
lim —— = log2 N.
fc-~00 k
Thus for every e > 0 we can find a number k such that if we take the
elements of E by ordered groups of k, then the identification of one element
requires on the average less than log2 N + e binary digits.
The formula
C. I(E2) = 1.
Postulate C is the definition of the unit; it is not more and not less
arbitrary than the choice of the unit of some physical quantity. The meaning
of Postulate B is evident: the larger a set, the more information is gained
by . the characterization of its elements. Postulate A may be justified as
follows.
A set Enm of NM element may be decomposed into N subsets each of
M elements; let these be denoted by Efy,. . . , E$\ In order to characterize
an element of ENM we can proceed in two steps. First we specify that subset
to which the element in question belongs. Let this subset be denoted by
£$. We need for this specification an information I(EN), since there are
N subsets. Next we identify the element in Ef}. The amount of information
needed for this purpose is equal to I(EM) since the subset contains M
elements. Now these two informations completely characterize an element
of Enm ; Postulate A expresses thus that the information is an additive
quantity.
IX, § 1] HARTLEY’S FORMULA 543
Proof. Let P be an integer larger than 2. Define for every integer r the
integer s(r) by
s(r) s(r) + 1
< log2 P < (3)
r r
Hence
s(r)
hm —- = log2 P. (4)
r-+ao r
thus
From (4) and (9) we conclude that f(P) = log2P for P > 2. Since /(2) = 1,
/(1) = 0, the theorem is herewith proved.
Postulate B can be replaced by the following one:
I(Enm) — I{En) +
Theorem 2.1(EN) = log2 N is the only function which satisfies the postulates
A*, B*, and C.
Proof. Let P > 1 be any power of a prime number and f(n) = I(En)
a function satisfying A*, B*, C. Put
RE) log2 n
g(n) =f(n) - (10)
log2 P
If we put
lim e„ - 0. (12)
/z—► CO
g(P) = o. (13)
i_i
i
a
R Q*,
for
■P)
)
(14)
i-1
i_i
n
^ 3
- 1 for
[ P ■p)
where (a, b) denotes the greatest common divisor of the integers a and b.
Cf. P. Erdos [2] and the article of D. K. Fadeev, The notion of entropy in a
finite probabilistic pattern (Arbeiten zur Informationstheorie, Vol. I). Fadeev found
this theorem independently from Erdos. The proof given here (cf. A. Renyi [29], [30],
[37]) is considerably simpler than that of the above two authors.
IX, § 11 HARTLEY’S FORMULA 545
Clearly
(15)
and
n = Pn' + /,
where (n',P) =1 and 0 < / < 2P. According to (13), g(Pri) = g{n'),
hence we can write
log2 n
hence we obtain rfk) — 0 after at most + 1 steps, hence for every n
log iP
bn
9(n) = £ ehp (17)
i=i
, log,n
bn < IP + 1 •
log2P
Thus, according to (12),
Let c denote the limit of the left hand side of (19). We conclude that for
every P > 1 which is a power of a prime number
§ 2. Shannon’s formula
E — E1 + E2 + ...+E„.
n
loga A — /i + I2 — I\ + E
k=l
Pk log 2Npk (2)
IX, § 2] SHANNON’S FORMULA 547
h= £, Pk log2 — • (3)
k=1 Pit
Formula (3) was first established by Shannon and in what follows we
shall call it Shannon’s formula. Simultaneously with and independently of
Shannon the same formula was also found by N. Wiener.
1 1
III. / - 1.
,22,
Furthermore, we require:
IV. The following relation holds:
I{Pl,Pl, • • ;Pn) =
Pi Pi
= I(Pl + Pi, P'1, • • •, Pn) + (Pi + Pi) I (4)
Pl+Pl Pl+Pl
/(l) = 0, (6)
i.e. that the occurrence of the sure event does not give us any information.
In fact, if n = 2, px = 1, p2 = 0, it follows from IV that
b) If we put
Ih
' Pm +1’ • • Pm + n) T Sm I (4')
$m
I(Pl+P2,P3, ■ • • 3 Pm + n) =
P1 + P2.
— Pm + 1’ • • •> Pm + n) T $m ^ (7a)
■ ■ • 5 Pm + n) = I(Pl+P*,P3, ■ • ;Pm+n) +
Pi P2
+ {Pl + P2) I (7b)
,7*1 + P2 P1+P2
and
P1+P2 Pi P2
smI + (Pi + P2) I
Jm Pi T P2 P1+P2)
Pi P,
= smI (7c)
tn m
(4"0
y=i y'=l
In fact, if in (4") all nij are equal to m and all pJt are equal to ——, the
mn
left hand side is equal to f(nm) and the right hand side to f(n) + /(m),
hence we get (9).
e) If we apply (4') to the case when all probabilities are equal and if we
unite them all except the first one, we obtain
[1 1 1) 1
/(«) = / 5 1 ~ - + 1 -— /(» - !)• (10)
n n ) n
IX, § 2] SHANNON’S FORMULA 551
lim 5„ — 0. (12)
d2 + d3 + ... + d„i
dn = d„ + (13)
n
^
N
J
E
n=2
r dk= N
(15)
N + 1A k=2
/r = 9.
E
«=2
n
Because of (12) the right hand side of (15) tends to zero for jV -> oo. Hence
we have
2 N
lim-— V dk = 0. (16)
n +oo y +1 k=2
552 INTRODUCTION TO INFORMATION THEORY [IX, § 2
lim dN = 0, (17)
N->- + oo
we find
a a a , a
logo b — I —— , 1 - + — log2 a + log2 (b-a). (19)
T b
hence (5) is proved for n — 2. We show now by induction that (5) holds in
the general case too. Suppose that (5) is valid for a certain integer n\ let
SP = (p1?. . .,pn+1) be any distribution having n + 1 terms. We conclude
from (4) and (20) that
n-1 i
7Ol, • • ;Pn +1) = £ Pk log-2 - +
k=\ Pk
1 f ! Pn Pn+1 |]
+ (.Pn+Pn + l) log..
- [Pn+Pn + 1 [Pn+Pt + 1 ’Pn+Pn +1).
lim + (1 - a) ] = j
ft —► oo l n J
(0 < a < 1), then we have also lim = s. We need only the particular case
n-+oo
n+1 J
However, (4) does not follow from (22). This is most easily demonstrated
by the quantity
which fulfills Postulates I-III and Formula (22), without fulfilling (4). (If it
” 1
fulfilled (4), it would be equal, by the just-proved theorem, to Z Pk^°^— >
k=l Pk
which is not the case.) We shall see in § 6 that the quantity (23) too can be
considered as a measure of the information associated with the distribution
SA = (px,. . .,pn). In fact, we shall define a class of information measures
depending on a parameter a which contains both Shannon’s information
(for a — 1) and the quantity (23) (for a = 2).
We add some further remarks.
mation or about uncertainty means essentially the same thing: in the first
case we consider an experiment which has been performed, in the second
case an experiment not yet performed. The two terminologies will be used
alternatively in order to obtain the simplest possible formulation of our re¬
sults.
2. The quantity (5) is frequently called the entropy of the distribution
£6 = (plf..pn). Indeed, there is a strong connection between the notion of
entropy in thermodynamics and the notion of information (or uncertainty).
L. Boltzmann was the first to emphasize the probabilistic meaning of the
thermodynamical entropy and thus he may be considered as a pioneer of
information theory. It would even be proper to call Formula (5) the Boltz-
mann-Shannon formula. Boltzmann proved that the entropy of a physical
system can be considered as a measure of the disorder in the system. In case
of a physical system having many degrees of freedom (e.g. a perfect gas)
the number measuring the disorder of the system measures also the uncer¬
tainty concerning the states of the individual particles.
3. In order to avoid possible misunderstandings it should be emphasized
that when we speak about information, what we have in mind is not the
subjective “information” possessed by a particular observer. The terminol¬
ogy is really somewhat misleading as it seems to support that the informa¬
tion depends somehow on the observer. In reality the information contained
in an observation is a quantity independent of the fact whether it does or
does not reach the perception of an observer (be it a man or some registering
device or a computer). The notion of uncertainty should also be interpreted
in an objective sense; what we have in mind is not the subjective “uncertain¬
ty” existing in the mind of the observer concerning the outcomes of an experi¬
ment; it is an uncertainty due to the fact that really several possibilities are
to be taken into account. The measure of uncertainty does not depend on
anything else than these possible events and in this sense it is entirely objec¬
tive. The above mentioned relation between information and thermodynam¬
ical entropy is noteworthy in this respect too.
x'i, x2,. . x'n. The observation of the random variable assuming the values
x{, x'2,. . ., x'n with probabilities pl} p2,. . .,pn contains the same amount
of information as the observation of Consequently, if h(x) is a function
such that h(x) / h(x') for x ^ x', we have I(h(£)) = /(£). However, with¬
out the condition h(x) A h(x) for x A x' we can state only that I{h{£)) <
< /(£). This follows from the evident inequality
and if wl5 vv2,.. \vn are positive numbers with £ wk = 1, then we have
k=1
n n
g( E wk xk) g{xk). (2)
k=1 k=l
I(pl,p2,...,p„)^log2n. (3)
It suffices for this to apply (2) to the convex function y = xlog2 x (x > 0);
Then
k=1
Z ?* = =1
Z j
pj Z
k=1
wjk = Z
j=1
pj=1’
If in (5) we sum over k, we obtain (4). Inequality (4) expresses that the un¬
certainty for a distribution is larger if the terms of the distribution are closer
to each other.
We introduce now the notion of conditional information. Let £ and q be
two random variables having finite discrete distributions. Let jcl5 x2,. . ., xm
be the distinct values taken on by £ with positive probabilities, yx, y2,. . ., yn
those by q. We write:
j = 1,2,..rri _
P(£ = Xj, q=yk) = rjk u_i ^ ; *2 = (/-in • • •> rmny (7)
X
k= 1
rik = Pi (j = 1, 2, ..., 777) (10a)
and
m
X rjk = dk (k = 1,2,..., n). (10b)
7=1
On the other hand, if /((£, 77)) denotes the information associated with the
two-dimensional distribution of ^ and 77:
m n I
/(«. 1)) = - I I log, —, (12)
7=1 fc=l rjk
then we have
It follows from the definition that /(£ | rj) = /(£) when £ and 77 are inde¬
pendent, hence (13) reduces in this case to the relation obtained in the pre¬
ceding section:
/(«, i,)) = 1(0 + /(,). (14)
/«{, 0) s ko + m (15)
558 INTRODUCTION TO INFORMATION THEORY [IX, § 3
holds, where the sign of equality occurs only it £ and rj are independent.
According to (13), relation (15) is equivalent to
/(£ I n) £ m (16)
which means that the “conditional” uncertainty of £ for a known value of
rj cannot exceed the “unconditional” uncertainty of £. By taking (11) into
account we can write
m n
K£\n) = - Z Z
j=1k=l
QkPj\k i^g2 Pj\k' (17)
Pj l°g2 Pj ^ Z
k= 1
9kPl\k ,Qg2 Pj\k‘ (18)
From (17) and (18) follows immediately (16), and hence (15) too. The sign
of equality in (18) can only hold if all pJlk (k = 1, 2,...,«) are equal, i.e.
when £ and r\ are independent. We conclude from (13) that
m - nt i n) = m - «>i i o- (20)
The left hand side of (19) may be interpreted as the decrease of uncertainty
due to the knowledge of 17, or as the information about £ which can be
gained from the value of 17. We call this the relative information given
by rj about £ and denote it by /(£, rj); we have thus
(We must not confuse /(£, 17) with the information /((<!;, 17)) associated with
the two-dimensional distribution of £ and 17.) According to (20)
hence the value of 17 gives the same amount of information about ^ as the
value of £ gives about 17.
/(£, r\) can also be defined by the symmetric expression
m n
Ki. n) = 1(0 + Hi) - /(«, „))=££ r,t log, -2-. (22)
j=lk=l Pj<lk
IX, § 3] CONDITIONAL AND RELATIVE INFORMATION 559
KL 0 ^ 0, (23)
where the equality sign holds only if £ and i/ are independent. Hence if ^
and rj are not independent, the value of rj gives always information about
On the other hand, from (21a) and (21b) follows
Here too, it is easy to find the cases in which the equality sign holds. In fact,
if I(£, 0 = 1(0, then /(£ | rj) = 0, which can occur only if the value of ^ is
uniquely determined by the value of rj, i.e. if f = f(rj). Similarly, 1(0 0 =
= I(rj) can occur only if ij = g(0- The quantity 1(0 0 can be considered
as a measure of the stochastic dependence between the random variables
^ and rj.
The relation 1(0 0 = I(l, 0, expressing that rj contains (on the average)
just as much information about £ as £ about rj, seems to be at the first glance
surprising, but a deeper consideration shows it to be quite natural.
The following example is enlightening. Let rj be a random variable sym¬
metrically distributed with respect to the origin, with P(rj = 0) = 0, and
put £= if. There corresponds to every value of rj one and only one value
of 0 while conversely £, determines rj only up to its sign. In spite of this,
£ gives just as much information on rj as rj gives on £ (viz. 1(0) ); the differ¬
ence is that this information suffices for the complete characterization of
£ but does not determine rj completely (only the value of \rj\). In fact, 1(0 =
= 1(0 + 1 (if we know already the absolute value of rj, rj can still take on
the values ±\tj\ with probability —, hence one unit of uncertainty must
be added).
We prove now the inequality
which is equivalent to
If instead of rj we observe a function f(0 of rj, then we obtain from the value
off(0 at most as much information on £ as from the value of g; the uncer¬
tainty of £ given the value of f(if) is thus not less than its uncertainty given
the value of rj.
560 INTRODUCTION TO INFORMATION THEORY [IX, § 4
The case when several values off(yk) are equal to each other can be dealt
with similarly. Thus we proved (26), hence (25) too.
The same example which served to derive Shannon’s formula can be used
to get a heuristic idea of the notion of gain of information. Let E be a set
containing N elements and let Ex,. . ,,En be a partition of this set. If Nk
n
is the number of elements of Ek, we have N — ]>] Nk and we put pk =
k=1
m = m + 7(c 1 q) (i)
where, clearly
i(C I n)=Y
k=1
Pk iog2 Nk.
IX, § 4] THE GAIN OF INFORMATION 561
Now let E' be a nonempty subset of E and let E'k(k = 1,2,...,«) denote
the intersection of Ek and E'. Let Nk be the number of elements of Ek,
n
N' the number of elements of E' and put qk . Then we have X N'k —
n k=1
— N', hence X 9k — 1- Suppose that we know about an element chosen
k=l
at random that it belongs to E'; what amount of information will be fur¬
nished hereby about rjl The original (a priori) distribution of q was
^(Pi> • • • 5 Pn) after the information telling us that the chosen element
belongs to E', rj has the (a posteriori) distribution Q = (qu q2,.. ., qn). At the
first sight one could think that the information gained is /(^) - 7(0. This,
however, cannot be true, since 7(«i^) - 7(Q) may be negative, while the gain
of information must always be positive. The quantity /(^) - I(Q) is the
decrease of uncertainty of //; we are, however, looking for the gain of infor¬
mation with respect to y\ resulting from the knowledge that e{ belongs to E'.
Let the quantity looked for be denoted by 7(Q || &)1’, it can be determined
by the following reasoning: The statement e4 £E' contains the information
l°g2 This information consists of two parts; first the information given
by the proposition e^ £ E' about the value of rj, next the information given
by this proposition about the value of £ if q is already known. The second
part is easy to calculate; in fact if q — k, the information obtained is equal
, Nk
to log2 —p- and since this information presents itself with probability qk,
£ 9k log2 -T77
&=i Nk
Hence
1 We use a double bar||in /(Qll-S0) in order to avoid confusion with the con¬
ditional information /(£ | rj).
562 INTRODUCTION TO INFORMATION THEORY [IX, § 4
/(Q||^>0. (4)
The equality sign occurs in (4) only if the distributions S3 and Q are identical.
/(Q || S3) is defined only if every pk is positive and if there exists a one-to-one
correspondence between the individual terms of the two distributions. The
quantity I(Q || 3s), defined by (3), will be called the gain of information
resulting from the replacement of the (a priori) distribution S3 by the (a pos¬
teriori) distribution Q.
The gain of information is one of the most important notions in informa¬
tion theory; it may even be considered as the fundamental one, from which
all others can be derived. In § 6 we shall build up information theory in this
fashion; the gain of information, as a basic concept, will be defined by pos¬
tulates.
The relative information introduced in the preceding section can be ex¬
pressed as follows by means of the gain of information. Let £ and rj be ran¬
dom variables assuming the distinct values xl5 x2,. . ., xm and yu y2; . .., y„
with positive probabilities pj = P{i; = X/) and qk = P(q — yk) respectively;
put {Pl> • • Q. ' ((7ij (?2> • • •! ^n)?
j=i Pj
hence, because of qkPj\k — rik
n m n
From this (5) can be derived by Formula (22) of § 3. Formula (5) means
that the amount of information on £ which is contained in the value of i/
is equal to the expectation of the gain of information obtained by replacing
the distribution tS/3 of £ by the conditional distribution Sfik.
If & = (Pl,...,Pn) is any distribution having n terms and if %n —
IX, § 4] THE GAIN OF INFORMATION 563
’ 1 1
we have
n ’ n
n
But only the sums on the two sides of (8) are equal; the single terms have not
necessarily the same value.
The following symmetric expression is also often considered in informa¬
tion theory:
Let us remark that while certain terms of the sum (3) defining I(Q || d?3)
may be negative and we know only that the sum itself is nonnegative, on
the contrary, on the right hand side of (10) all terms are nonnegative.
The relative information can be expressed by means of the gain of infor¬
mation in still another way. If <32 is the distribution {rJk}, Sft * Q the dis¬
tribution {pjqk}, then it follows from Formula (22) of § 3 that
t- Vk
lim st — = pk9
n-*.m ^
hence
lim P
n
log2 --I{£P)
7t„
< S = 1 (2)
n-*- oo
This is, in view of (2), always possible. This means that the sequences of
outcomes obtained by repeating n times the experiment ^ can be parti¬
tioned into two classes: the first consists of the sequences for which
the second of the remaining ones. According to (3) the probability that a
sequence belongs to the second class is less than 5. Let C„ denote the number
of sequences of the first class, let q±, q2,. . ., qCn be their probabilities. By
(4) we have
-n[l{Cp) +
)< {j — 2,..., C„) (5)
or, by adding these inequalities,
-»{l(Cfi)+ *)
c„
C„ • 2 < I cIj- (6)
/=!
The sum on the right hand side cannot exceed 1, since it represents precisely
the probability that a sequence belongs to the first class. Therefore we have
n {!(.&)+1)
C <2 (7)
566 INTRODUCTION TO INFORMATION THEORY [IX, § 5
Now let us number the events of the first class from 1 to Cn and write
these numbers in the binary system. For this n /(J?5) + + 1 binary digits
are needed. There can be found an n2 such that for n> n2 the inequality
holds. Put 770 = max (nx, n2); it is clear that n0 depends only on s, 5, and &
and satisfies the requirements of the theorem. It is easy to show that with
large probability - s) 0- 1-symbols are not sufficient to describe
the outcome of the sequence of experiments. To see this, subdivide again
the set of the sequences into two classes: let the first class contain the se¬
quences for which
and the second the remaining ones. Choose an n3 such that for n > n3 the
probability of (9) exceeds 1-5; this is possible because of (2). Let
Dn denote the number of sequences in the first class and let rlt r2,. . ., rDn
be the corresponding probabilities. We have then
Z n>\-8. (ii)
7=1
a probability > q > 0. If we choose <5 such that 25 < g, then this contradicts
what was just proved.
Theorem 1 is therefore completely proved. It can be sharpened in the
following manner:
Theorem 2. For every 5 > 0 there can be given an n0 such that for n > n0
the outcome of n independent experiments can be uniquely expressed, with
probability > 1 — 5, by at most nl{ffi) + K^Jh 0—1 -symbols; K is here a
positive constant which depends only on 5.
However, there corresponds to every o between 0 and 1 a constant K' and an
integer n'0 such that a unique characterization of the outcome of a sequence
of experiments becomes impossible (with a probability > g, for n > n'0) by
less than — Kf/n 0—1 -symbols.
( 1 1 ^ Vk ~ "Pk
Jn — log2 — - I(&) = I —7=— i°g2 —
1
n n„ k=1 fn Pk
tends to the normal distribution as n -*■ oo. There exists thus a constants
which depends only on <5 such that we have for sufficiently large n
K
— log2 —-/(^) >\-5. (12)
n n„ < Jn
The continuation of the proof runs exactly as that of Theorem 1.
Theorem 1 can also be considered as a justification of Shannon’s defini¬
tion of information. The statement of Theorem 1 can be translated into the
language of communication theory as follows: Let a message source at
the moment t (t = 1,2,...) emit a random signal let x1; x2,. . ., xr
be the possible signals; let pk = P(ft = xk) denote the probabilities of the
individual signals. These probabilities are supposed to be independent of t.
Assume that the signals are independent of each other. Assume further that
for transmission the signals must be transformed (encoded) since the
“channel” (transmission network) can only transmit two signs.2 (This is the
case e.g. if the channel works with electric current and at every instant only
two cases are possible: the current is either on or off.) Let 0 denote one of
the signs and 1 the other. The question is then, how many 0 or 1 symbols
are necessary for the transmission of the information contained in n signs
£i> £2, ■■ - An furnished by the source. According to Theorem 1 with proba¬
bility arbitrarily near to 1 less than n(I(^) + s) symbols are required, pro¬
vided that n is sufficiently large. This shows the importance of the quantity
for communication engineering.
ipk=i.
k=1
The discrete incomplete random variables £ and t] are said to be inde¬
pendent, if for any two sets A and B the events £ £ A and rj £ B are inde¬
pendent.
The distribution of an incomplete random variable will be called an in¬
complete probability distribution; in this sense the ordinary distributions can
be considered as a particular case of the latter. Thus if pk > 0 (k = 1,...,«)
n
yk — n
HPj
j=1
Let £ be an incomplete random variable taking on values xk with probabili-
570 INTRODUCTION TO INFORMATION THEORY [IX, § 6
(i)
Remark. This means that if we put SAi — (pa, . . ., pin), = (qn, . . ., qin)
(/ = 1 or 2), then there corresponds to the element pXj p2k of 6A the ele¬
ment qv q2k of Q. Postulate I is a general formulation of the additivity of
information.
Postulate II If pk < qk Oc = 1, 2,. .., n), then we have I(Q II SA) > 0;
for pk > qk (k = 1, 2,. . ., n) we have I(Q || S3) < 0.
Remark. It follows from this that I{fA1| fA) = 0. For complete distribu¬
tions fA and Q Postulate II asserts nothing more than this, as then the in¬
equalities pk < qk (k = 1,2,..., n) occur only if pk — qk (k = 1,2,..., n),
n n
If we put qx = p2 = 1, Pi — p, q2 = q, we obtain
g(l,p) = c log2 —
P
9
information is log.
572 INTRODUCTION TO INFORMATION THEORY [IX, § 6
, Qk
Qk— n
X Qj
7=1
F(Q_, Sfi, x) will be called the conditional distribution function of the gain of
information.
Now we can formulate our further requirements:
Postulate IV. I(Q \ \ ,9^') depends only on the function F(Q, 9, x).
Because of this postulate we can also write /[T(x)] instead of I(Q || 9s),
where F(x) = F(Q, 9, x).
1 otherwise.
Postulate IV is thus fulfilled and (8) expresses that for a degenerate distri-
IX, § 6] FURTHER MEASURES OF INFORMATION 573
bution function
0 for x < c,
F(x) = Dl (x) =
(10)
1 otherwise,
7* qk
ak = log2 — , wk =-- {k= 1,2,..n) (11)
Pk qi + <?2 + • • • + q„
hold. This is the case if we take qk = twk and pk = twk 2~ak. If we choose
the number t such that
\
0 < t < min (12)
\ I
A: = l
Wk-2~ak
7
2,..., 77) we have log2 — > 0, hence F(Q, x) < From this follows
Pk
by Postulate V that I(Q || ,9s) > 7[Z)0(x)] — 0; and if pk > qk, the inequal¬
ity is reversed. However, Postulate II is not superfluous, since in order to
state Postulate V we used relation (8) resting on Postulate II.
Postulate VI. Let Ft £ ■9r (i = 1, 2, 3) and I[F2\ — 7[7’3]. Then for every
574 INTRODUCTION TO INFORMATION THEORY [XX, § 6
( i £ qt \ (13a)
/a(QH^) = ^Tlog: „ct-l
. k=1 Pk
£ (ik
\ *-i
or fill || -FF) — fi(Q || F5) with
n
Proof.2 Instead of IfiQ || FF) we use also the notation fi(F(Q, FF, x.))
Then (13a) and (13b) are written as
+ 00
and
+ 00
h(F) = j xdF(x). (16)
— 00
From these formulae we see that IfQ ||^) satisfies for every a Postulates
I through VI. It remains still to show that no other functional can satisfy
all these Postulates. A simple calculation shows that
F(Qi * Q2, * ^2, x) - J F(Q1} < x-y) dF(Q2, &2, y), (17)
— oo
+ oo
F*G= j F(x-y)dG(y%
— 00
then we have
ni«’,Fl] = ni (i9)
i=l i=l
I[Dc(x)\ = e (20)
holds for every real c, where Dfx) is the degenerate distribution function
of the constant c (see Formula (10)).
Let
'l'a (0 = /[(!- t) D_a (x) + tDA (a)]. (21)
576 INTRODUCTION TO INFORMATION THEORY [IX, § 6
I[DU (*)] = u = ipA (0 = /[(l - (pA (m)) D_A{x) + cpA (u) DA (*)]. (22)
= («*))• (26)
k=1
Lemma. Let gx{x) and <pfx) be two continuous and strictly increasing func¬
tions in the interval [/, K\. Suppose that for arbitrarily chosen numbers
*i, x2,. . ., x„ in [J, K] and for positive numbers wx, w2,. . ., w„ with
n
Z wk = 1 we always have
k=1
n n
1 (Z
k =1
wk(Pi (Xk)) = 9?2_1 (
k
Z
=1
wk CP2 (xk)). (27)
holds, where a > 0 and P are two constants. (Conversely, (28) implies (27)).
IX, § 6] FURTHER MEASURES OF INFORMATION 577
T’sO) is thus a linear function, which proves the first part of the lemma.
The converse is trivial.
We show now that there can be found a function cp(x), independent of A,
such that (26) remains valid if <pA is replaced by cp. It suffices to prove that
= <Pb(x)~<Pb(-A)
for 0 < A < B. (30)
<Pb(A)-cpb(-A)
Now we investigate how cp(x) can be chosen such that it fulfils also Postu¬
late Put in F
Then we have
for all values of a and b and for 0 < t < 1. It follows from the lemma that
This relation being fulfilled for every y, we may interchange x and y, hence
with a ^ 1, hence
2(«-l)* _ i
3
(43)
II
'Zd
k=1
4(Qll^„) = log2 n - log2 (44)
1 —a
I <7*
'*=1
holds. Thus if we put for any incomplete distribution & = (p1}. . ., pn)
f n \
LPk
k=l
h (^) = -j-
l —a
10g2 n (45a)
\Y,Pkj
\fc=l '
we find that
lM\\$n) = h(%n)-lM- (46)
1
Z ^log2
A(Qll^„) = log2 n- (47)
Z dk
k=1
1
Z PO °g2
k = 1_Pk
h(&)- (45b)
ipk
k=1
1
-log2 Z Pk fora ^ 1. (45c)
1-a k=i
580 INTRODUCTION TO INFORMATION THEORY [IX, § 6
( n 1 1 —a
\ T^T
/YJ Pk
Pk)
4(^5) = log, k = 1
Z
k=1
Pk
(Z wk4y (Xk>0,wk>0,fjwk=l)
k =1 1
Theorem 3. If Sfi = (p1?. . .,/>„) and Q = (qt,. . ., qn) are any incomplete
n n
creasing function of a. Since I0(Q | | SP) = log, — , for the complete distribu-
s
tions and Q there follows the inequality
Proof. We have
, Z Qk y
from which Theorem 3 follows by the same theorem on mean values (cf.
foot-note) as above.
If a is negative or zero, the properties of and IfQ || differ essen¬
tially from those of Shannon's information. As can be seen from Theorem 3,
IfQ || J/3) is for complete distributions only then positive, when a is posi¬
tive. The following property is particularly undesirable: Let a < 0; modify
the complete distribution J?3 = (pl5 . . .,/?„) by letting tend to zero, then
/fSP) tends to infinity. On the other hand, 70(<&>) is always equal to log2 n
whenever & contains n positive terms. is thus very inadequate to meas¬
ure the information and we consider only IfidfP) with positive a as true meas¬
ures of information.
Let us now consider some distinctive features of Shannon’s information
among the informations of the family or of IfQ || d/3) among the
informations of the family IfQ || SP). One of these properties is given by
Theorem 4. If c and r\ are two random variables with the discrete finite
distributions Sfi and Q and if denotes the two-dimensional distribution of
the pair (£, rf), then
holds for every SP and Q with the mentioned properties if and only if a = 1.
7>(£ = 0, rj = 0) = pq + £,
P(£ = 0, rj = 1) = p(l - q) - £,
P(Z = 1, n = 0) = (1 -p)q ~ £,
P(f = 1, rj = 1) = (1 -p)(l - q) + £
582 INTRODUCTION TO INFORMATION THEORY [IX, § 6
with
1 1
0 <p < 1, 0 < q < 1, p # — , —
and
would have an extremum for e = 0. But this is not the case, since g'(0) ^ 0.
The quantity IfiQ || S75) is distinguished among the IfiQ, || &) e.g. by
the following property:
( ke= 1 ?*)2=(£
k=1
ly s c i Pk) (f a).
*=1 k=1
n
(v v-i Qk
si"
|iH
h _'a —1
1
It is easy to see that the right hand side of (54) is not identically zero; e.g.
it is different from 0 if we put n = 2, q1 = q2, px + p2.
Z Pk l0g2 4“
u x k =1 Pk
/(a) =-— (1)
Z Pk
Since
/ JL ,1 ( ' .. 1 \*\
/ k=1
Z Pfe log5 — Z P* °§2
fctl_P/c
—-
r (a) = - In 2
Pfc
(2)
v Z Pk Z Pk
k=1
k =1
•it follows from Cauchy’s inequality that /(a) is a strictly decreasing function;
further we have
1 1
/(1) = (^), lim /(a) = log2 — , lim /(a) = log2 — •
a^-oo Pi a^+0O Pr
we have
and
< ^(a) < pr for a > 1
584 INTRODUCTION TO INFORMATION THEORY [IX, § 7
Now let Bn(a) be the event n„ > p(a)n. Consider the conditional information
contained in the outcome of the sequence of experiments, under the condi¬
tion Bn(ot). Put for this
n\
C»= v (5)
nx\ n2\ ... nr\
n p"k > p(«f
*-« ,
Z nk = n
k= 1
s C„(x)p(a)"' (6)
and on the other hand
r
Pk )T
I (8)
,k=1 X«)J J
Pk
?*(<*) = r
-7 • (10)
YpI
7=1
Pk
log2
&?i lP(«)j
(11)
Furthermore, we have
n^(a)>x«)". (15)
A:=i
When v* = n*(a) (ic = 1, 2,. . r), the event 5„(a) occurs; hence
n\
C„(a)> r (16)
n n^y-
k=1
n\
(17)
n
/c=l
Relations (12), (16) and (17) lead to
In n
iM-o < — log2 C„ (a) < f (Qa). (18)
n n
Therewith we proved
P(Ak) = pk, 0 < < p2 < . • • < P,, I Pk = 1 and & = (Pi,p2, ■ • •,/>,)•
k=l
Let the experiment be repeated n times such that the repetitions are inde¬
pendent of each other. Put further
586 INTRODUCTION TO INFORMATION THEORY [IX, § 8
noc
nh (QJ = nla (&) + --Ix (Qa 11 &). (20)
I — a
If, however, g > 0 and e > 0 are arbitrarily small positive numbers and n
is large enough, then «(/i(Qa) — e) 0—1 -symbols are not sufficient with
probability > g.
b) if = NA * Q, we have
if the series on the right hand sides of (1) and (2) converge. The series (2)
does not always converge. For instance for
1
Pk = (k = 1,2,...)
ck log2 (/: + 1)
_ y *
° hi n log2 (n + 1)
However the series (1) converges always for a > 1. In case of discrete in¬
finite distributions the measure of order a of the amount of information is
thus always defined if a > 1.
Let i/ be a second random variable which takes on the same values as ^
but has a different probability distribution P(rj = xk) = qk {k = 1,2,.. .).
Let the gain of information of order a, obtained if the distribution
Q = (qx, <72,. . .) is replaced by = Cpl5 p2, ■ ■ •)> be defined by
1 / °° na
(4)
k=1 Pk
if the series on the right hand side of (3) or (4) converges (which is not always
the case). The series (3) converges according to Holder’s inequality always
for 0 < a < 1.
Let now t, be a random variable having continuous distribution. We want
now to extend the definition of the measure of order a of the amount of
information, i.e. /a(£), to this case. If we do this in a straightforward way we
obtain that this quantity is, in general, infinite. If for instance £ is uniformly
distributed on (0, 1), we know (cf. Ch. VII, § 14, Exercise 12) that the digits
of the binary expansion of £ are completely independent random variables
which take on the values 0 and 1 with probability Hence the exact
, _ m]
(5)
where [x] denotes the largest integer not exceeding jc. Suppose a > 0 and
let 4(£i) be finite (this is only a restriction for a < 1). It follows from
Jensen’s inequality that Ia(£N) is finite for every N and the inequality
f, k k+ 1 1
PN,k ~ P =P <£ <
(N ~ ~N. N N j
(k = 0, ±1, ±2, ..N = 1
then we have the inequality
+ 00 +00
£ Pn^hn1- £
k=— oo £=-oo
from which (6) follows; for a > 1 (6) can be proved in a similar manner.
When the distribution is continuous, the information 4(£a) tends to
infinity as N -> co; however, in many cases the limit
da (0 = lim
4 (h)
(7)
N-oo log2 N
exists. The quantity da{£) will be called the dimension of order a oft If not
only d = dff) exists but also the limit
[A^]
we put £n = (N = 1, 2, . . .) and if we suppose that 7a(<4) is finite
N
(a > 0) then
IAM
lim (9)
N -*■ 00 log 2N
+ 00
i.e. the dimension of order a of l; is equal to 1; if the integral j fixfdx (a / 1)
— 00
exists, then
+ 00
lim (4 (4v) - log2 N) = 4>a (0 = —log2 (
N-+ 00 1 CL
f fix)* dx);
J
(10)
if
+ 00
exists, then
+ 00
We have then
+ 00
'& + 1 ' k
-F
AT N
AW L (14)
1
~N
590 INTRODUCTION TO INFORMATION THEORY [IX, § 8
lim/v(x) =f(x)
N-*~CC
for almost every x. Now we shall use Jensen’s inequality in the following
form: If g(x) is a concave function and if p(x) and h(x) are measurable func-
b
tions with p(x) > 0 and \ p{x)dx = 1, then we have
a
b b
j g(h(x)) p(x) dx < g{§ h(x)p(x)dx). (15)
a a
This inequality can be proved in the same way as the usual form of Jensen’s
s fix) k k+ 1
P(x) = — f°r -rr^x<-—
pNk N N
then we get
k+1
N
(16)
Ik fix) log2 J(x) dx < pNk log2 —-—
NpNk
AT
+ 00
If/M < K, we have also f^ix) < K; thus the functions fu(x) are uniformly
bounded. Hence, by the convergence theorem of Lebesgue,
+A +A
+ 0C
r 1
Since we have assumed that If^) and J f(x) log2 - dx are finite, we
1
J
\x\>A
m
log2 fix)
dx < £ (22a)
and
1
E Pv log — < e-
2
(22b)
\1\>A Pll
(20), (22a) and (22b) show immediately that the theorem is true for a = 1.
Consider now the case a > 1. We get from Fatou’s lemma1 that
+ 00 + CO
lim inf j fN(x)adx> f f(x)a dx. (23)
N-oo J
— 00
J
—CO
hence (10) is proved for a > 1. We have still to examine the case 0 < a < 1.
+ 00 +00
Since we supposed 7a(£j) to be finite, we can find for every e > 0 an A > 0
such that
I Pli < £• (29)
\1\>A
From (27), (28) and (29) we conclude that (25) remains valid for 0 < a < 1.
Theorem 1 is thus completely proved.
The quantities
+ 00
These facts are explained by realizing that /M(£) is the limit of a difference
between two informations.
All what we have said can be extended to the case of r-dimensional ran¬
dom vectors (r = 2, 3,. . .) with an absolutely continuous distribution. Let
f(xlt. . xr) be the density function of the random vector (<!;(1),,.^(r)).
lim = r. (33)
N-» oo log,^
The dimension of the (absolutely continuous) distribution of a random vector
of r components is thus equal to r; the notion of dimension in information
theory is thus in accordance with the notion of geometrical dimension.
Furthermore, for a > 0, a ^ 1 we have
with
+ 00 +00
and
lim [/, (({«>,..£«)) - r log, at] = ((£■>.{<'>)) (35a)
N -*■ oo
with
+ oo +oo
1
A.,(«a>.{«)) = J . . . J /(X„ . . AT,) log2 dx\... dxr, (35b)
fiXl, • • Xr)
— 00 — 00
1 IJdf'N , . • ■ , £^)) denotes the entropy of order a of the distribution of the random
vector ($’,. .., 1$).
594 INTRODUCTION TO INFORMATION THEORY [IX, § 8
dQ^
where h(co) > 0 is the Radon-Nikodym derivative and
dAft
J h(co) dAft = 1.
n
Formulas (37a) and (38a) remain valid in the discrete case too. The (ordinary)
discrete distributions Aft = (px,. . .,pn) and Q = (qx,. . ., qn) may indeed
be considered as measures defined on an algebra of events of n elements
<ox, co2,. . con with Aft(a>k) = pk and Q(cok) = qk (k = 1,2,..., n). The con¬
dition that Q is absolutely continuous with respect to Aft is here automatically
fulfilled whenever pk > 0 (k = 1,2,...,«) and we have
K°a) = —
(k = 1,2,...,«).
Pk
The formulas
n na
7. (£2 || Iog2 V and IfQ || Aft) = £ qk log, —
l—i 1
a— 1 k=1 Pk
1 One could deduce Formulas (37a) and (38a) from a certain number of postulates
as was done in the discrete case (§ 6). This will not be dealt with here.
IX, § 8] DEFINITION OF INFORMATION FOR GENERAL DISTRIBUTIONS 595
<l(x)
h(x) = (39)
p(x) '
In this case we obtain for the gain of information from (37) and (38)
+ 00
q(x)a
dx for a/ 1 (37b)
I. (Qll^>) = —b-r log
a— 1 (1 Pixy-1
and
+ 00
where Ia(Q || tP3) is defined by (37b) for a ^ 1 and by (38b) for a = 1, pro¬
vided that Iffl || SP) exists.
Consider first the case 0 < a < 1. It is clear that pN(x) -> p(x) and
qn(x) -*■ q(x) almost everywhere; further
+ 00
Since
J qN(,x)apN{xf~a dx
\x\>A
+ 00 + 00
f ?(*)“ f qN(x)a j
J Wr'dx*) J^dx’ (43)
+ 00 + 00
4n(xY q(xY
lim inf dx. (44)
N -+- 00 J Pn{x) Pixy1
— 00
+ 00
qN{x) .
A(QnII^V) = J qN(x) log2 Z 7“ dx. (45)
pN{pc)
+ 00 + 00
, \ , , log2 e , . . „
qN(x) log2 —— +-pN(x) > 0.
Pn\x) e
lim
im inf f qN(x) log2 dx > q(x) log2^~dx. (47)
A- oo J p(x)
Theorem 1. If & = (plt • • -,pr) and Q„ = (qnl,. .., qnr) are probability
distributions and if
Proof. If (2) does not hold, there exists a subsequence nx < n2 < ... <
< ns < . . . of the integers with
N
Pk > 0 (k = 0, 1,..., N) and V pk = 1, (6)
k=o
and we have
is zero, since X Pjk — 1 (j = 0, 1,. .N); thus the system has a non-
k=l
trivial solution (x0, Xl,..., xN). If (5a) is fulfilled, we have
I ** I ^ Z Pjk I Xj |. (5b)
E 1**1 s E l*il-
k =0 7=0
But this inequality is an equality; hence the same must hold for every in¬
equality (5b), i.e.
Z rf+1) = I PH®-
jZo k-0
Since (5d) is valid for h = s, it follows that no pk can be zero. Because of the
homogeneity of the equations, (5a) has thus a positive system of solutions
N
Z PjPjk = Pk-
7=0
lnik=X. (10)
7=0
1 This solution is unique; this is a corollary of (7) and need not be proved here
separately.
600 INTRODUCTION TO INFORMATION THEORY [IX, § 9
Furthermore, by definition
Pit*1' = 1=0
I tfPib (11)
hence
■ N N n(n) \«
7a(^+1>||^) = -—log2 Eft w=o (12)
OC X k=o Pi )
N N
Z
=0
k
Pk^ik=PiY Pik=Pi-
k=0
(H)
If we multiply the inequality (13) by pk and then take the sum over k, we
obtain
It remains to show that y = 0. Choose a subsequence nx< ... < nt < ...
of the integers such that the limits
N
fni+s)
lim p)k‘ _
= Z
1=0
^jiPik = q'jk-
Obviously
IX, § 9] INFORMATION THEORY OF LIMIT THEOREMS 601
Let Qj and Q) denote the distributions (qJ0,. . ., qjN) and (q'J0,. . ., q']N),
respectively. If we put 71$ = PjP$/pk, Jensen’s inequality implies
N N
by the same argument that led to (15). But, because of (16), the relations
and
/« (Qj \\'9S)= hm Ia 11 &) =y (18b)
t-*00
hold; hence there is equality in (17). Since (17) is derived from Jensen’s
inequality, it follows that equality can hold only if qJ{ = Xp, (/ = 0, 1,. . .,N).
N N
+ 00
Pn (*) (19)
lim
n-+co I P„(X) l0g2
<p(x)
dx = 0,
<P(X) =
Jin
j x2pn(x)dx = 1.
— 00
To say that the distribution with the density function pn(x) tends to the nor¬
mal distribution, means therefore that the entropy of this distribution tends
to the entropy of the normal distribution. But we can prove that for a den¬
sity function p(x) such that
+ CO
j x2p(x)dx= 1 (23)
the inequality
+ CO + 00
holds, since because of (21) and (23), (24) is equivalent to the well-known
inequality
+ 00
mum of the entropy of all random variables with unit variance. Thus the
central limit theorem of probability theory is closely connected with the
second law of thermodynamics.1
P(A) = E
con£A
Pn for
Let 3S be the set of those subsets B of Q for which p(B) is finite and positive.
For A £ and B £ 3dl, P(A \ B) is defined by
K AB)
P(A | B) = (1)
m *
If Qn is the set {cox, co2,. . .,<x>N}, then QN £ P& for N > N0. We define the
entropy Ix(£) of £ (in other words the information contained in the value
of 0 by
h(0= Hm (3)
if this limit exists and is finite. If it does not exist, the information in ques¬
tion will be characterized by the following two quantities:
1 if n is odd,
e0(n) =
0 if n is even.
' N N
N—
, N N
7i(e0 (n) | Qh) = l°g2 r . . n + log2
N N N N
N—
T
and by (3)
A(«o («)) = 1- (5)
Take now an example for which the limit (3) does not exist. Consider again
the binary expansions of the positive integers; let [fl, «S@,P(A | B)] be
the same conditional probability space as in the previous example.
Let rj(n) be the largest exponent of 2 in the binary expansion of n; hence
i(«)
« = ££*(«)2* with £„(n)= 1.
i
IX, § 11] EXERCISES 605
and
N — 2r 4- 1
P(t](n) = r\QN) =-—-.
then we have
and we have
§ 11. Exercises
l.1 a) How much information is contained in the licence number of a car, if this
consists of two letters and four decimal digits? (22.6)
b) How much information is needed to express the outcome of a game of “lotto”,
in which 5 numbers are drawn at random from the first 90 numbers? (25.4)
c) What amount of information is needed to describe the hand of a player in bridge
(each player having 13 cards from 52)? (39.2)
d) How much information is contained in a Hollerith punch-card which has 80
columns and in each column one perforation in one of 12 possible positions? (286.8)
2. a) Let some integer n (1 < n < 2 000) be divided by 6, 10, 22, and 35 and let
the remainders be given, while we assume that the remainders are compatible. How
much information is thus given concerning the number nl
Hint. The information is equal to log., 2000 = 10.96 (i.e. we get full information
on n). In fact the remainders mentioned determine n modulo the least common
multiple of 6, 10, 22 and 35; which is equal to 2 310 > 2 000, hence n is uniquely
determined.
b) Let the number n be expressed in the system of base (—2), i.e. put
n= X bk{— 2)*,
*=o
where bk can take on the values 0 or 1 only. How much information on n is contained
in M
[^]
Hint. Put ?V = Y, 22j+1;then
/=o
r 2r+1—N— 1
na+jc(~2>*) = z *"•
k=0 n= — N
Hint. As is known,1 for every positive integer k the asymptotic density of the
numbers n with V(n) — U{ri) — k exists. Let this density be denoted by dk, then
xv=n
k= 0
for | z | < 2,
where p runs through all primes. Let Nk(x) denote the number of integers n smaller
than x with V(n) — U(n) = k, then
Nk{x)
lim = 4
X-> +oo a:
V *«(*)
* = nL
=l H
-.
where q is a positive integer > 2, and e„(x) can take on the values 0, 1, . . ., q — 1
(n — 1,2,...). How much information with respect to x is contained in the value
of £„(x) ?
b) Expand x (0 < x < 1) into the Cantor series
x _y En(x)
n—\ dldi ■ ■ ■ dn
where qu q,,... , qn, . . . are positive integers > 2, and £„(x) can take on the values
0, 1, 1 . How much information with respect to x is contained in the value
of en(*)?
c) Expand x (0 < x < 1) into a regular continued fraction
x =
(x) +
a2 (x) +
where each a„(x) can be an arbitrary positive integer. How much information about
x is contained in the value of a„(x)?
Hint. Let m„(k) denote the measure of the set of those x for which o„(x) = k .
As is known1
Let it be remarked that contrary to Exercises 3.a) and 3.b), the random variables an(x)
in this example are not independent; the total information contained in a sequence
of several digits an(x) is not equal to the sum of the informations contained in the
individual digits.
4. Let a differentiable function /(x) be defined in [0, A] and suppose /(0) = 0 and
|/'(x)| < B . Find an upper bound for the information necessary in order to determine
the value of /(x) at every point of [0, A] with an error not exceeding e > 0 .
ke { AB
Hint. Put xk k = 0,1, xr ab -j = A. Let the curve of
B L-f J +
/(x) be approximated by a polygonal line y — <p(x) which can have for its slope in
each of the intervals (xk, xk + x) either +B or —B. If (p(x) is already defined for
0 < x < xk , then let the slope in (xfc+l) be so chosen that \f(xk+l) — g?Ot+1)| < e.
Obviously, this is always possible. Since f(x) — <p(x) is in every interval (xk, xA+l)
monotone, the inequality |/(x) — <p(x)| < e holds in the open intervals (xk, x*.+l)
(k = 0, 1,. . .) as well. Clearly, the number of possible functions q>(x) is equal
5. We have n apparently identical coins. One of them is false and heavier than the
others. We possess a balance with two scales but without weights. How many
weighings are necessary to find the false coin?
Hint. The amount of information needed is equal to log2 n. Only weighings with
an equal number of coins in both scales are worth while to be performed. 3 cases
are possible: equilibrium, right scale heavier, and left scale heavier. One weighing
{logo n \ .
lo ~ 3 f weighings
({x} denotes the smallest integer greater than or equal to x). It is easy to see that
this number of weighings is sufficient. In fact, let k be defined by 3* 1 < n < 3*.
At the first weighing we put in each of the dishes j coins. We know then to which
of the three sets, each containing at most 3fc 1 coins, the false coin belongs. Proceeding
in this manner, the false coin will be found after at most k weighings.
Hint. Obviously, at least {log2 N} questions are needed, since each answer provides
at most one bit of information and we need log, N bits. An optimal system of questions
is to ask, whether in the binary representation of the number x the first, the second, ...,
digit is 0? The aim is arrived at by {log2 N} questions, since the binary representation
of an integer is unique.
b) Suppose N = 2s. How many optimal systems of questions do exist? That is:
how many systems of exactly j questions determine x whatever it may be?
Hint. The number of the possible sequences of answers to s questions is evidently 2s.
There corresponds thus to every integer x (x = 0, 1.2s — 1) in a one-to-one
manner a sequence of s yes-or-no answers. Every question can be put in the following
form: Does x belong to a subset A of the sequence 0, 1, . . ., 2s — 1? Thus to an
optimal sequence of s questions there correspond s subsets of the set M — {0, 1, ... ,
2s — 1}; let these be denoted by Au A2, . . . , As. According to what has been said,
Ay has to contain exactly 2i_1 elements. Let A always denote the set complementary
to A with respect to M. Then AyA2 and AXA2 have to contain both 2s-2 elements;
AyA2A3, AyA2A3, AyA2A3, and AyA2A$ have to contain 2'-3 elements, and so on.
IX, § II] EXERCISES 609
Conversely, if all sets AXA2. . . Ak~rAk contain exactly 2s~k elements (k = 1,2,..., s),
where A means either A or A, then the system of sets At, A2,. . ., As is optimal.
It follows from this that the number of optimal sequences of questions is
If we regard the systems of questions which differ only in the order of questions as
2s!
identical, then the number looked for is -.
si
Remark. In the Bar-Kochba game the questions are, in general, formulated while
taking into account the answers already obtained. (In the language of set theory:
if the first answers have shown that the object belongs to a subset A of the set M of
all possible objects, then the next question is whether it belongs to some subset B
of the set A.) It follows from what has been said that the questioner suffers no dis¬
advantage by being obliged to put his questions simultaneously.
7. Suppose that in the Bar-Kochba game type players agree that the objects
allowed to be thought of are the n elements of a given set M. Suppose that the
questions are asked at random, or in other words, all possible questions have the
same probability, independently of the answers already obtained.
a) What is the probability that the questioner finds out the object by k questions?
b) Find the limit of the probability obtained in a) as n and k both tend to + °° such
that
lim (k — log2 n) = c .
n—► co
Hints. We may suppose that the elements of the set M are the numbers 1, 2
Each possible question is equivalent to asking whether the number thought of does
belong to a certain subset of M. The number of possible questions is thus equal to
the number of subsets of M, i.e. to 2". (For sake of simplicity there are included the
two trivial questions corresponding to the whole set and the empty set.) Let
Ax, A2, . . . , Ak be the sets chosen at random by the questioner: i.e. he asks, whether
the number thought of does belong to these sets. By assumption, each of the sets
each other and each takes on the values 0 and 1 with probability — . The questioner
finds the number x, when the sequence of numbers ex(x), f2(jc),.. ., e*(x) is different
from all sequences ex(y), f2(y), • • • . f*O0 with y ^ x. The sequences e,(/), e2(/),..., ek(l)
are, with probability and independently of each other, equal to any sequence
f i r-1 _
that the red ball is alone in an urn? The answer is evidently 1 — — P„.k-
Remarks
is employed. In fact, exp > 0.99. This result is surprising, since one would
be inclined to guess that the random strategy is much less advantageous than the
optimal strategy.
2. When the questions are asked at random it may happen that the same question
occurs twice. But the corresponding probability is so small, if n is large, that it is
not worth while to exclude this possibility, though of course this would slightly increase
the chances of success.
8. Certain players play the Bar-Kochba game in the following manner: There are
r + 1 players, r players think of some object; the last player asks them questions.
The same question is addressed to every player, who answers by “yes or no ,
according to what is true concerning the object he had thought of.
a) Each of the players thinks of one of the numbers 1,2,. . . ,n(n> r), but each
of a different number. The questions are asked at random, as in the preceding
exercise. What is the probability that the questioner finds all numbers by k questions?
b) n = r and the players agree to think each of a different number of the sequence
1, 2hence it is a permutation of numbers which is to be found. What is the
probability that the questioner finds the permutation by k questions? Calculate ap¬
proximately this probability for k — 2 log2 n + c.
Hints, a) We are led to the following urn problem: we put n balls into 2k urns,
independently of each other, each ball having the same probability to get into
any one of the urns. Among the n balls there are r red balls, the others are white.
What is the probability that all the red balls get into different urns? This proba¬
bility is
thus
for all pairs of integers n and m. Suppose further that the limits
f(P) log n
9(n) = /(«) -
log P
lim lim
f(n) An = 0.
P —► 00 fj —► 00 log n log P
lim --= c.
/(«)
» log n
If we put now h(n) = /(«) — clogw, then /(«) is a completely additive function
for which
r h<ji) = 0.
lim --- o
(1)
n —* oo log«
But this implies h(n) = 0 since otherwise there would exist an integer r with h(r) ^ 0
and thus, because of the additivity, h(rk) = kh(r) for k — 1,2,... , which contra¬
dicts (1).
Remark. This problem is due to K. L. Chung but his proof differs from ours.
If instead of the complete additivity only simple additivity is required, i.e. that
f(nm) = /(«) + f(m), if («, m) — 1, then the condition (B) does not imply /(«) =
= c log n. (The last step of the proof cannot be carried out in this case.)
kpk = A > 1.
612 INTRODUCTION TO INFORMATION THEORY [IX, § 11
f Pk lo§2 — = h{&)
i Pk
11. Let and Q be two distributions, absolutely continuous with respect to Lebesgue
measure, with density functions p(x) and q(x) and further let be absolutely con¬
tinuous with respect to It follows from Theorem 2 of § 8 that the gain of information
is nonnegative in this case too, i.e. we have the inequalities
+ CD
and
-foo
Prove these inequalities directly (without passing to the limit) by Jensen’s inequality
geneialized for functions, i.e. by inequality (15) of § 8.
J xf(x) dx = A
o
and put
1 ( j
&(x) = y exp —
0 0 0
Hint. Let f{x) be a density function which vanishes outside (a, b) and put
0 otherwise.
IX, § 11] EXERCISES 613
We have then
b b j)
15. The relative information Id,rj) contained in the value of r] concerning £ (or
conversely) can be defined, when the pair (4 rj) has an absolutely continuous distri¬
bution, by
Id, v) = hxd) + 4,t0?) - 4,2 (&»?))•
Show that if 01 is the joint distribution of £ and ij and if * Q denotes the direct
product of the distributions and Q of £ and rj, then Formula (11) of § 4 remains
valid; i.e. Id,rj) is equal to the gain of information obtained by replacing * Q
by 01.
Hint. If h(x, y) is the density function of the pair (4 rj), f(x) and g(y) are the
density functions of £ and rj, then we have
-{-CO -{-00
It follows that
Id, V) = 4(« II & * Q),
because of Formula (38) of § 8.
16. In the following exercises we always use natural (Napier’s) logarithms In.
a) Calculate the entropy (of dimension 1 and of order 1) of the normal distribution;
i.e. show that
J
-f 00
where
(x — m)2 \
(p(x) = — - exp
In a ~’2cr2 J
614 INTRODUCTION TO INFORMATION THEORY [IX, § 11
b) Calculate the entropy (of dimension r and of order 1) of the r-dimensional normal
distribution.
Hint. Let the r-dimensional density function of the random variables £2, • • •> £r
be
Z Z
/=1 i=\
By a suitable orthogonal transformation
r
Vk = Z cm (£,• - mi)
/= i
we obtain for the density function of the random variables rju rj2, ... , ??,:
(2tt) 2 cqu,;... ar
1
with axa2 . . . ar = \\B\\ 2 . According to Exercise 13, the entropy is invariant under
such a transformation, since the absolute value of the determinant of an orthogonal
transformation is equal to 1. Hence, according to Exercise 14,
r r _1
Hint. We find the desired information by subtracting from the sum of the informa¬
tions contained in ? and in rj, the information contained in the distribution of the
pair (£, rj). Hence
2ne
/(£, n) — In + In In
J AC - B2
= In
r{—^—
1 [a — 1 «+i x^ \ ^
2 a_1 1 - — 1_a for | x | <c,
f*(x) = 7 \r(( a —“ 1
otherwise,
3a— 1
where we have put c — a
a - 1
Hint. Put
-f 00
(x — m)2
m = M( C) , „■ = / (x — m)2f(x) dx, (f(x) exp -
2a2
V 2na
We have then
+ 00 + 00
which implies a), b) can be proved in the same fashion. Let it be noticed that /a(x)
tends to
7 —exp
2jig (" +)
as a —> 1.
18. Let f(x) and f„(x) be density functions such that /„(*) = 0 (« = 1, 2,. .,) for
every value of x for which f(x) = 0; suppose further that all integrals
+ 00
r nw dx («= 1,2,...)
J /(
f(x)
f2J,x)
*. f fix)
dx — 1.
n co J
where E runs through all measurable subsets of the set of real numbers.
616 INTRODUCTION TO INFORMATION THEORY [IX, § 11
1 E
+ CO
ifnix)-f{X)Y
* J Ax)
dx
and clearli
+ 00 + 00
(/„(*)-A*))2 dx =
f n (X)
dx — 1.
fix) Ax)
— 00 — oo
TABLES 617
TABLES
Table 1
n n\ log n! n n\ log n\
Tabli: 2
V
n \
0 l 2 3 4 5 6 7 8
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1
8 1 8 28 56 70 56 28 8 1
9 1 9 36 84 126 126 84 36 9
10 1 10 45 120 210 252 210 120 45
11 1 11 55 165 330 462 462 330 165
12 1 12 66 220 495 792 924 792 495
13 1 13 78 286 715 1287 1716 1716 1287
14 1 14 91 364 1001 2002 3003 3432 3003
15 1 15 105 455 1365 3003 5005 6435 6435
16 1 16 120 560 1820 4368 8008 11440 12870
17 1 17 136 680 2380 6188 12376 19448 24310
18 1 18 153 816 3060 8568 18564 31824 43758
19 1 19 171 969 3876 11628 27132 50388 75582
20 1 20 190 1140 4845 15504 38760 77520 125970
21 1 21 210 1330 5985 20349 54264 116280 203490
22 1 22 231 1540 7315 26334 74613 170544 319770
23 1 23 253 1771 8855 33649 100947 245157 490314
24 1 24 276 2024 10626 42504 134596 346104 735471
25 1 25 300 2300 12650 53130 177100 480700 1081575
26 1 26 325 2600 14950 65780 230230 657800 1562275
27 1 27 351 2925 17550 80730 296010 888030 2220075
28 1 28 378 3276 20475 98280 376740 1184040 3108105
29 1 29 406 3654 23751 118755 475020 1560780 4292145
30 1 30 435 4060 27405 142506 593775 2035800 5852925
i
1 For n > 15 values are given for k < only; the further values can
Table 2
( continued)
9 10 11 12 13 14 15
V
2
3
4
5
6
7
8
1 9
10 1 10
55 11 1 11
220 66 12 1 12
715 286 78 13 1 13
2002 1001 364 91 14 1 14
5005 3003 1365 455 105 15 1 15
11440 8008 4368 1820 560 120 16 16
24310 19448 12376 6188 2380 680 136 17
48620 43758 31824 18564 8568 3060 816 18
92378 92378 75582 50388 27132 11628 3876 19
167960 184756 167960 125970 77520 38760 15504 20
293930 352716 352716 293930 203490 116280 54264 21
497420 646646 705432 646646 497420 319770 170544 22
817190 1144066 1352078 1352078 1144066 817190 490314 23
1307504 1961256 2496144 2704156 2496144 1961256 1307504 24
2042975 3268760 4457400 5200300 5200300 4457400 3268760 25
3124550 5311735 7726160 9657700 10400600 9657700 7726160 26
4686825 8436285 1 13037895 17383860 20058300 20058300 17383860 27
6906900 13123110 21474180 30421755 37442160 40116600 37442160 28
10015005 20030010 34597290 51895935 67863915 77558760 77558760 29
14307150 30045015 54627300 86493225 119759850 145422675 155117520 30
1
620 TABLES
X
0.1 0.2 0.3 0.4 0.5
k'''
X
0.6 0.7 0.8 0.9
k \
X
l 2 3 4 5
(continued) Table 3
/
6
/
7 8 9 10
/
•*
Table 3 (continued)
v X
12 13 14 15
ll
k \
1
0 0.00001
0.00018 0.00007 0.00002 0.00001
1
0.00101 0.00044 0.00019 0.00008 0.00003
2
0.00370 0.00177 0 00082 0.00038 0.00017
3
4 0.01018 0.00530 0.00269 0.00133 0.00064
5 0 02241 0.01274 0.00699 0.00373 0.00193
6 0.04109 0.02548 0.01515 0.00869 0.00483
7 0.06457 0.04368 0.02814 0.01739 0.01037
8 0.08879 0.06552 0 04573 0.03043 0.01944
9 0.10853 0.08736 0.06605 0.04734 0.03240
10 0.11938 0.10484 0.08587 0.06628 0.04861
11 0.11938 0.11437 0.10148 0.08435 0.06628
12 0.10943 0.11437 0.10994 0.09841 0.08285
13 0.09259 0.10557 0.10994 0.10599 0.09560
14 0.07275 0.09048 0.10209 0.10599 0.10244
15 0.05335 0.07239 0.08847 0 09892 0.10244
16 0.03668 0.05429 0.07188 0 08655 0 09603
17 0.02373 0.03832 0 05497 0.07128 0.08473
18 0.01450 0 02555 0 03970 0 05544 0.07061
19 0 00839 0.01613 0.02716 0.04085 0.05574
20 0 00461 0 00968 001765 0 02859 0.04181
21 0.00241 0.00553 0 01093 0.01906 0.02986
22 0.00121 0 00301 0 00645 001213 0.02036
23 0.00057 0 00157 0 00365 0.00738 0.01328
24 0.00026 0 00078 0.00197 0 00430 0.00830
25 0 00011 0.00037 0 00102 0.00241 0.00498
26 0 00004 0.00017 0.00051 0.00129 0.00287
27 0.00002 0.00007 0.00024 0 00067 0 00159
28 0.00003 0.00011 0 00033 0.00085
29 0.00001 0.00005 0 00016 0.00044
30 0.00002 0.00007 0.00022
31 0,00003 0.00010
32 0.00001 0.00005
33 0.00002
34 0.00001
TABLES 623
Table 3
17 18 19 20
k
0
1
2
3 0 00003
4 0.00014 0.00006 0.00003 0.00001
5 0.00049 0.00024 0.00011 0.00005
6 0 00138 0.00071 0.00036 0.00018
7 0 00337 0.00185 0.00099 0.00052
8 0.00716 0.00416 0.00236 0.00130
9 0.01352 0.00832 0.00498 0.00290
10 0 02300 0.01498 0.00946 0.00581
11 0 03554 0 02452 0.01635 0.01057
12 0.05035 0 03678 0.02588 0.01762
13 0 06584 0.05092 0.03783 0.02711
14 0.07996 0.06548 0.05135 0.03874
15 0.09062 0.07857 0.06504 0.05165
16 0.09628 0.08839 0 07724 0.06456
17 0 09628 0.09359 0.08632 0.07595
18 0.09093 0.09359 0.09112 0.08439
19 0.08136 0.08867 0.09112 0 08883
20 0.06915 0.07980 0.08656 0 08883
21 0.05598 0.06840 0 07832 0.08460
22 0 04326 0 05596 0.06764 0.07691
23 0.03197 0.04380 0.05587 0.06688
24 0 02265 0.03285 0.04423 0.05573
25 0 01540 0.02365 003362 0.04458
26 0 01007 0 01637 0.02456 0 03429
27 0 00634 0 01091 0.01728 0.02540
28 0 00385 0 00701 0.01173 0.01814
29 0.00225 0.00435 0 00768 0.01251
30 0.00127 0 00261 0.00486 0.00834
31 0.00070 0.00151 0.00298 0.00538
32 0.00037 000085 0 00177 0.00336
33 0.00019 0.00046 0.00102 0.00203
34 0.00009 0 00024 0.00057 0.00119
35 0.00004 0.00012 0.00030 0.00068
36 0.00002 0.00006 0.00016 0.00938
37 0.00001 0.00003 0.00008 0.00020
38 0.00001 0.00004 0.00010
39 0.00002 0.00005
40 0.00002
41 0.00001
624 TABLES
Table 4
r*(A) =
1
(n - 1)!
"t“-'e-‘dt= £ Xke~x
k\
(«= 1,2,...)
i
i 0.008960 0.009950 0.019801 0.029555 .
2 000040 000050 000197 000441
3 000002 000004
n 2=0.08 2=0.09
II
II
( continued) Table 4
Table 4 ( continued)
2=6.0 2 = 6.5
o
6 = 5.5
II
Table 4
0°
o
od
n A—7.0 A=7.5
II
II
1 0.99909 0.99945 0.99966 0.99980
2 99271 99530 99698 99807
3 97036 97975 98625 99072
4 91823 94085 95762 96989
5 82701 86794 90037 92564
6 69929 75856 80876 85040
7 55029 62184 68663 74382
8 40129 47536 54704 61440
9 27091 33803 40745 47689
10 16950 22359 28338 34703
11 09852 13776 18412 23664
12 05335 07924 11192 15134
13 02700 04267 06380 09092
14 01281 02157 03418 05141
15 00572 01026 01726 02743
16 00241 00461 00823 01383
17 00096 00196 00372 00661
18 00036 00079 00160 00300
19 00013 00031 00065 00130
20 00005 00011 00025 00054
21 00004 00010 00020
22 00001 00003 00008
23 00001 00003
24 00001
628 TABLES
Table 4 (continued)
■*?
TABLES 629
Table 5
1
The function <j?(x) = --c 2
Jin
I
X <p(x) * V\x) X <f(x) * <Hx)
0.00 0.3989
0.01 0.3989 0.41 0.3668 0.81 0.2874 1.21 0.1919
0.02 0.3989 0.42 0.3653 0.82 0.2850 1.22 0.1895
0.03 0.3988 0.43 0.3637 0.83 0.2827 1.23 0.1872
0.04 0.3986 0.44 0.3621 0.84 0.2803 • 1.24 0.1849
0.05 I 0.3984 0.45 0.3605 0.85 0.2780 1.25 0.1826
0.06 0.3982 0.46 0.3589 0.86 0.2756 1.26 0.1804
0.07 0.3980 0.47 0.3572 0.87 0.2732 1.27 0.1781
0.08 0.3977 0.48 0.3555 0.88 0.2709 1.28 0.1758
0.09 0.3973 0.49 0.3538 0.89 0.2685 1.29 0.1736
0.10 0.3970 0.50 0.3521 0.90 0.2661 1.30 0.1714
0.11 0.3965 0.51 0.3503 0.91 0.2637 1.31 0.1691
0.12 0.3961 0.52 0.3485 0.92 0.2613 1.32 0.1669
0.13 0.3956 0.53 0.3467 0.93 0.2589 1.33 0.1647
0.14 0.3951 0.54 0.3448 0.94 0.2565 1.34 0.1626
0.15 0.3945 0.55 0.3429 0.95 0.2541 1.35 0.1604
0.16 0.3939 0.56 0.3410 0.96 0.2516 1.36 0.1582
0.17 0.3932 0.57 0.3391 0.97 0.2492 1.37 0.1561
0.18 0.3925 0.58 0.3372 0.98 0.2468 1.38 0.1539
0.19 0.3918 0.59 0.3352 0.99 0.2444 1.39 0.1518
0.20 0.3910 0.60 0.3332 1.00 0.2420 1.40 0.1497
0.21 0 3902 0.61 0.3312 1.01 0.2396 1.41 0.1476
0.22 0.3894 0.62 0.3292 1.02 0.2371 1.42 0.1456
0.23 0.3885 0.63 0.3271 1.03 0.2347 1.43 0.1435
0.24 0.3876 0.64 0.3251 1.04 0.2323 1.44 0.1415
0.25 0.3867 0.65 0.3230 1.05 0.2299 1.45 0.1394
0.26 0.3857 0.66 0.3209 1.06 0.2275 1.46 0.1374
0.27 0.3847 0.67 0.3187 1.07 0.2251 1.47 0.1354
0.28 0.3836 0.68 0.3166 1.08 0.2227 1.48 0.1334
0.29 0.3825 0.69 0.3144 1.09 0.2203 1.49 0.1315
0.30 0.3814 0.70 0.3123 1.10 0.2179 1.50 0.1295
0.31 0.3802 0.71 0.3101 1.11 0.2155 1.51 0.1276
0.32 0.3790 0.72 0.3079 1.12 0.2131 1.52 0.1257
0.33 0.3778 0.73 0.3056 1.13 0.2107 1.53 0.1238
0.34 0.3765 0.74 0.3034 1.14 0.2083 1.54 0.1219
0.35 0.3752 0.75 0.3011 1.15 0.2059 1.55 0.1200
0.36 0.3739 0.76 0.2989 1.16 0.2036 1.56 0.1182
037 0.3725 0.77 0.2966 1.17 0.2012 1.57 0.1163
0.38 0.3712 0.78 0.2943 1.18 0.1989 1.58 0.1145
0.39 0.3697 0.79 0.2920 1.19 0.1965 1.59 0.1127
0.40 0.3683 0.80 0.2897 1.20 0.1942 1.60 0.1109
1
630 TABLES
Table 5 (continued)
<p(x) * v(x)
X ?(*) X <P(x) X
Table 6
x
1 C _ -“1
The normal distribution function <P(x) = —- e 2 du
■J2* J.
X d>(x) X 0(X) X 0(X) X 0(x)
0.00 0.5000
0.01 0.5O4C 0 41 0.6591 0.81 0.7910 1.21 0.8869
0.02 0.5080 0 42 0.6628 0.82 0.7939 1.22 0.8888
0.03 0.5120 0.43 0.6664 0.83 0.7967 1.23 0.8907
0.04 0.5160 0.44 0.6700 0.84 0.7995 1.24 0.8925
0.05 0.5199 0.45 0.6736 0.85 0.8023 1.25 0.8944
0.06 0.5239 0.46 0.6772 0.86 0.8051 1.26 0.8962
0.07 0.5279 0.47 0.6808 0.87 0.8078 1.27 0.8980
0.08 0.5319 0.48 0.6844 0.88 0.8106 1.28 0.8997
0.09 0.5359 0.49 0.6879 0.89 0.8133 1.29 0.9015
0.10 0.5398 0.50 0.6915 0.90 0.8159 1.30 0.9032
0.11 0.5438 0.51 0.6950 0.91 0.8186 1.31 0.9049
0.12 0.5478 0.52 0.6985 0.92 0.8212 1.32 0.9066
0.13 0.5517 0.53 0.7019 0.93 0.8238 1.33 0.9082
0.14 0.5557 0.54 0.7054 0.94 0.8264 1.34 0.9099
0.15 0.5596 0.55 0.7088 0.95 0.8289 1.35 0.9115
0.16 0.5636 0.56 0.7123 0.96 0.8315 1.36 0.9131
0.17 0.5675 0.57 0.7157 0.97 0.8340 1.37 0.9147
0.18 0.5714 0.58 0.7190 0.98 0.8365 1.38 0.9162
0.19 0.5753 0.59 0.7224 0.99 0.8389 1.39 0.9177
0.20 0.5793 0.60 0.7257 1.00 0.8413 1.40 0.9192
0.21 0.5832 0.61 0.7291 1.01 0.8438 1.41 0.9207
0.22 0.5871 0.62 0.7324 1.02 0.8461 1.42 0.9222
0.23 0.5910 0.63 0.7357 1.03 0.8485 1.53 0.9236
0.24 0.5948 0.64 0.7389 1.04 0.8508 1.44 0.9251
0.25 0.5987 0.65 0.7422 1.05 0.8531 1.45 0.9265
0.26 0.6026 0.66 0.7454 1.06 0.8554 1.46 0.9279
0.27 0.6064 0.67 0.7486 1.07 0.8577 1.47 0.9292
0.28 0.6103 0.68 0.7517 1.08 0.8599 1.48 0.9306
0.29 0.6141 0.69 0.7549 1.09 0.8621 1.49 0.9319
0 30 0.6179 0.70 0.7580 1.10 0.8643 1.50 0.9332
0.31 0.6217 0.71 0.7611 1.11 0.8665 1.51 0.9345
0.32 0.6255 0.72 . 0.7642 1.12 0.8686 1.52 0.9357
0.33 0.6293 0.73 0.7673 1.13 0.8708 1.53 0.9370
0.34 0.6331 0.74 0.7703 1.14 0.8729 1.54 0.9382
0.35 0.6368 0.75 0.7734 1.15 0.8749 1.55 0.9394
0.36 0.6406 0.76 0.7764 1.16 0.8770 1.56 0.9406
0.37 0.6443 0.77 0.7794 1.17 0.8790 1.57 0.9418
0.38 0.6480 0.78 0.7823 1.18 0.8810 1.58 0.9429
0.39 0.6517 0.79 0.7853 1.19 0.8830 1.59 0.9441
0.40 0.6554 0.80 0.7881 1.20 0.8849 1.60 0.9452
632 TABLES
Table 6 ( continued)
00 o O
o o
d o'
O O o
o o O o o
o d o o o
+ vo O o o (
T-H CN
—< o o o o o o O o
o o o o o o o o
1 o o o ^H
N CN CO wo r-
o o o o o o o O O
o o
o o o o o o o o d o
II 1 o
o o T-H CN CO H" VO oo ro
o o o o O o o o O T-H
CN CN
o o o o o o o o o o
1 o o o
5i
<N + o o o T-H CN CO r- o H- <J\ NO
o o o o O o o T-H CN CO wo NO
d o o o o o o o o O O
o o o o o
n O o CN r- T-H r- CN CO NO
o o © o O o o oo ON vo
CN CO ■O’ vo r- ON o CO wo
o o o o o o o o d o o o o o
o - T-H
o o t-H CN n- CN oo 00 o WO CO «o O oo
- o o © O o o T-H T-H CN r- NO ON NO
VO ON CN 00 T-H wo ON
ii o d o d o o o o o o o o d T-H T-H
CN CN CN CO
The values of 100 Pn(c),
O •H CN wo o oo o NO r- CN CO ON o r^ ON NO
o q o O O ^■H CO tj- NO ON CN WO o oo wo O On
ON VO oo wo CO O
d o o o o o O o o o T-H T-H CN
CN CN co wo NO d
o <N NO WO ON ON <N NO
7
r- vo T-H CO CO 00 ON wo NO o ON O
as o O o t-H CN n- r- VO o NO CO T-H On 00 r- r- 00 ON
CN wo
Table
o o o d o d o T-H CN CN CO N- »ri NO d oo ON CN CO
T-H T-H
cn oo ON NO 00 CN o o o oo CO CO oo OO o NO CO
00 o q CN co CN oo NO vo •o NO 00 - N- ON On vo CN OO wo CN ON
d d o o o T—H CN CO vo NO 00 ON o CN co voi oo o <N CO
T—H T-H T-H T-H T-H CN CN CN
'w/
H
SO vo co co o rj- CO O «o CO Tt- CN »o CO ■'3* NO oo O o NO O ON
O CN VO CN os VO CO CN CN
t" CO NO oo o CN WO r- Os o CN CN
d o o *—< T—H co d wo r-' On T-H CO »o ON r-H NO OO d CN wo’ r- ON
T-H T-H T—H CN CN CN CN co CO CO co CO
H
'w CN CN vo wo VO ON »o O r- NO o r^* 00 O -wo o NO CN
VO CN t*- ON NO n- «o vo •o VO vo CO T-H On wo wo ON CN
o o T-H co WO wo ON CN VO oo T-H o CO NO ON CN o (N d d ON
t—( t-H T-H CN CN CN CO CO CO CO N" rt tj- wo wo wo wo wo
on o o o ON 00 wo 00 ON CO *o CO NO ON CO OO T—H O CN
t—H VO OS CN CN o
•o CO wo NO •o <N r- T-H <N CN o NO O <N CO CO oo
o <N WO 00 CN so t—< wo ON d oo CN NO o CO r- d CO NO OO ,-i’ CO wo
CN CN CN T—H CO CO N- »o VO
•o NO NO NO no r-
r-’ ON d
r- r- 00
"~l
■ Os »“H h* vo CN v—i CN VO NO o _ r-
CN t-H Os SO OO 00 VO NO O
o
OO CN CO
r- o Tf o o
OO CN wo NO wo
oo
r-
r-’ T—<
0o’ wo T-Hd co 00 CO r- ^H vo 00 o CO vo 00 ^ CN o CO wo wo
' 1 d CN co tj- VO
wo NO NO r- 00 OO oo 00 OO ON ON ON ON ON ON ON
^H O cn ^h ON NO O 00 NO CO CO r-H
•o o CN wo NO o W0 OO
fti m o o
VO CN On r- O OO CN CO T-H
oo CO CN o Tj- 1/0 NO r- OO 00 OO
vo r-* r-‘ SO co OO co NO ON CN CO »o NO r- r- 00 oo ON ON ON ON ON ON ON ON ON
II SO r- r- oo oo 00 ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON
I
o r-H
wo wo wo CN NO 00 ON ON O O O O o o O O O o O o o
cC o CN q
ON CO
ON ON ON ON ON o o o
O o o o o o o o o o
n
<u
co SO oo OO On on ON ON ON ON ON
CTs
On o o o
O Q o d d o d d o d
o o o o o o o o o o
T-H o T-H o
T-H o
os OS OS ON OS ON ON ON On ON ON
Vh T—. t-H t-H T-H T-H
<D 1“l
£ o © o © O O © O O O O O o o o o o o o o o o o o o o
o o o o o o o O O O oO o o o o o o o o o o o o o o
o o o o o o o O o o O o O o o O o O o d d o o o o o
o o o o o o o o o O o o o o o
T-H o T-H o o T-H o
T-H T-H o o
T-H o o
O O p
t 1 T_H r-H t—■1 T-H
,—h ^h ,-H
1 1 T“l
o /
o T—H o
T-H ^H o
wo so 00 ON CN co *o NO 00 ON <N CO T}- »/0 NO oo ON
,-H r“’
T-H T-H T-H CN CN CN CN CN CN CN CN CN fN CO
/ *■
1 TABLES
Table 8
K(z) Z K(z)
2 K{z) Z
1
0.71 0.305471 1.14 0.851394
0.28 0.000001
0.72 0.322265 1.15 0.858038
0.29 0.000004
0.73 0.339113 1.16 0.864442
0.30 0.000009
0.74 0.355981 1.17 0.870612
0.31 0.000021
0.75 0 372833 1.18 0.876548
0.32 0.000046
0.76 0.389640 1.19 0.882258
0.33 0.000091
0.77 0.406372 1.20 0.887750
0.34 0.000171
0.78 0.423002 1.21 0.893030
0.35 0.000303
0.79 0.439505 • 1.22 0.898104
0.36 0.000511
0.80 0.455857 1.23 0.902972
0.37 0.000826
0.81 0.472041 1.24 0.907648
0.38 0.001285
0.82 0.488030 1.25 0.912132
0.39 0.001929
0.002808 0.83 0.503808 1.26 0.916432
0.40
0.003972 0.84 0.519366 1.27 0.920556
0.41
0.005476 0.85 0.534682 1.28 0.924505
0.42
0.007377 0.86 0.549744 1.29 0.928288
0.43
0.009730 0.87 0.564546 1.30 0.931908
0.44
0.012590 0.88 0.579070 1.31 0.935370
0.45
0.016005 0.89 0.593316 1.32 0.938682
0.46
0.020022 0.90 0.607270 1.33 0.941848
0.47
0.024683 0.91 0.620928 1.34 0.944872
0.48
0.030017 0.92 0.634286 1.35 0.947756
0.49
0.036055 0.93 0.647338 1.36 0.950512
0.50
0.042814 0.94 0.660082 1.37 0.953142
0.51
0.050306 0.95 0.672516 1.38 0.955650
0.52
0.058534 0.96 0.684636 1.39 0.958040
0.53
0.067497 0.97 0.696444 1.40 0.960318
0.54
0.077183 0.98 0.707940 1.41 0.962486
0.55
0.087577 1.99 0.719126 1.42 0.964552
0.56
0.57 0.098656 1.00 0.730000 1.43 0.966516
0.58 0.110395 1.01 0.740566 1.44 0.968382
0.59 0 122760 1.02 0.750826 1.45 0.970158
0.60 0 135718 1.03 0.760780 1.46 0.971846
0.61 0.149223 1.04 0.770434 1.47 0.973448
0.62 0.163225 1.05 0.779794 1.48 0.974970
0.63 0.177753 1.06 0.788860 1.49 0.976412
0.64 0.192677 1.07 0.797636 1.50 0.977782
0.65 0.207987 1.08 0.806128 1.51 0.979080
0.66 0.223637 1.09 0.814342 1.52 0.980310
0.67 0.239582 1.10 0.822282 1.53 0.981476
0.68 0.255780 1.11 0.829950 1.54 0.982578
0.69 0.272189 1.12 0.837356 1.55 0.983622
0.70 0.288765 1.13 0.844502 1.56 0.984610
TABLES 635
( continued) Table 8
Table 9
(2k + l)2n2
8z3
a
0.02 0.03 0.04 0.05 0.06 0.07
o.ot
0.1
0.5
0.0000 0.0000
1.0
0.0000 0.0000 0.0000 0.0002 0.0009
1.5
0.0000 0.0001 0.0008 0.0036 0.0101 0.0212
2.0
0.0001 0.0022 0.0112 0.0299 0.0578 0.0925
2.5
0.0000 0.0015 0.0151 0.0474 0.0941 0.1487 0.2061
3.0
0.0001 0.0092 0.0491 0.1136 0.1879 0.2628 0.3341
3.5
0.0006 0.0291 0.1052 0.2001 0.2942 0.3804 0.4571
4.0
4.5 0.0031 0.0643 0.1776 0.2951 0.4001 0.4901 0.5665
5.0 0.0096 0.1134 0.2582 0.3895 0.4985 0.5873 0.6598
5.5 0.0225 0.1726 0.3406 ; 0.4784 0.5863 0.6707 0.7374
6.0 0.0428 0.2375 0.4204 0.5591 0.6627 0.7409 0.8005
6.5 0.0707 0.3045 0.4952 0.6310 0.7282 0.7989 0.8509
7.0 0.1053 0.3708 0.5638 0.6940 0.7834 0.8461 0.8904
7.5 0.1452 0.4347 0.6258 0.7484 0.8294 0.8838 0.9207
8.0 0.1889 0.4959 0.6811 0.7951 0.8671 0.9135 0.9436
8.5 0 2348 0.5513 0.7301 0.8345 0.8977 0.9365 0.9606
9.0 0.2819 0.6031 0.7731 0.8676 0.9221 0.9540 0.9729
9.5 0.3290 0.6506 0.8104 0.8950 0.9414 0.9672 0.9817
10.0 0.3754 0.6938 0.8427 0.9175 0.9564 0.9770 0.9878
11.0 0.4640 0.7678 0.8939 0.9505 0.9768 0.9891 0.9949
12.0 0.5450 0.8270 0.9303 0.9714 0.9882 0.9951 0.9980
13.0 0.6174 0.8734 0.9555 0.9841 0.9943 0.9980 0.9993
14.0 0.6812 0.9090 0.9724 0.9915 0.9974 0.9992 0.9998
15.0 0.7367 0.9358 0.9833 0.9956 0.9988 0.9997 0.9999
16.0 0.7844 0.9555 0.9902 0.9978 0.9995 0.9999 1.0000
17.0 0.8249 0.9697 0.9944 0.9990 0.9998 1.0000
18.0 0.8591 0.9797 0.9969 0.9995 0.9999
19.0 0.8876 0.9867 0.9983 0.9998 1.0000
20.0 0.9112 0.9915 0.9991 0.9999
21.0 0.9304 0.9946 0.9996 1.0000
22.0 0.9459 0.9967 0.9998
23.0 0.9584 0.9980 0.9999
24.0 0.9683 0.9988 1.0000
25.0 0.9760 0.9993
30.0 0.9949 1.0000
35.0 0.9991
40.0 0.9999
43.0 1.0000
TABLES 637
(continued) Table ?
REMARKS AND BIBLIOGRAPHICAL NOTES
These notes wish to call attention to books and papers which may be useful to the
reader for further study of subjects dealt with in the present textbook, including
books and papers to which reference was made in the text. For topics which are
treated in detail in some current textbook, we mention only such books, where the
reader can find further references.
As regards topics not discussed in standard textbooks, the sources of the material
contained in this book are given in greater detail. These bibliographic notes contain
often some remarks on the historical development of the problems dealt with, but
to give a full account of the history of probability theory was of course impossible.
For the history of Probability Calculus up to Laplace see Todhunter [1].
Concerning less-known theorems or methods from other branches of mathematics,
we refer to some current textbook readily accessible to the reader.
The notes are restricted to the most important methodical problems. On several
occasions the method of exposition chosen in the present book is compared in the
notes to that in other textbooks.
Chapter I
Glivenko was the first to stress in his textbook (Glivenko [3]; cf. also Kolmogorov
[9]) the advantage of discussing the algebra of events as a Boolean algebra before
the introduction of the notion of probability. It seems to us that the understanding
of Kolmogorov’s axiomatic theory is hereby facilitated. On the general theory of
measure and integration over a Boolean algebra instead of over a field of sets see
Caratheodory [1], Recent results on probability as a measure on a Boolean algebra
are summarized in Kappos [1].
§§ 1-4. On Boolean algebras in general see Birkhoff [1], Glivenko [2]. We did
not give a system of independent axioms for Boolean algebras, since it seemed to
us of much more importance to present the rules of Boolean algebra in a way which
makes clear the duality of the two basic operations.
§ 5. See Stone [1]. We follow here Frink [1]; as to the Lemma see Hausdorff [1]
and Frink [2],
§ 6. The unsolved problem mentioned in Exercise 7 was first formulated by Dede¬
kind (cf. Birkhoff [1], p. 147). Concerning Exercise 11 see e.g. Gavrilov [1].
REMARKS AND BIBLIOGRAPHICAL NOTES 639
Chapter II
i
640 REMARKS AND BIBLIOGRAPHICAL NOTES
Chapter III
Chapter IV
Chapter V
§ 1. The content of this section appeared first in the present book (German edition,
1962).
§ 2. We follow here Kolmogorov [5], For the Radon-Nikodym theorem, see e.g.
Halrnos [1 ].
§ 3. Concerning the new deduction of the Maxwell distribution given here, see
Renyi [19].
§ 4. We follow here Kolmogorov [5].
§ 6 and § 7. See Gebelein [1], Renyi [26], [28], Csaki-Fischer [1], [2], On the
Lemma of Theorem 3 of § 7, see Boas [1 ]. On Theorem 3, see Renyi [26], On the
applications of the large sieve of Linnik in number theory, see Linnik [1], Renyi [2],
Bateman-Chowla-Erdos [1].
§ 7. Exercises 1-4 treat problems of integral geometry from the point of view of
probability theory (see Blaschke [1]). Exercise 6: cf. Hajos-Renyi [1]; Exercises
28-30: Renyi [28].
REMARKS AND BIBLIOGRAPHICAL NOTES 641
Chapter VI
Chapter VII
Chapter VIII
§ 1. Chebyshev [1], Markov [1], Liapunov [1], Lindeberg [1], Polya [1], Feller [1],
Khinchin [3], Gnedenko-Kolmogorov [1], Kolmogorov [5], [11], Prekopa-Renyi-
Urbanik [1 ].
§ 2. Gnedenko [2],
§ 3. Levy [4], Feller [1], Khinchin [5], and also Gnedenko-Kolmogorov [1],
§ 5. Erdos-Renyi [1]; for the particular case p — const, cf. Bernstein [4].
§ 6. For the lemma, cf. Cramer [3], Theorem 3 was first, under certain restrictions,
proved by a different method (cf. Renyi [3]). This result was generalized by Kolmo¬
gorov [10], For the more simple proof given here, cf. Renyi [24], Theorem 3 may be
applied to prove limit theorems for dependent random variables; cf. Revesz [1],
On the central limit theorem for dependent random variables, see the fundamental
paper of Bernstein [3].
§ 7. Anscombe [1], Doeblin [2], Renyi [22], [31],
§ 8. On the theory and applications of Markov chains and Markov processes
see Markov [1], Kolmogorov [3], [7]; Doeblin [1], [2], Feller [2], [3], [6], [7],
Doob [2], Chung [1], Bartlett [1], Bharucha-Reid [1], Wiener [2], Chandrasekhar
[1], Einstein [1], Flostinsky [1], Levy [3], Renyi [11],
§ 9. Renyi [9], [10], van Dantzig [1], Malmquist [1]; further references are to
be found in Wilks [1], Wang [1].
§ 10. Kolmogorov [6], N. V. Smirnov [1], [2], Gnedenko [1], Gnedenko-Koroljuk
[1], Doob [1], Feller [5], Donsker [1].
§ 11. For Theorem 1: Polya [2], See also Dvoretzky-Erdos [1], On the arc sine
law (Theorem 6) cf. Levy [2], Erdos-Kac [2]; Sparre-Andersen [1], [2]; Chung-
Feller [1], Renyi [33], On Lemma 2, Renyi [36]; for other generalizations, Spitzer
[1], For Theorem 8, see Erdos-Hunt [1]; for Theorem 9, Erdos-Kac [1]; for a gen¬
eralization of it, Renyi [9].
§ 12. Lindeberg [1], Krickeberg [1].
§ 13. Exercise 5: Renyi [17]; for a similar general system of independent functions,
see Steinhaus, Kac and Ryll-Nardzewski [1 ]— [10], Renyi [7]. Exercise 8: Kac [1];
Exercise 24: Wilcoxon [1] and Renyi [12]; Exercise 25: Lehmann [1] and Renyi
[12]; the equivalence of the problems considered in these two papers is proved in
E. Csaki [1], Exercise 28: Erdos-Renyi [3], The result of Exercise 30 is due to
Chung and Feller [1]; as regards the presentation given here, cf. Renyi [33],
Appendix
On the concepts of the entropy and information see Boltzmann [1], Hartley [1],
Shannon [1], Wiener [1], Shannon-Weaver [1], Woodward [1], Barnard [1], Jeffreys
[1], and the papers of Khinchin. Fadeev, Kolmogorov, Gelfand, Jaglom, etc., in
Arbeiten zur Informationstheorie I—III .On the role of the notion of information in sta¬
tistics, see the works of Fisher [1 ]—[3], and of Kullback [1 ]. The notion of the dimension
of a probability distribution and that of the entropy of the corresponding dimension
were introduced in a paper of Balatoni and Renyi (Arbeiten zur Informationstheorie I)
and were further developed in Renyi [27], [30]. Measures of information differing
from the Shannon-measure were already considered earlier, e.g. by Bhattacharyya
[1] and Schiitzenberger [1]; the theory of entropy and information of order a is
developed in Renyi [34], [37].
REMARKS AND BIBLIOGRAPHICAL NOTES 643
Part of the material appeared for the first time in the German edition of this book.
This appendix covers merely the basic notions of information theory; their appli¬
cation to the transmission of information through a noisy channel, coding theory,
etc. are not dealt with here. Besides the already mentioned works of Shannon and
Khinchin let there be indicated those of Feinstein [1], [2], McMillan [1] and Wol-
fowitz [1], [2], [3],
§ 1. Concerning the theorem of Erdos on additive number-theoretical functions,
which was rediscovered by Fadeev, see Erdos [2]; the simple proof given in the text
is due to Renyi [29].
§ 2. For the theorem of Mercer, see Knopp [1].
§ 6. On the mean value theorem, see de Finetti [2], Kolmogorov [4], Nagumo [1],
Aczel [1], Hardy-Littlewood-Polya [1] (where further references can be found;
this book contains also all other inequalities used in the Appendix, e.g. the inequalities
of Jensen and of Holder).
§ 9. The idea that quantities of information theory may be used for the proof of
limit theorems is due to Linnik [2].
On the theorem of Perron-Frobenius, see Gantmacher [1],
§ 11. For Exercise2c, see Renyi [16] and Kac [2]; Exercise 3c: Khinchin [6], for
the generalizations: Renyi [21]. Exercise 4: Kolmogorov-Tikhomirov (Arbeiten zur
Informationstheorie III). The content of Exercise 9 is due to Chung (unpublished
communication), the proof given here differs from that of Chung. Exercise 17b. cf.
Moriguti [1].
Tables
Aczel, J.
[1] On mean values, Bull. Amer. Math. Soc. 54, 393-400 (1948).
[2] On composed Poisson distributions, III, Acta Math. Acad. Sci. Hung 3, 719-
224 (1952).
Aczel, J., L. Janossy and A. Renyi
[1] On composed Poisson distributions, I, Acta Math. Acad. Sci. Hung. 1, 209-224
(1950).
Alexandrov, P. S. (AneKcaimpoB, C.) [1 ] BBenerme b o6myio Teopmo mhokcctb
h (JjyHKiMft (Introduction to the theory of sets and functions), OGIZ, Moscow-
Leningrad 1948.
Anscombe, F. J.
[1] Large sample theory of sequential estimation, Proc. Cambridge Phil. Soc 48,
600 (1952).
Arato, M. and A. Renyi
[1] Probabilistic proof of a theorem on the approximation of continuous functions
by means of generalized Bernstein polynomials, Acta Math. Acad. Sci. Hung 8,
91-98 (1957).
ARBEITEN ZUR INFORMATIONSTHEORIE I—III (Teil von A. J. Chintschin,
D. K. Faddejew, A. N. Kolmogoroff, A. Renyi und J. Balatoni; Teil II von
I. M. Gelfand, A. M. Jaglom, A. N. Kolmogoroff, Chiang Tse-Pei, I. P.
Zaregradski; Teil III von A. N. Kolmogoroff und W. M. Tichomirow),
VEB Deutscher Verlag der Wissenschaften, Berlin 1957 bzw. 1960.
Aumann, G.
[1] Reelle Funktionen, Springer-Verlag, Berlin-Gottingen-Heidelberg 1954.
Baticle, E.
[2] Sur une loi de probability a priori pour Interpretation des resultats de tirages
dans une urne, C. R. Acad. Sci. Paris 228, 902-904 (1949).
Bauer, H. .
[1] Wahrscheinlichkeitstheorie und Grundziige der Masstheorie, Sammlung uo-
schen 1216/1216a, de Gruyter, Berlin 1964.
[1] Essay towards solving a problem in the doctrine of chances, Ostwald s Klas-
siker der Exakten Wissenschaften”, Nr. 169, W. Engelmann, Leipzig 1908.
Bernoulli, J. , ,
[1] Ars Coniectandi (1713) I—II, III—IV. “Ostwald s Klassiker der Exakten
Wissenschaften”, Nr. 108, W. Engelmann, Leipzig 1899.
Bernstein, S. N. (BepmuTeiiH, C. H.)
[1 ] Demonstration du theoreme de Weierstrass fondee sur la calcul des probabilites,
Soobshchs. Charkovskovo Mat. Obshch. (2) 13, 1-2 (1912).
[2] OnbiT aKcnoMaTHHecKoro odocHOBaHMH Teopmi BepoHTHOcreH (On a tentative
axiomatisation probability theory), Charkovskovo Zap. Mat. ot-va 15,
209-274 (1917).
[3] Sur l’extension du theoreme limite du calcul des probabilites aux sommes de
quantites dependantes. Math. Ann. 97, 1-59 (1926).
[4] Teopun BepoflTHOCTen (Probability theory), 4. ed., Goztehizdat, Moscow 1946.
Bharucha-Reid, A. T.
[1 ] Elements of the theory of Markov processes and their applications, McGraw-
Hill, New York 1960.
Bhattacharyya, A.
[1] On some analogues of the amount of information and their use in statistical
estimation, Sankhya 8, 1-14 (1946).
Bienayme, M.
[1] Considerations a l’appui de la decouverte de Laplace sur la loi des probabilites
dans la methode des moindres carres, C. R. Acad. Sci. Paris 37, 309-324 (1853).
Birkhoff, G.
[1] Lattice theory, 3. ed., American Mathematical Society Colloquium Publications
25. AMS, Providence 1967.
Blanc-Lapierre, A., et R. Fortet
[1] Theorie des fonctions aleatoires, Masson et Cie., Paris 1953.
Blaschke, W.
[1] Vorlesungen iiber Integralgeometrie, 3. AufL, VEB Deutscher Verlag der
Wissenschaften, Berlin 1955.
Blum, J. R., D. L. Hanson and L. H. Koopmans
[1] On the strong law of large numbers for a class of stochastic processes, Zeit-
schrift fur Wahrscheinlichkeitstheorie 2, 1-11 (1963).
Boas, R. P. Jr.
[1] A general moment problem, Amer. J. Math. 63, 361-370 (1941).
Bochner, S. and S. Chandrasekharan
[1] Fourier transforms, Princeton Univ. Press, Princeton 1949.
Boltzmann, L.
[1] Vorlesungen iiber Gastheorie, Johann Ambrosius Barth, Leipzig 1896.
Borel, E.
[1] Sur les probabilites denombrables et leurs applications arithmetiques, Rend.
Circ. Mat. Palermo 26, 247-271 (1909).
[2] Elements de la theorie des probabilites, Hermann et Fils, Paris 1909.
REFERENCES 647
VON Bortkiewicz, L.
[1] Das Gesetz der kleinen Zahlen, B. G. Teubner, Leipzig 1898.
de Bruijn, N. G.
[1] Asymptotic methods in analysis, North Holland Publ. Comp. Inc., Amsterdam
1958.
Cantelli, F. P.
[1] La tendenza ad un limite nel senzo del calcolo delle probabilita, Rend. Circ.
Mat. Palermo 16, 191-201 (1916).
Caratheodory, C.
[1 ] Entwurf einer Algebraisierung des Integralbegriffes, Sitzungsber. Math.-Natur-
wiss. Klasse Bayer, Akad. Wiss., Munchen 1938, S. 24-28.
Chandrasekhar, S.
[1 ] Stochastic problems in physics and astronomy, Rev. Mod. Phys. 15, 1-89 (1943).
Chebyshev, P. L. (He6bimeB, n. Jl.)
[1] Teopun BepoHTHOCTeM (Theory of probability), Akad. izd., Moscow 1936.
Chung, K. L.
[1] Markov chains with stationary transition probabilities, Springer-Verlag, Berlin-
Gottingen-Heidelberg 1960.
Chung, K. L., and P. Erdos
[1] Probability limit theorems assuming only the first moment, I, Mem. Amer.
Math. Soc. 6, 1-19 (1950).
Chung, K. L. and W. Feller
[1] On fluctuations in coin-tossing, Proc. Acad. Sci. USA 35, 605-608 (1949).
Cramer, H.
,
[1] Uber eine Eigenschaft der normalen Verteilungsfunktion, Math. Z. 41 405-414
(1936).
[2] Random variables and probability distributions, Cambridge Univ. Press,
Cambridge 1937.
[3] Mathematical methods of statistics, Princeton Univ. Press, Princeton 1946.
Cramer, H. and H. Wold
[1] Some theorems on distribution functions, J. London Math. Soc. 11 , 290-294
(1936).
CsAKI, E.
[1] On two modifications of the Wilcoxon-test, Publ. Math. Inst. Hung. Acad.
,
Sci. 4 313-319 (1959).
Csaki, P. and J. Fischer
[1] On bivariate stochastic connection, Publ. Math. Inst. Hung. Acad. Sci. 5,
311-323 (1960).
[2] Contributions to the problem of maximal correlation, Publ. Math. Inst. Hung.
Acad. Sci. 5, 325-337 (1950).
CsaszAr, A.
[1] Sur la structure des espaces de probabilite conditionelle, Acta Math. Acad. Sci.
Hung. 6, 337-361 (1955).
[2] Sur une caracterisation de la repartition normale de probability, Acta Math.
Acad. Sci. Hung. 7, 359-382 (1956).
VAN DANTZIG, D.
[1] Mathematische Statistiek, “Kadercursus Statistiek, 1947-1948”, Mathema-
tisch Centrum, Amsterdam 1948.
648 REFERENCES
Darmois, G.
[1] Analyse generate des liaisons stochastiques, Revue Inst. Internat. Stat. 21, 2-8
(1953).
Doeblin, W.
[1] Sur les proprietes asymptotiques de mouvements regis par certains types de
chaines simples, Bull. Soc. Math. Roumaine Sci. 391, 57-115 (1937), 3911,
3-61 (1937).
[2] Elements d’une theorie generate des chaines simples constantes de Markov,
Ann. Sci. Ecole Norm. Sup. (3) 57, 61-111 (1940).
Donsker, M. D.
[1] Justification and extension of Doob's heuristic approach to the Kolmogorov-
Smirnov theorems, Ann. Math. Stat. 23, 277-281 (1952).
Doob, J. L.
[1] Heuristic approach to the Kolmogorov-Smirnov theorems, Ann. Math. Stat.
20, 393 (1949).
[2] Stochastic processes, Wiley-Chapman, New York-London 1953.
Dugue, D.
[1] Arithmetique de lois de probabilites, Mem. Sci. Math., No. 137, Gauthier-
Villars, Paris 1957.
Dumas, M.
[1] Sur les lois de probabilites divergentes et la formule de Fisher, Intermed. Rech.
Math. 9 (1947), Supplement 127-130.
[2] Interpretation de resultats de tirages exhaustifs, C. R. Acad. Sci. Paris 288,
904-906 (1949).
(See the note from E. Borel after Dumas’ article, too.)
Dvoretzky, A. and P. Erdos
[1] Some problems on random walk in space, Proc. 2nd Berkeley Symp. Math.
Stat. Prob. 1950, Univ. California Press, Berkeley-Los Angeles 1951, 353-367.
[2] On Cantor’s series with convergent Y —, Ann. Univ. Sci. Budapest, Rolando
^ <fn
Eotvos nom., Sect. Math. 2, 93-109 (1959).
REFERENCES 649
Feinstein, A.
[1] A new basic theorem of information theory, Trans. Inst. Radio Eng., 2-22
(1954).
[2] Foundations of information theory, McGraw-Hill, New York 1958.
Feldheim, E.
[1 ] £tude de la stabilite des lois de probability Dissertation, Univ. Paris, Paris 1937.
[2] Neuere Beweise und Verallgemeinerung der wahrscheinlichkeitstheoretischen
Satze von Simmons, Mat. Fiz. Lapok 45, 99-114 (1938).
Feller, W.
[1] Ober den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung, Math. Z.
40, 521-559 (1935); 42, 301-312 (1947).
[2] Zur Theorie der stochastischen Prozesse, Existenz- und Eindeutigkeitssatze,
Math. Ann. 113, 113-160 (1936).
[3] On the integro-differenlial equations of purely discontinuous Markov processes,
Trans. Amer. Math. Soc. 48, 488-515 (1940). Errata: ibidem 58, 474 (1945).
[4] The law of the iterated logarithm for identically distributed random variables,
Ann. Math. 47, 631-638 (1946).
[5] On the Kolmogorov-Smirnov limit theorems for empirical distributions, Ann.
Math. Stat. 19, 177-189 (1948).
[6] On the theory of stochastic processes, with particular reference to applications,
Proc. Berkeley Symp. Math. Stat. Prob. 1945, 1946, Univ. California Press,
Berkeley-Los Angeles 1949, 403-432.
[7] An introduction to probability theory and its applications, Vols 1-2, Wiley,
New York 1950-1966.
DE FlNETTI, B.
[1] Funzione caratteristica di un fenomeno aleatorio, Mem. R. Accad. Lincei (6)
4, 85-133 (1930).
[2] Sul concetto di media, Giorn. 1st. Ital. Att. 2, 369-396 (1931).
Fisher, R. A.
[1] Statistical methods for research workers, 10th edition, Oliver-Boyd Ltd.,
Edinburgh-London 1948.
[2] The design of experiments, Oliver-Boyd Ltd., London-Edinburgh 1949.
[3] Contributions to mathematical statistics, Wiley-Chapman, New York-London
1950.
Fisher, R. A. and F. Yates
[1] Statistical tables for biological, agricultural and medical research, Oliver-Boyd
Ltd., London-Edinburgh 1949.
Fisz, M.
[1] Probability theory and mathematical statistics, 3. ed. Wiley, New York 1963.
Florek, K., E. Marczewski and C. Ryll-Nardzewski
[1] Remarks on the Poisson stochastic process, I, Studia Math. 13,122-129(1953).
650 REFERENCES
Frechet, M.
11] Recherches theoriques modernes, Fascicule 3 du Tome 1 du Traite du calcul
des probabilities par E. Borel et divers auteurs, Gauthier-Villars, Paris 1937.
[2] Les probabilites associees a un systeme d’evenements compatibles et dependants,
I-II, Hermann et Cie., Paris 1940 and 1943.
Frink, O.
[1] Representations of Boolean algebras. Bull. Amer. Math. Soc. 47, 755—756
(1941).
[2] A proof of the maximal chain theorem, Amer. J. Math. 74, 676-678 (1952).
Jeffreys,H.
[1] Theory of probability, 2nd edition, Clarendon Press, Oxford 1948.
Jordan, Ch.
[1] On probability, Proc. Phys. Math. Soc. Japan 7, 96-109 (1925).
[2] Statistique mathematique, Gauthier-Villars, Paris 1927.
[3] Le theoreme de probability de Poincare, generalise au cas de plusieurs variables-
independantes, Acta Sci. Math. Szeged 7, 103-111 (1934).
[4] Calculus of finite differences, 2nd edition, Chelsea Publ. Comp., New York 1950.
[5] Fejezetek a klasszikus valoszinusegszamitasbol (Chapters from the classical
calculus of probabilities), Akademiai Kiado, Budapest 1956.
Kac, M.
[1 ] Random walk and the theory of Brownian motion, Amer. Math. Monthly 54 ,
369-391 (1947).
652 REFERENCES
Kac, M.
[2] A remark on the proceeding paper by A. Renyi, Publ. Inst. Math. Beograd 8,
163-165 (1955).
[3] Probability and related topics in physical sciences. Lectures in applied mathe¬
matics, Vol. I, lntersci. Publ., London-New York 1959.
[4] Statistical independence in probability, analysis and number theory, Math.
Assoc. America 1959.
Kantorovitch, L. V. (KaHToposmi, JI. B.)
[1] Sur une probleme de M. Steinhaus, Fund. Math: 14, 266-270 (1929).
Kappos, D. A.
[1] Strukturtheorie der Wahrscheinlichkeitsfelder und -raume, Springer-Verlag,
Berlin-Gottingen-Heidelberg 1960.
Kawata, I. and H. Sakamoto
[1] On the characterization of the normal population by the independence of the
sample mean and the sample variance, J. Math. Soc. Japan 1, 111-115 (1949).
Khinchin, A. J. (Xhhrhh, A. R.)
,
[1] Uber dyadische Briiche, Math. Z. 18 109-118 (1923).
[2] Sur les classes d’evenements equivalents. Mat. Sbornik 39:3, 40-43 (1932).
[3] Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Springer, Berlin 1933.
,
[4] Korrelationstheorie der stationarer stochastischer Prozesse, Math. Ann. 109 604-
615 (1934).
[5] Sul dominio di attrazione della legge di Gauss, Giorn. 1st. Ital. Att. 6, 378-393
(1935).
[6] Kettenbriiche, B. G. Teubner, Leipzig 1956.
[7] O KJiaccax BKBHBajieHTHbix co6mthh (On classes of equivalent events), Dok-
,
ladi Akad. Nauk. SSSR 85 713-714 (1952).
Khinchin, A. J. und A. N. Kolmogorov (Xhhhhh, A. R. h A. H. Kojimotopob)
[1 ] Uber Konvergenz von Reihen, deren Glieder durch den Zufall bestimmt werden.
Mat. Sbornik 32, 668-677 (1925).
(Khinchin) Chintschin, A. J. et P. Levy
[1] Sur les lois stables, C. R. Acad. Sci. Paris 202, 374-376 (1936).
Knopp, K.
[1] Theorie und Anwendung der unendlichen Reihen, Springer, Berlin 1924.
Roller, S.
[1] Graphische Tafeln zur Beurteilung statistischer Zahlen, Steinkopff, Dresden-
I.eipzig 1943.
Kolmogorov, A. N. (Kojimotopob, A. H.)
,
[1] Uber das Gesetz des iterierten Logarithmus, Math. Ann. 101 126-136 (1929).
,
[2] Sur la loi forte des grandes nombres, C. R. Acad. Sci. Paris 191 910-912(1930).
[3] Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung, Math.
,
Ann. 104 415-458 (1930).
,
[4] Sur la notion de la moyenne, Atti R. Accad. Naz. Lincei 12 388-391 (1930).
[5] Foundations of the theory of probability, Chelsea, New York 1956.
[6] Sulla determinazioneempiricadi una legge di distribuzione, Giorn. 1st. Ital. Att.
4, 83-91 (1933).
[7] Ilenbi MapKOBa c chcthbim mhoxccctbom bo3mojkhbix coctohhhh (Markov
chains with denumerably infinite possible states), Bull. Mosk. Univ. 1, 1 (1937).
[8] O jiorapH(})MHMecKH HopMajibHOM 3aKOHe pacnpeflejieHHH pa3MepoB nacTHii npn
UPodjieHHH (On the lognormal distribution of the sizes of particles in chop¬
ping), Dokl. Akad. Nauk. SSSR 31, 99-101 (1941).
[9] Algebres de Boole metriques completes, VI. Zjad Matematykow Polskich,
Warsaw 20-23. IX. 1948, Inst. Math. Univ. Krakow. 1950. 22-30.
REFERENCES 653
Laha, R. G.
[1 ] An example of a non-normal distribution where the quotient follows the Cauchy
law, Proc. Nat. Acad. Sci. USA 44 222-223 (1958).,
Laplace, P. S.
Malmquist, S.
[1] On a property of order statistics from a rectangular distribution, Skand.
Aktuarietidskrift 33, 214-222 (1950).
Marczewski, E.
[1] Remarks on the Poisson stochastic process, II, Studia Math. 13, 130-136 (1953).
Markov, A. A. (MapKOB, A. A.)
[1] Wahrscheinlichkeitsrechnung, B. G. Teubner, Leipzig 1912.
McMillan, B.
[1] The basic theorems of information theory, Ann. Math. Stat. 24, 196-219 (1953).
Medgyessy, P.
[1] Decomposition of superpositions of distribution functions, Akad. Kiado,
Budapest 1961.
von Mises, R.
[1] Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theore-
tischen Physik, Deuticke, Leipzig-Wien 1931.
[2] Wahrscheinlichkeit, Statistik und Wahrheit, Springer-Verlag, Berlin 1952.
Mogyorodi, J.
[1] On a consequence of a mixing theorem of A. Renyi, MTA Mat. Kut. Int. Kozl.,
9, 263-267 (1964).
Molina, F. C.
[1] Poisson’s exponential binomial limit, van Nostrand, New York 1942.
Moriguti, S.
[1 ] A lower bound for a probability moment of an absolutely continuous distribution
with finite variance, Ann. Math. Stat. 23, 286-289 (1952).
Nagumo, M.
[1] IJber eine Klasse von Mittelwerten, Japan. J. Math. 7, 71-79 (1930).
REFERENCES 655
Neveu, J.
[1] Mathematical foundations of the calculus of probability, Holden-Day Inc.,
San Francisco 1965.
Neyman, J.
[1] L’estimation statistique traite comme un probleme classique de probabilite.
Act. Sci. Industr., Nr. 739, Gauthier-Villars, Paris 1938.
[2] First course in probability and statistics, H. Holt et Co., New York 1950.
Onicescu, O. et G. Mihoc
[1] La dependance statistique. Chaines et families de chaines discontinues, Act.
Sci. Industr., Nr. 503, Gauthier-Villars, Paris 1937.
Onicescu, O., G. Mihoc si C. T. Ionescu-Tulcea
[1] Calculul probabilitatilor si applicatii, Bucuresti 1956.
Parzen, E.
[1] Modern probability theory and its applications, Wiley, New York 1960.
Pearson, E. S. and H. O. Hartley
[1] Biometrical tables for statisticians, Cambridge Univ. Press, Cambridge 1954.
Pearson, K.
[1] Early statistical papers, Cambridge Univ. Press, Cambridge 1948.
Poincare, H.
[1] Calcul des probability, Carre-Naud, Paris 1912.
Poisson, S. D.
[1] Recherches sur la probabilite de judgements, Bachelier, Pans 1837.
Polya, G.
[1] Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das
Momentproblem, Math. Z. 8, 171-181 (1920).
[2] Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt
im StraBennetz, Math. Ann. 84, 149-160 (1921).
Polya, G. und G. Szego
[1] Aufgaben und Lehrsatze aus der Analysis, I—II, Springer, Berlin 1925.
Popper, K.
[ 1 ] Philosophy of science: A personal report, British Philosophy in the Mid-Century,
ed. by C. A. Mace, 1956, p. 191.
[2] The logic of scientific discovery, Hutchinson, London 1959.
Prekopa A.
[1] On composed Poisson distributions, IV, Acta Math. Acad. Sci. Hung. 3,
317-326 (1952).
[2] Valoszinusegelmelet muszaki alkalmazasokkal (Probability theory and its
applications in technology), Muszaki Konyvkiado, Budapest 1962.
Prekopa, A., A. Renyi and K. Urbanik
[1] O npenejibHOM pacpnenejieHHH nna cyMM neiaBHCHMbix cjiysaiinbix bcjihmhh
Ha 6nKOMnaKTHbix KOMMyTaTHBHbix TonojiorHHecKHx rpynnax (On limit dis¬
tribution of sums of independent random variables over bicompact commuta¬
tive topological groups), Acta Math. Acad. Sci. Hung. 7, 11-16 (1956).
Reichenbach, H.
[1] Wahrscheinlichkeitslehre, Sijthoff, Leiden 1935.
656 REFERENCES
Renyi, A.
[1] Simple proof of a theorem of Borel and of the law of the iterated logarithm.
Mat. Tidsskrift B, 41-48 (1948).
[2] O npeacTaBJieHHH neTHbix HHcea b Birae cyMMti npocToro h nonra npocToro
HHCJia (On the representation of even numbers as sums of a prime and an al¬
most prime number), Izvestia Akad. Nauk. SSSR, Ser. Mat. 12, 57-78 (1948).
[3] K TeopHH npeaejibHbix TeopeM rjib cyMM ue3aBHCHMbix cnywafiHb;x bchhhhh
(On limit theorems of sums of independent random variables), Acta Math.
Acad. Sci. Hung. 1, 99-108 (1950).
[4] On the algebra of distributions, Publ. Math. Debrecen 1, 135-149 (1950).
[5] On composed Poisson distributions, II, Acta Math. Acad. Sci. Hung. 2, 83-98
(1951).
[6] On some problems concerning Poisson processes, Publ. Math. Debrecen 2,
66-73 (1951).
[7] On a conjecture of H. Steinhaus, Ann. Soc. Polon. Math. 25, 279-287 (1952).
[8] On projections of probability distributions, Acta Math. Acad. Sci. Hung. 3,
131-142 (1952).
[9] On the theory of order statistics, Acta Math. Acad. Sci. Hung. 4, 191-232
(1953).
[10] Eine neue Methode in der Theorie der geordneten Stichproben, Bericht iiber
die Mathematiker-Tagung Berlin 1953, VEB Deutscher Verlag der Wissen-
schaften, Berlin 1953. 203-213.
[11] Kemiai reakciok targyalasa a sztochasztikus folyamatok elmelete segitsegevel
(On describing chemical reactions by means of stochastic processes), A Magyar
Tudomanyos Akademia Alkalmazott Matematikai Intezetenek Kozlemenyei 2,
596-600 (1953) (In Hungarian).
[12] Ujabb kriteriumok ket minta osszehasonlitasara (Some new criteria for com¬
parison of two samples), A Magyar Tudomanyos Akademia Alkalmazott
Matematikai Intezetenek Kozlemenyei 2, 243-265 (1953) (In Hungarian).
[13] Valoszinusegszamitas (Probability theory), Tankonyvkiado, Budapest 1954
(In Hungarian).
[14] Axiomatischer Aufbau der Wahrscheinlichkeitsrechnung, Bericht iiber die
Tagung Wahrscheinlichkeitsrechnung und Mathematische Statistik, VEB
Deutscher Verlag der Wissenschaften, Berlin 1954, 7-15.
[15] On a new axiomatic theory of probability, Acta Math. Acad. Sci. Hung. 6,
285-335 (1955).
[16] On the density of sequences of integers, Publ. Inst. Math. Beograd 8, 157-162
(1955).
[17] A szamjegyek eloszlasa valos szamok Cantor-fele eloallitasaiban (The distri¬
bution of the digits in Cantor’s representation of the real numbers), Mat. Lapok
7, 77-100 (1956) (In Hungarian).
[18] On conditional probability spaces generated by a dimensionally ordered set of
measures, Teor. Verojatn. prim. 1, 61-71 (1956).
[19] A new deduction of Maxwell’s law of velocity distribution, Isv. Mat. Inst. Sofia 2,
45-53 (1957).
[20] A remark on the theorem of Simmons, Acta Sci. Math. Szeged. 18, 21-22
(1957).
[21 ] Representations for real numbers and their ergodic properties, Acta Math.
Acad. Sci. Hung. 8, 477-493 (1957).
[22] On the asymptotic distribution of the sum of a random number of independent
random variables, Acta Math. Acad. Sci. Hung. 8, 193-199 (1957).
REFERENCES 657
Renyi, A.
[23] Quelques remarques sur les probabilities des evenements dependantes, J. Math,
pures appl. 37, 393-398 (1958).
[24] On mixing sequences of sets, Acta Math. Acad. Sci. Hung. 9, 215-228 (1958).
[25] Probabilistic methods in number theory. Proceedings of the International
Congress of Mathematicians, Edinburgh 1958, 529-539.
[26] New version of the probabilistic generalization of the large sieve, Acta Math.
,
Acad. Sci. Hung. 10 217-226 (1959).
[27] On the dimension and entropy of probability distributions, Acta Math. Acad.
,
Sci. Hung. 10 193-215 (1959).
,
[28] On measures of dependence, Acta Math. Acad. Sci. Hung. 10 441-451 (1959).
[29] On a theorem of P. Erdds and its applications in information theory, Mathe-
matica Cluj 1 (24), 341-344 (1959).
[30] Dimension, entropy and information. Transactions of the IE Prague Conference
on Information theory, statistical decision functions, random processes, Praha
1960, 545-556.
[31] On the central limit theorem for the sum of a random number of independent
,
random variables, Acta Math. Acad. Sci. Hung. 11 97-102 (1960).
[32] Az apritas matematikai elmeleterol (On the mathematical theory of chopping),
Epitoanyag 1-8 (1960) (In Hungarian).
[33] Bolyongasi problemakra vonatkozo hatareloszl&stetelek (Limit theorems in
random walk problems), A Magyar Tudomanyos Akademia III (Matematikai
,
es Fizikai) Osztalyanak Kozlemenyei 10 149-170 (1960) (In Hungarian).
[34] Az informacioelmelet nehany alapveto kerdese (Some fundamental problems
of the information theory), A Magyar Tudomanyos Akademia III (Matematikai
,
es Fizikai) Osztalyanak Kozlemenyei 10 251-282 (1960) (In Hungarian).
[35] Egy altalanos modszer valoszinusegszamitasi tetelek bizonyitasara (A general
method for proving theorems in probability theory), A Magyar Tudomanyos
Akademia III (Matematikai es Fizikai) Osztalyanak Kozlemenyei 11, 79-105
(1961) (In Hungarian).
[36] Legendre polynomials and probability theory, Ann. Univ. Sci. Budapest, R.
Eotvos nom.. Sect. Math. 3-4, 247-251 (1961).
[37] On measures of entropy and informations. Proc. Fourth Berkeley Symposium
on Math. Stat. Prob. 1960, Vol. I, Univ. California Press, Berkeley-Los Angeles
1961, 547-561.
[38] On stable sequences of events, Sankhya A 25, 293-302 (1963).
[39] On certain representations of real numbers and on equivalent events, Acta Sci.
Math. Szeged 26, 63-74 (1965).
[40] Uj modszerek es eredmenyek a kombinatbrikus analizisben (New methods
and results in combinatorial analysis), A Magyar Tudomanyos Akademia III
,
(Matematikai es Fizikai) Osztalyanak Kozlemenyei 16 75-105, 159-177
(1966) (In Hungarian).
[41] Sur les espaces simples des probability conditionnelles, Ann. Inst. H. Poincare
,
B 1 3-19 (1964).
[42] On the foundations of information theory, Review of the International Statis¬
tical Institute 33, 1-14 (1965).
Renyi, A. and P. Revesz
[1] On mixing sequences of random variables, Acta Math. Acad. Sci. Hung. 9,
389-393 (1958).
[2] A study of sequences of equivalent events as special stable sequences, Publi-
,
cationes Mathematicae Debrecen 10 319-325 (1963).
658 REFERENCES
Saxer, W.
[1] Versicherungsmathematik, II, Springer-Verlag, Berlin-Gottingen-Heidelberg
1958.
Schmetterer, L.
[1] Einfuhrung in die mathematische Statistik, Springer-Verlag, Wien 1956.
Schutzenberger, M. P.
[1] Contributions aux applications statistiques de la theorie de l’information, Inst.
Stat. Univ. Paris (A) 2575, 1-115 (1953).
Shannon, C. E.
[1 ] A mathematical theory of communication, Bell Syst. Techn. J. 27, 379-423,
623-653 (1948).
Shannon, C. E. and W. Weaver
[1] The mathematical theory of communication, Univ. Illinois Press, Urbana 1949.
Singer, A. A. (3mirep, A. A.)
(1] O He3aBHCnM£>ix Bbi6opKax H3 HopMajibHon coBoxynHOCTH (On independent
samples from a population), Uspehi Mat. Nauk 6, 172-175 (1951).
Skitovich, V. R. (Ckhtobhh, B. P.)
[1] 06 oahom CBOHCTBe HopMajibHoro pacnpeaeneHHH (On a property of the nor¬
mal distribution), Dokl. Akad. Nauk. SSSR 89, 217-219 (1953).
Slutsky, E.
[1] Uber stochastische Asymptoten und Grenzwerte, Metron 5, 1-90 (1925).
Smirnov, N. V. (Cmhphob, H. B.)
[1] Uber die Verteilung allgemeiner Glieder inder Variationsreihe, Metron 12,
59-81 (1935).
[2] npn6jiH>KeHHe 33kohob pacnpeqejieHHH CjiyiaftHbix BeaHHHH no aMnupunecKHM
AaHHbiM (Approximation of the laws of distribution of random variables by
,
means of empirical data), Uspehi Mat. Nauk. 10 179-206 (1944).
Smirnov, V. I. (Cmhphob, B. H.)
[1] Lehrgang der hoheren Mathematik, Teil III, 3. Aufl., VEB Deutscher Verlag
der Wissenschaften, Berlin 1961.
von Smoluchowski, M.
[1] Drei Vortrage liber Diffusion, Brownsche Molekularbewegung und Koagulation
von Kolloidteilchen, Phys. Z. 17, 557-571, 585-599 (1916).
REFERENCES 659
Sparre-Andersen, E.
[1] On the number of positive sums of random variables, Skand. Aktuarietidskrift,
1949, 27-36.
[2] On the fluctuations of sums of random variables, I—II, Math. Scand. 1, 263-285
,
(1953); 2 193-223 (1954).
Spitzer, F.
[1] A combinatorial lemma and its application to probability theory, Trans. Airier.
Math. Soc. 82, 323-339 (1956).
Steinhaus. H.
[1] Les probability denombrables et leur rapport a la theorie de la mesure, Fund.
Math. 286-310 (1923).
[2] Sur la probability de la convergence des series, Studia Math. 2, 21-39 (1951).
Steinhaus, H., M. Kac et C. Ryll-Nardzewski
[i ]_ [10] Sur les fonctions independantes, I, Studia Mathematica 6, 46-58 (1936),
II ibidem 6, 59-66 (1936); III, ibidem 6, 89-97 (1936); IV, ibidem 7, 1-15
(1938); V, ibidem 7, 96-100 (1938); VI, ibidem 9, 121-132 (1940); VII, ibidem
, , ,
10 1-20 (1948); VIII, ibidem 11 133-144 (1949); IX, ibidem 12 102-107
,
(1951); X, ibidem 13 1-17 (1953).
Stone, M. H.
[1] The theory of representation for Boolean algebras, Trans. Arner. Math. Soc. 4,
31-111 (1936).
Student
[1] _’s Collected papers, Edited by E. S. Pearson and J. Wishart, London 1942.
SzAsz, G. ,,
[1] Introduction to lattice theory (transl. from the Hungarian), Akad. Kiado,
Budapest 1963.
Szokefalvi-Nagy, B.
[1] Spektraldarstellung linearer Transformationen des Hilbertschen Raurnes,
Springer, Berlin 1942.
Titchmarsh, E. C.
[1] Theory of functions, Clarendon Press, Oxford 1952.
Todhunter, L. . . _ ...
[1] History of the mathematical theory of probability, MacMillan, Cambridge
London 1865.
WlDDER, D. V.
[1] The Laplace-transform, Princeton Univ. Press, Princeton 1946.
Wiener, N.
[1] Cybernetics or control and communication in the animal and the machine,
Act. Sci. Indust., Nr. 1053, Hermann et Cie, Paris 1948.
[2] Extrapolation, interpolation and smoothing of stationary time series, Wiley,
New York 1949.
WlLCOXON, F.
[1] Individual comparisons by ranking methods, Biometrics Bull. 1, 80-83 (1945).
Wilks, S. S.
[1] Order statistics. Bull. Amer. Math. Soc. 54, 6-50 (1948).
Wolfowitz, J.
[1] The coding of messages subject to chance errors, Illinois J. Math. 1, 591-606
(1957).
[2] Information theory for mathematicians, Ann. Math. Stat. 29, 351-356 (1958).
[3] Coding theorems of information theory, Springer-Verlag, Berlin-Gottingen-
Heidelberg 1961.
Woodward, P. M.
[1] Probability and information theory with applications to radar, Pergamon Press,
London 1953.
Zygmund, A.
[1] Trigonometrical series, Warsaw 1935; Dover-New York 1955.
[2] Trigonometric series, I—II, Cambridge Univ. Press, Cambridge 1959.
AUTHOR AND SUBJECT INDEX
Feldheim, E., 167, 640, 641, 649 Hardy, G. H„ 307, 368, 552, 574, 580,
Feller, W., 447, 448, 453, 639, 641, 642 640, 641, 643, 651
Fermi-Dirac statistics, 43 Harris, T. E., 651
Finetti, B., de, 413, 639, 643, 649 Hartley, H. O., 643
Fischer, J., 640, 647 Hartley, R. V., 642, 643, 651
Fisher,R. A., 339, 642, 643, 649 Hartley’s formula, 542
Fisz, M., 639, 649 Hausdorff, F., 23, 415, 638, 651
Florek, K., 640, 649 Helly, E., 319, 641
Fortet, R., 639, 646 Helmert, R., 198, 640, 651
Fourier-Stieltjes transform, 302 Hille, E., 431, 641, 651
Fourier transform, 356, 357 Hirschfeld, A. O., 283
Frechet, M., 37, 639, 650 Hostinsky, B., 642, 651
frequency, 30 Hunt, G. A., 512, 642, 648
—, relative, 30 Hurwitz, A., 641, 651
Frink, O., 21, 638, 650 hypergeometric distribution, 88
Frobenius, 598, 643
fundamental theorem of mathematical incomplete probability distribution, 569
statistics, 400 incomplete random variable, 569
independent events, 57
gain, conditional distribution function of independent random variables, 99, 182
572 infinitely divisible distribution, 347
—, measure of, 574 infinitesimal random variable 448
—, of information, 562 information, 540, 554, 592
Gabon’s desk, 152 —, of order alpha, 579, 586
gamma distribution, 202 integral geometry, 69
Gantmacher, F. R., 598, 643, 650 Ionescu-Tulcea, C. T., 639, 658
Gauss, C. F., 641, 650 Isaev, B., 640, 659
Gauss curve, 152
Gaussian, density function, 191 Jaglom, A. M., 642, 645
Gaussian distribution function, 157, 187 Janossy, L., 640, 645
Gaussian random variable, 156 Jeffreys, H., 562, 639, 641, 642, 651
Gavrilov, M. A., 28, 638, 650 Jensen inequality, 555
Geary, R. C., 339, 641, 650 joint distribution function, 178
Gebelein, H., 283, 640, 650 Jordan, Ch., 37, 639, 651
Gelfand, A. N., 642, 646
Gelfand-distributions, 353 Kac, M., 345, 511, 514, 639, 641, 642,
generalized functions, 354 643, 648, 651, 659
generating function, 135 Kantorovich, L. V., 120, 640
geometric distribution, 90 Kappos, D. A., 638, 639
Glivenko, V. I., 9, 401,492, 638, 641, 650 Kawata, J., 339, 641
Gnedenko, B. V., 348, 448, 449, 458, Khinchin, A. , 347, 380, 453, 548, 607,
496, 639, 641, 642, 650 639, 641, 642, 643, 645
—, theorem of, 449 Knopp, K., 150, 426, 472 , 491, 640, 641,
Groshev, L., 640, 659 643
Gumbel, A. J., 37 Koller, S., 643
Kolmogorov, A. N 9, 33, 69, 276, 383,
HAjek, J., 434, 460, 641, 650 396, 402, 420, 438, 448, 458, 493,
Hajos, G., 640, 651 576, 638, 639, 640, 641, 642, 643
half line period, 127 645, 650, 652
Halmos, P. R., 48, 639, 651 Kolmogorov probability space, 97
FIanson, L., 475 Kolmogorov’s formula, 348
664 AUTHOR AND SUBJECT INDEX
Kolgomorov’s fundamental theorem, 286 Malmquist, S., 489, 640, 642, 654
— inequality, 392 Marczewski, E., 640, 649, 654
Koopmans, L. H., 646 marginal distribution, 190
Koroljuk, V. S., 496, 642, 650 Markov, A. A., 442, 642, 654
Krickeberg, K., 642, 653 —, theorem of, 479
Kronecker, L., 397 Markov chain, 475
Kullback, S., 642, 653 -, additive, 483
Ky Fan, 641, 653 -, ergodic, 479
-, homogeneous, 476
Laguerre polynomials, 169 — -—, reversible, 534
Laha, R. G., 372, 641, 653 Markov inequality, 218
Laplace curve, 152 maximal correlation, 283
—, method of, 164 Maxwell distribution, 200, 239
Laplace, P. S., 153, 639, 653 — —, of order n, 269
large sieve, 286 Maxwell-Boltzmann statistics, 43
lattice, 21 McMillan, B., 643, 654
— distribution, 308 measure, 49
law of errors 440 —, complete, 50
— of large numbers, due to, Bernstein, —, outer, 50
379 —, c-finite, 49
-Khinchin, 380 measurable set, 50
—- --Kolmogorov, 383 Medgyessy, P., 654
--Markov, 378 median, 217
— of the iterated logarithm 402 Mensov, D. E., 641
Lebesgue measure, 52 Mercer theorem, 552, 643
Lebesgue-Stieltjes measure, 52 Mihoc, G., 639, 640, 655
Legendre polynomials, 509 Mikusinski, J., 353, 641
Lehmann, E. L., 642, 653 Mises, R. von, 639, 654
level set, 172 mixing sequence of random variables, 467
Levy, P., 348, 350, 453, 511, 639, 641, mixture of distributions, 207, 131
642, 652, 653 modulus of dependence, 283
Levy-Khinchin formula, 347 Mogyorodi, J., 475, 654
Liapunov, A. M., 517, 641, 642 Moivre-Laplace theorem, 153
Liapunov’s condition, 442 Molina, F. C., 643, 654
Lighthill, M. J., 353, 641, 653 moment, 137, 217
Ltndeberg, J. W., 642, 653 — generating function, 138
Lindeberg’s, condition, 443, 447 monotone class, 418
— theorem 520 Monte Carlo method, 69
linear operator, 515 Moriguti, S., 654
Linnik, Yu. V., 286, 329, 336, 605, 640, mutually independent random variables,
641, 643, 653 252
Littlewood, J. E., 368, 574, 580, 643,651
Liapunov, A. M., 653, 653 Nagumo. M., 576, 643, 654
Lobachevski, N. I., 198, 640, 654 n-dimensional cylinder, 28 7
Loeve, M., 639, 654 negative binomial distribution, 92
logarithmically uniform distribution, 249 neglig’ble random variable, 448
lognormal distribution, 194 Neveu, J., 655
Lomnicki, A., 639, 654 Newton, I. 531
Losch, F., 199, 640, 654 Neyman, J. 639, 655
Luce, R. D., 639, 654 non-atomic probability space, 81
LukAcs, E., 331, 339, 641, 654 normal curve, 152
AUTHOR AND SUBJECT INDEX 665
Volume 1: I. N. Vekua
Volume 2: L. Berg
Volume 3: M. L. Rasulov
Methods of Contour Integration
Volume 4: N. Cristescu
Dynamic Plasticity
Volume 5: A. V. Bitsadze
Boundary Value Problems for Second
Order Elliptic Equations
Volume 6: G. Helmberg
Introduction to Spectral Theory
in Hilbert Space
Volume 8: J. W. Cohen
The Single Server Queue