CHAPTER 1
Probability Theory
1.1. Probability space
dddd, ddd (Ω, F, P) dddddddd (probability space). ddddddd
ddddd notations.
Definition 1.1. (1) Possible outcomes ωα , α ∈ A, are called sample points (d
dd) 1.
(2) The set Ω = {ωα : α ∈ A} = the collection of all possible outcomes, i.e., the set
of all sample points, is called a sample space (dddd).
dddddd, Ω ddddddddddddddddddd.
Example 1.2. (1) Ω = N = the set of all natural numbers = {ωn : ωn =
n for all n ∈ N}.
(2) Ω = R = the set of all real numbers.
(3) Ω = the collection of all odd positive numbers.
(4) Ω = the collection of all fruits, e.g., apple ∈ Ω, pineapple ∈ Ω.
(5) Ω = the collection of all colors.
Definition 1.3. A system F of subsets of Ω is called a σ-algebra if
(i) Ω ∈ F;
1Addddddddd index set. dA = N = {1, 2, 3, ...}, d ωα ddd ωn . d A = R, d possible
outcomes dd uncountable dd.
5
6 1. PROBABILITY THEORY
(ii) Ac ∈ F whenever A ∈ F ;
∞
(iii) An ∈ F for all n = 1, 2, 3, ... implies that An ∈ F.
n=1
ddddddddddddd,dddddddd? dddddddddddddd
dd. dd, dddddddd.
Example 1.4. (1) Let Ω = {1, 2, 3}. Then
(i) F1 = {∅, {1}, {2, 3}, Ω} is a σ-algebra.
(ii) F2 = {∅, {1}, {2}, {3}, Ω} is not a σ-algebra, since {1} ∈ F2 , but {1}c =
{2, 3} ∈ F2 .
(2) Let Ω = R and F = the collection of all subsets of R, then F is a σ-algebra.
(3) Let Ω = N. Then
(i) F1 = {∅, {1, 3, 5, 7, ...}, {2, 4, 6, 8, ...}, N} is a σ-algebra.
(ii) F2 = {∅, {3, 6, 9, ...}, {1, 4, 7, ...}, {2, 5, 8, ...}, {1, 3, 4, 6, 7, 9, ...}, {1, 2, 4, 5, 7, 8, ...},
{2, 3, 5, 6, 8, 9, ...}, N} is a σ-algebra.
(iii) F3 = {∅, {1, 2}, {3, 4}, {5, 6}, ..., {1, 2, 3, 4}, {1, 2, 5, 6}, ..., {1, 2, 3, 4, 5, 6}, ..., Ω}
is a σ-algebra.
(4) Let Ω = N. Then
(i) F1 = {A ⊆ N : A is finite or Ac is finite } is not a σ-algebra. For example,
the set An = {n} ∈ F1 for all n, but the set
∞
An = N ∈ F1 ,
n=1
since neither N nor Nc has finite elements.
(ii) F2 = {A ⊆ N : A is countable or Ac is countable } is a σ-algebra.
dddd, ddd sample space dddddddd σ-algebra. ddddd σ-algebras
dddddddddddddd, d Example 1.4 (3) dd F1 , F2 d F3 .
1.1. PROBABILITY SPACE 7
Definition 1.5. Let Ω be a non-empty set and let F be a σ-algebra on Ω, then (Ω, F)
is called a measurable space (dddd).
Definition 1.6. Let (Ω, F) be a measurable space. A probability measure (ddd
d) is a real-valued function P : F −→ R satisfying
(i) P(E) ≥ 0 for all E ∈ F;
(ii) (Countable additivity) Let (En ) be a sequence of countable collection of disjoint
sets in F. Then
∞
∞
P En = P(En ). (1.1)
n=1 n=1
(iii) P(Ω) = 1.2
d d d d d d d d d d d d, d d d d d d d d d σ-algebra d d d d d d d
∞
An ∈ F for all n = 1, 2, 3, ... implies that An ∈ F . ddddddd, (1.1) dddd
n=1
d.
Proposition 1.7. (1) P(E) ≤ 1 for all E ∈ F.
(2) P(∅) = 0.
(3) P(E c ) = 1 − P(E).
(4) P(E ∪ F ) = P(E) + P(F ) − P(E ∩ F ).
(5) If E ⊆ F , then P(E) ≤ P(F ).
(6) If (En ) is the collection of sets in F, then
∞
∞
P En ≤ P(En ).
n=1 n=1
(7) (i) If (En ) satisfies
E1 ⊆ E2 ⊆ · · · ⊆ En ⊆ · · · ,
2dddddddddd, P dddddddddd measure.
8 1. PROBABILITY THEORY
∞
∞
then P(En ) converges to P En , i.e., lim P(En ) = P En .
n→∞
n=1 n=1
(ii) If (En ) satisfies
E1 ⊇ E2 ⊇ · · · ⊇ En ⊇ · · · ,
∞ ∞
then P(En ) converges to P En , i.e., lim P(En ) = P En .
n→∞
n=1 n=1
Definition 1.8. The triple (Ω, F, P) is called a probability space (dddd).
Example 1.9. (1) Let
Ω = {H, T } (ddddddd),
F = {∅, {H}, {T }, {H, T }},
and let P be given by
1
P(∅) = 1, P({H}) = P({T }) = , P({H, T }) = 1.
2
Then (Ω, F, P) is a probability space.
(2) Let
Ω = { , , , , , } (ddddddddd),
F = the collection of all subsets of Ω,
P satisfies P({
}) = P({ }) = · · · = P({ }) = 1/6 and P is a probability
measure. Then (Ω, F, P) is a probability space.
(3) Let Ω = {ω1 , ω2 , ..., ωn , ...} be a countable set and let F be the collection of all
subsets of Ω. Assume that (pn ) be a sequence of real numbers with
∞
pn ≥ 0 for all n and pn = 1.
n=1
1.1. PROBABILITY SPACE 9
Define a set function P : F −→ R by P({ωn }) = pn and
P(E) = P({ωn }) = pn .
ωn ∈E ωn ∈E
Then P defines a probability measure.
We call (Ω, F, P) a discrete probability space and Ω a discrete sample space.
dddddddddddddd probability space. ddddd probability space d
d, ddddddddddddd notation.
Question. Given a sample set Ω and a collection of subsets of Ω, C. Does there exist
a collection of subsets of Ω, say G, such that
(i) C ⊆ G;
(ii) G is a σ-algebra?
Answer. Yes. We may take G to be the collection of all subsets of Ω.
ddddddddd F ddddddd C dddd σ-algebra. dddd yes. dd
d
H.
C⊆H and H:σ-algebra
dddd σ-algebra ddddddd: dddd σ-algebra ddddddd σ-algebra.
Notation 1.10. If G is the smallest σ-algebra containing C, then we say that G is
generated by C and denote it by G = σ(C).
Example 1.11. Let
Ω = {1, 2, 3, 4} and C = {{1, 2}, {4}}.
Then
σ(C) = {∅, {1, 2}, {3}, {4}, {1, 2, 3}, {1, 2, 4}, {3, 4}, Ω}.
10 1. PROBABILITY THEORY
Example 1.12. Let Ω = R and let C be the collection of all open intervals (a, b) in R.
Then the sets in B = σ(C) are called Borel sets. dddddddddd, dddddd
random variable ddddd. For example, R, Q, (a, b), [a, b), (a, b], [a, b] are in σ(C). d
dddddddddd R1 dd subsets dd Borel sets. d Borel set ddddddd
dddd, dddddddd real analysis ddddd.
Remark 1.13. Let Ω = [0, 1] and let B1 be the collection of all Borel sets in [0, 1], i.e.,
B1 = B ∩ [0, 1] := {A ∩ [0, 1] : A ∈ B}.
For (a, b) ∈ B1 , define
m((a, b)) = b − a.
Then we can define a probability measure m : B1 −→ R. m is called the Lebesgue measure
(ddddddddddddddddd).
([0, 1], B1 , m) dddddddd probability space dd.
Exercise
(1) Find the σ-algebra generated by the given collection of sets C.
(a) Ω = {1, 2, 3, 4}, C = {{1, 2, 3}, {4}};
(b) Ω = {1, 2, 3, 4}, C = {{2, 3, 4}, {3, 4}};
(c) Ω = {1, 2, 3, 4, 5}, C = {{1, 2, 4}, {1, 4, 5}};
(d) Ω = R, C = {[−1, 0), (1, 2)}
(2) Let Ω = {1, 2, 3, 4, 5, 6} and let F = σ ({{1, 2, 3, 4}, {3, 4, 5}}). Find a probability
measure defined on (Ω, F).
1.2. RANDOM VARIABLES 11
(3) Consider a probability space (Ω.F, P), where Ω = {1, 2, 3, 4, 5}, F is the collection
of all subsets of Ω, and
1 1
P({1}) = P({2}) = P({5}) = , P({3}) = P({4}) = P({6}) = .
4 12
(a) Let
X = 2I{1} + 3I{2,3} − 3I{4,5} + I{6} .
Find E[X] and E[X 2 ].
(b) Let
Y = I{1,2} + 3I{2,4,5} − 2I{4,5,6} .
Find E[Y ] and E[Y 3 ].
(4) Let Ω = R, F = all subsets so that A or Ac is countable, P(A) = 0 in the first
case and = 1 in the second. Show that (Ω, F, P) is a probability space, i.e., show
that F is a σ-algebra and P is a probability measure.
1.2. Random variables
Let (Ω, F, P) be a probability space.
Definition 1.14. We say a function X : Ω −→ R to be a random variable (r.v., dd
dd) if for every B ∈ B,
{ω : X(ω) ∈ B} ∈ F,
i.e., X is measurable with respect to F.
Notation 1.15. For all random variable X and B ∈ F ,
{X ∈ B} := {ω ∈ Ω : X(ω) ∈ B}.
12 1. PROBABILITY THEORY
Example 1.16. Suppose that Ω = [0, 1] and F = B1 .
(1) X1 (ω) = ω. For B ∈ B,
{X1 ∈ B} = {ω ∈ [0, 1] : X1 (ω) ∈ B}
= {ω ∈ [0, 1] : ω ∈ B} = B ∩ [0, 1] ∈ B1 .
Thus, X1 is a random variable.
(2) X2 (ω) = ω 2 . For B ∈ B,
{X2 ∈ B} = {ω ∈ Ω : X2 (ω) ∈ B} = {ω ∈ Ω : ω 2 ∈ B}.
dddddddd, ddddddd B ∈ B dd. dddddddddddd
d. d Example 1.18 (1) dddddddddddd.
Theorem 1.17. The following statements are equivalent.
(1) X is a random variable on (Ω, F).
(2) {X ≤ r} ∈ F for all r ∈ R.
(3) {X < r} ∈ F for all r ∈ R.
(4) {X ≥ r} ∈ F for all r ∈ R.
(5) {X > r} ∈ F for all r ∈ R.
ddddddd, d check ddddddd random variable ddddd. dddd
Example 1.16 dd X2 dd, ddd check X2 (ω) = ω 2 ddd random variable ddd
dddd B dddddd, ddddd, dddddd. dd Theorem 1.17 ddddd
dd.
Example 1.18. (1) Consider Ω = [0, 1], F = B1 and X(ω) = ω 2 .
(i) If r < 0, {X ≤ r} = {ω ∈ [0, 1] : ω 2 ≤ r} = ∅ ∈ F.
1.2. RANDOM VARIABLES 13
√
(ii) If 0 ≤ r ≤ 1, {X ≤ r} = {ω ∈ [0, 1] : ω 2 ≤ r} = [0, r] ∈ F .
(iii) If r > 1, {X ≤ r} = [0, 1] ∈ F.
Thus, X is a random variable.
(2) ddddd random variable ddd.
Let Ω = {1, 2, 3, 4} and F = σ({1, 2}, {3}, {4}).
(a) X1 (1) = 2, X1 (2) = 3, X1 (3) = 4, X1 (4) = 5. Since
{X1 ≤ 2} = {1} ∈ F,
X1 is not a random variable.
(b) X2 (1) = X2 (2) = 2, X2 (3) = 10, X2 (4) = −500.
(i) If r < −500, {X2 ≤ r} = ∅ ∈ F.
(ii) If −500 ≤ r < 2, {X2 ≤ r} = {4} ∈ F.
(iii) If 2 ≤ r < 10, {X2 ≤ r} = {1, 2, 4} ∈ F.
(iv) If r ≥ 10, {X2 ≤ r} = Ω ∈ F.
Thus, X2 is a random variable.
Theorem 1.19. (1) If X is a random variable, f is a Borel measurable function
on (R, B), then f (X) is a random variable.
(2) If X and Y are random variables, f is a Borel measurable function of two vari-
ables, then f (X, Y ) is a random variable.
(3) If (Xn )n≥1 is a sequence of random variables, then
inf Xn , sup Xn , lim inf Xn , lim sup Xn , lim Xn
n n n→∞ n→∞ n→∞
are random variables.
ddddddddddddd, ddddddd. lim sup d lim inf dddddd
Appendix A dddd.
14 1. PROBABILITY THEORY
Example 1.20. (1) Let (Ω, F, P) be a discrete probability space. Then every
real-valued function on Ω is a random variable.
(2) Let (Ω, F, P) = ([0, 1], B1 , m). Then the random variables are exactly the Borel
measurable functions defined on ([0, 1], B1 ).
Exercise
(1) Let Ω = {1, 2, 3, 4, 5, 6}, and let
⎧ ⎧ ⎧
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ 1, ω = 1, ⎪
⎪ 3, ω = 1, ⎪
⎪ 3, ω = 1,
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ 2, ω = 2, ⎪
⎪ 2, ω = 2, ⎪
⎪ 2, ω = 2,
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎨ 1, ω = 3, ⎪
⎨ 3, ω = 3, ⎪
⎨ 1, ω = 3,
X1 (ω) = X2 (ω) = X3 (ω) =
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ 1, ω = 4, ⎪
⎪ 3, ω = 4, ⎪
⎪ 5, ω = 4,
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ 2, ω = 5, ⎪
⎪ 2, ω = 5, ⎪
⎪ 4, ω = 5,
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎩ 2, ω = 6, ⎪
⎩ 2, ω = 6, ⎪
⎩ 4, ω = 6.
(a) Let F = σ ({{1}, {2}, {3}, {4}, {5}, {6}}), which of X1 , X2 , X1 +X2 , X1 +X3
and X3 are random variables on (Ω, F)?
(b) Let F = σ ({{1, 2, 3}, {4, 5}}), which of X1 , X2 , X1 + X2 , X1 + X3 and X3
are random variables on (Ω, F)?
(c) Let F = σ ({{1, 4}, {2, 5}, {3}}), which of X1 , X2 , X1 + X2 , X1 + X3 and
X3 are random variables on (Ω, F)?
(2) Suppose X and Y are random variables on (Ω, F, P) and let A ∈ F . Show that
if we let
⎧
⎪
⎨ X(ω), if ω ∈ A,
Z(ω) =
⎪
⎩ Y (ω), if ω ∈ Ac ,
then Z is a random variable.
1.3. EXPECTATION 15
(3) Let P be the Lebesgue measure on Ω = [0, 1]. Define
⎧
⎪
⎨ 0, if 0 ≤ ω < 1/2,
Z(ω) =
⎪
⎩ 2, if 1/2 ≤ ω ≤ 1.
For A ∈ B 1 , define
Q(A) = Z(ω) dP(ω).
A
(a) Show that Q is a probability measure.
(b) Show that if P(A) = 0, then Q(A) = 0. (We say that Q is absolutely
continuous with respect to P.)
(c) Show that there is a set A for which Q(A) = 0 but P(A) > 0.
1.3. Expectation
Definition 1.21. The function
⎧
⎪
⎨ 0, if ω ∈ A,
IA (ω) =
⎪
⎩ 1, if ω ∈ A.
is called the indicator function of A.
Remark 1.22. The indicator function IA is a random variable if and only if A ∈ F.
Definition 1.23. (1) Let Ai ∈ F for all i and let a random variable X be of the
form
∞
X= bi IAi . (1.2)
i=1
Then X is called a simple random variable.
16 1. PROBABILITY THEORY
(2) Let X be the form (1.2), we define the expectation (ddd) of X to be
∞
E[X] = bi P(Ai ).
i=1
dddddddd, dddddddddddd. ddddddddddddd
dd (An ) d disjoint.
Example 1.24. Let (Ω, F, P) = ([0, 1], B1 , m) and consider
∞
1
X= I −i .
i [0,2 )
i=1
2
Then the expectation of X is given by
∞
1 ∞
1 1
−i
E[X] = i
P([0, 2 )) = i
= .
i=1
2 i=1
4 3
Remark 1.25. Consider the generalization of the expectation3. Let X be a positive
random variable. Define
n n+1
Λmn = ω: m
≤ X(ω) < m ∈ F, for all m, n ∈ N.
2 2
Let
∞
n
Xm = I
m Λmn
.
n=0
2
(Xm ddddd Figure 1.3) Due to the construction of Xm , we see that for all ω ∈ Ω,
Xm (ω) ↑ and
lim Xm (ω) = X(ω).
m→∞
(i) If E[Xm ] = +∞ for some m, we define E[X] = +∞.
3dddddddddd Lebesgue integral, ddddddddddddd. ddd Riemann integral
ddddddddddddd. Riemann inegral ddddddddd, Lebesgue integral dddddd.
ddddddddddd. dd step functions / simple functions, ddddd simple functions ddd
ddd f ddd. ddddddddd Figure 1.1 d Figure 1.2.
1.3. EXPECTATION 17
Figure 1.1. Riemann integral dddd
Figure 1.2. Lebesgue integral dddd
(ii) If E[Xm ] < ∞ for all m, define
∞
n n n+1
E[X] = lim E[Xm ] = lim P m ≤X< m .
m→∞ m→∞
n=0
2m 2 2
dddddd positive random variable dddddddd expectation,
Definition 1.26. Consider a general random variable X. Then we can write X as
X = X + − X −,
where X + = X ∨ 0, X − = (−X) ∨ 0.
18 1. PROBABILITY THEORY
4.2
-m
3.2
-m
X
2.2
-m
Xm
-m
2
Figure 1.3. X d Xm
(1) Unless both of E[X + ] and E[X − ] are +∞, we define
E[X] = E[X + ] − E[X − ].
(2) If E|X| = E[X + ] + E[X − ] < ∞, X has a finite expectation. We denote by
E[X] = X dP = X(ω) P(dω).
Ω Ω
(3) For A ∈ F , define
X dP = E[XIA ], (1.3)
A
which is called the integral of X with respect to P over A.
(4) X is integrable with respect to P over A if the integral (1.3) exists and is finite.
Remark 1.27. (1) If X has a cumulative distribution function (c.d.f.) F with
respect to P, then
∞
E[X] = x dF (x).
−∞
Moreover, if g is Borel measurable function in R,
∞
E[g(X)] = g(x) dF (x).
−∞
1.3. EXPECTATION 19
(2) If X has a probability density function (p.d.f.) f with respect to P, then
∞
E[X] = xf (x) dx
−∞
and
∞
E[g(X)] = g(x)f (x) dx.
−∞
(3) If X has a probability mass function p with respect to P, then
∞
E[X] = xn p(xn ),
n=1
∞
E[g(X)] = g(xn )p(xn ).
n=1
Example 1.28. (1) Let Ω = {1, 2, 3, 4}, F = σ({1}, {2}, {3}, {4}) and
1 1 1 1
P({1}) = , P({2}) = , P({3}) = , P({4}) = .
2 4 6 12
Let
X = 5I{1} + 2I{2} − 4I{3,4} .
Then
1 1 1 1
E[X] = 5 · + 2 · − 4 + =2
2 2 6 12
1 1 1 1 35
E[X ] = 25 · + 4 · + 16
2
+ = .
2 2 6 12 2
(2) Suppose X is normally distributed on (Ω, F, P) with mean 0 and variance 1, then
X has probability density function
2
1 x
√ exp − .
2π 2
20 1. PROBABILITY THEORY
Thus,
∞ 2
1 x
E[X] = x √ exp − = 0,
−∞ 2π 2
∞ 2
1 x
E[X 3 ] = x √
3
exp − = 0, ( odd function )
−∞ 2π 2
∞ 2 ∞ 2
1 x 1 x
E[eX ] = x
e √ exp − =√ exp − + x dx
−∞ 2π 2 2π −∞ 2
∞
1 1 1
= √ exp − (x − 1)2 + dx = e1/2 .
2π −∞ 2 2
Proposition 1.29. (1) (Absolute Integrability)
X dP < ∞ ⇐⇒ |X| dP < ∞.4
A A
(2) (Linearity)
(aX + bY ) dP = a X dP + b Y dP.
A A A
(3) (Additivity over sets) If (An ) is disjoint, then
X dP = X dP.
∪n An n An
(4) (Positivity) If X ≥ 0 P-a.e.5 on A, then
X dP ≥ 0.
A
(5) (Monotonicity) If X1 ≤ X ≤ X2 P-a.e. on A, then
X1 dP ≤ X dP ≤ X2 dP.
A A A
4ddd Riemann integral dddddddd.
5We say a property holds P-a.e. (almost everywhere) or P-a.s. (almost surely) means that the
probability that this property holds is equal to 1, i.e., except a set with probability 0, this property is
true.
1.3. EXPECTATION 21
(6) (Modulus Inequality)
X dP ≤ |X| dP.
A A
Theorem 1.30. (1) (Dominated Convergence Theorem) If lim Xn = X P-a.e.
n→∞
on A and |Xn | ≤ Y P-a.e. on A for all n with Y dP < ∞. Then
A
lim Xn dP = lim Xn dP = X dP.
n→∞ A A n→∞ A
(2) (Monotone Convergence Theorem) If Xn ≥ 0 and Xn X P-a.e. on A, then
lim Xn dP = lim Xn dP = X dP.
n→∞ A A n→∞ A
(3) (Fatou’s Lemma) If Xn ≥ 0 P-a.e. on A, then
lim inf Xn dP ≤ lim inf Xn dP.
A n→∞ n→∞ A
(4) (Jensen’s Inequality) If ϕ is a convex function, X and ϕ(X) are integrable, then
ϕ(E[X]) ≤ E[ϕ(X)].
Exercise
(1) Let λ be a fixed number in R, and define the convex function ϕ(x) = eλx for
all x ∈ R. Let X be a normally distributed random variable with mean μ and
variance σ 2 , i.e., the probability density function of X is given by
1 (x − μ)2
f (x) = √ exp − .
2πσ 2σ 2
(a) Find E[eλX ].
(b) Verify that Jensen’s inequality holds (as it must):
Eϕ(X) ≥ ϕ(E[X]).
22 1. PROBABILITY THEORY
(2) For each positive integer n, define fn to be the normal density with mean zero
and variance n, i.e.,
2
1 x
fn (x) = √ exp − .
2nπ 2n
(a) What is the function f (x) = lim fn (x)?
n→∞
∞
(b) What is lim fn (x) dx?
n→∞ −∞
(c) Note that
∞ ∞
lim fn (x) dx = f (x) dx.
n→∞ −∞ −∞
Explain why this does not violate the ”Monotone Convergence Theorem”.
(3) Let P be the Lebesgue measure on Ω = [0, 1]. Define
⎧
⎪
⎨ 0, if 0 ≤ ω < 1/2,
Z(ω) =
⎪
⎩ 2, if 1/2 ≤ ω ≤ 1.
For A ∈ B 1 , define
Q(A) = Z(ω) dP(ω).
A
(a) Show that Q is a probability measure.
(b) Show that if P(A) = 0, then Q(A) = 0. (We say that Q is absolutely
continuous with respect to P.)
(c) Show that there is a set A for which Q(A) = 0 but P(A) > 0.