Notes
Notes
1
What is Probability?
3
What is Probability
4
Probability
Probability (Axiomatic Approach, A.N. Kolmogorov)
5
Probability (Axiomatic Approach, A.N. Kolmogorov)
6
Some Points
D Why σ-field?
D Why countable additivity?
7
Common Mistake
D If there are only two possible outcomes, and you don’t know which is
true, the probability of each of these outcomes is 1/2.
8
Tossing two coins and Equally Likely
Suppose a cup containing two similar coins is shaken, then turned upside
down on a table. What is the chance that the two coins show heads?
Consider the following solutions to this problem.
9
Tossing two coins and Equally Likely
D Either they both show heads, or they don’t. These are the two
possible outcomes. Assuming these are equally likely, the chance of
both heads is 1/2.
10
Tossing two coins and Equally Likely
11
Tossing two coins and Equally Likely
D Despite the fact that the coins are supposed to be similar, imagine
that they are labeled in some way to distinguish them. Call one of the
the first coin and the other the second. Now there are four outcomes
which might be considered. Assume these four possible outcomes
are equally likely. Then the event of both coins showing heads has a
chance of 1/4.
12
Tossing two coins and Equally Likely
13
Another Problem
A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.
14
Another Problem
A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.
Is this correct?
14
Another Problem
A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.
Is this correct?
The sample space Ω = {HH, HT, TH, TT}. All are equally likely. Therefore
the probability of desired event is 3/4.
14
Consequences
D P(∅) = 0.
D For every A ∈ F, 0 ≤ P(A) ≤ 1.
D Monotonicity: If A ⊆ B, then P(A) ≤ P(B).
T B = A ∪ (B \ A) and hence
D P(Ac ) = 1 − P(A).
T 1 = P(A ∪ Ac ) = P(A) + P(Ac ).
15
Inclusion-Exclusion Principle
Theorem
For A, B ∈ F,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
17
Inclusion-Exclusion Principle
Assume that the result is true for n and we will prove it for n + 1.
P(A1 ∪ A2 ∪ · · · ∪ An ∪ An+1 )
= P(A1 ∪ A2 ∪ · · · ∪ An ) + P(An+1 ) − P((A1 ∪ A2 ∪ · · · ∪ An ) ∩ An+1 )
= P(A1 ∪ A2 ∪ · · · ∪ An ) + P(An+1 ) − P((A1 ∩ An+1 ) ∪ (A2 ∩ An+1 ) ∪ · · · ∪ (An ∩ An+1 ))
18
Inclusion-Exclusion Principle
n
X X X
= P(Ai ) − P(Ai1 ∩ Ai2 ) + P(Ai1 ∩ Ai2 ∩ Ai3 )
i=1 1≤i1 <i2 ≤n 1≤i1 <i2 ≤i3 <n
Theorem
Let A1 , A2 , · · · , An ∈ F. Then
n
X
P(A) ≤ P(Ak ).
k=1
Sn
Define B1 = A1 and Bn+1 = An+1 \ (A1 ∪ · · · ∪ An ). Then A = k=1 Bk and
n
X n
X
P(A) = P(Bk ) ≤ P(Ak ).
k=1 k=1
Let Ai = {σ : σ(i) = i}. The |Ai | = (n − 1)!. Moreover, for any finite subset
I ⊆ {1, 2, · · · , n}, ∩i∈I Ai has exactly (n − |I|)! elements. Using the
inclusion-exclusion principle, we see that a random permutation has a
fixed point (i.e., in ∪i∈{1,2,··· ,n} Ai ) with probability nk=1 (−1)k+1 k!1 . Thus the
P
n → ∞.
21
Continuity Property
Theorem
S∞
Let A1 ⊆ A2 ⊆ · · · ∈ F and A = n=1 An . Then
22
Continuity Property
Theorem
S∞
Let A1 ⊆ A2 ⊆ · · · ∈ F and A = n=1 An . Then
Follows from
P(A) = P(A1 ∪ (A2 \ A1 ) ∪ (A3 \ A2 ) · · · )
= P(A1 ) + P(A2 \ A1 ) + P(A3 \ A2 ) + · · ·
= P(A1 ) + (P(A2 ) − P(A1 )) + (P(A3 ) − P(A1 )) + · · · = lim P(An )
n→∞
The last equality follows from the absolute summability of the series.
22
Continuity Property
23
Continuity Property
Theorem
Let P be finitely additive probability measure, then P is countably
additive if and only if the continuity property holds for P.
24
Continuity Property
Theorem
Let P be finitely additive probability measure, then P is countably
additive if and only if the continuity property holds for P.
24
Conditional Probability
Definition
Conditional probability of A given that the even B has occurred is given
by
P(A ∩ B)
P(A | B) =
P(B)
25
Theorem
If A1 , A2 , · · · , An ∈ F and if P(A1 ∩ A2 ∩ · · · ∩ An ) > 0, then
Note that
P(A1 ∩ · · · ∩ An−1 ) ≥ P(A1 ∩ A2 ∩ · · · ∩ An ) > 0.
Therefore
26
Theorem (Law of Total Probability)
Let An , n ≥ 1 be a partition (finite or countable) of Ω. Then if A ∈ F,
X
P(A) = P(A | An )P(An ).
n
Note that
∞
[ ∞
X ∞
X
P(A) = P( (A ∩ An )) = P(A ∩ An ) = P(A | An )P(An ).
n=1 n=1 n=1
27
Bayes’ Theorem
Let An , n ≥ 1 be a partition (finite or countable) of Ω and suppose
P(A) > 0. Then
P(A | An )P(An )
P(An | A) = P .
m P(A | Am )P(Am )
Therefore
P(A | B)P(B)
P(B | A) =
P(A)
Take B = An and use the law of total probability.
28
Independence
Definition
Two events A and B are said to be independent if
P(A ∩ B) = P(A)P(B).
29
Consequences
Theorem
If A and B are independent, so also are A and Bc , Ac and B , and Ac and Bc .
30
Consequences
Theorem
A and B are independent if and only if P(A | B) = P(A) (provided .... ).
31
Consequences
Theorem
A 7→ P(A | B) defines a new probability measure on F, called the
conditional probability measure given B (provided ...).
32
Some Examples
Boy or Girl paradox / Two Child Problem
Example
A family has two children, Jeet and Kiran. Kiran is a girl. What is the
probability that both children are girls?
33
Scandal of Arithmetic (The Paradox of the Chevalier De Méré)
Example
Which is more likely, obtaining at least one six in 4 tosses of a fair die
(event A), or obtaining at least one double six in 24 tosses of a pair of
dice (event B)?
64 −54
D P(A) = 64
≈ 0.518
3624 −3524
D P(B) = 3624
≈ 0.491
D Why is this paradox?
34
Problem of Points / Division of Stakes
36
Problem of Points / Division of Stakes
37
Birthday Paradox
38
Birthday Paradox
D After three trials, given that first two outcomes are different, the
conditional probability is N2 that the third outcome will be equal to
one of the two outcomes. Thus the conditional probability is 1 − N2
that the third will differ from first two. Thus
1 2
P(first three trials different) = (1 − )(1 − ).
N N
D Thus
1 2 n
P(first n trials different) = (1 − )(1 − ) · · · (1 − ).
N N N
39
Birthday Paradox
D Thus
1 2 n−1
P(first n trials different) = (1 − )(1 − ) · · · (1 − ).
N N N
D Taking logarithm,
1 2 n−1
log(probability) = log (1 − ) + log (1 − ) + · · · + log (1 − )
N N N
D For 0 < x < 1,
log (1 − x) ≤ −x.
40
Birthday Paradox
D Hence
1 2 n−1 1 2 n−1
log (1 − ) + log (1 − ) + · · · + log (1 − ) ≤ −( + + · · · + )
N N N N N N
D Thus
1 n(n − 1)
log P(n trials are different) ≤
2 N
D This gives
n(n − 1)
P(n trials are different) ≤ exp −
2N
41
Birthday Paradox
D With 23 people, we can see that the probability will of two people
having the same birthday will be more that 1/2. And with 42 people, it
will be more than 0.9.
√
D If n is more than N, the probability will be more than 1/2.
42
Monty Hall Problem
Suppose you’re on a game show, and you’re given the choice of three
doors: Behind one door is a car, behind the others, goats. You pick a door,
say No. 1, and the host, who knows what’s behind the doors, opens
another door, say No. 3, which has a goat.
He then says to you, “Do you want to pick door No. 2?”
Is it to your advantage to switch your choice ?
43
Change Problem
There are 2n people in a waiting line at a box office. n of them have |10
coins while the others have |5 coins. A ticket costs |5. Every one in the
line needs exactly one ticket. The cashbox is empty initially. What is the
probability that no person will wait for change?
44
Change Problem
45
Change Problem
(2n, 2)
0 (2n, 0)
46
Change Problem
D For each of these lines draw a dummy line. It coincides with the
original line till the first hit of the line y = 1, then it mirrors the
original line.
D Any dummy line then starts at the origin and ends at the point (2n, 2).
D It consists of n + 1 steps up and n − 1 steps down.
2n
D Thus there are
n−1
dummy lines.
2n 2n
D Finally there
are n − n−1 lines favourable to our event, and the
probability is
2n 2n
n
− n−1 1
p= 2n
= .
n
n+1
47
Urn Problems
There are a blue balls and b red balls in an urn. Two balls are taken one
by one. What’s the probability that the first ball is red and the second ball
is blue?
Let A be the event that the first ball is red and let B be the event that the
second ball is blue.
b
P(A) = .
a+b
Now
a
P(B|A) = .
a+b−1
48
Urn Problems
Finally
ab
P(A ∩ B) = P(A)P(B|A) = .
(a + b)(a + b − 1)
49
Urn Problems
50
Surprises with Conditional Probability
51
Surprises with Conditional Probability
P(B | A)P(A)
P(A | B) = = 0.000099
P(B | A)P(A) + P(B | Ac )P(Ac )
52
Surprises with Conditional Probability
The test is quite an accurate one, but the person tested positive has a
really low chance of actually having the disease! Of course, one should
observe that the chance of having disease is now approximately 10−4
which is considerably higher than 10−6 .
A calculation-free understanding of this surprising looking phenomenon
can be achieved as follows: Let everyone in the population undergo the
test. If there are 109 people in the population, then there are only 103
people with the disease. The number of true positives is approximately
103 × 0.99 ≈ 103 while the number of false positives is
(109 − 103 ) × 0.01 ≈ 107 . In other words, among all positives, the false
positives are way more numerous than true positives.
53
Surprises with Conditional Probability
The surprise here comes from not taking into account the relative sizes of
the sub-populations with and without the disease. Here is another
manifestation of exactly the same fallacious reasoning.
A person X is introverted, very systematic in thinking and somewhat
absent-minded. You are told that he is a doctor or a mathematician. What
would be your guess - doctor or mathematician?
A common answer is “mathematician”. Even accepting the stereotype that
a mathematician is more likely to have all these qualities than a doctor,
this answer ignores the fact that there are perhaps a hundred times more
doctors in the world than mathematicians! In fact, the situation is
identical to the one in the example above, and the mistake is in confusing 54
P(A | B) and P(B | A).
Random Variables
Discrete Random Variables
55
Independence of Random Variables
for any x, y.
D Independence extends naturally to a family of random variables.
56
Distribution of Random Variables
P(X = x) = P(Y = x)
for each x.
57
Discrete Random Variables
D Since X takes at most countably infinite values, the above sum makes
“sense”.
D Variance of X is defined by Var(X) = E(X − EX)2 .
58
Expectation
Proof.
X XX
E(X + Y) = zP(X + Y = z) = zP(X + Y = z|Y = y)P(Y = y)
z z y
XX
= zP(X = z − y)P(Y = y)
z y
XX XX
= (z − y)P(X = z − y)P(Y = y) + yP(X = z − y)P(Y = y)
z y z y
= EX + EY
59
Expectation
60
Examples
D EX = 21 .
D What is Var(X)?
61
Examples
62
Geometric Distribution
P(X = n) = (1 − p)n−1 p
for n = 1, 2, · · · .
D X denotes the number of coin tosses required until the coin lands on
heads.
D For a geometric random variable X with parameter p and n > 0,
63
Jensen’s Inequality
f (E[X]) ≤ Ef (E[X]).
64
Conditional Expectation
65
Coupon Collector’s Problem
66
Bonferroni’s Inequalities
(X ≤ x) = {ω ∈ Ω : X(ω) ≤ x} ∈ F
for each x ∈ R.
D Let F(x) = P(X ≤ x). F is called the distribution function of the
random variable X.
D If F ′ (x) exists, then f (x) = F ′ (x) is called the density of X.
68
Continuous Random Variables
F(x, y) = P(X ≤ x, Y ≤ y)
for each x, y ∈ R.
D X, Y are independent if
for each x, y ∈ R.
69
Uniform Distribution
D X ∼ Unif [a, b] if
0 if x ≤ a,
FX (x) = x−1
b−a
if a ≤ x ≤ b,
1 if x ≥ b
70
Exponential Distribution
71
Expectation
72
Markov’s Inequality
73
Chebyshev’s Inequality
Var(X)
P(|X − EX| ≥ a) ≤ .
a2
74