0% found this document useful (0 votes)
68 views84 pages

Notes

The document outlines the course IE621: Probability and Stochastic Processes, covering various definitions and perspectives of probability, including classical, empirical, subjective, and axiomatic approaches. It discusses key concepts such as probability spaces, inclusion-exclusion principles, and conditional probability, along with relevant theorems like Bayes' theorem and the law of total probability. The course is designed to provide a comprehensive understanding of probability theory and its applications in stochastic processes.

Uploaded by

Neeraj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views84 pages

Notes

The document outlines the course IE621: Probability and Stochastic Processes, covering various definitions and perspectives of probability, including classical, empirical, subjective, and axiomatic approaches. It discusses key concepts such as probability spaces, inclusion-exclusion principles, and conditional probability, along with relevant theorems like Bayes' theorem and the law of total probability. The course is designed to provide a comprehensive understanding of probability theory and its applications in stochastic processes.

Uploaded by

Neeraj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

IE621: Probability and Stochastic Processes 1

K.S. Mallikarjuna Rao

Industrial Engineering & Operations Research


Indian Institute of Technology Bombay

IE 621 Probability and Stochastic Processes


July - November, 2024
What is Probability?
What is Probability?

D Classical: First introduction to Probability. Depends on equally likely


principle.
D Roll a fair die, there are six outcomes and all of which are equally
likely. Therefore each outcome has probability 1/6.
D Advantage is that it is conceptually simple.

1
What is Probability?

D Empirical (Frequentist): This perspective defines probability via a


thought experiment and generalizes the first view.
D We are given a die. It is supposed to be not fair, but we do not know
the weights. To deduce the probability, we roll the die again and
again and consider the empirical averages. The probability is defined
to be the limit of this average as the number of rolls grow.
D Disadvantage: Thought experiment could never be carried out in
practice more than once.
D Disadvantage: It does not tell how large n has to be before we get a
good approximation.
2
What is Probability

D Subjective: Subjective probability is an individual person’s measure


of belief that an event will occur.
D The drawback of this approach is that one person’s view may be
drastically different from other person’s view.
D Another drawback is that subjective probability disobeys coherence
(consistency). The probabilities need not sum to 1.

3
What is Probability

D Axiomatic: This is a unifying perspective.


D The coherence conditions needed for subjective probability can be
proved to hold for the classical and empirical definitions.
D The axiomatic perspective codifies these coherence conditions, so
can be used with any of the above three perspectives.

4
Probability
Probability (Axiomatic Approach, A.N. Kolmogorov)

D Ω Set of possible outcomes; sample space.


D An event is a subset of Ω.
D F Set of all possible events satisfying certain conditions:
T ∅, Ω ∈ F.
T If A ∈ F, then Ac ∈ F.
T If A1 , A2 , · · · ∈ F, then A1 ∪ A2 ∪ · · · ∈ F
T A collection of subsets satisfying above three conditions is called
σ-field (σ-algebra).

5
Probability (Axiomatic Approach, A.N. Kolmogorov)

D Probability is a function P : F → R satisfying


T P(A) ≥ 0 for every A ∈ F.
T P(Ω) = 1.
T If A1 , A2 , · · · ∈ F are pairwise disjoint, then
∞ ∞
!
[ X
P An = P(An ). (Countable Additivity)
n=1 n=1

D (Ω, F, P) is called probability space.

6
Some Points

D Why σ-field?
D Why countable additivity?

7
Common Mistake

D If there are only two possible outcomes, and you don’t know which is
true, the probability of each of these outcomes is 1/2.

8
Tossing two coins and Equally Likely

Suppose a cup containing two similar coins is shaken, then turned upside
down on a table. What is the chance that the two coins show heads?
Consider the following solutions to this problem.

9
Tossing two coins and Equally Likely

D Either they both show heads, or they don’t. These are the two
possible outcomes. Assuming these are equally likely, the chance of
both heads is 1/2.

10
Tossing two coins and Equally Likely

D Regard the number of heads showing on the coins as the outcome.


There could be 0 heads, 1 head, or 2 heads. Now there are three
possible outcomes. Assuming these are equally likely, the chance of
both heads is 1/3.

11
Tossing two coins and Equally Likely

D Despite the fact that the coins are supposed to be similar, imagine
that they are labeled in some way to distinguish them. Call one of the
the first coin and the other the second. Now there are four outcomes
which might be considered. Assume these four possible outcomes
are equally likely. Then the event of both coins showing heads has a
chance of 1/4.

12
Tossing two coins and Equally Likely

Question: Which of the solutions is correct?


All are correct as far as the formal theory is considered.

13
Another Problem

A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.

14
Another Problem

A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.
Is this correct?

14
Another Problem

A fair coin is tossed twice. What’s the probability to have at least one Tail?
The number of Tails possible is 0, 1, or 2. Therefore the probability is 2/3.
Is this correct?
The sample space Ω = {HH, HT, TH, TT}. All are equally likely. Therefore
the probability of desired event is 3/4.

14
Consequences

D P(∅) = 0.
D For every A ∈ F, 0 ≤ P(A) ≤ 1.
D Monotonicity: If A ⊆ B, then P(A) ≤ P(B).
T B = A ∪ (B \ A) and hence

P(B) = P(A) + P(B \ A) ≥ P(A).

D P(Ac ) = 1 − P(A).
T 1 = P(A ∪ Ac ) = P(A) + P(Ac ).

15
Inclusion-Exclusion Principle

Theorem
For A, B ∈ F,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

Note that A ∪ B = A ∪ (B \ A) and hence

P(A ∪ B) = P(A) + P(B \ A).

Also B = (B \ A) ∪ (A ∩ B) and hence

P(B) = P(B \ A) + P(A ∩ B)

Using the two equalities, we get the result.


16
Inclusion-Exclusion Principle

The above can be generalized to multiple sets.


Theorem
For any events A1 , A2 , · · · , An ∈ F, we have
X X X
P(A1 ∪ A2 ∪ · · · ∪ An ) = P(Ai ) − P(Ai ∩ Aj ) + P(Ai ∩ Aj ∩ Ak )
i i<j i<j<k
n+1
− · · · + (−1) P(A1 ∩ A2 ∩ · · · ∩ An ).

We can use induction to prove this.

17
Inclusion-Exclusion Principle

Assume that the result is true for n and we will prove it for n + 1.

P(A1 ∪ A2 ∪ · · · ∪ An ∪ An+1 )
= P(A1 ∪ A2 ∪ · · · ∪ An ) + P(An+1 ) − P((A1 ∪ A2 ∪ · · · ∪ An ) ∩ An+1 )
= P(A1 ∪ A2 ∪ · · · ∪ An ) + P(An+1 ) − P((A1 ∩ An+1 ) ∪ (A2 ∩ An+1 ) ∪ · · · ∪ (An ∩ An+1 ))

Now expand the terms and rearrange to complete.

18
Inclusion-Exclusion Principle

n
X X X
= P(Ai ) − P(Ai1 ∩ Ai2 ) + P(Ai1 ∩ Ai2 ∩ Ai3 )
i=1 1≤i1 <i2 ≤n 1≤i1 <i2 ≤i3 <n

− · · · + (−1)n+1 P(A1 ∩ A2 ∩ · · · ∩ An ) + P(An+1 )


n
X X
− P(Ai ∩ An+1 ) + P(Ai1 ∩ Ai2 ∩ An+1 )
i=1 1≤i1 <i2 ≤n
X
− P(Ai1 ∩ Ai2 ∩ Ai3 ∩ An+1 )
1≤i1 <i2 <i3 ≤n

+ · · · − (−1)n+1 P(A1 ∩ A2 ∩ · · · ∩ An ∩ An+1 )


Close inspection will complete the proof.
19
Boole’s Inequality (Union Bound)

Theorem
Let A1 , A2 , · · · , An ∈ F. Then
n
X
P(A) ≤ P(Ak ).
k=1

Sn
Define B1 = A1 and Bn+1 = An+1 \ (A1 ∪ · · · ∪ An ). Then A = k=1 Bk and
n
X n
X
P(A) = P(Bk ) ≤ P(Ak ).
k=1 k=1

This proves Boole’s Inequality.


20
A more general inequality (Boneferroni Inequality) will be proved in later
Probability of Derangements

A derangement is a permutation σ of {1, 2, · · · , n} such that σ(i) ̸= i for


each i ∈ {1, 2, · · · , n}. The probability that a random permutation is a
derangement is nk=0 (−1)k k!1 which tends to e1 as n → ∞.
P

Let Ai = {σ : σ(i) = i}. The |Ai | = (n − 1)!. Moreover, for any finite subset
I ⊆ {1, 2, · · · , n}, ∩i∈I Ai has exactly (n − |I|)! elements. Using the
inclusion-exclusion principle, we see that a random permutation has a
fixed point (i.e., in ∪i∈{1,2,··· ,n} Ai ) with probability nk=1 (−1)k+1 k!1 . Thus the
P

probability of derangements is nk=0 (−1)k k!1 . This converges to e1 as


P

n → ∞.

21
Continuity Property

Theorem
S∞
Let A1 ⊆ A2 ⊆ · · · ∈ F and A = n=1 An . Then

P(A) = lim P(An ).


n→∞

22
Continuity Property

Theorem
S∞
Let A1 ⊆ A2 ⊆ · · · ∈ F and A = n=1 An . Then

P(A) = lim P(An ).


n→∞

Follows from
P(A) = P(A1 ∪ (A2 \ A1 ) ∪ (A3 \ A2 ) · · · )
= P(A1 ) + P(A2 \ A1 ) + P(A3 \ A2 ) + · · ·
= P(A1 ) + (P(A2 ) − P(A1 )) + (P(A3 ) − P(A1 )) + · · · = lim P(An )
n→∞

The last equality follows from the absolute summability of the series.
22
Continuity Property

P : F → R is said to be finitely additive probability measure if it satisfies

D P(A) ≥ 0 for every A ∈ F.


D P(Ω) = 1.
D If A1 , A2 , · · · , An ∈ F are pairwise disjoint, then

P(A1 ∪ A2 ∪ · · · ∪ An ) = P(A1 ) + P(A2 ) + · · · + P(An ). (Finite Additivity)

23
Continuity Property

Theorem
Let P be finitely additive probability measure, then P is countably
additive if and only if the continuity property holds for P.

24
Continuity Property

Theorem
Let P be finitely additive probability measure, then P is countably
additive if and only if the continuity property holds for P.

We have already proved that if P is countably additive, then the


continuity holds. To prove the other way, let A1 , A2 , · · · , be disjoint sets in
F. Now consider Bn = A1 ∪ A2 ∪ · · · ∪ An for n ≥ 1. Then B1 ⊆ B2 ⊆ · · · and
∪An = ∪Bn . Therefore

[ ∞
[ n
X ∞
X
P( An ) = P( Bn ) = lim P(Bn ) = lim P(Am ) = P(An ).
n→∞ n→∞
n=1 n=1 m=1 n=1

24
Conditional Probability

Definition
Conditional probability of A given that the even B has occurred is given
by
P(A ∩ B)
P(A | B) =
P(B)

25
Theorem
If A1 , A2 , · · · , An ∈ F and if P(A1 ∩ A2 ∩ · · · ∩ An ) > 0, then

P(A1 ∩ A2 ∩ · · · ∩ An ) = P(A1 )P(A2 | A1 ) · · · P(An | A1 ∩ A2 ∩ · · · ∩ An−1 ).

Note that
P(A1 ∩ · · · ∩ An−1 ) ≥ P(A1 ∩ A2 ∩ · · · ∩ An ) > 0.
Therefore

P(A1 ∩ A2 ∩ · · · ∩ An ) = P(A1 ∩ · · · ∩ An−1 )P(An | A1 ∩ · · · ∩ An−1 ).

We can now complete the proof.

26
Theorem (Law of Total Probability)
Let An , n ≥ 1 be a partition (finite or countable) of Ω. Then if A ∈ F,
X
P(A) = P(A | An )P(An ).
n

Note that


[ ∞
X ∞
X
P(A) = P( (A ∩ An )) = P(A ∩ An ) = P(A | An )P(An ).
n=1 n=1 n=1

27
Bayes’ Theorem
Let An , n ≥ 1 be a partition (finite or countable) of Ω and suppose
P(A) > 0. Then
P(A | An )P(An )
P(An | A) = P .
m P(A | Am )P(Am )

Using the multiplication rule, we have

P(B | A)P(A) = P(A ∩ B) = P(A | B)P(B)

Therefore
P(A | B)P(B)
P(B | A) =
P(A)
Take B = An and use the law of total probability.
28
Independence

Definition
Two events A and B are said to be independent if

P(A ∩ B) = P(A)P(B).

29
Consequences

Theorem
If A and B are independent, so also are A and Bc , Ac and B , and Ac and Bc .

30
Consequences

Theorem
A and B are independent if and only if P(A | B) = P(A) (provided .... ).

31
Consequences

Theorem
A 7→ P(A | B) defines a new probability measure on F, called the
conditional probability measure given B (provided ...).

32
Some Examples
Boy or Girl paradox / Two Child Problem

Example
A family has two children, Jeet and Kiran. Kiran is a girl. What is the
probability that both children are girls?

Compare this with the following:


Example
A family has two children, Jeet and Kiran. At least one of them is a girl.
What is the probability that both children are girls?

33
Scandal of Arithmetic (The Paradox of the Chevalier De Méré)

Example
Which is more likely, obtaining at least one six in 4 tosses of a fair die
(event A), or obtaining at least one double six in 24 tosses of a pair of
dice (event B)?

64 −54
D P(A) = 64
≈ 0.518
3624 −3524
D P(B) = 3624
≈ 0.491
D Why is this paradox?

34
Problem of Points / Division of Stakes

D Two teams compete in a game of skill.


D They play until one wins six rounds.
D The winner will take the entire prize.
D How the should the prize be divided if the game is interrupted with
the score 5 to 2?
D Luca Pacioli considered this problem in 1494 textbook Summa de
arithmetica, geometrica, proportioni et proportionalità.
D Pacioli’s solution is to divide the stakes in the ration 5:2.
D Mid 16th century, Niccolò Tartaglia noticed counterintuitive results
(e.g., when the game is stopped after first round).
D Girolamo Cardano has provided one solution, which was also not
adequate. 35
Problem of Points / Division of Stakes

D The problem was solved by Pierre de Fermat and Blaise Pascal in a


series of letters.
D They never met in person.
D This communication gave birth to the Probability.
D This problem was brought to the attention of Pascal and Fermat
around 1654 by the Chevalier de Méré, a famous gambler and noble
man in Paris.

36
Problem of Points / Division of Stakes

D Two players play a game of chance.


D Each player puts up equal stakes.
D The first player who wins a certain number of rounds will collect the
entire stake.
D Suppose the game is interrupted before either player has won.
D How do the players divide the stake?

37
Birthday Paradox

D What is the probability that, in a group of N people, two of them


share the same birthday
D Consider N balls numbered 1 to N in an urn. We want to compute the
probability that the numbers of the n balls picked with replacing are
different.
D After two trials, the probability that the two numbers are different is
1 − N1 .

38
Birthday Paradox

D After three trials, given that first two outcomes are different, the
conditional probability is N2 that the third outcome will be equal to
one of the two outcomes. Thus the conditional probability is 1 − N2
that the third will differ from first two. Thus
1 2
P(first three trials different) = (1 − )(1 − ).
N N
D Thus
1 2 n
P(first n trials different) = (1 − )(1 − ) · · · (1 − ).
N N N

39
Birthday Paradox

D Thus
1 2 n−1
P(first n trials different) = (1 − )(1 − ) · · · (1 − ).
N N N
D Taking logarithm,

1 2 n−1
log(probability) = log (1 − ) + log (1 − ) + · · · + log (1 − )
N N N
D For 0 < x < 1,
log (1 − x) ≤ −x.

40
Birthday Paradox

D Hence
1 2 n−1 1 2 n−1
log (1 − ) + log (1 − ) + · · · + log (1 − ) ≤ −( + + · · · + )
N N N N N N
D Thus
1 n(n − 1)
log P(n trials are different) ≤
2 N
D This gives
 n(n − 1) 
P(n trials are different) ≤ exp −
2N

41
Birthday Paradox

D With 23 people, we can see that the probability will of two people
having the same birthday will be more that 1/2. And with 42 people, it
will be more than 0.9.

D If n is more than N, the probability will be more than 1/2.

42
Monty Hall Problem

Suppose you’re on a game show, and you’re given the choice of three
doors: Behind one door is a car, behind the others, goats. You pick a door,
say No. 1, and the host, who knows what’s behind the doors, opens
another door, say No. 3, which has a goat.
He then says to you, “Do you want to pick door No. 2?”
Is it to your advantage to switch your choice ?

43
Change Problem

There are 2n people in a waiting line at a box office. n of them have |10
coins while the others have |5 coins. A ticket costs |5. Every one in the
line needs exactly one ticket. The cashbox is empty initially. What is the
probability that no person will wait for change?

44
Change Problem

D Assume ticket buyers are located at points 1, 2, · · · , 2n on the


horizontal axis.
D The empty cashbox is at origin.
D if a person has |10, the cash line goes one step up, otherwise it goes
one step down.
D At both ends, the line has zero y-coordinate.
D Favourable lines are those which do not go above x-axis.

45
Change Problem

(2n, 2)

0 (2n, 0)

D Let us count the number of lines crossing or touching the line y = 1.


D These are the only lines which are favourable to the opposite event,
when someone will wait for the change.

46
Change Problem

D For each of these lines draw a dummy line. It coincides with the
original line till the first hit of the line y = 1, then it mirrors the
original line.
D Any dummy line then starts at the origin and ends at the point (2n, 2).
D It consists of n + 1 steps up and n − 1 steps down.
2n
D Thus there are

n−1
dummy lines.
2n 2n
D Finally there
 
are n − n−1 lines favourable to our event, and the
probability is
2n 2n
 
n
− n−1 1
p= 2n
 = .
n
n+1

47
Urn Problems

There are a blue balls and b red balls in an urn. Two balls are taken one
by one. What’s the probability that the first ball is red and the second ball
is blue?
Let A be the event that the first ball is red and let B be the event that the
second ball is blue.

b
P(A) = .
a+b
Now
a
P(B|A) = .
a+b−1
48
Urn Problems

Finally
ab
P(A ∩ B) = P(A)P(B|A) = .
(a + b)(a + b − 1)

49
Urn Problems

Another way to solve this:


Let the balls be numbered through 1 to a be blue while the others are red.
Then
Ω = {(1, 2), (1, 3), · · · , (a + b, a + b − 1)}
where (x, y) denotes the outcome where x denotes the number on the
first ball and y is number on the second ball. Therefore

A ∩ B = {(a + 1, 1), (a + 1, 2), · · · , (a + b, a)}

Note that |Ω| = (a + b)(a + b − 1) and |A ∩ B| = ab, solving the problem.

50
Surprises with Conditional Probability

Consider a rare disease X that affects one in a million people. A medical


test is used to test for the presence of the disease. The test is 99%
accurate in the sense that if a person has no disease, the chance that the
test shows positive is 1% and if the person has disease, the chance that
the test shows negative is also 1%.
Suppose a person is tested for the disease and the test result is positive.
What is the chance that the person has the disease X?

51
Surprises with Conditional Probability

Let A be the event that the person has the disease X.


Let B be the event that the test shows positive.
Therefore P(A) = 10−6 , P(B | A) = 0.99 and P(B | Ac ) = 0.01.
Need to find P(A | B).
Baye’s rule implies

P(B | A)P(A)
P(A | B) = = 0.000099
P(B | A)P(A) + P(B | Ac )P(Ac )

52
Surprises with Conditional Probability

The test is quite an accurate one, but the person tested positive has a
really low chance of actually having the disease! Of course, one should
observe that the chance of having disease is now approximately 10−4
which is considerably higher than 10−6 .
A calculation-free understanding of this surprising looking phenomenon
can be achieved as follows: Let everyone in the population undergo the
test. If there are 109 people in the population, then there are only 103
people with the disease. The number of true positives is approximately
103 × 0.99 ≈ 103 while the number of false positives is
(109 − 103 ) × 0.01 ≈ 107 . In other words, among all positives, the false
positives are way more numerous than true positives.
53
Surprises with Conditional Probability

The surprise here comes from not taking into account the relative sizes of
the sub-populations with and without the disease. Here is another
manifestation of exactly the same fallacious reasoning.
A person X is introverted, very systematic in thinking and somewhat
absent-minded. You are told that he is a doctor or a mathematician. What
would be your guess - doctor or mathematician?
A common answer is “mathematician”. Even accepting the stereotype that
a mathematician is more likely to have all these qualities than a doctor,
this answer ignores the fact that there are perhaps a hundred times more
doctors in the world than mathematicians! In fact, the situation is
identical to the one in the example above, and the mistake is in confusing 54
P(A | B) and P(B | A).
Random Variables
Discrete Random Variables

D (Ω, F, P) be a probability space.


D We assume that Ω is discrete (finite set or infinite and countable set).
D Since Ω is discrete, F can be the entire powerset of Ω i.e., 2Ω .
D X : Ω → R is a discrete random variable provided
{ω ∈ Ω : X(ω) = x} ∈ F for each x ∈ R.
D Since Ω is a discrete set, X takes only discretely many values.
D The smallest σ-field FX ⊆ F containing the events {ω ∈ Ω : X(ω) = x}
for each x ∈ R is called the σ-field generated by X (or information
space given by X).

55
Independence of Random Variables

D Two random variables X and Y are said to be independent if the


information spaces corresponding to them are independent.
Equivalently if

P(X = x, Y = y) = P(X = x)P(Y = y)

for any x, y.
D Independence extends naturally to a family of random variables.

56
Distribution of Random Variables

D Random variable X is characterised by its distribution i.e., the


probabilities P(X = x) for different values of x.
D Two random variables X and Y are identically distributed if their
distribution is same. In other words,

P(X = x) = P(Y = x)

for each x.

57
Discrete Random Variables

D Expectation of the random variable is defined by


X
EX = xP(X = x)
x∈R

D Since X takes at most countably infinite values, the above sum makes
“sense”.
D Variance of X is defined by Var(X) = E(X − EX)2 .

58
Expectation

D Let X, Y be two random variables, then


E(X + Y) = EX + EY.

Proof.
X XX
E(X + Y) = zP(X + Y = z) = zP(X + Y = z|Y = y)P(Y = y)
z z y
XX
= zP(X = z − y)P(Y = y)
z y
XX XX
= (z − y)P(X = z − y)P(Y = y) + yP(X = z − y)P(Y = y)
z y z y

= EX + EY
59
Expectation

D Let X be a random variable and c be a constant. Then E(cX) = cEX.


D Let X be discrete random variable that takes on only nonegative
integer values. Then

X
E(X) = P(X > n).
n=1

60
Examples

D Bernoulli random variable: X ∼ Ber(p) if



1 with probability p
X=
0 with probability 1 − p.

D EX = 21 .
D What is Var(X)?

61
Examples

D Binamomial random variable: X ∼ Bin(n, p) if


 
n k
X = k with probability p (1 − p)n−k
k

D Observe that X = X1 + X2 + · · · + Xn where Xi , i = 1, 2, · · · are iid


(independent and identically distributed with Bernoulli distribution.
D Find expectation and variance.

62
Geometric Distribution

D X has geometric distribution with parameter p if

P(X = n) = (1 − p)n−1 p

for n = 1, 2, · · · .
D X denotes the number of coin tosses required until the coin lands on
heads.
D For a geometric random variable X with parameter p and n > 0,

P(X = n + k|X > k) = P(X = n).

63
Jensen’s Inequality

D For any convex function f ,

f (E[X]) ≤ Ef (E[X]).

64
Conditional Expectation

D Let X, Y be two random variables. Then


X
E(X|Y = y) = xP(X = x|Y = y).
x

D For any two random variables X and Y,


X
E(X) = P(Y = y)E(X|Y = y).
y

D E(X|Y) is a random variable f (Y) that takes on the value E(X|Y = y)


when Y = y.
D E(X) = E[E(X|Y)].

65
Coupon Collector’s Problem

D There are n coupons. In each draw, a coupon is drawn uniformly at


random. How many coupons do I need to draw to get all n coupons
(in expectation)?

66
Bonferroni’s Inequalities

Let A1 , A2 , · · · , An ∈ F and let A = A1 ∪ A2 ∪ · · · ∪ An . Let


n
X X X
S1 = P(Ak ), S2 = P(Ai ∩Aj ), · · · , Sk = P(Ai1 ∩Ai2 ∩· · ·∩Aik )
k=1 1≤i<j≤n 1≤i1 <i2 <···<ik ≤n

Then S1 − S2 ≤ P(A) ≤ S1 . More generally for m = 1, 2, · · · , n, we have



≤ Pm (−1)k−1 S for odd m
k=1 k
P(A)
≥ m (−1)k−1 Sk for even m.
P
k=1

The inequality P(A) ≤ S1 is known as Boole’s inequality (also Union


bound).
67
Continuous Random Variables
Continuous Random Variables

D Let (Ω, F, P) be a probability space.


D X : Ω → R is said to be random variable if

(X ≤ x) = {ω ∈ Ω : X(ω) ≤ x} ∈ F

for each x ∈ R.
D Let F(x) = P(X ≤ x). F is called the distribution function of the
random variable X.
D If F ′ (x) exists, then f (x) = F ′ (x) is called the density of X.

68
Continuous Random Variables

D Let X, Y be two random variables. The joint distribution is given by

F(x, y) = P(X ≤ x, Y ≤ y)

for each x, y ∈ R.
D X, Y are independent if

F(x, y) = FX (x)FY (y).

for each x, y ∈ R.

69
Uniform Distribution

D X ∼ Unif [a, b] if 
0 if x ≤ a,


FX (x) = x−1
b−a
if a ≤ x ≤ b,


1 if x ≥ b

and its density function is given by



0 if x ≤ a,


fX (x) = b−a1
if a ≤ x ≤ b,


0 if x ≥ b

70
Exponential Distribution

D An exponential distribution with parameter θ (and is denoted by


X ∼ exp(θ) ) is given by if

1 − e−θx for x ≥ 0,
FX (x) =
0 otherwise.

D Let X ∼ exp(θ). Then

P(X > s + t|X > t) = P(X > s).

71
Expectation

D For continuous random variable X, its expectation is defined by


Z Z
EX = xdF(x) = xf (x)dx

where F is its distribution function and f is its density.


D The first integral is to be understood as Riemann-Stieltjes integral.
The second one is Riemann integral. A formal definition requires
Lebesgue integration.
D Variance of X is defined as

Var(X) = E(X − E(X))2 .

72
Markov’s Inequality

D Let X be random variable that assumes only non-negative values.


Then for all a > 0,
EX
P(X ≥ a) ≤ .
a
D Consider 
1 if X ≥ a
I=
0 otherwise.

Then I ≤ aX . Therefore EI ≤ Ea . But EI = P(X ≥ a), proving the Markov’s


inequality.

73
Chebyshev’s Inequality

D Let X be random variable and a > 0. Then

Var(X)
P(|X − EX| ≥ a) ≤ .
a2

74

You might also like