Lecture Set 5
Lecture Set 5
DNSC 6311
Stochastic Foundations: Probability
Korel Gundem
1 / 26
Lecture Outline
7. Uniform model.
2 / 26
Models Related to Bernoulli Trials
We can define other RVs from the Bernoulli trials.
For example, what is the distribution of number of down days
before a particular stock is up ?
0.5
0.4
Relative Frequency
0.3
0.2
0.1
0.0
0 2 4 6 8
Days
3 / 26
Distribution of Number of Failures Before the First Success
▶ Note that the random variable, X , number of failures (down
days) before the first success (up day ), can take values,
x = 0, 1, 2, . . .
▶ The random variable X will have a geometric distribution with
parameter p,
Note that
∞
X
(1 − p)x = 1/p.
x=0
(geometric series)
4 / 26
Back to the GE Data
How does the geometric distribution fit to the GE data on number
down days before the first up day ?
We can evaluate the geometric probabilities using R function
dgeom(x,p);dgeom(0,0.4854)=0.4854.
Relative Frequencies of No. of Down Days Before GE is Up
0.5
0.4
Relative Frequency
0.3
0.2
0.1
0.0
0 2 4 6 8
Days
5 / 26
Properties of the Geometric Distribution
1−p
E [X ] =
p
which is an intuitive result.
1−p
V [X ] = .
p2
6 / 26
R Computations with Geometric Model
Let X denote the number of down days in the market before GE stock goes up.
▶ What is the probability that GE stock will be down more than 4 days before it
goes up?
P(X ≥ 5) = 1 − pgeom(4, 0.4854) = 0.0361.
▶ What is the probability that GE stock will be down at most 2 days before it
goes up?
P(X ≤ 2) = pgeom(2, 0.4854) = 0.8637.
▶ What is the expected number of down days until the first up day ?
1−p 1 − 0.4854
E [X ] = = = 1.06
p 0.4854
7 / 26
Negative Binomial Distribution
▶ What about the RV, number of downs before the rth up day ?
8 / 26
Properties of the Negative Binomial Distribution
9 / 26
Sampling Without Replacement
▶ Hypergeometric Experiment (DS Section 5.3)
- A random sample of size n is selected from N items without replacement.
- K of the N items can be classified as successes and (N − K ) are classified as
failures.
▶ The distribution of X = Number of successes in the (hypergeometric)
experiment is given by
K N−K
x n−x
P(X = x) = N
, x = 0, 1, . . . , n.
n
10 / 26
Multivariate Extension of Binomial: Trinomial Distribution
▶ Consider an extension of the binomial experiment where each
of the n independent trials has three P3possible outcomes with
probabilities p1 , p2 , p3 , such that i=1 pi = 1.
▶ Note that the probabilities are constant across the trials and
we are interested in Xi ,P number of outcomes of type i,
i = 1, . . . , 3, such that 3i=1 Xi = n.
▶ Then the joint distribution of (X1 , X2 , X3 ) is a trinomial
distribution
n!
p(x1 , x2 , x3 ) = p1x1 p2x2 p3x3 ,
x1 ! x2 ! x3 !
P3 P3
where i=1 xi = n and i=1 pi = 1.
Since 3i=1 xi = n, this is a bivariate distribution.
P
▶ Thus, we obtain
n − x2 p1 x1 p1 n−x1 −x2
P(x1 |X2 = x2 ) = 1− ,
x1 1 − p2 1 − p2
0.2
0.1
0.0
0 1 2 3 4 5 6 7
14 / 26
Poisson Model
▶ Related to the Poisson process (Poisson experiment).
e −λ λx
Pr (X = x) = P(x) = , x = 0, 1, . . .
x!
15 / 26
Fitting A Poisson Model to Number of Shopping Trips
In fitting a Poisson model it is important to see the mean and the
variance are close to each other, that is, their ratio is closer to one.
We can check this in R via var(x) and mean(x) functions.
From the data, we can estimate λ by x and compute probabilities
in R via dpois(x,λ).
Poisson Fit to the Distribution of Number of Weekly Trips
0.5
Actual
Poisson
0.4
0.3
relfreq
0.2
0.1
0.0
0 1 2 3 4 5 6 7
N
16 / 26
Poisson Process (Experiment)
▶ RV X (t) denotes the number of occurrences of an outcome
during a time interval t such that
1. No. of occurrences in a time interval is independent of number
of occurrences in any other disjoint time interval.
2. Probability of an occurrence in any very short time interval is
proportional to the length of the time interval.
3. Probability that more than one outcome will occur in any very
short time interval is negligible.
▶ Based on these assumptions it can be shown that X (t) has a
Poisson distribution with rate λt, that is, the rate is
proportional to the length of the interval.
▶ What is the probability of 0 occurrences in a time interval t ?
e −λt λ0
Pr (X (t) = 0) = = e −λt .
0!
Note that ”0 occurrences in a time interval t” ⇐⇒ ”next
occurrence will be later than t”; exponential distribution.
17 / 26
Example: Poisson Experiment
▶ Assume that number of calls arriving to a call center is
Poisson with (mean) rate λ = 2 per minute during the period
of 9:00 am-12:00 noon.
This means that X (t) = Number of calls arriving to a call
center during a period of t minutes is Poisson with rate
λt = 2t, that is, E [X (t)] = λt = 2t.
▶ What is the probability of having more than 25 calls during
10:00-10:10 am ?
Expected number of calls during the ten minutes will be
2 × 10 = 20.
0.08
0.06
Prob(calls)
0.04
0.02
0.00
0 10 20 30 40
calls
19 / 26
Example Cont’d
▶ What is the probability of having more than 18 calls during
10:00-10:10am and more than 24 calls during 10:20-10:35am?
20 / 26
Poisson Approximation to Binomial Distributions
The Poisson distribution with rate n × p can be obtained as a limiting case of the
binomial distribution when n is very large and p is very small.
Poisson Binomial
0.14
0.14
0.12
0.12
0.10
0.10
0.08
0.08
P(X=x)
P(X=x)
0.06
0.06
0.04
0.04
0.02
0.02
0.00
0.00
0 2 4 6 8 10 12 14 16 18 20 22 24 26 0 2 4 6 8 10 12 14 16 18 20 22 24 26
Poisson Binomial
0.20
0.7
0.6
0.15
0.5
0.4
P(X=x)
P(X=x)
0.10
0.3
0.2
0.05
0.1
0.00
0.0
0 1 2 3 4 5 6 7 8 9 10 12 14 16 18 0 1 2 3 4 5 6
lambda=4.5 n=5,p=0.9
Example: Suppose that in a large population the proportion of people who have a
certain disease is 0.01. What is the probability that in a random group of 200 people
at least four people will have the disease? Compute the probability using the binomial
and Poisson mass functions.
21 / 26
Continuous Random Variables
▶ Random Variable (RV) X can take infinite (uncountable)
number of values.
P(X = x) = 0.
▶ A RV X is a continuous RV if there exists a nonnegative
function f (x) such that
Z b
P(a < X < b) = f (x)dx.
a
0.4
0.10
0.3
0.3
0.08
0.06
density
density
density
0.2
0.2
0.04
0.1
0.1
0.02
0.00
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0 10 20 30 40 50
x x x
23 / 26
Uniform Distribution
▶ Uniform distribution over (a, b)
1
f (x) = ,
b−a
1 x −a
Z x
F (x) = P(X < x) =
dy = .
b−a b−a a
▶ We can evaluate f (x) and F (x) using R statements:dunif (x, a, b) and
punif (x, a, b), respectively.
▶ For a = 0 and b = 1
1.0
0.8
1.2
Cumulative Density
0.6
Density
1.0
0.4
0.8
0.2
0.6
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
24 / 26
Expected Value of a Continuous RV
▶ For a continuous random variable X the expected value is
given by Z ∞
E (X ) = xf (x)dx.
-∞