IPHS 402
Distributions of Random Variables
Bernoulli, Binomial and Poisson
Distributions
Biostats Lecture 7
Hua Yun Chen
Lester Arguelles
With acknowledgement to Dr. Dominic Reda on providing the original PP slides
1
Biostats Lecture 7 (Diez Chapter 4)
Bernoulli Distribution (4.2.1)
Binomial Distribution (4.3)
The Binomial Distribution (4.3.1)
Normal Approximation to the Binomial (4.3.2)
The Normal Approximation Breaks Down on
Small Intervals (4.3.3)
Poisson Distribution (4.5)
Not covering:
Geometric Distribution (4.2.2)
Negative Binomial Distribution (4.4)
2
Bernoulli & Binomial Distributions
3
Bernoulli Trial (Experiment, or Random
Variable)
Definition: A Bernoulli trial (experiment, or random
variable) results in one of two possible outcomes
(success/failure), where the success probability is p, and
the failure probability is q=(1-p)
Examples:
• Coin flip (head, tail)
• a person gets flu in 2018 (yes, no)
• a terminally sick patient survives next 5 years (yes, no)
• answer to a true/false question
4
Bernoulli Distribution
If X is a random variable with a Bernoulli distribution,
then
X 0 1
P(x) q=1-p p
For examples:
1. X is the number of heads in a coin flip.
2. X is the number of persons having a disease in a
random draw of an individual from the population.
5
Bernoulli Random Variable Examples
Example 1: Let X be the breast cancer status of a 50+
year old woman with 0.1 prevalence of breast cancer in
this age group. Then X has a Bernoulli distribution with
p = 0.1.
Example 2: Let X be an adult who gets influenza,
where prevalence of influenza in this group is 0.6. Then
X has a Bernoulli distribution with p = 0.6.
Example 3. A student gets a final grade of A in the
class, where the probability a student gets an A is 0.45.
6
Expected Value & Variance of a
Bernoulli Random Variable
If X has a Bernoulli distribution with probability p, then
Mean of X:
Variance of X:
Standard deviation of X :
7
Example: Multiple Choice Quiz
A quiz includes 3 multiple choice questions each
with 4 choices. Suppose you have no clue about the
quiz or topics. What is the chance you answer all 3
correctly?
8
Setup and Answer
Let
, if question i is answered correctly
, if question i is answered incorrectly
where i=1,2,3
Probability of guessing correctly on one question with four answer
choices is ¼. This is a Bernoulli random variable with p = ¼.
There is only one way to answer all three correctly.
Assuming all 3 questions are independent,
P(all 3 questions answered correctly)=
9
Multiple Choice Quiz Continues: From
Bernoulli to Binomial
Let’s look at this a slightly different way.
What is the chance you answer 3 questions correctly?
There is only one way to answer all 3 questions correctly:
10
Multiple Choice Quiz Continues
What is the chance you answer 2 questions correctly?
There are three ways to answer 2 of 3 questions
correctly: .
=3*
11
Multiple Choice Quiz Continues
What is the chance you answer 1 question correct?
There are three ways to answer 2 of 3 questions
correctly: .
=3*
12
Multiple Choice Quiz Continues
What is the chance you answer 0 questions correctly?
There is only one way to answer all 3 questions
incorrectly:
Looking at the three previous slides and this one, what
do the probabilities sum to?
0.015625 + 0.140625 + 0.421875 + 0.421875 = ?
13
Formula for calculating such
probabilities
General formula:
P(answering x questions correctly) =
Number of ways to answer x of 3 questions correctly =
where denotes the number of ways to answer x
questions correctly out of 3. This is an example of
the Binomial Distribution.
14
Formula for calculating binomial
probabilities
More general formula:
P(x successes out of n trials) =
where denotes the number of ways to have x successes
out of n trials. P is the success probability in one trial.
This is the general Binomial Distribution.
15
How can we get the binomial
probabilities and know they are correct?
1. Set , then
.
are binomial probabilities respectively for success out
of 1 trial.
2.
are binomial probabilities respectively for success out
of 2 trials.
16
How can we get the binomial
probabilities and know they are correct?
3.
are binomial probabilities respectively for successes
out of 3 trials.
4. In general,
P(x successes out of n trials) =
17
Choose Function (continued)
Computing The # Of Ways
The choose function is useful for calculating the number of ways
to choose k successes in n trials.
• n! is “n factorial”,
• By definition and
• is “n choose k”
Examples:
K=1, n=4: = = 4
K=2, n=9: = = 36
18
Practice (Choose Function)
Which of the following is false?
19
Practice (Choose Function)
Which of the following is false?
a) When k=1, n!/(1!(n-1)! = n!/(n-1)! = n
b) When k=n, n!/(n!0!) = n!/n! = 1
c) When k=0, n!/0!n!) = n!/n! = 1
d) When k=n-1, n!/(n-1)!1! = n/1 = n
20
Binomial Probabilities
If p represents the probability of success, (1-p) represents
theprobability of failure, n represents the number of independent
Bernoulli trials, and X represents the number of successes
A binomial random variable is the “sum of n independent and
identically distributed Bernoulli random variables.”
21
Conditions for Binomial Distribution
● n Bernoulli trials
● n is fixed (determined in advanced and can’t
change)
● all n Bernoulli trials are independent
● success probability p is the same in all Bernoulli
trials
Then the number of successes X has a Binomial(n,p)
distribution or simply B(n,p).
Note: X ~ B(n,p) is same as X is binomial with n
independent identical Bernoulli trials with the same 22
“success” probability p.
Practice – Binomial Distribution
Which of the following is not a condition that needs to
be met for the binomial distribution to be applicable?
1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified as a success
or a failure
4. the number of desired successes, k, must be greater
than the number of trials
5. the probability of success, p, must be the same for
each trial
23
Practice – Binomial Distribution
Which of the following is not a condition that needs to
be met for the binomial distribution to be applicable?
1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified as a success
or a failure
4. the number of desired successes, k, must be greater
than the number of trials
5. the probability of success, p, must be the same for
each trial
24
Practice – Using the Binomial Distribution
A 2012 Gallup survey suggests that 26.2% of
Americans are obese. Among a random sample of 10
Americans, what is the probability that exactly 8 are
obese?
25
Practice – Using the Binomial Distribution
A 2012 Gallup survey suggests that 26.2% of
Americans are obese. Among a random sample of 10
Americans, what is the probability that exactly 8 are
obese?
26
Mean, Variance, and Standard Deviation of
Binomial Distribution
Mean:
Variance:
Standard deviation:
Note: Mean and standard deviation of a binomial might not always be whole
numbers. These values represent what we would expect to see on average.
27
Probability Distribution Function (pdf)
reviewed
Definition: If X is a discrete random variable, then P(X
x) is the probability distribution function (pdf) where x
is in the sample space of X.
Since the binomial distribution is discrete, it has a
probability distribution function rather than a
probability density function
28
Cumulative Distribution Function (cdf)
Definition: If X is a random variable, then P(X
x), where x is in the sample space of X, is the
cumulative distribution function (cdf).
From the cdf you can obtain the “less than or
equal to” or “at most” cumulative probability.
Note: P(X > x) = 1 - P(X x)
Why?
This works for both discrete and continuous
random variables
Why? 29
Practice – Free Throw Probability
A consistent free throw (FT) basketball player has a 75%
FT percentage, i.e., 75% of the time the player scores on
free throws. Suppose in a game she was awarded 3 FTs.
Each successful free throw is 1 point.
1. What is the pdf of the number of points of 3 FTs?
2. What is the probability that she scores 2 points?
3. What is the probability that she scores at most 2
points?
4. What is the probability that she scores at least 1 point?
5. What are the mean, variance, standard deviation of the
distribution of the number of FT points?
30
Practice – Free Throw Probability
(continued)
Let X be random variable with number of points scored from 3 FTs.
1. What is the pdf of number of points of 3 FTS?
X ~ Binomial (n, p) = Binomial (3, .75)
2. What is the probability that she scores 2 points?
P(X=2)=
= = 3 x 0.5625 x 0.25
= .421875
3. What is the probability that she scores at most 2 points?
P(X 2)=P(X=0)+P(X=1)+P(X=2)
= + +
= (1 x 1 x .015625) + (3 x .75 x .0625) + 3 x 0.5625 x 0.25
= .015625 + .140625 + .421875
= .578125
31
Practice (continued)
4. What is the probability that she scores at least 1 point?
P(X 1)
=P(X=1)+P(X=2)+P(X=3)
= (1- P(X=0))
=(1 - .0156)
=.9843
5. What are the mean, variance, standard deviation of the distribution
of the number of points?
,,
np = 3 x 0.75 = 2.25
np (1-p) = 2.25 x .25 = .5625
= = .75
32
Distribution of Number of Successes
in n Trials as n Increases
Below are histograms of samples from the binomial model
where p = 0.10 and n = 10, 30, 100, and 300.
What happens as n increases?
33
An Analysis of Facebook Users
A recent study found that ``Facebook users get more than they
give". For example:
1. 40% of Facebook users in our sample made a friend request, but
63% received at least one request.
2. Users in our sample pressed the like button next to friends'
content an average of 14 times, but had their content ``liked" an
average of 20 times.
3. Users sent 9 personal messages, but received 12.
4. 12% of users tagged a friend in a photo, but 35% were
themselves tagged in a photo.
Any guesses for how this pattern can be explained?
Power users contribute much more content than the typical user.
http://www.pewinternet.org/Reports/2012/Facebook-users/Summary.aspx
34
Practice – Facebook Users
This study also found that approximately 25% of Facebook users
are considered power users. The same study found that the
average Facebook user has 245 friends. What is the probability
that the average Facebook user with 245 friends has 70 or more
friends who would be considered power users? Note any
assumptions you must make.
We are given that n = 245, p = 0.25, and we are asked for the
probability P(K ≥70). To proceed, we need to assume
independence among the Facebook users.
P(X ≥ 70) = P(K = 70 or K = 71 or K = 72 or … or K = 245)
= P(K = 70) + P(K = 71) + P(K = 72) + … + P(K = 245)
This seems like an awful lot of work...
35
Normal Approximation to the Binomial
36
Practice – Facebook Users (continued)
What is the probability that the average Facebook user
with 245 friends has 70 or more friends who would be
considered power users?
37
How Large is Large Enough for the Normal to
be a Good Approximation for the Binomial?
The sample size is considered large enough if the
expected number of successes (np) and failures (n(1-p))
are both at least 10:
np ≥ 10 and n(1-p) ≥ 10
In this example:
np = 245 x 0.25 = 61.25
n(1-p) = 61.25 x 0.75 = 45.9375
So, the normal approximation can be used.
38
Example – HPV Risk and # of Sexual
Partners
Consider a random sample of n = 5 participants who all reported having
greater than or equal to 3 sex partners within the last 12 months. Using
the high-risk population prevalence for HPV, p = 0.6, answer these
questions.
a. What are the expected (mean) number of high-risk HPV cases in
this sample and the associated standard deviation?
b. Can we justify using the normal distribution to approximate this
probability? Explain.
39
Example – HPV Risk and # of Sexual
Partners (continued)
Consider a random sample of n = 5 participants who all reported having
greater than or equal to 3 sex partners within the last 12 months. Using the
high risk population prevalence for HPV, p = 0.6, answer questions.
a. What are the expected (mean) number of high-risk HPV cases in this
sample and the associated standard deviation?
µ = np = 5 x 0.6 = 3
σ = sqrt(np(1-p)) = sqrt (5*.6*.4) = sqrt (1.2) = 1.095
b. Can we justify using the normal distribution to approximate this
probability? No.
np = (5 x 0.6) = 3
n(1-p) = (5 x 0.4) = 2
40
Practice – When Can the Normal
Approximation be Used?
BelowBelow
areare fourpairs
four pairs of
ofBinomial
Binomialdistribution parameters.
distribution parameters.
Which distribution can be approximated by the normal
Which distribution can be approximated by the normal
distribution?
distribution?
1. n = 100, p = 0.95
2. n = 25, p = 0.45
3. n = 150, p = 0.05
1. n =4. 100, p =p 0.95
n = 500, = 0.015
2. n = 25, p = 0.45
3. n = 150, p = 0.05
4. n = 500, p = 0.015
41
Practice – When Can the Normal
Approximation be Used?
BelowBelow
areare fourpairs
four pairs of
ofBinomial
Binomialdistribution parameters.
distribution parameters.
Which distribution can be approximated by the normal
Which distribution can be approximated by the normal
distribution?
distribution?
1. n = 100, p = 0.95
2. n = 25, p = 0.45
3. n = 150, p = 0.05
1. n =4. 100, p =p 0.95
n = 500, (np
= 95, n(1-p) = 5)
= 0.015
2. n = 25, p = 0.45 → 25 x 0.45 = 11.25, 25 x 0.55 =
13.75
3. n = 150, p = 0.05 (np = 7.5, n(1-p) = 142.5)
4. n = 500, p = 0.015 (np = 7.5, n(1-p) = 492.5)
42
Example: Expected Value and Standard
Deviation of a Binomial Random Variable
A 2012 Gallup survey suggests that 26.2% of
Americans are obese. Among a random sample of 100
Americans, how many would you expect to be obese?
We would expect 26.2 out of 100 randomly sampled
Americans to be obese, with a standard deviation of 4.4.
43
When is the Sample Mean and Unusual
Observation? (Introduction to Inference)
Using the notion that observations that are more than 2 standard
deviations away from the mean are considered unusual, the mean
and the standard deviation we just computed can be used to calculate
a range for the plausible number of obese Americans in random
samples of 100.
26.2 ± (2 x 4.4) → (17.4, 35.0)
44
Practice – Attitudes About Home Schooling
An August 2012 Gallup poll suggests that 13% of Americans
think home schooling provides an excellent education for
children. Would a random sample of 1,000 Americans where
100 share this opinion be considered unusual?
(a) Yes (b) No
http://www.gallup.com/poll/156974/private-schools-top-marks-educating-children.aspx 45
Practice – Attitudes About Home Schooling
(continued)
An August 2012 Gallup poll suggests that 13% of Americans
think home schooling provides an excellent education for
children. Would a random sample of 1,000 Americans where
100 share this opinion be considered unusual?
(a) Yes because 100 is an unusual observation (b) No
np +
Range of usual observations:
= 130 + (2 x 10.6)
= (108.8, 151.2)
46
Poisson distribution
47
Poisson Distribution
• Increasingly being used in public health and clinical research
• The random variable takes the form: the number of events in
a time interval
• There are multiple time intervals
• Examples:
number of heart attacks in a month
number of marriages in a year
number of people getting struck by lightning in a year
This is different from a Bernoulli or binomial random variable
since we are counting the number of events in an interval rather
than whether an event occurred (yes, no)
48
Poisson Distribution
Poisson was a French Mathematician, Statistician and Physicist (1781-1840)
From Diez, 4th
edition, pp 163-164
49
Poisson Distribution Shape
● The value of λ determines the shape of the Poisson
distribution
● λ is the expected number of events per time interval
50