Prob Stat
Prob Stat
Probability
Post-Graduate Diploma in Data
Science in Health and Climate
Change for Social Impact
Random Variable
p(x)
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function (pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative distribution function (CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Examples
12 .25
1.0
Answer (b)
x f(x)
Though this sums to 1,
1 (3-1)/2=1.0 you can’t have a negative
probability; therefore, it’s
2 (3-2)/2=.5 not a probability
function.
3 (3-3)/2=0
4 (3-4)/2=-.5
Answer (c)
x f(x)
0 1/25
1 3/25
Doesn’t sum to 1. Thus,
2 7/25 it’s not a probability
function.
3 13/25
24/25
Practice Problem:
x 1 2 3 4 5
P(x) .1 .1 .4 .3 .1
f ( x) e x
This function integrates to 1:
1
e
x x
e 0 1 1
0 x
0
Review: Continuous case
The normal distribution function also integrates to 1 (i.e., the area under
a bell curve is always 1):
1 x 2
1 ( )
2
e 2 dx 1
Review: Continuous case
One standard
deviation from the
Mean ()
mean ()
Expected value, or mean
x 1 2 3 4 5
P(x) .1 .1 .4 .3 .1
Discrete case:
E( X ) x p(x )
all x
i i
Continuous case:
E( X ) xi
all x
p(xi )dx
Sample Mean is a special case of Expected
Value…
x i n
1
X i 1
n
i 1
xi ( )
n
1 1 1 “49 choose 6”
7.2 x 10-8
49 49! 13,983,816
Out of 49
6 43!6!
numbers, this is
the number of
distinct
The probability function (note, sums to 1.0): combinations of 6.
x$ p(x)
-1 .999999928
Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
If you play the lottery every week for 10 years, what are your
expected winnings or losses?
A roulette wheel has the numbers 1 through 36, as well as 0 and 00. If you bet $1
that an odd number comes up, you win or lose $1 according to whether or not that
event occurs. If random variable X denotes your net gain, X=1 with probability
18/38 and X= -1 with probability 20/38.
On average, the casino wins (and the player loses) 5 cents per game.
If the cost is $10 per game, the casino wins an average of 53 cents per game. If
10,000 games are played in a night, that’s a cool $5300.
Practice Problem
Let X = a random variable that is the number of tests you have to run per lot:
E(X) = (.90)20 (1) + [1-.9020] (21) = 12.2% (1) + 87.8% (21) = 18.56
E(X) = (.90)10 (1) + [1-.9010] (11) = 35% (1) + 65% (11) = 7.5 average per lot
c. 5 samples at a time?
E(X) = (.90)5 (1) + [1-.905] (6) = 59% (1) + 41% (6) = 3.05 average per lot
Var ( x) E[( x ) ]
2 2
(x )
all x
i
2
p(xi )
Discrete case:
Var ( X ) 2
(x )
all x
i
2
p(xi )
Continuous case:
Var ( X ) ( xi ) p ( xi )dx
2 2
Sample variance is a special case…
( xi x ) 2 N
1
i 1
n 1
i 1
( xi x ) (2
n 1
)
2
(x )
all x
i
2
p(xi )
(1 .053) 2 (18 / 38) (1 .053) 2 (20 / 38)
(1.053) 2 (18 / 38) (1 .053) 2 (20 / 38)
(1.053) 2 (18 / 38) (.947) 2 (20 / 38)
.997
.997 .99
Standard deviation is $.99. Interpretation: On average, you’re either 1
dollar above or 1 dollar below the mean, which is just under zero.
Makes sense!
Handy calculation formula!
(x ) x p(xi ) ( )
2
Var ( X ) i
2
p(xi ) i
2
all x all x
1 1 1 1 1 1 21
E ( x)
all x
xi p(xi ) (1)( ) 2( ) 3( ) 4( ) 5( ) 6( )
6 6 6 6 6 6 6
3 .5
1 1 1 1 1 1
2
E(x )
all x
2
xi p(xi ) (1)( ) 4( ) 9( ) 16( ) 25( ) 36( ) 15.17
6 6 6 6 6 6
Find the variance and standard deviation for Rohan’s night wakings (recall that
we already calculated the mean to be 3.2):
x 1 2 3 4 5
P(x) .1 .1 .4 .3 .1
Answer:
x2 1 4 9 16 25
P(x) .1 .1 .4 .3 .1
5
E ( x 2 ) xi p ( x i ) (1)(.1) (4)(.1) 9(.4) 16(.3) 25(.1) 11 .4
2
i 1
E[( x x )( y y )]
N
σ xy ( xi x )( yi y ) P ( xi , yi )
i 1
Interpreting Covariance
( x X )( y
i i Y )
cov ( x , y ) i 1
n 1