L10
L10
Yibi Huang
Department of Statistics
University of Chicago
Outline
1
Continuous distributions
Frequency Scale of Histograms
250000
200000
Frequency
150000
100000
50000
0
120 130 140 150 160 170 180 190 200 210
Height (cm)
2
Density Scale of Histograms
0.030
0.025
Density
0.020
0.015
0.010
0.005
0.000
120 130 140 150 160 170 180 190 200 210
Height (cm)
In a density scale,
0.030
0.025
Density
0.020
0.015
0.010
0.005
0.000
120 130 140 150 160 170 180 190 200 210
Height (cm)
4
From Histograms to Density Curves
6
Continuous Random Variables & Density Curves
a b
7
Example — Spinner
• P (X = 0.75) =?
8
Density Curve for the Spinner Example
P (0.3 < X < 0.7) = 0.4 P (X < 0.5 or X > 0.8) = 0.7
9
Expected Value (=Mean) and Variance for a Continuous Random
Variable
10
Example — Spinner
The mean of X is
Z ∞ Z 1
1 21 1
µX = xf (x )dx = x · 1dx = x = ,
−∞ 0 2 0 2
the variance is
Z ∞ Z 1
1 1 1 1 1 1
V (X ) = (x − )2 f (x )dx = (x − )2 ·1dx = (x − )3 = .
−∞ 2 0 2 3 2 0 12
The SD is q p
SD (X ) = V (X ) = 1/12 ≈ 0.289.
11
Thank God ...
12
Normal distribution
Normal Distributions
a mean µ, and an SD σ
1 x −µ 2
e− 2 ( ).
1
f (x ) = √ σ
σ 2π
σ
σ
µ µ
µ c µ d
P (X < c ) P (X > d )
a µ b
P (a < X < b )
E.g., for z = −0.83, look at the row −0.8 and the column 0.03.
P (Z < 1.573) =
1.573
= between P (Z < 1.57) and P (Z < 1.58)
= between 0.9418 and 0.9429
Any value between 0.9418 and 0.9429 will be accepted in HWs and
exams.
16
Find Normal Probabilities inR
> pnorm(-0.83)
[1] 0.2032694
> pnorm(1.573)
[1] 0.9421406
17
Finding Upper Tail Probabilities
P (Z > −0.83) = = −
−0.83 −0.83
= 1 − 0.2033 = 0.7967
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
−0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
−0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
−0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
> 1 - pnorm(-0.83)
[1] 0.7967306
> # another way to find upper tail area
> pnorm(-0.83, lower.tail=FALSE)
[1] 0.7967306
18
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . .
−0 . 9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
−0 . 8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
−0 . 7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
.. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . .
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
E.g, we want to find the first quartile of the standard normal, i.e., what’s
the z such that
21
If P (Z > z ) = = 0.05, then z =?
z=?
> qnorm(1-0.05)
[1] 1.644854
> qnorm(0.05, lower.tail=F) # alternative way
[1] 1.644854
Now we’ve learned how to find probabilities about the standard normal
N (0, 1). To compute probability about general normal distribution
N (µ, σ), we need to know about the Z score.
22
Example: SAT vs. ACT
SAT scores are distributed nearly normally with mean 1500 and
standard deviation 300. ACT scores are distributed nearly normally
with mean 21 and standard deviation 5. A college admissions offi-
cer wants to determine which of the two applicants scored better on
their standardized test with respect to the other test takers: Pam,
who earned an 1800 on her SAT, or Jim, who scored a 24 on his
ACT?
Jim
Pam
−2 −1 0 1 2
24
Standardizing with Z scores (cont.)
25
Recap: Ways to Detect Outliers
26
Calculating Normal Probabilities
SAT
N(1500, 300)
600 900 1200 1500 1800 2100 2400
Z−score
N(0,1)
−3 −2 −1 0 1 2 3
28
Quality control
Z 35.8 − 36
Z= = −1.82
−1.82 0 2 0.11
29
Second decimal place of Z
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 Z
.0183 .0188 .0192 .0197 .0202 .0207 .0212 .0217 .0222 .0228 −2 . 0
.0233 .0239 .0244 .0250 .0256 .0262 .0268 .0274 .0281 .0287 −1 . 9
.0294 .0301 .0307 .0314 .0322 .0329 .0336 .0344 .0351 .0359 −1 . 8
.0367 .0375 .0384 .0392 .0401 .0409 .0418 .0427 .0436 .0446 −1 . 7
.0455 .0465 .0475 .0485 .0495 .0505 .0516 .0526 .0537 .0548 −1 . 6
.0559 .0571 .0582 .0594 .0606 .0618 .0630 .0643 .0655 .0668 −1 . 5
In R:
# or
> pnorm(35.8, mean = 36, sd = 0.11)
[1] 0.03451817
30
Practice
= −
35.8 − 36 36.2 − 36
Z35.8 = = −1.82, Z36.2 = = 1.82
0.11 0.11
In R:
> qnorm(0.03, m = 98.2, s = 0.73)
[1] 96.82702 32
Practice
Body temperatures of healthy humans are distributed nearly normally with
mean 98.2◦ F and standard deviation 0.73◦ F. What is the cutoff for the
highest 10% of human body temperatures?
µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ µ + 4σ
68.27% ~ 68%
95.45% ~ 95%
99.73% ~ All but
1/4 of 1%
> pnorm(1) - pnorm(-1)
[1] 0.6826895
> pnorm(2) - pnorm(-2)
[1] 0.9544997
> pnorm(3) - pnorm(-3)
[1] 0.9973002 34