Lect9 Math231
Lect9 Math231
Statistics
Sampling Distributions
Central Limit Theorem
Shaheena Bashir
FALL, 2019
2/42
Outline
Background
Sampling Distributions
Sample Mean
Difference Between 2 Sample Means
Sample Proportion
Difference Between 2 Sample Proportions
Sample variance
o
3/42
Background
o
4/42
Background
Sample Mean
If X1 , X2 , . . . , Xn are observations of a random sample of size n
from a population. Find the mean of the sample x̄. Repeat the
process over & over again drawing a new sample & calculating the
sample mean associated.
I Population parameters are fixed unknown quantities, e.g.,
population mean µ, variance σ 2 , etc.
I Sample averages are variable due to sampling variability.
I In many situations, it is natural to assume that a random
variable X has a particular probability distribution.
I A listing or graph of all possible values of the sample mean
and how often they occur is called the sampling distribution
of the sample mean.
o
5/42
Background
140
120
0.8
100
0.6
80
Frequency
Probability
60
0.4
40
0.2
20
0
0.0
o
6/42
Background
Standard Error
o
7/42
Central Limit Theorem (CLT)
o
8/42
Central Limit Theorem (CLT)
Background Examples
o
9/42
Central Limit Theorem (CLT)
Sample Mean X̄
The Central Limit Theorem basically says that for non-normal data,
the distribution of the sample means has an approximate
normal distribution, no matter what the distribution of the
original data looks like, as long as the sample size is large
enough (usually at least 30) and all samples have the same size.
Statistics
c For Dummies, 2nd Edition, Deborah J. Rumsey
o
10/42
Central Limit Theorem (CLT)
Sample Mean X̄
x −µ
X ∼ N(µ, σ 2 ) Z= ∼ N(0, 1)
σ
σ2 x̄ − µ
X̄ ∼ N(µ, ) Z= √ ∼ N(0, 1)
n σ/ n
√
σ/ n is called the standard error of the mean.
o
11/42
Central Limit Theorem (CLT)
Sample Mean X̄
Example
o
12/42
Central Limit Theorem (CLT)
Sample Mean X̄
20000
400
15000
300
Frequency
Frequency
10000
200
5000
100
µ = 60
µ = 60
0
56 58 60 62 64 58 60 62 64
Heights x
15000
120
100
10000
80
Frequency
Frequency
60
5000
40
20
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X x
o
14/42
Central Limit Theorem (CLT)
Sample Total
Sample Total
(X1 + X2 + · · · + Xn ) ∼ N(nµ, nσ 2 )
o
15/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial
n=40, p=0.2
0.15
0.10
0.05
0.00
0 3 6 9 12 16 20 24 28 32 36 40
o
16/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial
As n → ∞
X ≈ N(np, np(1 − p))
X − np
Z=p ≈ N(0, 1)
np(1 − p)
o
17/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial
Example
o
18/42
Sampling Distributions
Sampling Distributions
o
19/42
Sampling Distributions
Sample Mean
Sample Mean
o
20/42
Sampling Distributions
Sample Mean
Standard Error
o
21/42
Sampling Distributions
Sample Mean
Example
o
22/42
Sampling Distributions
Difference Between 2 Sample Means
Background
I Do males and females spend the same amount of time, on
average exercising?
o
26/42
Sampling Distributions
Difference Between 2 Sample Means
Example
o
27/42
Sampling Distributions
Sample Proportion
Background
o
28/42
Sampling Distributions
Sample Proportion
Population Proportion
E (x) = np
Var (x) = np(1 − p)
o
29/42
Sampling Distributions
Sample Proportion
o
30/42
Sampling Distributions
Sample Proportion
Sampling Distribution of p̂
o
31/42
Sampling Distributions
Sample Proportion
As n → ∞
X ≈ N(np, np(1 − p))
p̂ ≈ N(p, p(1 − p)/n)
o
32/42
Sampling Distributions
Sample Proportion
100
80
60
Frequency
40
20
0
p^
o
33/42
Sampling Distributions
Sample Proportion
120
100
80
Frequency
60
40
20
0
p^
o
34/42
Sampling Distributions
Sample Proportion
70
60
50
40
Frequency
30
20
10
0
p^
o
35/42
Sampling Distributions
Sample Proportion
Example
o
36/42
Sampling Distributions
Difference Between 2 Sample Proportions
Background
o
37/42
Sampling Distributions
Difference Between 2 Sample Proportions
s
p1 (1 − p1 ) p2 (1 − p2 )
p̂1 − p̂2 ∼ N p1 − p2 , +
n1 n2
o
39/42
Sampling Distributions
Difference Between 2 Sample Proportions
Example
o
40/42
Sampling Distributions
Sample variance
Sample Variance
(n − 1)S 2
∼ χ2(n−1)
σ2
Here n − 1 is the degrees of freedom (df).
o
41/42
Sampling Distributions
Sample variance
Histogram of (n−1)S2 σ2
3000
Frequency
2000
1000
0
0 1 2 3 4 5 6 7
var(y)
o
42/42
Sampling Distributions
Sample variance
Example