0% found this document useful (0 votes)
91 views42 pages

Lect9 Math231

The central limit theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the shape of the population distribution. For a random variable X with mean μ and variance σ^2, the sampling distribution of the sample mean X̄ is approximately normal with mean μ and variance σ^2/n. Similarly, the sampling distribution of the sample total (X1 + X2 + ... + Xn) is approximately normal with mean nμ and variance nσ^2. The binomial distribution can also be approximated as normal for large n and values of p not close to 0 or 1.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views42 pages

Lect9 Math231

The central limit theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the shape of the population distribution. For a random variable X with mean μ and variance σ^2, the sampling distribution of the sample mean X̄ is approximately normal with mean μ and variance σ^2/n. Similarly, the sampling distribution of the sample total (X1 + X2 + ... + Xn) is approximately normal with mean nμ and variance nσ^2. The binomial distribution can also be approximated as normal for large n and values of p not close to 0 or 1.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

1/42

Statistics

Sampling Distributions
Central Limit Theorem

Shaheena Bashir

FALL, 2019
2/42
Outline

Background

Central Limit Theorem (CLT)


Sample Mean X̄
Sample Total
Normal Approximation to Binomial

Sampling Distributions
Sample Mean
Difference Between 2 Sample Means
Sample Proportion
Difference Between 2 Sample Proportions
Sample variance

o
3/42
Background

o
4/42
Background

Sample Mean
If X1 , X2 , . . . , Xn are observations of a random sample of size n
from a population. Find the mean of the sample x̄. Repeat the
process over & over again drawing a new sample & calculating the
sample mean associated.
I Population parameters are fixed unknown quantities, e.g.,
population mean µ, variance σ 2 , etc.
I Sample averages are variable due to sampling variability.
I In many situations, it is natural to assume that a random
variable X has a particular probability distribution.
I A listing or graph of all possible values of the sample mean
and how often they occur is called the sampling distribution
of the sample mean.
o
5/42
Background

Shape of a Sampling Distribution


As with any other distribution, a sampling distribution has its own
shape, center, and measure of variability.

Sampling Distribution of Means


1.0

140
120
0.8

100
0.6

80
Frequency
Probability

60
0.4

40
0.2

20
0
0.0

1 2 3 4 5 6 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

X Average Outcome of 10 rolls of a Die

o
6/42
Background

Standard Error

The standard deviation of a statistic (an estimator of population


parameter) shows the variability of the statistic around the
population parameter is also called as the standard error of the
estimator. It refers to the precision of the estimator.

I The standard deviation of X̄ given by σ/ n is also called as
standard error of mean.

o
7/42
Central Limit Theorem (CLT)

o
8/42
Central Limit Theorem (CLT)

Background Examples

How to model the chance behavior for


I the electricity consumption in a city at any given time that is
the sum of the demands of a large number of individual
consumers
I the quantity of water in a reservoir may be thought of as
representing the sum of a very large number of individual
contributions.
I the error of measurement in a physical experiment is
composed of many unobservable small errors which may be
considered additive.

o
9/42
Central Limit Theorem (CLT)
Sample Mean X̄

The Central Limit Theorem basically says that for non-normal data,
the distribution of the sample means has an approximate
normal distribution, no matter what the distribution of the
original data looks like, as long as the sample size is large
enough (usually at least 30) and all samples have the same size.
Statistics
c For Dummies, 2nd Edition, Deborah J. Rumsey

o
10/42
Central Limit Theorem (CLT)
Sample Mean X̄

Finding Probabilities for X̄

x −µ
X ∼ N(µ, σ 2 ) Z= ∼ N(0, 1)
σ
σ2 x̄ − µ
X̄ ∼ N(µ, ) Z= √ ∼ N(0, 1)
n σ/ n

σ/ n is called the standard error of the mean.

o
11/42
Central Limit Theorem (CLT)
Sample Mean X̄

Example

The numerical population of grade point averages at a college has


mean 2.61 and standard deviation 0.5 . If a random sample of size
100 is taken from the population, what is the probability that the
sample mean will be between 2.51 and 2.71?

o
12/42
Central Limit Theorem (CLT)
Sample Mean X̄

Normal Population & 100K Samples of size n = 30

Population of Heights Histogram of x

20000
400

15000
300
Frequency

Frequency

10000
200

5000
100

µ = 60

µ = 60
0

56 58 60 62 64 58 60 62 64

Heights x

Notice: The variability of the 2 distributions


o
13/42
Central Limit Theorem (CLT)
Sample Mean X̄

Uniform Population & 100K Samples of size n = 30

Population Distribution Histogram of x

15000
120
100

10000
80
Frequency

Frequency
60

5000
40
20
0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

X x

o
14/42
Central Limit Theorem (CLT)
Sample Total

Sample Total

Let X be a random variable, with finite mean µ and finite variance


σ 2 . Suppose you repeatedly draw independent samples of size n
from the distribution of X . Then as n −→ ∞, the distribution of
the sample total X1 + X2 + · · · + Xn :

(X1 + X2 + · · · + Xn ) ∼ N(nµ, nσ 2 )

X1 +X2 +···+X n −nµ


while √ ∼ N(0, 1)
nσ 2

nσ 2 is called the standard error of the total.

o
15/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial

Normal Approximation to Binomial


If X ∼ bin(n, p)

n=40, p=0.2
0.15
0.10
0.05
0.00

0 3 6 9 12 16 20 24 28 32 36 40

o
16/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial

Normal Approximation to Binomial Cont’d

If n > 20 and 0.05 < p < 0.95


E (X ) = np & Var (X ) = np(1 − p)

As n → ∞
X ≈ N(np, np(1 − p))
X − np
Z=p ≈ N(0, 1)
np(1 − p)

o
17/42
Central Limit Theorem (CLT)
Normal Approximation to Binomial

Example

The reliability of an electric fuse is the probability that a fuse,


chosen at random from production will function under its designed
conditions. A random sample of 1000 fuses was tested and 27
defectives were observed. Calculate the approximate probability of
observing 27 or more. Assume that the fuse reliability is 0.98

o
18/42
Sampling Distributions

Sampling Distributions

I Population parameters are fixed unknown quantities, e.g.,


population mean µ, variance σ 2 , population proportion p etc.
I Sample statistics are random variables (due to sampling
variability).
I A listing or graph of all possible values of the sample statistic
and how often they occur is called the sampling distribution
of that statistic, e.g., sampling distribution of sample means.

o
19/42
Sampling Distributions
Sample Mean

Sample Mean

If X1 , X2 , . . . , Xn are observations of a random sample of size n


from a
I N(µ, σ 2 ) population, then the sample mean: X̄ = 1
P
n i Xi is
normally distributed with mean µ & variance σ 2 /n, i.e.,
probability distribution of the sample mean X̄ ∼ N(µ, σ 2 /n)
I nonnormal distribution, the sampling distribution of X̄ is
approximately normally distributed with mean µ & variance
σ 2 /n for large samples (CLT)

o
20/42
Sampling Distributions
Sample Mean

Standard Error

The standard deviation of a statistic (an estimator of population


parameter) shows the variability of the statistic around the
population parameter is also called as the standard error of the
estimator. It refers to the precision of the estimator.

I The standard deviation of X̄ given by σ/ n is also called as
standard error of mean.

o
21/42
Sampling Distributions
Sample Mean

Example

An electronic device has a life length T which is exponentially


distributed with parameter λ = 1/100; that is, its pdf is
1
1
f (t) = 1000 e − 1000 t . The mean & variance of the life length of the
device are 103 & 106 hours respectively. Suppose that 100 such
devices are tested, yielding observed values T1 , . . . , T100 . What is
the probability that 950 < T̄ < 1100?

o
22/42
Sampling Distributions
Difference Between 2 Sample Means

Background
I Do males and females spend the same amount of time, on
average exercising?

I Mean length of a part manufactured at plant A (µ1 ) compared


to mean length of a same part manufactured at plant B (µ2 ).
It is common to compare two groups on the mean parameters of
o
the groups µ1 and µ2 .
23/42
Sampling Distributions
Difference Between 2 Sample Means

Difference Between Two Sample Means


I If X11 , X12 , . . . , Xn1 are observations of a random sample of
size n1 from a
1
N(µ1 , σ12 ) population, then the sample mean: X̄1 =
P
I
n1 i Xi

X̄1 ∼ N(µ1 , σ12 /n1 )

I If X21 , X22 , . . . , Xn2 are observations of a random sample of


size n2 from a
1
N(µ2 , σ22 ) population, then the sample mean: X̄2 =
P
I
n2 i Xi

X̄2 ∼ N(µ2 , σ22 /n2 )

I Take the difference between all possible means x̄1 − x̄2


I How does the estimator x̄1 − x̄2 which is a random variable of
µ1 − µ2 behave?
o
24/42
Sampling Distributions
Difference Between 2 Sample Means

Distribution of x̄1 − x̄2

If samples from original populations were independent

E [x̄1 − x̄2 ] = E [x̄1 ] − E [x̄2 ]


= µ1 − µ2

Var [x̄1 − x̄2 ] = Var [x̄1 ] + Var [x̄2 ]


σ12 σ22
= +
n1 n2
q 2
σ σ2
The standard deviation of x̄1 − x̄2 given by n11 + n22 is also called
as standard error of difference between 2 means.
o
25/42
Sampling Distributions
Difference Between 2 Sample Means

Distribution of x̄1 − x̄2 Cont’d

If both original populations were independent normal


 s 
2
σ1 σ 2
x̄1 − x̄2 ∼ N µ1 − µ2 , + 2
n1 n2

(x̄1 − x̄2 ) − (µ1 − µ2 )


z= q 2 ∼ N(0, 1)
σ1 σ22
n1 + n2

o
26/42
Sampling Distributions
Difference Between 2 Sample Means

Example

A random sample of n1 = 20 observations are taken from a normal


population with mean 30. A random sample of n2 = 25
observations are taken from a different normal population with
mean 27. Both populations have σ 2 = 8. What is the probability
that x̄1 − x̄2 exceeds 5?

o
27/42
Sampling Distributions
Sample Proportion

Background

I What proportion of a manufactured good is defective?


I What proportion of the U.S. is republican?
I What proportion of students entering college successfully
complete a degree?

o
28/42
Sampling Distributions
Sample Proportion

Population Proportion

Binomial random variable x is commonly used in practical examples


like consumer preference or opinion polls. Here x ∼ Bin(n, p).

E (x) = np
Var (x) = np(1 − p)

We use random sample of n people to estimate the proportion p of


people in the population who have a specified characteristic. If x of
the sampled people have the characteristic, then the sample
proportion p̂ = xn .

o
29/42
Sampling Distributions
Sample Proportion

o
30/42
Sampling Distributions
Sample Proportion

Sampling Distribution of p̂

The distribution of the values of the sample proportions p̂ in


repeated samples (of the same size) is called the sampling
distribution of p̂.
The sampling distribution of p̂ is identical to the probability
distribution of x, except that it is rescaled along x-axis, i.e,
x  np
E =
n n
x  np(1 − p)
Var =
n n2
p(1 − p)
=
n

o
31/42
Sampling Distributions
Sample Proportion

Normal Approximation to Binomial

As n → ∞
X ≈ N(np, np(1 − p))
p̂ ≈ N(p, p(1 − p)/n)

I The approximation would be adequate if np > 5 &


n(1 − p) > 5.
p
I The quantity p(1 − p)/n is called the standard error of
proportion p̂.
p̂ − p
z=p ∼ N(0, 1)
p(1 − p)/n

o
32/42
Sampling Distributions
Sample Proportion

Sampling Distribution of p^ from 500 flips of 20 coins

100
80
60
Frequency

40
20
0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

p^

o
33/42
Sampling Distributions
Sample Proportion

Sampling Distribution of p^ from 500 flips of 30 coins

120
100
80
Frequency

60
40
20
0

0.2 0.3 0.4 0.5 0.6 0.7 0.8

p^

o
34/42
Sampling Distributions
Sample Proportion

Sampling Distribution of p^ from 500 flips of 50 coins

70
60
50
40
Frequency

30
20
10
0

0.3 0.4 0.5 0.6 0.7

p^

o
35/42
Sampling Distributions
Sample Proportion

Example

A certain companys customers is made up of 43% women and 57%


men. An aggressive marketing campaign results in an increase of
women customers to 46%, according to a sample survey of 50
customers. If the company hadn’t run the campaign, how likely is
it that 46% of customers are women? Was the campaign worth it?

o
36/42
Sampling Distributions
Difference Between 2 Sample Proportions

Background

I ’Are men, or women, better doctors?’ The researchers


randomly asked this question in their community with the
following results:
I Of 150 female respondents surveyed, 14% said men are better
doctors.
I Of 120 male respondents surveyed, 30% said men are better
doctors.
It is common to compare two groups on the proportion parameters
of the groups p1 and p2 .

o
37/42
Sampling Distributions
Difference Between 2 Sample Proportions

Distribution of p̂1 − p̂2


If samples of sizes n1 &n2 from original populations were
independent & large

E [p̂1 − p̂2 ] = E [p̂1 ] − E [p̂2 ]


= p1 − p2

Var [p̂1 − p̂2 ] = Var [p̂1 ] + Var [p̂2 ]


p1 (1 − p1 ) p2 (1 − p2 )
= +
n1 n2
q
The standard deviation of p̂1 − p̂2 given by p1 (1−p n1
1)
+ p2 (1−p2 )
n2 is
also called as standard error of difference between 2
proportions.
o
38/42
Sampling Distributions
Difference Between 2 Sample Proportions

Distribution of p̂1 − p̂2 Cont’d

 s 
p1 (1 − p1 ) p2 (1 − p2 ) 
p̂1 − p̂2 ∼ N p1 − p2 , +
n1 n2

(p̂1 − p̂2 ) − (p1 − p2 )


z= q ∼ N(0, 1)
p1 (1−p1 ) p2 (1−p2 )
n1 + n2

o
39/42
Sampling Distributions
Difference Between 2 Sample Proportions

Example

An experiment was conducted to test the effect of a new drug on a


viral infection. The infection was induced in 100 mice and the mice
were split into 2 groups of size 50 each. The first group received
no treatment for infection. The 2nd group was treated with drug.
After 30 days, the proportion of survivors in the 2 groups were
found to be 0.30 and 0.60 respectively. What is the probability that
the difference between 2 group proportions is larger than 30%?

o
40/42
Sampling Distributions
Sample variance

Sample Variance

Let X1 , X2 , . . . , Xn are observations of a random sample of size n


from a N(µ, σ 2 ) population, then the sample variance calculated as
1 P
S = n−1 i (Xi − X̄ )2 is the sample variance of these n
2

observations, if we take multiple samples of size n from some


population, the sample variance will vary from one sample to
another around the population variance.

(n − 1)S 2
∼ χ2(n−1)
σ2
Here n − 1 is the degrees of freedom (df).

o
41/42
Sampling Distributions
Sample variance

Histogram of (n−1)S2 σ2

3000
Frequency

2000
1000
0

0 1 2 3 4 5 6 7

var(y)

o
42/42
Sampling Distributions
Sample variance

Example

You might also like