0% found this document useful (0 votes)
13 views28 pages

09.1 Sampling Distributions Lecture

The document discusses the concepts of populations and samples in statistics, emphasizing the importance of random sampling for making inferences about population parameters. It explains the sampling distribution of the mean, including the Central Limit Theorem, and provides examples illustrating how to calculate probabilities and make statistical inferences based on sample data. Additionally, it covers the t-distribution and its application when the population standard deviation is unknown, highlighting its convergence to the normal distribution as sample size increases.

Uploaded by

ckranock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views28 pages

09.1 Sampling Distributions Lecture

The document discusses the concepts of populations and samples in statistics, emphasizing the importance of random sampling for making inferences about population parameters. It explains the sampling distribution of the mean, including the Central Limit Theorem, and provides examples illustrating how to calculate probabilities and make statistical inferences based on sample data. Additionally, it covers the t-distribution and its application when the population standard deviation is unknown, highlighting its convergence to the normal distribution as sample size increases.

Uploaded by

ckranock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Section 9 – Sampling Distributions

ENGR 3311 – Engineering Math Methods


Instructor: Michael Weeks, PhD
Spring 2025
Populations & Samples
• A population constitutes a collection of objects, actual or
conceptual, such as sets of measurements or observations
under investigation.
• Populations are often described by the distribution of their
values, i.e. in terms of its corresponding probability
distribution or density function !(#).
• A population may be finite, or infinite.
• A sample is used in situations where it is impossible,
impractical, or uneconomical to observe all the values of
the population.
• The samples can be used to infer from it results pertaining
to the entire population.
• A sample is only useful if it is representative of the
population.
• This can be assured through the use of random samples.
• The purpose of most statistical investigations is to
generalize from the information contained in random
samples about the population from which the samples were
obtained.
• We use statistics, such as # (or other quantities calculated
from the sample observations) for making inferences about
the parameters of the population, such as µ.

2
Populations & Samples

• A set of observations !" , !$ , ⋯ , !& constitutes a random


sample of size ' from a finite population of size (, if its
values are chosen so that each subset of ' of the ( elements
of the population has the same probability of being
selected.
• Observations are made independently and at random.

• A set of observations !" , !$ , ⋯ , !& constitutes a random


sample of size ' from the infinite population described by a
probability density or distribution )(+) if:
• Each !- is a random variable whose distribution is given by
)(+).
• These ' random variables are independent.

) +" , +$ , ⋯ , +& = ) +" ) +$ ⋯ ) +&


• The selection of a sample that is at least approximately
random can be accomplished by:
• Using a random number generator with elements that
are physically numbered (e.g. serial number) in the
population.
• Using artificial devices, e.g. selecting 1 unit every hour
from a production line for testing. (avoids unconscious
biases)

• Incorrect inferences on the characteristics of the population


are said to be biased and can be due to:
• Selecting the most convenient members of the
population.
• Poor choice of artificial devices.
• Sampling from the wrong population.

3
Populations & Samples (Example 1)

• Consider the problem of


sampling a block of land for
contaminants in the soil from
pollution.
• One approach for homogeneous
land areas is to sample at
random over the entire area, as
shown in (a) with the sites
indicated in circles.
• If a smelter is known to be
located in the area shaded in
blue, then the soil contaminants
is not homogenous.
• Using the average from the
whole area as a summary
description downplays the
seriousness of the
contamination around the
smelter.
• The smelter runoff area should
be treated as a separate
population and samples should
be taken around it, i.e. the “hot
spot”.
• Owners of the smelter would
favor having samples taken
across the entire block to lessen
the severity of the
contamination!

4
Sampling Distribution of the Mean

• For each random sample of ! observations taken from


some population, the computed means " would be
different.
• No two of them would be alike.

• If a random sample #$ , #& , ⋯ , #( of size ! from a


population having the mean ) and the variance * & , then
the sample mean # is given as follows.

#$ + #& + ⋯ + #(
#=
!

• Since the value of the sample mean # is determined by the


values of the random variables in the sample, it follows
that # is also a random variable.

• The probability distribution of a statistic # is called a


sampling distribution.

5
Sampling Distribution (Example 2)

• Suppose that 50 random samples of size # = 10 are to be


taken from a population having the discrete uniform
distribution,

1) for ' = 0,1,2, ⋯ , 9


& ' = ( 10
0 elsewhere

• Assuming that sampling is with replacement from the


infinite population, the means of the 50 samples are as
follows.

4.4 3.2 5.0 3.5 4.1 4.4 3.6 6.5 5.3 4.4
3.1 5.3 3.8 4.3 3.3 5.0 4.9 4.8 3.1 5.3
3.0 3.0 4.6 5.8 4.6 4.0 3.7 5.2 3.7 3.8
5.3 5.5 4.8 6.4 4.9 6.5 3.5 4.5 4.9 5.3
3.6 2.7 4.0 5.0 2.6 4.2 4.4 5.6 4.7 4.3

• The mean and variance of the mean of 50 samples are,

'/̅ = 4.43 3/4̅ = 0.93


6
Sampling Distribution (Example 2)

• The population from which the 50 samples of size # =


10 were obtained has the following mean and variance.

+ +
&=' ,. .(,) = 4.5 23 =' , − 4.5 3.(,) = 8.25
()* ()*

• This leads us to expect a mean of &(̅ = 4.5 and a variance of


8.39
2(3̅ = from the samples.
:*
(theoretical - this is what we expect)
50 Samples Sample Distribution

,( 4.43 &(̅ 4.5


;(3 0.93 2(3̅ 0.825

7
Sampling Distribution of the Mean
For a random sample of size ! taken from a population with
mean " and variance # $ :
• The mean "&̅ and the variance #&$̅ of the theoretical
sampling distribution of the mean ' are given as follows.
"&̅ = "
#$
#&$̅ =
!
#$ * − !
#&$̅ = .
! *−1

• The expected value of the sample mean is the population


mean ", whereas its variance is .⁄/ times the population
variance. (Infinite Population)
• The spread of the sampling distribution reduces with
increasing !.
• The reliability of the mean 0&̅ as an estimate of " is often
measured by #&̅ = 21 / . (Finite Population N)
• This is called the standard error of the mean.
• Only partial information about the theoretical sampling
distribution of the mean can be obtained (# must be
known).
• In general, it is impossible to determine such a distribution
exactly without knowledge of the actual form of the
population (i.e. the distribution function).
• If sampling from a population with an unknown
distribution, then it is possible to find the limiting
distribution as ! → ∞ of a random variable 5 that is closely
related to '.
• Assuming that the variance # $ of the population is known.
8
Sampling Distribution of the Mean

• If ! is the mean of a random sample of size " taken from a


population with mean # and finite variance $ % , then
!−#
&=
$⁄ "
where & is a random variable that is known as the
standardized sample mean.
• It is the difference between ! and # divided by the
standard error of the mean.
• The distribution function of & approaches that of the
standard normal distribution as " → ∞.
• This is known as the Central Limit Theorem.
• The distribution of ! is approximately normal with mean
# and variance $ % /" whenever " is large (or " ≥ 30),
regardless of the form of the population distribution.
• The approximation can also be applied to smaller sample
sizes if the population closely follows a normal distribution.

9
Sampling Distribution of the Mean

• The tendency toward


normality is exhibited
whenever ! is large,
regardless of the form of
the population
distribution "($).

• In practice, the normal


distribution provides an
excellent approximation
to the sampling
distribution of the mean
& for ! as small as 25 or
30, with hardly any
restrictions on the shape
of the population.

• When the random


samples come from a
normal population, the
sampling distribution of
the mean is normal
regardless of the size of
!.

The following is the normal


scores plot of the 50
sample means from
Example 2, for ! = 10.

10
Sampling Distribution (Example 3)

An electrical firm manufactures light bulbs that have an


average lifespan of 800 hours and a standard deviation of
40 hours. Find the probability that a random sample of 36
bulbs will have an average life of less than 790 hours.

11
Sampling Distribution (Example 4)

The weights of a population of workers have a mean of 167


and standard deviation 27.
a) If a sample of 36 workers is chosen, what is the
probability that the sample mean of their weights lies
between 163 and 170?
b) Repeat (a) when the sample is of size 144.

12
REVIEW: Sampling Distribution
• In general, a statistic is calculated from a sample selected
from the population.
• These results are then used to make inferences concerning the
values of the population parameters.
• The results of these statistics are expected to fluctuate for
every sample.
!# + !% + ⋯ + !'
!=
(
• Since a statistic is a random variable that depends only on
the observed sample, it must have a probability distribution.

The probability distribution of a statistic is called a sampling


distribution.

• The probability distribution of ! is called the sampling


distribution of the mean.
• The sampling distribution of the a statistic will depend on:
• the size of the population,
• the size of the samples,
• and the method of choosing the samples.
• Sampling distributions of important statistics allow us to
learn information about population parameters. (see next
chapter on “Inferences Concerning a Mean” – Hypothesis
testing.)

• If sampling from a population with an unknown distribution,


either finite or infinite, the sampling distribution of ! will
still be approximately normal with a mean ) and variance
*+
, provided that the sample size ( is large.
'

13
REVIEW: Sampling Distribution
Central Limit Theorem:

If ! is the mean of a random sample of size " taken from a


population with mean # and variance $ % , then the limiting
form of the distribution as " → ∞ is the standard normal
distribution with the random variable
!−#
(=
$⁄ "

• The normal approximation for ! will generally be good if:


• " ≥ 30 and regardless of the shape of the population
distribution.
• " < 30 if the population distribution is not too different
from a normal distribution.
• any size " (no matter how small) if the population is
known to be normally distributed.

14
Sampling Distribution (Example 5)

The breaking strength ! of a certain rivet used in a


machine engine has a mean 5000 psi and standard
deviation 400 psi. A random sample of 36 rivets is
taken. Consider the distribution of !, the sample mean
breaking strength.

a) What is the probability that the sample mean falls


between 4900 psi and 5100 psi?

b) What sample $ would be necessary in order to have

% 4900 < ! < 5100 = 0.99

15
Sampling Distribution of the Mean

• The previous analysis required the knowledge of the


population standard deviation !.
• If " is large, it does not matter if ! is known. It is reasonable
to substitute ! for the sample standard deviation #.

$
1 , $
# = ( -) − -
"−1 )*+

-−/
.=
#⁄ "

• For small values of ", the values of # $ fluctuate


considerably from sample to sample.
• Not much is known about the sampling distribution for small
".
• If - is the mean of a random sample of size " taken from a
normal population with mean / and finite variance ! $ ,
then
-−/
1=
#⁄ "
is a random variable having the t distribution with the
parameter 2 = " − 1.

• This theorem is more general since it does not require the


knowledge of ! of the population.
• But it requires the assumption of a normal distribution
when " is small.

16
Sampling Distribution of the Mean

• The t distribution is similar to that of a normal distribution


and has a mean of 0 with variance that depends on the
parameter " (the number of degrees of freedom), or the
sample size.
• The variance of the t distribution approaches 1 as # → ∞.
• As " → ∞, the t distribution with " degrees of freedom
approaches the standard normal distribution.

• A table can be used to determine the values of &' for


various values of ", such that
( = *+,- & ≥ &'
• The table contains selected values of &' for various values of
", where &' is such that the area under the t distribution to
its right is equal to (.
• Note that &/0' = −&' .

17
18
Sampling Distribution of the Mean

• Exactly 95% of the values of a t-distribution with ! −


1 degrees of freedom lie between −$%.%'( and $%.%'( .
• There are other t-values that contain 95% of the
distribution, such as −$%.%' and $%.%) , but these values
don’t appear in the table.

• A t-value that falls below −$%.%'( or above $%.%'( would


tend to make us believe either that a very rare event has
taken place or that our assumption about * is in error.
• A t-value that that falls below −$%.%+ or above $%.%+
would provide an even stronger evidence that the
assumed value of * is quite unlikely.

19
Sampling Distribution (Example 6)

A treatment plant that sends effluent into the river claims


the mean suspended solids is ! = 40 mg/l. Measurements of
the suspended solids in river water on % = 12 Monday
mornings yield ( = 46 and * = 9.4 mg/l.
If it can be assumed that the data constitute a random
sample from a normal population, do they tend to support or
refute plant’s claim about the mean? (Assume 95%
confidence)

20
Sampling Distribution (Example 7)

A chemical engineer claims that the population mean yield of


a certain batch process is 500 grams per milliliter of raw
material. To check this claim he samples 15 batches each
month. If the computed t-value falls between −"#.#%& and
"#.#%&, he is satisfied with this claim. What conclusion should
he draw from a sample that has a mean ' = 518 grams per
milliliter and a sample standard deviation , = 40 grams?

21
Sampling Distribution (Example 8)

A manufacturing firm claims that the batteries used in their


cell phones will last an average of 30 hours. To maintain this
average, 16 batteries are tested each month. If the computed
t-value falls between −"#.#%& and "#.#%& , the firm is satisfied
with its claim.

What conclusion should the firm draw from a sample that has
a mean of ' = 27.5 hours and a standard deviation of s = 5
hours?

22
Sampling Distribution of the Variance

• This deals with the sampling distribution of the sample


variance ! " for random samples from normal populations.
• Note that ! " cannot be negative. Hence, its sampling
distribution cannot be normal.
• If ! " is the variance of a random sample of size # taken
from a normal population having the variance $ " , then
- "
" ∑
# − 1 ! *+, .* − .
%" = =
$" $"

is a random variable having the chi-squared distribution


with / = # − 1 degrees of freedom.
• A table containing select values of %0" for various values of
/ can be used to return the following results.

1 = 2345(% " ≥ %0" )

• The probability that a random sample produces a %" value


greater than some specified value is equal to area under
the curve to the right of this value.

23
Sampling Distribution of the Variance

! = #$%&(( ) ≥ (+) )

24
Sampling Distribution of the Variance

• Exactly 95% of a chi-squared distribution lies between


' '
!".$%& and !"."'& .
'
• A !' value falling to the right of !"."'& is not likely to occur
unless our assumed value of ( ' is too small.
'
• Similarly, a !' value falling to the left of !".$%& is unlikely
'
unless our assumed value of ( is too large.

25
Sampling Distribution (Example 9)

A manufacturer of car batteries guarantees that the


batteries will last, on average, 3 years with a standard
deviation of 1 year. If five of these batteries have lifetimes of
1.9, 2.4, 3.0, 3.5, and 4.2 years, should the manufacturer still
be convinced that the batteries have a standard deviation of
1 year? Assume that the battery lifetime follows a normal
distribution.

26
Sampling Distribution (Example 10)

Plastic sheeting produced by a machine is periodically monitored


for possible fluctuations in thickness. Based on experience, when the
machine is working well, an observation on thickness has a normal
distribution with standard deviation ! = 1.3 mm.
Samples of 20 thickness measurements are collected regularly. One
sample recorded a standard deviation exceeding 1.72 mm. Will this
sample signal a concern about the product?

27
Sampling Distribution (Example 11)

The scores on a placement test given to college freshmen for


the past five years are approximately normally distributed
with a mean ! = 74 and a variance % & = 8. Would you still
consider % & = 8 to be a valid value of the variance if a
random sample of 18 students who take the placement test
this year obtain a value of ( & = 14?

28

You might also like