Sampling & Sampling Distribution
Outline of Lecture
Key Definition of Basic Terms
Sampling and Sampling Techniques
Sampling Distribution
Hypothesis Testing
Reading Text:
Statistics for Business and Economics by
Andersen, Sweeney and Williams (Thomson
South-Western). Chapter 7
Learning Objectives
At the end of this lecture students should be
able to:
Understand the basic terms used in sampling.
Know the differences between, reasons to,
and when to adopt various sampling
techniques
implement various sampling techniques
Derive the sampling distribution of sample
means
Hypothesis Testing
Basic Terms
Population
Sample
Parameter
Statistics
Estimation
Estimate
Estimator
Sampling
Sampling Frame
Sampling and Sampling
Techniques
Types of Sampling
Probability – based Sampling
Non – probability based sampling
Probability – Based Sampling
Simple Random Sampling
Stratified Random Sampling
Systematic Sampling
Cluster Sampling
Sampling and Sampling
Techniques
Non Probability – Based Sampling
Convenient Sampling
Purposive Sampling
Quota Sampling
Snowball Sampling
Sampling With and Without Replacement
Errors In Sampling
There are two types of errors
Sampling error:
It is the discrepancy between the population value and
sample value.
May arise due to inappropriate sampling
techniques applied
Non sampling errors: are errors due to
procedure bias such as:
Due to incorrect responses
Measurement
Errors at different stages in processing the data
Need for Sampling
Reduced cost
Greater speed
Greater accuracy
Greater scope
More detailed information can be
obtained.
Sampling Distribution of the
sample mean
Sampling distribution of the sample mean is a
theoretical probability distribution that shows
the functional relationship between the
possible values of a given sample mean based
on samples of size n and the probability
associated with each value, for all possible
samples of size n drawn from that particular
population.
Sampling Distribution of the
sample mean
There are commonly three properties of
interest of a given sampling distribution.
Its Mean
Its Variance
Its Functional form.
Steps for the construction of
Sampling Distribution of the mean
From a finite population of size N , randomly
draw all possible samples of size n .
Calculate the mean for each sample.
Summarize the mean obtained in step 2 in
terms of frequency distribution or relative
frequency distribution.
Example
Example: Suppose we have a population of
size N =5 , consisting of the age of five
children: 6, 8, 10, 12, and 14. Take samples
of size 2 with replacement and construct
sampling distribution of the sample mean.
Solution: N =5, n =2
è We have N n =52 =25 possible samples since sampling is with replacement.
Step 1: Draw all possible samples:
6 8 10 12 14
6 (6, 6) (6, 8) (6, 10) (6, 12) (6, 14)
8 (8,6) (8,8) (8,10) (8,12) (8,14)
10 (10,6) (10,8) (10,10) (10,12) (10,14)
12 (12,6) (12,8) (12,10) (12,12) (12,14)
14 (14,6) (14,8) (14,10) (14,12) (14,14)
Step 2: Calculate the mean for each sample:
6 8 10 12 14
6 6 7 8 9 10
8 7 8 9 10 11
10 8 9 10 11 12
12 9 10 11 12 13
14 10 11 12 13 14
Step 3: Summarize the mean obtained in
step 2 in terms of frequency distribution.
Frequency
6 1
7 2
8 3
9 4
10 5
11 4
12 3
13 2
14 1
Estimation
Estimation/Estimator
Point Interval
Estimation
Often, we are concerned with how to estimate a population
parameter with a sample statistic. We used a point estimate as a
single value.
To increase the level of confidence in estimation, we use a range
of values (rather than a single value) as the estimate of a
population parameter.
Example: Suppose, someone asks you how long it takes to go from
Akoka to Victoria Island. What would be a more reliable estimate
– 30 minutes or between 25 and 40 minutes?
If you use 30 minutes as estimate, you are using a point estimate.
On the other hand, if you use between 25 and 40 minutes as
estimate, you are using an interval estimation.
Estimation
Interval estimation uses a range of values.
The width of the range indicates the level of
confidence.
Once a confidence level is specified, the interval
estimate can be calculated using the appropriate
formula in chapter 8 in the book (Statistics for
Business and Economics) . If one wants 95%
confidence, the interval estimate is known as
95% Confidence Interval.
Estimation
This lecture covers the following:
Interval Estimate for a Population Mean
(known σ)
Interval Estimate for a Population Mean
(unknown σ)
Determining Size of a Sample
Margin of Error and the Interval Estimate
The general form of an interval estimate of a
population mean is
x Margin of Error
Interval Estimate of a Population Mean:
s Known
Interval
Estimate s
x z /2
of m n
where: x is the sample mean
1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
s is the population standard deviation
n is the sample size
Interval Estimate for Population Mean
(Known σ)
Interval Estimation for Population Mean
(Known σ )
Using Formula, the 95% confidence interval is:
s
x z /2
n
82 ± 1.96 (20/√100)
82 ± 3.92
78.08 to 85.92
We are 95% sure that the average amount spent by all the
Department Stores is between N78.08 (‘000) and N85.92 (‘000).
Interval Estimation of a Population Mean:
s Unknown
If an estimate of the population standard deviation s
cannot be developed prior to sampling, we use the
sample standard deviation s to estimate s .
This is the s unknown case.
In this case, the interval estimate for m is based on the
t distribution.
(We’ll assume for now that the population is
normally distributed.)
t Distribution
The t distribution is a family of similar probability
distributions.
A specific t distribution depends on a parameter
known as the degrees of freedom.
Degrees of freedom refer to the sample size minus 1.
t Distribution
A t distribution with more degrees of freedom has
less dispersion.
As the number of degrees of freedom increases, the
difference between the t distribution and the
standard normal probability distribution becomes
smaller and smaller.
t Distribution
t distribution
Standard (20 degrees
normal of freedom)
distribution
t distribution
(10 degrees
of freedom)
z, t
0
t Distribution
Degrees Area in Upper Tail
of Freedom .20 .10 .05 .025 .01 .005
. . . . . . .
50 .849 1.299 1.676 2.009 2.403 2.678
60 .848 1.296 1.671 2.000 2.390 2.660
80 .846 1.292 1.664 1.990 2.374 2.639
100 .845 1.290 1.660 1.984 2.364 2.626
.842 1.282 1.645 1.960 2.326 2.576
Standard normal
z values
Interval Estimation of a Population Mean:
s Unknown
Interval Estimate
s
x t /2
n
where: 1 - = the confidence coefficient
t/2 = the t value providing an area of /2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
Interval Estimate for Population Mean
(Unknown σ)
Interval Estimate (Continued)
9312 ± 955
8357 to 10,267
We are 95% confident that the average credit
card balance of all customers in the country
are between N8,357 and N10,267.
Summary of Interval Estimation Procedures
for a Population Mean
Can the
Yes No
population standard
deviation s be assumed
known ?
Use the sample
standard deviation
s to estimate s
s Known
Case
Use Use
s s Unknown s
x z /2 Case x t /2
n n
Sample Size for an Interval Estimate
of a Population Mean
Let E = the desired margin of error.
E is the amount added to and subtracted from the
point estimate to obtain an interval estimate.
Sample Size for an Interval Estimate
of a Population Mean
Margin of Error
s
E z /2
n
Necessary Sample Size
( z / 2 ) 2 s 2
n
E2
Determination of Sample Size
Interval Estimation
of a Population Proportion
The general form of an interval estimate of a
population proportion is
p Margin of Error
Interval Estimation
of a Population Proportion
The sampling distribution of p plays a key role in
computing the margin of error for this interval
estimate.
The sampling distribution of p can be approximated
by a normal distribution whenever np > 5 and
n(1 – p) > 5.
Interval Estimation
of a Population Proportion
Normal Approximation of Sampling Distribution of p
Sampling
p(1 p)
distribution sp
of p n
/2 1 - of all /2
p values
p
p
z /2s p z /2s p
Interval Estimation of a Population
Proportion
Interval Estimate
p (1 p )
p z / 2
n
where: 1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
p is the sample proportion
Interval Estimate for a Population
Proportion
Hypothesis Testing
What is a Hypothesis?
A statement about population that may or may not be true.
Examples:
O The average nicotine content of a new brand of cigarette is 0.05
milligrams.
O The average life of a new battery is over 72 months.
O The proportion of all registered voters in the country who favor
intervention in Niger is 0.55.
What is Hypothesis Testing?
A procedure to test a hypothesis
Hypothesis Testing
Hypothesis testing is used under several situations:
O To test if a new drug is more effective
O To test if a new process improves a product
O To test if employee training improves employee job
rating
O To test if a new bonus plan increases sales
performance
O To test if new advertising increases sales volume
And a host of similar situations
Basic Terms
Null hypothesis
Alternative hypothesis
Sidedness of a test
Test statistic
Critical value
Type I error
Type II error
Level of significance
Hypothesis Testing
Developing Null and Alternative Hypotheses
Type I and Type II Errors
Population Mean: s Known
Population Mean: s Unknown
Developing Null and Alternative
Hypotheses
Developing Null and Alternative
Hypotheses