SAMPLING & ESTIMATION
Main Issues
 Universe/Population
 Sampling Frame
 Sampling Unit
 Sample Size
 Budgetary Constraints
 Sampling Procedure
 Universe/Population
 CENSUS STUDY
 Sample
 Sampling Unit
 Sampling Frame: representation of the elements of the target
population. Examples of a sampling frame include the telephone book,
an association directory listing the firms in an industry, a customer
database, a mailing list on a database purchased from a commercial
organisation, a city directory, or a map. If a list cannot be compiled,
then at least some directions for identifying the target population
should be specified, such as random-digit dialling procedures in
telephone surveys.
 Sample Size
 Budgetary Constraints
 Sampling Procedure
Criteria of Sampling Design
Minimise cost of sampling
Cost of
collecting &
analyzing Data
Cost of
incorrect
inferences
Systematic bias &
Sampling error
Leads to
Systematic bias – Inherent in the System
Design Errors: Selection error, Sampling frame error, Measurement scale error
Administering Error: Questioning error, Recording error
Response Error: Data error (intentional/ unintentional)
Non response Error: Failure to contact all members, Incomplete responses
Random/Sampling error – Random variation, controllable by sample size
difference between measure obtained from the sample and the true measure of
the population
Sampling Methods
A. Non-random/Non-probability-based sampling: relies on the personal
judgement of the researcher rather than on chance to select sample
elements.
• Convenience sampling: selection of sampling units is left primarily to the
interviewer. Often, respondents are selected because they happen to be in
the right place at the right time. Examples: (1) use of students and members
of social organisations, (2) street interviews without qualifying the
respondents, (3) some forms of email and Internet survey, (4) tear-out
questionnaires included in a newspaper or magazine.
• Judgmental sampling: elements are selected based on the judgement of the
researcher because he/she believes that they are representative of the
population of interest or are otherwise appropriate. Examples: (1) test
markets selected to determine the potential of a new product, (2) purchase
engineers selected in industrial marketing research because they are
considered to be representative of the company, (3) product testing with
individuals who may be particularly fussy or who hold extremely high
expectations, (4) expert witnesses used in court.
Quota sampling: two-stage restricted judgemental sampling that is used
extensively in street interviewing.
• The first stage consists of developing control characteristics, or quotas,
of population elements such as age or gender. To develop these quotas,
the researcher lists relevant control characteristics and determines the
distribution of these characteristics in the target population, such as
Males 49%, Females 51% (resulting in 490 men and 510 women being
selected in a sample of 1,000 respondents). Often, the quotas are
assigned so that the proportion of the sample elements possessing the
control characteristics is the same as the proportion of population
elements with these characteristics. In other words, the quotas ensure
that the composition of the sample is the same as the composition of
the population with respect to the characteristics of interest.
• In the second stage, sample elements are selected based on
convenience or judgement.
• Snowball sampling: an initial group of respondents is selected who
possess the desired characteristics of the target population. After being
interviewed, these respondents are asked to identify others who
belong to the target population. Subsequent respondents are selected
based on the referrals. By obtaining referrals from referrals, this
process may be carried out in waves, thus leading to a snowballing
effect. The main objective of snowball sampling is to estimate
characteristics that are rare in the wider population.
• Examples: users of particular government or social services, such as
parents who use nurseries or child minders, whose names cannot be
revealed; special census groups, such as widowed males under 35; and
members of a scattered minority ethnic group; Industrial buyer using
some special equipment or technology;
B. Random/Probability- based sampling
1. Simple random sampling
 Each element/item has equal chance of getting included in a
sample. Randomness.
 Sampling with/without replacement
 Random number table, pseudo-random number generator.
2. Stratified Sampling
 Each stratum is a homogeneous group and different from
other strata.
 Random selection from each stratum, proportionately.
3. Cluster sampling
 Least or no variation among clusters.
 Clusters are selected randomly for further
analysis.
 Area sampling in geographical clusters.
 Multi-stage sampling as a special case.
4. Systematic sampling
 Elements selected at a uniform interval.
 Selection evenly spread, less cost & time, more
convenient.
 the sample is chosen by selecting a random starting
point and then picking every i th element in succession
from the sampling frame.
 The sampling interval, i, is determined by dividing the
population size N by the sample size n and rounding to
the nearest whole number. For example, there are
100,000 elements in the population and a sample of
1,000 is desired. In this case, the sampling interval, i, is
100. A random number between 1 and 100 is selected.
If, for example, this number is 23, the sample consists
of elements 23, 123, 223, 323, 423, 523, and so on.
Sample Size Determination:
Interval
Confidence
with
associated
is
precision
of
Level
proportion
for
)
1
(
mean
for
2
2
2
2
2
z
D
D
z
p
p
n
D
z
n





SAMPLING DISTRIBUTION
• Sampling Distribution: Distribution of a sample
statistics, usually mean.
• Standard error( ): Standard deviation of the
sampling distribution.
• Mean of sampling distribution( ) of means, taking
all possible samples exhaustively, approaches to
population mean (µ), particularly for normal
population distribution.
• As sample size increases, standard error decreases.
Assuming Normal Population Distribution
n = Sample size
Central Limit Theorem:
 Irrespective of shape of population distribution, sampling
distribution approaches to normal, as sample size increases.
 Mean of such sampling distribution is population mean.
Sample
Size
Standard
error
Vs
Precision of
Estimation
Cost of
sampling
Point Estimate
Interval Estimate.
 Confidence Level:
 Level of significance, α
 Probability that is associated with an interval
estimate (1- α), of any population parameter.
 Higher confidence level => Wider confidence
interval
Estimation of mean from large sample(usually n> 30):
As sample size is large, sampling distribution of
mean is normal.
1. Compute from either known or estimated
2. Get Z value from standard normal distribution table
corresponding to confidence level (1- α).
3. The confidence interval
Estimation of means from small samples(n<30):
t-distribution:
 Applicable for smaller sample size.
 Unimodal and almost like a bell shape.
 Flatter than normal.
 Larger the sample size less flatter the distribution shape and
closer to normal.
 Value of t varies with d.f.i.e.(n-1) as the distribution shape
changes.
Step 1. Compute ( ) as usual
Step 2. Get t value from t- distribution table corresponding to
(n- 1) as d.f. and (1- confidence level) as the area under curve.
Step 3. ± t is the confidence interval/limit.
Case
Two sided Confidence
Interval (CI)
Population standard deviation, σ
known
𝑥 ± 𝑍𝛼/2
𝜎
𝑛
Population
standard
deviation, σ
unknown
Sample size n > 30
𝑥 ± 𝑍𝛼/2
𝑠
𝑛
Sample size n ≤ 30
𝑥 ± 𝑡𝛼
2,𝑛−1
𝑠
𝑛
Example 1: A sample of size 20 was collected
and the sample mean and standard deviation
are estimated as 9.8525 and 0.0965. Find 95%
two-sided CI for the mean.
• Example 2: The life in hours of a light bulb is
known to be approximately normally distributed
with standard deviation of 25 hours. A random
sample of 40 bulbs has a mean life of 1014 hours.
1. Construct a 95% two-sided CI on the mean life.
2. Construct a 95% one-sided lower CI of the mean life.
One-sided confidence interval: Appropriate lower or upper
confidence limit are found by replacing
𝑍𝛼/2 by 𝑍𝛼 and 𝑡𝛼
2
,𝑛−1 by 𝑡𝛼,𝑛−1
• Example 3: The following result shows the
investigation of the haemoglobin level of hockey
players (in g/dl).
15.3 16.0 14.4 16.2 16.2
14.9 15.7 14.6 15.3 17.7
16.0 15.0 15.7 16.2 14.7
14.8 14.6 15.6 14.5 15.2
a) Find the 90% two-sided CI on the mean
haemoglobin level.
b) Also construct 90% Upper CI on the mean
haemoglobin level.
15.43684211
0.83413996
Confidence Interval on the Variance of a Normal Distribution
Confidence Intervals on a Population Proportion
Example: An automatic filling machine is used to fill bottles with
liquid detergent. A random sample of 20 bottles results in a
sample variance of fill volume of 0.0153. If the variance of fill
volume is too large, an unacceptable proportion of bottles will
be under- or overfilled. We will assume that the fill volume is
approximately normally distributed. Calculate 95% upper-
confidence interval for variance.
Therefore, at the 95% level of confidence, the data indicate that the process
standard deviation could be as large as 0.17
Example: In a random sample of 85 automobile engine
crankshaft bearings, 10 have a surface finish that is rougher than
the specifications allow. Therefore, a point estimate of the
proportion of bearings in the population that exceeds the
roughness specification is 𝑝 =
𝑥
𝑛
=
10
85
= 0.12 . Compute 95%
two-sided confidence interval for p.

Ch6_Sampling_and_Estimation_1665986605149647534634cf02dbcbec (1).pdf

  • 1.
    SAMPLING & ESTIMATION MainIssues  Universe/Population  Sampling Frame  Sampling Unit  Sample Size  Budgetary Constraints  Sampling Procedure
  • 2.
     Universe/Population  CENSUSSTUDY  Sample  Sampling Unit  Sampling Frame: representation of the elements of the target population. Examples of a sampling frame include the telephone book, an association directory listing the firms in an industry, a customer database, a mailing list on a database purchased from a commercial organisation, a city directory, or a map. If a list cannot be compiled, then at least some directions for identifying the target population should be specified, such as random-digit dialling procedures in telephone surveys.  Sample Size  Budgetary Constraints  Sampling Procedure
  • 3.
    Criteria of SamplingDesign Minimise cost of sampling Cost of collecting & analyzing Data Cost of incorrect inferences Systematic bias & Sampling error Leads to Systematic bias – Inherent in the System Design Errors: Selection error, Sampling frame error, Measurement scale error Administering Error: Questioning error, Recording error Response Error: Data error (intentional/ unintentional) Non response Error: Failure to contact all members, Incomplete responses Random/Sampling error – Random variation, controllable by sample size difference between measure obtained from the sample and the true measure of the population
  • 4.
    Sampling Methods A. Non-random/Non-probability-basedsampling: relies on the personal judgement of the researcher rather than on chance to select sample elements. • Convenience sampling: selection of sampling units is left primarily to the interviewer. Often, respondents are selected because they happen to be in the right place at the right time. Examples: (1) use of students and members of social organisations, (2) street interviews without qualifying the respondents, (3) some forms of email and Internet survey, (4) tear-out questionnaires included in a newspaper or magazine. • Judgmental sampling: elements are selected based on the judgement of the researcher because he/she believes that they are representative of the population of interest or are otherwise appropriate. Examples: (1) test markets selected to determine the potential of a new product, (2) purchase engineers selected in industrial marketing research because they are considered to be representative of the company, (3) product testing with individuals who may be particularly fussy or who hold extremely high expectations, (4) expert witnesses used in court.
  • 5.
    Quota sampling: two-stagerestricted judgemental sampling that is used extensively in street interviewing. • The first stage consists of developing control characteristics, or quotas, of population elements such as age or gender. To develop these quotas, the researcher lists relevant control characteristics and determines the distribution of these characteristics in the target population, such as Males 49%, Females 51% (resulting in 490 men and 510 women being selected in a sample of 1,000 respondents). Often, the quotas are assigned so that the proportion of the sample elements possessing the control characteristics is the same as the proportion of population elements with these characteristics. In other words, the quotas ensure that the composition of the sample is the same as the composition of the population with respect to the characteristics of interest. • In the second stage, sample elements are selected based on convenience or judgement.
  • 6.
    • Snowball sampling:an initial group of respondents is selected who possess the desired characteristics of the target population. After being interviewed, these respondents are asked to identify others who belong to the target population. Subsequent respondents are selected based on the referrals. By obtaining referrals from referrals, this process may be carried out in waves, thus leading to a snowballing effect. The main objective of snowball sampling is to estimate characteristics that are rare in the wider population. • Examples: users of particular government or social services, such as parents who use nurseries or child minders, whose names cannot be revealed; special census groups, such as widowed males under 35; and members of a scattered minority ethnic group; Industrial buyer using some special equipment or technology;
  • 7.
    B. Random/Probability- basedsampling 1. Simple random sampling  Each element/item has equal chance of getting included in a sample. Randomness.  Sampling with/without replacement  Random number table, pseudo-random number generator. 2. Stratified Sampling  Each stratum is a homogeneous group and different from other strata.  Random selection from each stratum, proportionately.
  • 8.
    3. Cluster sampling Least or no variation among clusters.  Clusters are selected randomly for further analysis.  Area sampling in geographical clusters.  Multi-stage sampling as a special case.
  • 9.
    4. Systematic sampling Elements selected at a uniform interval.  Selection evenly spread, less cost & time, more convenient.  the sample is chosen by selecting a random starting point and then picking every i th element in succession from the sampling frame.  The sampling interval, i, is determined by dividing the population size N by the sample size n and rounding to the nearest whole number. For example, there are 100,000 elements in the population and a sample of 1,000 is desired. In this case, the sampling interval, i, is 100. A random number between 1 and 100 is selected. If, for example, this number is 23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on.
  • 10.
  • 11.
    SAMPLING DISTRIBUTION • SamplingDistribution: Distribution of a sample statistics, usually mean. • Standard error( ): Standard deviation of the sampling distribution. • Mean of sampling distribution( ) of means, taking all possible samples exhaustively, approaches to population mean (µ), particularly for normal population distribution. • As sample size increases, standard error decreases.
  • 12.
    Assuming Normal PopulationDistribution n = Sample size
  • 13.
    Central Limit Theorem: Irrespective of shape of population distribution, sampling distribution approaches to normal, as sample size increases.  Mean of such sampling distribution is population mean. Sample Size Standard error Vs Precision of Estimation Cost of sampling
  • 14.
  • 15.
    Interval Estimate.  ConfidenceLevel:  Level of significance, α  Probability that is associated with an interval estimate (1- α), of any population parameter.  Higher confidence level => Wider confidence interval
  • 16.
    Estimation of meanfrom large sample(usually n> 30): As sample size is large, sampling distribution of mean is normal. 1. Compute from either known or estimated 2. Get Z value from standard normal distribution table corresponding to confidence level (1- α). 3. The confidence interval
  • 17.
    Estimation of meansfrom small samples(n<30): t-distribution:  Applicable for smaller sample size.  Unimodal and almost like a bell shape.  Flatter than normal.  Larger the sample size less flatter the distribution shape and closer to normal.  Value of t varies with d.f.i.e.(n-1) as the distribution shape changes. Step 1. Compute ( ) as usual Step 2. Get t value from t- distribution table corresponding to (n- 1) as d.f. and (1- confidence level) as the area under curve. Step 3. ± t is the confidence interval/limit.
  • 18.
    Case Two sided Confidence Interval(CI) Population standard deviation, σ known 𝑥 ± 𝑍𝛼/2 𝜎 𝑛 Population standard deviation, σ unknown Sample size n > 30 𝑥 ± 𝑍𝛼/2 𝑠 𝑛 Sample size n ≤ 30 𝑥 ± 𝑡𝛼 2,𝑛−1 𝑠 𝑛
  • 19.
    Example 1: Asample of size 20 was collected and the sample mean and standard deviation are estimated as 9.8525 and 0.0965. Find 95% two-sided CI for the mean.
  • 20.
    • Example 2:The life in hours of a light bulb is known to be approximately normally distributed with standard deviation of 25 hours. A random sample of 40 bulbs has a mean life of 1014 hours. 1. Construct a 95% two-sided CI on the mean life. 2. Construct a 95% one-sided lower CI of the mean life. One-sided confidence interval: Appropriate lower or upper confidence limit are found by replacing 𝑍𝛼/2 by 𝑍𝛼 and 𝑡𝛼 2 ,𝑛−1 by 𝑡𝛼,𝑛−1
  • 21.
    • Example 3:The following result shows the investigation of the haemoglobin level of hockey players (in g/dl). 15.3 16.0 14.4 16.2 16.2 14.9 15.7 14.6 15.3 17.7 16.0 15.0 15.7 16.2 14.7 14.8 14.6 15.6 14.5 15.2 a) Find the 90% two-sided CI on the mean haemoglobin level. b) Also construct 90% Upper CI on the mean haemoglobin level. 15.43684211 0.83413996
  • 22.
    Confidence Interval onthe Variance of a Normal Distribution Confidence Intervals on a Population Proportion
  • 23.
    Example: An automaticfilling machine is used to fill bottles with liquid detergent. A random sample of 20 bottles results in a sample variance of fill volume of 0.0153. If the variance of fill volume is too large, an unacceptable proportion of bottles will be under- or overfilled. We will assume that the fill volume is approximately normally distributed. Calculate 95% upper- confidence interval for variance. Therefore, at the 95% level of confidence, the data indicate that the process standard deviation could be as large as 0.17
  • 24.
    Example: In arandom sample of 85 automobile engine crankshaft bearings, 10 have a surface finish that is rougher than the specifications allow. Therefore, a point estimate of the proportion of bearings in the population that exceeds the roughness specification is 𝑝 = 𝑥 𝑛 = 10 85 = 0.12 . Compute 95% two-sided confidence interval for p.