0% found this document useful (0 votes)
22 views7 pages

Chapter 6

This document discusses statistical sampling and estimation. It defines key sampling concepts like sampling plans, sampling methods, population parameters, and estimators. It also covers interval estimates, confidence intervals, and how the central limit theorem and t-distribution relate to estimating population means from sample data. Estimating techniques like confidence intervals allow statisticians to quantify uncertainty and determine how likely it is that the sample mean represents the true population mean.

Uploaded by

storkydd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

Chapter 6

This document discusses statistical sampling and estimation. It defines key sampling concepts like sampling plans, sampling methods, population parameters, and estimators. It also covers interval estimates, confidence intervals, and how the central limit theorem and t-distribution relate to estimating population means from sample data. Estimating techniques like confidence intervals allow statisticians to quantify uncertainty and determine how likely it is that the sample mean represents the true population mean.

Uploaded by

storkydd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 6

Sampling and Estimation

Statistical Sampling

 Sampling is the foundation of statistical analysis.

 Sampling plan - a description of the approach that is used to obtain samples from a
population prior to any data collection activity.

 A sampling plan states:

- its objectives

- target population

- population frame (the list from which the sample is selected)

- operational procedures for collecting data

- statistical tools for data analysis

Example 6.1: A Sampling Plan for a Market Research Study

 A company wants to understand how golfers might respond to a membership program that
provides discounts at golf courses.

◦ Objective - estimate the proportion of golfers who would join the program

◦ Target population - golfers over 25 years old

◦ Population frame - golfers who purchased equipment at particular stores

◦ Operational procedures - e-mail link to survey or direct-mail questionnaire

◦ Statistical tools - PivotTables to summarize data by demographic groups and estimate


likelihood of joining the program

Sampling Methods

 Subjective Methods

 Judgment sampling (Chọn mẫu theo sự xét đoán) – expert judgment is used to
select the sample

 Convenience sampling (Chọn mẫu thuận tiện) – samples are selected based on the
ease with which the data can be collected

 Probabilistic Sampling

 Simple random sampling involves selecting items from a population so that every
subset of a given size has an equal chance of being selected

Additional Probabilistic Sampling Methods

 Systematic (periodic) sampling (Chọn mẫu ngẫu nhiên hệ thống) – a sampling plan that
selects every nth item from the population.
 Stratified sampling (Chọn mẫu phân tổ) – applies to populations that are divided into natural
subsets (called strata) and allocates the appropriate proportion of samples to each stratum
(tầng).

 Cluster sampling (Chọn mẫu ngẫu nhiên theo cụm) - based on dividing a population into
subgroups (clusters), sampling a set of clusters, and (usually) conducting a complete census
within the clusters sampled

 Sampling from a continuous process

◦ Select a time at random; then select the next n items produced after that time.

◦ Select n times at random; then select the next item produced after each of these
times.

Estimating Population Parameters

 Estimation involves assessing the value of an unknown population parameter using sample
data

 Estimators are the measures used to estimate population parameters

◦ E.g., sample mean, sample variance, sample proportion

 A point estimate is a single number derived from sample data that is used to estimate the
value of a population parameter.

 If the expected value of an estimator equals the population parameter it is intended to


estimate, the estimator is said to be unbiased.

Sampling Error

 Sampling (statistical) error occurs because samples are only a subset of the total population

◦ Sampling error is inherent in any sampling process, and although it can be


minimized, it cannot be totally avoided.

 Nonsampling error occurs when the sample does not represent the target population
adequately .

◦ Nonsampling error usually results from a poor sample design or inadequate data
reliability.

Sampling Distributions

 The sampling distribution of the mean is the distribution of the means of all possible
samples of a fixed size n from some population.

 The standard deviation of the sampling distribution of the mean is called the standard error
of the mean:

 As n increases, the standard error decreases.

◦ Larger sample sizes have less sampling error.


Central Limit Theorem

1. If the sample size is large enough, then the sampling distribution of the mean is:

- approximately normally distributed regardless

of the distribution of the population

- has a mean equal to the population mean

2. If the population is normally distributed, then the sampling distribution is also normally
distributed for any sample size.

◦ The central limit theorem allows us to use the theory we learned about calculating
probabilities for normal distributions to draw conclusions about sample means.

Applying the Sampling Distribution of the Mean

 The key to applying sampling distribution of the mean correctly is to understand whether the
probability that you wish to compute relates to an individual observation or to the mean of a
sample.

◦ If it relates to the mean of a sample, then you must use the sampling distribution of
the mean, whose standard deviation is the standard error, not the standard
deviation of the population.

Example 6.6: Using the Standard Error in Probability Calculations

 The purchase order amounts for books on a publisher’s Web site is normally distributed with
a mean of $36 and a standard deviation of $8.

 Find the probability that:

a) someone’s purchase amount exceeds $40.

Use the population standard deviation:

P(x > 40) = 1− NORM.DIST(40, 36, 8, TRUE) = 0.3085

b) the mean purchase amount for 16 customers exceeds $40.

Use the standard error of the mean:

P(x > 40) = 1− NORM.DIST(40, 36, 2, TRUE) = 0.0228

Interval Estimates

 An interval estimate provides a range for a population characteristic based on a sample.

◦ Intervals specify a range of plausible values for the characteristic of interest and a
way of assessing “how plausible” they are.

 In general, a 100(1 - a)% probability interval is any interval [A, B] such that the probability of
falling between A and B is 1 - a.
◦ Probability intervals are often centered on the mean or median.

◦ Example: in a normal distribution, the mean plus or minus 1 standard deviation


describes an approximate 68% probability interval around the mean.

Example 6.7: Interval Estimates in the News

 A Gallup poll might report that 56% of voters support a certain candidate with a margin of
error of ± 3%.

◦ We would have a lot of confidence that the candidate would win since the interval
estimate is [53%, 59%]

 Suppose the poll reported a 52% level of support with a ± 4% margin of error.

◦ We would be less confident in predicting a win for the candidate since the interval
estimate is [48%, 56%].

Confidence Intervals

 A confidence interval is a range of values between which the value of the population
parameter is believed to be, along with a probability that the interval correctly estimates the
true (unknown) population parameter.

◦ This probability is called the level of confidence, denoted by 1 - a, where a is a


number between 0 and 1.

◦ The level of confidence is usually expressed as a percent; common values are 90%,
95%, or 99%.

 For a 95% confidence interval, if we chose 100 different samples, leading to 100 different
interval estimates, we would expect that 95% of them would contain the true population
mean.

Confidence Interval for the Mean with Known Population Standard Deviation

 Sample mean ± margin of error

 Margin of error is: ± zα/2 (standard error)

 zα/2 is the value of the standard normal random variable for an upper tail area of α/2
(or a lower tail area of 1 − α/2).

 zα/2 is computed as =NORM.S.INV(1 – a/2)

 Example: if a = 0.05 (for a 95% confidence interval), then NORM.S.INV(0.975) = 1.96;

 Example: if a = 0.10 (for a 90% confidence interval), then NORM.S.INV(0.95) = 1.645,

 The margin of error can also be computed by =CONFIDENCE.NORM(alpha,


standard_deviation, size).

Example 6.8: Computing a Confidence Interval with a Known Standard Deviation


 A production process fills bottles of liquid detergent. The standard deviation in filling
volumes is constant at 15 mls. A sample of 25 bottles revealed a mean filling volume of 796
mls.

 A 95% confidence interval estimate of the mean filling volume for the population is

Excel Workbook for Confidence Intervals

 The worksheet Population Mean Sigma Known in the Excel workbook Confidence Intervals
computes this interval using the CONFIDENCE.NORM function

Confidence Interval Properties

 As the level of confidence, 1 - a, decreases, za/2 decreases, and the confidence interval
becomes narrower.

◦ For example, a 90% confidence interval will be narrower than a 95% confidence
interval. Similarly, a 99% confidence interval will be wider than a 95% confidence
interval.

 Essentially, you must trade off a higher level of accuracy with the risk that the confidence
interval does not contain the true mean.

◦ To reduce the risk, you should consider increasing the sample size.

The t-Distribution

 The t-distribution is a family of probability distributions with a shape similar to the standard
normal distribution. Different t-distributions are distinguished by an additional parameter,
degrees of freedom (df).

◦ As the number of degrees of freedom increases, the t-distribution converges to the


standard normal distribution
Confidence Interval for the Mean with Unknown Population Standard Deviation

where tα/2 is the value of the t-distribution with

df = n − 1 for an upper tail area of α/2.

 t values are found in Table 2 of Appendix A or with the Excel function T.INV(1 – a/2, n – 1).

 The Excel function

=CONFIDENCE.T(alpha, standard_deviation, size)

can be used to compute the margin of error

Confidence Interval for a Proportion

 An unbiased estimator of a population proportion p (this is not the number pi = 3.14159 …) is


the statistic pˆ = x / n (the sample proportion), where x is the number in the sample having
the desired characteristic and n is the sample size.

 A 100(1 – a)% confidence interval for the proportion is

Prediction Intervals

 A prediction interval is one that provides a range for predicting the value of a new
observation from the same population.

◦ A confidence interval is associated with the sampling distribution of a statistic, but a


prediction interval is associated with the distribution of the random variable itself.

 A 100(1 – a)% prediction interval for a new observation is


Confidence Intervals and Sample Size

 We can determine the appropriate sample size needed to estimate the population
parameter within a specified level of precision (± E).

 Sample size for the mean:

 Sample size for the proportion:

◦ Use the sample proportion from a preliminary sample as an estimate of p or set p =


0.5 for a conservative estimate to guarantee the required precision.

You might also like