0% found this document useful (0 votes)
9 views27 pages

Ch.3-Estimation module

The document covers the concepts of statistical estimation, focusing on point and interval estimation for population means. It explains how to compute and interpret confidence intervals using sample statistics and discusses the properties of good estimators. The document also includes examples and exercises to illustrate the application of these concepts in real-world scenarios.

Uploaded by

Nahom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views27 pages

Ch.3-Estimation module

The document covers the concepts of statistical estimation, focusing on point and interval estimation for population means. It explains how to compute and interpret confidence intervals using sample statistics and discusses the properties of good estimators. The document also includes examples and exercises to illustrate the application of these concepts in real-world scenarios.

Uploaded by

Nahom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Topic 27: Statistical Estimation: Concepts of Estimation

Topic Learning Objectives:

By the end of this session students are expected to:

 Explain the concept of statistical estimation;


 Compute and interpret the point estimate of a population mean;
 compute and interpret a confidence interval for a population mean

Topic outline
1. Concepts of statistical estimation
2. Synopsis
3. Wrap up discussion questions
4. Next session’s assignment

Reading Assignment Discussion:


 How is confidence interval developed for a population mean using the normal
distribution?

Reading Text:

The sampling process is used to draw statistical inference about the characteristics of a
population or process of interest. On many occasions we do not have enough
information to calculate an exact value of population parameters (such as μ, σ and P)
and therefore make the best estimate of this value from the corresponding sample
statistics (such as x , s, and p).The need to use the sample statistic to draw conclusions
about the population characteristic is one of the fundamental applications of statistical
inference in business and economics. For instance, statistical estimation could be used
in the following cases:
 A bank needs to understand the proportion of consumers aware of its
services and credit schemes.
 Any service centre needs to determine the average amount of time a
customer spends in queue.
In all such cases, a decision-maker needs to examine the two concepts of estimation
and hypothesis testing that are useful for drawing statistical inference about an
unknown population or process parameters based upon random samples. In this section
we shall discuss methods to estimate unknown population parameters and then to
determine the range of values (confidence interval) likely to contain the parameter
value. Estimation is a procedure of assigning numerical values to a population
parameter based on information collected from a corresponding random sample
statistic. There are two types of estimates that we can make about a population: a
point estimate and an interval estimate. Point estimation is a statistical procedure in
which we use a single value to estimate unknown population parameter. A point
estimate is a single number that is used as an estimate of unknown population
parameter (it is obtained from random sample data).

For example, suppose from a sample of 5,000 households studied, the mean housing
expenditure per month for this sample is Birr 450. Then, using x as a point estimate of
µ, we can state that the mean housing expenditure per month µ for all households is
about Birr 450, i.e. point estimation. Instead of saying that the mean housing
expenditure per month for all households is Birr 450, we obtain an interval by
subtracting from and adding to Birr 450. Then we state that this interval contains the
population mean µ. For purposes of illustration, we can subtract and add Birr 50 to Birr
450. Then, we obtain the interval Birr 400 – Birr 500. This interval is likely to contain
the population mean µ. This procedure is called interval estimation. The value Birr 400
is called the lower limit of the interval and Birr 500 is called the upper limit of the
interval, this is interval estimate.

A drawback of point estimate compared to interval estimate is that the former is based
on single element chosen from a sampling distribution (not range of values), but the
fact is the unknown parameter may be above or below the estimate. Also, it conveys
little information about the accuracy of the estimate; it does not tell as to how confident
we can be that the estimate is close to the parameter it is estimating. On the contrary,
interval estimation gives the estimate in ranges or intervals and specifies the level of
confidence concerning the reliability of the estimate. When we make an estimate of a
population parameter, we use a sample statistic. This sample statistic is an estimator,
refer to Table 27.1.

Table 27.1 Estimator and Estimate

Point estimator
Population Parameter
(sample statistic)

Mean µ x =∑Xi/n

2
Variance, σ S2=∑(Xi-x)2/n-1

Standard deviation σ s2

Proportion π or P p=X/n

The best estimator should be highly reliable and have the following desirable
properties:

1. Unbiasedness: Case where expected average value is equal to the population


parameter. For example the mean of sampling distribution of sample means is
equal to the population mean E(x)=µ, (E(x)-µ=0); biased estimator shows
deviation from the parameter of interest.
2. Efficiency: it is measured in terms of size of the standard error of the statistic.
The standard error decreases as n increases. Estimator whose value remains
stable from sample to sample. Standard error of the sampling distribution is
smaller than population standard deviation thus x is efficient estimator of µ.
3. Consistency: As n increases the standard error decreases and the probability of
being close to (the parameter it estimates) increases.
4. Sufficiency: Using all information about population available in the sample. E.g.
the mean is so (unlike the mode and median)

To overcome the drawback of point estimation, interval estimation is used.

Interval estimation: A statistical procedure in which we find a random interval with a


specified probability of containing the parameter being estimated.

Interval estimate: range of values within which the population parameter is expected
to occur ( it has upper and lower bounds).

Confidence interval: A range of data constructed from sample data so the parameter
occurs within that range at a specified probability. The specified probability is called the
level of confidence it is denoted by 1- 𝛼. Confidence level in decimal form is called
confidence coefficient. More common values are 90%, 95% & 99%.The corresponding
confidence coefficient are 0.9 0.95 & 0.99. 𝛼 is called the significance level.
Confidence Levels (1- 𝜶): are probabilities specifying the level of accuracy of the
interval estimate; it indicates the degree of sureness that the interval estimate contains
the parameter being estimated. Although any value of confidence level can be chosen,
popular confidence levels are like 90%, 95% and 99%.

Significance level (𝜶): is probability specifying the degree of error in interval


estimates. It indicates the probability that the parameter may be outside the interval
estimate being developed. For the above confidence levels the significance levels are
10%, 5% and 1% respectively.
95% confidence interval means: about 95% of the similarly constructed intervals
are expected to contain (will contain) the parameter being estimated. 95% of the
sample means for specified sample size will lie within 1.96*σ of the hypothesized µ.
N.B.: To find the Z value for corresponding confidence level, we divide each confidence
level into 2 left and right of µ. Thus for example, 0.95/2=0.4750 and we find the
corresponding Z value and it is 1.96.

The confidence interval estimate of a population parameter is obtained by applying the


formula : Point estimate ± Margin of error(E)
Where Margin of error = Zc × Standard error of a particular statistic
Zc = critical value of standard normal variable that represents confidence level
(probability of being correct) such as 0.90, 0.95. The number that should be subtracted
and added to a point estimate to obtain an interval estimate is margin of error and it
depends on the sample standard deviation and the confidence level.

95% (1- 𝛼)

0.4750 0.4750 𝛼/2 𝛼 /2

-1.96 0 1.96

Figure 27.1 The area for 95% Confidence Level

Synopsis

 There are two kinds of estimation: point estimation ad interval estimation


o Unbiasdness, efficiency, consistency, and sufficiency are properties of
good estimator
 The confidence interval estimate of a population parameter is obtained by
applying the formula : Point estimate ± Margin of error(E)
 Where Margin of error = Zc × Standard error of a particular statistic
 Zc = critical value of standard normal variable that represents confidence
level (probability of being correct) such as 0.90, 0.95, and so on.

Wrap up Discussion Questions:

 Distinguish between the point estimation and interval estimation; how is the
latter better than the former?
 What are the properties of a good estimator? Explain
 How is the confidence interval estimate of a population parameter developed?
 What is the margin of error?
 What is the standard error?

Next Session’s Assignment:


 Read about Computing and interpreting a confidence interval for a population
mean using the normal distribution

Topic 28: Statistical Estimation: Confidence Interval for the Population


mean Using Normal Distribution

Topic Learning Objectives:

By the end of this session students are expected to:

 Compute and interpret a confidence interval for a population mean using the
normal distribution

Topic outline

1. Point estimation for population mean


2. Developing confidence interval for the population mean using normal
distribution
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:


 How is confidence interval developed for a population mean using the normal
distribution?
Reading Text:
The use of the normal probability distribution to construct the confidence interval for
the population mean is warranted: (1) whenever n≥30, because of the central limit
theorem, (if population standard deviation σ is unknown, it can be approximated by
sample standard deviation s), or (2) when n< 30 but the population is normally
distributed and population standard deviation σ is known.
Accordingly, the confidence interval for the population mean at a given
confidence level (1-𝛼) is computed as follows:
σ 𝑆
x± Z ∗ or x ± Z ∗
n n

σ
Where standard error of the sampling distribution of sample means: ( 𝐱) =σx = or
n

𝑆
𝑆x =
n
σ 𝑆
The margin of error E=Z*σx = Z ∗ or E=Z*𝑆x = Z ∗
n n

n=sample size; x=sample mean; Z based on the confidence level (divide the given
confidence level by two and read the corresponding z value from the Z-table); σ=
population standard deviation, and S= sample standard deviation.

Note: as the sample size increases the standard error decreases. When sampling from
the same population, using a fixed sample size, the higher the confidence level, the
wider the confidence interval.

Example 28
1. The sponsor of TV program targeted at the children's market (age 4-10) wants
to find out the average amount of time children spend watching TV. A random
sample of 100 children indicated the average time spent by these children
watching TV per week to be 27.2 hours. From previous experience, the
population standard deviation of the weekly TV watched is known to be 8 hours.
A confidence level of 95% is adequate.
a. What is the population mean of weekly TV watching time for children?
b. What is the best estimate of the population mean? What is this value
called?
c. Develop a 95% confidence interval for the population mean of weekly TV
watching time.
d. Interpret the confidence interval.

Solution:

Given: x=27.2 hours σ=8 hours n=100, (1-𝛼)=95%=0.95, Z0.95/2=Z0.4750=1.96

a. Unknown. This is the value we wish to estimate.


b. The sample mean x=27.2 is the point estimate of the population mean weekly
TV watching time
σ 8
c. x ± Z ∗ = 27.2 ± 1.96 ∗ =27.2±1.96*0.8= 27.2±1.568=25.632-28.768
n 100

d. We can conclude that with 95 % confidence that a child on an average spends


between 25.6 and 28.7 hours per week watching television; or about 95% all
possible samples of 100 children would include the population mean. However, it
should be understood 5% of the times our conclusion would still be wrong. 2.5%
of the times on both tails children are watching below 25.6 and above 28.7 hours
weekly.
2. It is desired estimate the average age of students who graduate with an MBA
degree in the university system. A random sample 64 graduating students
showed that the average age was 27 years with a standard deviation of 4 years.
a. Estimate a 95% confidence interval estimate of the true average
(population mean) age of all such graduating students at the university.
b. How would the confidence interval limits change if the confidence level
was increased from 95% to 99%?

Solution: Given: x=27 yearsr, S=4 years, n=64, (1-𝛼)=95%=0.95, Z0.95/2=Z0.4750=1.96

𝑆
(1-𝛼)=99%=0.99, Z0.99/2=Z0.4950=2.5; use x ± Z ∗
n

a. 27±1.96*4/ 64=27±1.96*0.5=27±0.98= 26.02≤µ≤27.98


b. 27±2.58*0.5=27±1.29=25.71≤µ≤28.29
3. In a certain small city, to estimate the mean monthly expenditure for food, a
random sample 25 households was randomly selected yielding a mean of 200
birr. From experience, it is known that such expenditures are normally
distributed with a standard deviation of 50 Birr.
a. What is the point estimate of the mean monthly expenditures for food of
all households in the city?
b. Find 95% confidence interval for the mean monthly expenditures for food
of all households in the city.

Solution: Given: x=200 birr, σ=50 hours n=25, (1-𝛼)=98%=0.98, Z0.98/2=Z0.4900=2.33


Population is normally distributed
a. x=200
b. 200±2.33*50/ 25=200±2.33*10=200±23.3=177.7≤µ≤223.3
We are 98% confident that the true mean monthly expenditure for food is
between 177.7 and 223.3 Birr.
Exercise 28
1. The X-burger is a franchise fast-food restaurant located in the city specializing in
half-pound hamburgers, fish sandwiches, and chicken sandwiches. Soft drinks
and French fries also are available. The Marketing Department of X-burger
reports that the distribution of daily sales for their restaurants follows the normal
distribution and that the population standard deviation is $3,000. A sample of 40
franchises showed the mean daily sales to be $20,000.
a. What is the population mean of daily sales for X-burger franchises?
b. What is the best estimate of the population mean? What is this value called?
c. Develop a 95% confidence interval for the population mean of daily sales.
d. Interpret the confidence interval.
2. A machine produces components, which have a standard deviation of 1.6 cm in
length. A random sample of 64 parts is selected from the output and this sample
has a mean length of 90 cm. The customer will reject the part if it is either less
than 88cm or more than 92cm. Does the 95% confidence interval for the true
mean length of all the components produced ensure acceptance by the
customer?
3. Suppose that you wish to estimate the mean sales amount per retail outlet for a
particular consumer product during the past year. The number of retail outlets is
large. Determine the 95 percent confidence interval given that the sales amounts
are assumed to be normally distributed, the sample mean was $3425 based on
random sample of 25; and the population standard deviation is $200.

Synopsis

 The confidence interval for the population mean µ at a given confidence level
(1-𝛼) is computed as follows:
σ 𝑆
o x± Z ∗ or x ± Z ∗
n n

 Given the following assumptions:


o whenever n≥30, because of the central limit theorem, (if population
standard deviation σ is unknown, it can be approximated by sample
standard deviation s), or
o when n< 30 but the population is normally distributed and population
standard deviation σ is known.

Wrap up Discussion Questions:


 How is confidence interval developed for µ based on large samples?
 What are the conditions under which the normal distribution is used to
construct the interval estimate for population mean?
Next Session’s Assignment:
 Attempt Exercise 28, #1-3
 Read about developing confidence interval for P using the normal probability
distribution

Topic 29: Statistical Estimation: Confidence Interval for the Population


Proportion Using Normal Distribution

Topic Learning Objectives:

By the end of this session students are expected to:

 Compute and interpret a confidence interval for a population proportion using


the normal distribution

Topic outline

1. Developing confidence interval for the population proportion using normal


distribution
2. Synopsis
3. Wrap up discussion questions
4. Next session’s assignment

Reading Assignment Discussion:


 How is confidence interval developed for a population proportion using the
normal distribution?

Reading Text:

The estimator of the population proportion P is the sample proportion p.If the sample
size is large, p has an approximately normal sampling distribution. For estimating p, as
a rule of thumb, a sample is considered large enough when both n*p and n*q are
greater than 5. The mean of the sampling distribution of p is the population proportion
P, and the standard error or standard deviation of the sampling distribution of p that is

𝝈𝒑 = p(1 − p)
𝑛 , where 1-p is denoted by q

Since the standard deviation of the estimator depends on the unknown population
parameter, its value is also unknown to us. It turns out, however, that for large samples
we may use our actual estimate p instead of the unknown parameter P in the formula
for the standard deviation. The (1-𝛼) Confidence Interval for the Population Proportion
P is given by

p(1 − p)
p 𝑛
±z∗

Where p is the sample proportion, Z is based on the confidence level; and n is the
sample size

Example 29

A market research firm wants to estimate the share that foreign companies have in the
U.S. market for certain products. A random sample of 100 consumers is obtained, and
34 people in the sample are found to be users of foreign-made products; the rest are
users of domestic products. Give a 95% confidence interval for the share of foreign
products in this market.
We have x=34 and n =100, p=34/100=0.34
p(1 − p) 0.34 1 − 0.34
p
±z∗ 𝑛 = 0.34 ± 1.96 ∗ 100

0.34 ±1.96(0.04737)= 0.34 ±0.0928= [0.2472≤P≤0.4328]

Exercise 29

1. In a health survey involving a random sample of 75 patients who developed a


particular illness, 70% of them are cured of this illness by a new drug. Establish
the 95% confidence interval for the population proportion of all the patients who
will be cured by the new drug. This would help assess the market potential for
this new drug by a pharmaceutical company.
2. Suppose 1600 of 2000 union members sampled said they plan to vote for the
proposal to merge with a national union. Union bylaws state that at least 75% of
all members must approve for the merger to be enacted. Using the 0.95 degree
of confidence, what is the interval estimate for the population proportion?

Synopsis

 For large samples (np and nq>5),the confidence interval for the population

p(1 − p)
proportion P is found by: p 𝑛
±z∗

Wrap up Discussion Questions:

 How is confidence interval developed for P based on large samples?

Next Session’s Assignment:

 Attempt Exercise 29, #1 & 2


 Read about developing confidence interval for µ using the t-distribution
Topic 30: Statistical Estimation: Confidence Interval for the Population
Mean Using t-distribution

Topic Learning Objectives:

By the end of this session students are expected to:

 Describe the characteristics of t-distribution;


 Develop a confidence interval for a population mean using the t-distribution

Topic outline
1. Characteristics of t-distribution
2. Developing confidence interval for the population mean using t-distribution
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 List the features of t-distribution


 When and how is it used to develop confidence interval for a population
mean?

Reading Text:

So far, it was indicated that use of the normal distribution in estimating a population
mean is warranted for any large sample (n ≥ 30), and for a small sample (n<30) only if
the population is normally distributed and population standard deviation, σ is known.
However, when the sample is small (n<30) and the population is normal or
approximately normal, but σ is not known, we cannot use the normal distribution for
determining confidence intervals for the unknown population mean, but we can use the
t-distribution.
Note when σ is approximated by the sample standard deviation (S), the standard error
S
(Sx , i.e. ) will be somewhat different from sample to sample, due to the variability of
n

S. As a result, when S is used in the Z conversion formula for small samples, it results in
converted values that are not distributed as Z values. Instead, the values are distributed
according to the t distribution. This distribution was developed by William S. Gossett in
1908. Gosset worked in an Irish Guinness Brewery and published a paper about the t-
distribution using the pen name ‘Student’. In fact, the t distribution has many similar
characteristics to the z distribution.

Characteristics of t-distribution:

 It is continuous distribution;
 It is bell shaped and symmetrical.
 There is no one t distribution, but rather a family of t-distributions. All have the
same mean of 0, but standard deviation varies according to sample size; thus,
different t distribution exists for different sample size (refer to Fig. 30.1).
 It is more spread out (flatter) and wider than the z, thus:
o Standard deviation of t is greater than z, and thus it has a standard
deviation greater than one. The variance for t-distribution=df/df-2; note:
df stands for degree of freedom
o The value of t for a given level of confidence is larger in magnitude than
the corresponding z value.
 It approaches z distribution as sample size; n increases (refer to Fig. 30.1). In other
words the t-distribution is approximately normal for n ≥ 30
 The t-distribution is defined by the degrees of freedom (df) which is equal to n -1,
that is its only parameter. The degree of freedom is the number of items in a
sample that are free to vary. To illustrate the meaning of degrees of freedom:
Assume that the mean of four numbers is known to be 5. The four numbers are 7,
4, 1, and 8. The deviations of these numbers from the mean must total 0. The
deviations of +2, −1, −4, and +3 do total 0. If the deviations of +2, −1, and −4 are
known, then the value of +3 is fixed (restricted) in order to satisfy the condition that
the sum of the deviations must equal 0. Thus, 1 degree of freedom is lost in a
sampling problem involving the standard deviation of the sample because one
number (the arithmetic mean) is known.

Figure 30.1 t-distribution with Different Degrees of Freedom

Computing t values:
(x−𝜇)
t= 𝑆/ n
where x is the sample mean of n measurements, µ population mean; S is the

sample standard deviation.

When the population standard deviation (σ) is not known, for n<30, σ can't be
approximated by S, and we can't use z distribution. In such cases, the t-distribution is
used to construct a confidence interval for estimating the population mean, µ using the
following formula:
𝑆
The 1-σ confidence interval for µ= X ± t df Sx = X ± t df n

Where Sx =Standard error of the sampling distribution of X s; X = sample mean; n=


sample size; S=sample standard deviation. The value of t is obtained from the t-
distribution table for n–1 degrees of freedom and the given confidence level.

Example 30

The mean operating life for a random sample of (n=10) light bulbs is X=4,000 hours,
with the sample standard deviation S=200 hours. The operating life of bulbs in general
is assumed to be approximately normally distributed.
Required: estimate the mean operating life for the population of bulbs from which this
sample was taken, using a 95 percent confidence interval.
Solution:
Given: n = 10; X (a point estimate) = 4000; S = 200, and confidence level = 95% or
0.95; 𝛼=0.05; degree of freedom =n-1=10-1=9; area in each tail =0.5 - (0.95/2) =
0.5 - 0.4750 = 0.025

From the t distribution table, the value of t for df = 24 and 0.025 area in the right tail
or area of 0.05 at the two tails is 2.262.
𝑆 200
The 95% confidence interval for µ = X ± t df = 4000 ± 2.262 =
n 10

4000 ±2.262*63.25=4000±143.07=3,856.93≤µ≤4143.07
Thus, we can state with 95% confidence that the mean operating life for all bulbs lies
approximately between 3857 hours and 4143 hours.
Sometimes you might be provided with the raw data for the sample (10 bulbs). Under
this condition, you are first required to calculate the sample mean and the sample

standard deviation using x =


x i
and s =
 (x
i  x )2
respectively.
n n 1

Exercise 30

1. The Dr. wanted to estimate the mean cholesterol level for all adult males. He
took a sample of 25 adult males and found that the mean cholesterol level for
this sample is 186 with a standard deviation of 12. Assume that the cholesterol
levels for all adult males are (approximately) normally distributed. Construct a
95% confidence interval for the population mean µ.
2. The high cost of health care is a matter of major concern for a large number of
families. A random sample of 25 families selected from an area showed that they
spend an average of Birr 30 per month on health care with a standard deviation
of Birr 10. Make a 98% confidence interval for the mean health care expenditure
per month incurred by all families in this area. Assume that the monthly health
care expenditures of all families in this area have a normal distribution.

Synopsis

 To determine the confidence limits when the population is normal or approximately


normal, the standard deviation is unknown, and n<30; we use the t distribution. The
𝑆
formula is µ = X ± t df n

 The major characteristics of the t distribution are:


o It is a continuous distribution.
o It is mound-shaped and symmetrical.
o It is flatter, or more spread out, than the standard normal distribution.
o There is a family of t distributions, depending on the number of degrees of
freedom

Table 30.1 Interval Estimation of the Population mean

Wrap up Discussion Questions:

 Briefly explain the similarities and the difference between the standard
normal distribution and the t distribution.
 What are the parameters of a normal distribution and a t-distribution?
 Briefly explain the meaning of the degrees of freedom for a t distribution.
 What assumptions must hold true to use the t distribution to make a
confidence interval for

Next Session’s Assignment:


 Attempt Exercise 30, #1 & 2
 Read about developing confidence interval for finite populations

Topic 31: Statistical Estimation: Confidence Interval for Finite


Populations

Topic Learning Objectives:

By the end of this session students are expected to:

 Adjust a confidence interval for finite populations

Topic outline

1. Constructing confidence interval for finite populations


2. Synopsis
3. Wrap up discussion questions
4. Next session’s assignment

Reading Assignment Discussion:

 How is a confidence interval for finite populations adjusted?

Reading Text:
The populations sampled so far have been very large or infinite. What if the sampled
population is not very large? Some adjustments in the way the standard error of the
sample means and the standard error of the sample proportions are computed are
required. Thus, for a finite population, where the total number of objects or individuals
is N and the number of objects or individuals in the sample is n, we need to adjust the
standard errors in the confidence interval formulas for the population mean and
proportion. This adjustment is called the finite-population correction factor (FPC).
Particularly, it is needed when the sampling is done without replacement from a small
population; and when the sample constitutes more than 5% of the population
(n/N>0.05)
𝑁−𝑛
FPC= ; Multiplying this correction factor by the standard error reduces the standard
N−1

error. Logically, if the sample is a substantial percentage of the population, the estimate
of the population parameter is more precise. As N becomes larger relative to n, then
n/N becomes small and so FPC approaches unit. If n/N ≤ 0.05 or in other words, if the
sample size is not more than 5 % of the population size, then the FPC may be omitted.

Accordingly, to develop a confidence interval for the mean from a finite population and
unknown population standard deviation the formula is as follows:

𝑆 𝑁−𝑛
X ± t df ∗( )
n N−1

In case of proportion for similar cases, the confidence interval is:

p(1 − p) 𝑁−𝑛
p± z* 𝑛 ∗( )
N−1

Example 31

1. Suppose, 250 families reside around Unity University; and a random sample of 40 of
these families revealed their mean annual community contribution was $450 and the
standard deviation of this was $75.
a. What is the population mean? What is the best estimate of the population mean
annual contribution?
b. Develop a 90% confidence interval for the population mean. What are the
endpoints of the confidence interval?
c. Using the confidence interval, explain why the population mean could be $445.
Could the population mean be $425? Why?
Solution:
a. We do not know the population mean. This is the value we wish to estimate. The
best estimate we have of the population mean is the sample mean, which is
$450.
b. Given: X=$450, s=$75, N=250, n=40, df=40-1=39, 1- 𝛼=0.90, t39=1.685
𝑆 𝑁−𝑛 75 250−40
X ± t df ∗( )=450 ± 1.685 ∗ ∗( )=$450±$19.98* 0.8434
n N−1 40 250−1

=$450±$18.35; thus, the confidence interval is between: $431.65 and $468.35 .

c. The former can be a possibility, as it is in the confidence interval; but the latter is
not likely, it is not within the range.
2. The same study on community contributions, in the above case, revealed that 15 of
the 40 families sampled participate in community wide green initiatives regularly.
Construct the 95% confidence interval for the proportion of families participating in
community wide green initiatives regularly.
Given: N=250, n=40, p=15/40=0.375

p(1 − p) 𝑁−𝑛 0.375(1 − 0.375) 250−40


p± z* 𝑛 ∗( ) =0.375 ±1.96* 40 ∗ ( 250−1
)
N−1

=0.375 ± 1.96(0.0765)*(0.9184) = 0.375 ± 0.138= 0.237≤P≤0.513


Exercise 31

1. The attendance at a research conference yesterday was 400. A random sample of


50 of those in attendance revealed that the mean number of soft drinks consumed
per person was 1.86, with a standard deviation of 0.50. Develop a 99% confidence
interval for the mean number of soft drinks consumed per person.
2. There are 300 welders employed at METEC. A sample of 30 welders revealed that
18 graduated from a registered welding course. Construct the 95% confidence
interval for the proportion of all welders who graduated from a registered welding
course.

Synopsis:

Adjust (multiply) the standard errors in the confidence interval formulas for the
population mean and proportion by the finite-population correction factor (FPC);
 For sampling done without replacement from a small population;
 when the sample constitutes more than 5% of the population (n/N>0.05)
𝑁−𝑛
FPC= ;
N−1

Wrap up Discussion Questions:

 What are the conditions to use the finite-population correction factor while
developing confidence intervals for µ and P?
 How is the FPC used?

Next Session’s Assignment:

 Attempt Exercise 31, #1 and 2


 Read about sample size determination to estimate µ & P

Topic 32: Statistical Estimation: Sample Size Determination


Topic Learning Objectives:

By the end of this session students are expected to:

 Calculate the required sample size to estimate a population proportion or


population mean.
Topic outline

1. Determining sample size to estimate population mean or population proportion


2. Sample size determination for finite population
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 Explain sample size determination for estimating µ & P

Reading Text:

The reason that the resources at researchers’ disposal are limited will compel us not
take census or large sample, as long as small sample sizes can satisfactorily help us
achieve the research objective/result. Too large data wastes resource, too small data
may not be representative, making the resulting conclusion uncertain.

The correct sample size depends on three factors:


1. Level of confidence desired
2. The margin of error the researcher will tolerate (maximum allowable error)
3. The variation of the population being studied

Degree of Confidence is usually 95 or 99%, it could be any level. It is directly related to


sample size (n). Note: larger sample sizes (and more time and money to collect the
sample) correspond with higher levels of confidence. Maximum allowable error
(sampling error) is tolerable error at specified level of confidence; it is the difference
between an estimator and parameter. It is one half the width of the corresponding
confidence interval. A small allowable error will require large sample and vice versa. If
standard deviation of population is large, a large sample is required; if population is
homogenous, small sample is enough. To find the estimate of the population standard
deviation: use comparable study; range based approximation i.e. 1/6th of range, or
conduct pilot study. The standard error of statistic shrinks as the sample size increases.

σ PQ
From previous sections we understand that standard error σx = , and σ𝑝 = n of
n

sampling distribution of sample statistic x and 𝑝 are inversely related to sample size, n.
An equation for determining sample size can be derived from margin of error (E)
formula, by solving for n.

Sample size determination for estimating a population mean, µ:

σ 𝑍𝛼2/2 ∗σ 2
E= Z𝛼/2 ∗ ; accordingly, n=
n E2

N.B.: n=sample size;


Z𝛼/2 = standard normal value for corresponding confidence level
E= margin of error (the maximum allowable error)

Sample size determination for estimating the population proportion:

2
PQ 𝑍𝛼/2 ∗𝑃∗𝑄
E= Z𝛼/2 ∗ n , accordingly, n= 𝐸

N.B.: P =population proportions (if it cannot be approximated from comparable study


or from small pilot survey, then a value of 0.50 can be used for P), Q=1-P

Sample Size Determination for Finite Population

When samples are drawn without replacement from a finite population of size N, the
use of finite population correction factor reduces the standard error by a value equal to
(N − n) /(N −1). Accordingly, sample size determination formula for estimating the
population mean and proportion are multiplied by the finite population correction factor.
The revised sample size, taking into consideration the size of the population, is given by
𝑛 0 ∗N
n= ;
𝑛 0 +(N−1)

n=Revised sample size


𝑛0 = the sample size without adjustment (without using the finite correction
factor)

N= Population size

Example 32

1. A marketing research firm wants to conduct a survey to estimate the average


amount spent on entertainment by each person visiting a popular resort. The
people who plan the survey would like to determine the average amount spent
by all people visiting the resort to within $120, with 95% confidence. From past
operation of the resort, an estimate of the population standard deviation is $400.
What is the minimum required sample size?
2. The manufacturers of a sports car want to estimate the proportion of people in a
given income bracket who are interested in the model. The company wants to
estimate the population proportion, P, to within 0.01 with 99% confidence.
Current company records indicate that the proportion P may be around 0.25.
What is the minimum required sample size for this survey?
3. For a population of 1000, what should be the sampling size necessary to
estimate the population mean at 95 per cent confidence with a sampling error of
5 and the standard deviation equal to 20?
Solution:
𝑍𝛼2/2 ∗σ 2 (1.96) 2 (400 ) 2
1. n= ; n= =42.684  43
E2 120 2

𝑍2𝛼/2∗𝑃∗𝑄 2.576 2 (0.25)(0.75)


2. n= ; n= = 124.42  125
E2 0.10 2
𝑛 0 ∗N (61.456 ) * (1000 ) (61456 )
3. n= ; n= = =57.952
𝑛 0 +(N−1) 61.456  (1000  1) 1060 .456

Exercise 32
1. A student in public administration wants to estimate the mean monthly earnings
of city council members in large cities. She can tolerate a margin of error of $100
in estimating the mean. She would also prefer to report the interval estimate
with a 95% level of confidence. The student found a report by the Department
of Labor that reported a standard deviation of $1,000. What is the required
sample size?
2. A university’s office of research wants to estimate the arithmetic mean grade
point average (GPA) of all graduating seniors during the past 10 years. GPAs
range between 2.0 and 4.0. The estimate of the population mean GPA should be
within plus or minus 0.05 of the population mean. Based on prior experience, the
population standard deviation is 0.279. Using a 99% level of confidence, how
many student records need to be selected?
3. Suppose the U.S. president wants to estimate the proportion of the population
that supports his current policy toward revisions in the health care system. The
president wants the estimate to be within .04 of the true proportion. Assume a
95% level of confidence. The president’s political advisors found a similar survey
from two years ago that reported that 60% of people supported health care
revisions.
a. How large of a sample is required?
b. How large of a sample would be necessary if no estimate were available
for the proportion supporting current policy?
4. For a population of 500, what should be the sampling size necessary to estimate
the population mean at 95 per cent confidence with a sampling error of 5 and
the standard deviation equal to 10?

Synopsis:

To estimate the sample size for the purpose of estimating µ & P:


 Level of confidence desired, the margin of error, and the variation of the
population being studied is required
𝑍𝛼2/2 ∗σ 2
 n= ; (for µ)
E2

𝑍2𝛼/2∗𝑃∗𝑄
 n= ; (for P)
E2
𝑛 0 ∗N
 If population is finite, n=
𝑛 0 +(N−1)

Wrap up Discussion Questions:

 What are the three determinant of sample size?


 How is sample size related to the margin of error?
 How is sample size calculated for finite population cases?

Next Session’s Assignment:

 Attempt Exercise 32, #1-4


 Read about the theory of hypothesis testing

You might also like