5 Estimation and Hypothesis Testing
5 Estimation and Hypothesis Testing
Estimation of parameters
The statistical technique of estimating unknown population parameters
based on a value of the corresponding sample statistic.
Estimate
The value(s) assigned to a population parameter based on the value
of a sample statistic.
Estimator
The sample statistic that is used to estimate a population parameter.
Chapter 5 – Page 1
CONFIDENCE INTERVAL FOR THE POPULATION MEAN
𝜎
❑ The maximum error of estimate for 𝜇 is 𝑍𝛼 .
2 √𝑛
Example:
To determine the mean waiting time for his customers, a bank manager
took a random sample of 50 customers and found that the mean waiting
time was 7.2 minutes. Assuming that the population standard deviation is
known to be 5 minutes, find the 90% confidence interval of the mean
waiting time for all of the bank’s customers.
Solution:
Let ≡ Population mean waiting time of customers in minutes
n = 50, x = 7.2 , = 5 , = 0.10 ,
𝛼
= 0.05, 𝑍𝛼 = 𝑍0.05 = 1.6449
2 2
The 90% confidence interval for is
Example:
In a random sample of 70 students in a large university, a dean found that
the mean weekly time spent doing homework was 14.3 hours. If we
assume that homework time is normally distributed with a standard
deviation of 4.0 hours, find the 99% confidence interval estimate of the
weekly time spent doing homework for all the university’s students.
Solution:
Chapter 5 – Page 2
❑ The 100(1- )% confidence interval for the population mean, when
the population variance 2 is unknown and the sample size n is large
( n 30 ) is given by
𝑆 𝑆 𝑆
𝑥̄ ± 𝑍𝛼 or 𝑥̄ − 𝑍𝛼 < 𝜇 < 𝑥̄ + 𝑍𝛼
2 √𝑛 2 √𝑛 2 √𝑛
Example:
Measurements of the diameters of a random sample of 200 ball bearings
made by a certain machine during 1 week showed a mean of 8.24 mm
and a standard deviation of 0.42 mm. Find the 95% confident interval for
the mean diameter of all the ball bearings.
Solution:
Let ≡ Population mean diameter of ball bearings in mm
𝛼
n = 200, x = 8.24 , s = 0.42, = 0.05 , = 0.025, 𝑍0.025 = 1.96
2
The 95% confident interval for is
Example:
A random sample of 35 drums of a wax-base floor cleaner, has a standard
deviation of 12 pounds and a mean weight of 240 pounds. Construct a
95% confidence interval for the actual mean weight of all these drums.
Solution:
Chapter 5 – Page 3
CONFIDENCE INTERVAL FOR THE POPULATION PROPORTION
𝑝̂(1−𝑝̂)
❑ The maximum error of estimate for 𝑝 is 𝑍𝛼 √
2 𝑛
Example:
A manufacturer wants to assess the proportion of defective items in a
large batch produced by a particular machine. He tests a random sample
of 300 items and finds that 45 are defective. Calculate a 98% confidence
interval for the proportion of defective items in the complete batch.
Solution:
Example:
In an opinion poll conducted with a sampled of 1000 people chosen at
random, 30% said that they support a certain political party. Find a 95%
confidence interval for the actual proportion of the population who
supports this party.
Solution:
Chapter 5 – Page 4
HYPOTHESIS TESTING
Statistical Hypothesis
A statement, assumption or belief about parameter(s) of one or
more populations.
Experimental /sample evidence is required to verify the statement
Null Hypothesis ( H 0 )
A claim (or statement) about a population parameter that is
assumed to be true until it is declared false
Is the hypothesis that we hope to reject
Specifies the value of the population parameter to be tested
Alternative Hypothesis ( H 1 )
A claim (or statement) about a population parameter that will be true
if the null hypothesis is false
The rejection of H 0 means to accept the H 1
There are only four possible results when we test a given hypothesis.
1. We accept a true hypothesis
a correct decision
2. We reject a false hypothesis
a correct decision
3. We reject a true hypothesis
an incorrect decision
known as Type I error (denoted by α --- “alpha”)
4. We accept a false hypothesis
an incorrect decision
known as Type II error (denoted by β --- “beta”)
Significance Levels ( )
The maximum probability of making Type I error in hypothesis testing
Usually specified before a hypothesis test is made
The value of 5% ( = 0.05 ) or 1% ( = 0.01 ) is frequently used
Chapter 5 – Page 5
e.g. If we select 5% significance level, we will expect that the probability
of making an error of rejecting the hypothesis when it is true is 5%. In other
words, we are about 95% confidence that we will make a correct decision
although we could be wrong with a probability of 5%.
Critical value(s)
Value(s) that separates rejection region from the acceptance region
Examples: 𝑍𝛼 , 𝑍𝛼 etc
2
The critical region may be represented by a portion of the area under the
normal curve in two ways:
1. Two tails under the curve
2. One tail under the curve which is either the right tail or left tail
Two-tailed test
The test of hypothesis which are based on critical regions
represented by both tails under the normal curve, T-curve, etc
Acceptance
2 Region 2
Chapter 5 – Page 6
One-tailed test
The test of hypothesis which are based on a critical region
represented by only one tail under the normal curve.
Right-tailed test
Rejection Region
Acceptance
Region
Critical value
Left-tailed test
Rejection Region
Acceptance
Region
Critical value
Chapter 5 – Page 7
2. The null hypothesis always has a statement of equality in it. Hence, the
statement of a hypothesis can be of three types:
(i) H 0 : = 123
(ii) H 0 : 123
(iii) H 0 : 123
EXAMPLE:
For each of the statement given below, identify 𝐻0 and 𝐻1 .
(a) The mean height of females in a country is 156cm.
(c) The mean life of a car battery is not more than 40 months.
Chapter 5 – Page 8
Hypothesis Testing about a Population Mean: Large Sample
• Test statistic
(i) X − 0
Z=
if is known
n
(ii) X − 0
Z=
S
if is unknown and n 30
n
EXAMPLE:
A company markets car tires. Their lives are normally distributed with a
mean of 40,000 km and standard deviation of 3,000 km. A change in the
production process is believed to result in a better product. A test sample
of 64 new tyres has a mean life of 41,200 km. Can you conclude that the
new product is significantly better than the current one? (𝛼 = 0.05)
Solution:
Given 𝜇0 = 40,000; 𝜎 = 3,000; 𝑛 = 64; 𝑋 = 41,200
𝐻0 : 𝜇 ≤ 40,000 (Mean life of the new tires and old tires are the same)
𝐻1 : 𝜇 > 40,000 (Mean life of the new tires is better than the old tires)
Chapter 5 – Page 9
At 𝛼 = 0.05, critical value = 𝑍𝛼 = 𝑍0.05 = 1.6449
rejection region: 𝑍 > 1.6449
𝑋̅ − 𝜇0 41200 − 40000
𝑍= 𝜎 = = 3.2
3000
√𝑛 √64
EXAMPLE:
The expected lifetime of electric light bulbs produced by a given process
was 1500 hours. To test a new batch, a sample of 40 was taken which
showed a mean lifetime of 1410 hours and a standard deviation of 90
hours. Test the hypothesis that the mean lifetime of the electric light bulbs
has not changed, using a level of significance of 0.05.
Solution:
Given 𝜇0 = 1500; 𝑛 = 40; 𝑋 = 1410; 𝑠 = 90
𝐻0 :
𝐻1 :
𝑋̅ − 𝜇0
𝑍= 𝑠 =
√𝑛
Chapter 5 – Page 10
EXAMPLE:
It is thought that a certain Normal population has a mean of 1.6. A sample
of 50 gives a mean of 1.55 and a standard deviation of 0.3. Does this
provide evidence, at the 5% level, that the population mean is less than
1.6?
Solution:
Given 𝜇0 = 1.6; 𝑛 = 50; 𝑋 = 1.55; 𝑠 = 0.3
𝐻0 :
𝐻1 :
𝑋̅ − 𝜇0
𝑍= 𝑠 =
√𝑛
Chapter 5 – Page 11
Hypothesis Testing on a Population Proportion: Large Sample
Test statistic
𝑝̂ − 𝑝0
𝑍=
√𝑝0 (1 − 𝑝0 )
𝑛
Note:
1. p0 the population proportion (predetermined constant)
2. p̂ the sample proportion
3. Population proportion P , is used instead of sample proportion
because the population proportion is known
EXAMPLE:
ABC Mailing Company sells computers and computer parts by mail. The
company claims that at least 90% of all orders are mailed within 72 hours
after they are received. The quality control department at the company
often takes samples to check if this claim is valid. A recently taken sample
of 150 orders showed that 129 of them were mailed within 72 hours. Do
you think the company’s claim is true? Use a 2.5% significance level.
Solution:
𝑥 129
Given 𝑛 = 150; 𝑥 = 129; 𝑝0 = 0.9 𝑝̂ = = = 0.86
𝑛 150
Let 𝑝 ≡ true proportion of orders mail are mailed received within 72 hours.
Chapter 5 – Page 12
𝐻0 : 𝑝 ≥ 0.9
𝐻1 : 𝑝 < 0.9
𝑝̂ − 𝑝0 0.86 − 0.9
𝑍= = = −1.6330
√𝑝0 (1 − 𝑝0 ) √0.9(1 − 0.9)
𝑛 150
EXAMPLE:
In an investigation into ownership of calculators, 200 randomly chosen
school students were interviewed, 163 of them owned a calculator. Using
the evidence of this sample, test at the 5% level of significance, the
hypothesis that the proportion of school students owning a calculator is
more than 80%.
Solution:
𝑥
Given 𝑛 = 200; 𝑥 = 163; 𝑝0 = 0.8 𝑝̂ = =
𝑛
Let 𝑝 ≡ true proportion of students owning a calculator.
𝐻0 :
𝐻1 :
𝑝̂ − 𝑝0
𝑍= =
√𝑝0 (1 − 𝑝0 )
𝑛
Chapter 5 – Page 13
EXAMPLE:
An election candidate claims that 60 percent of the voters support him. A
random sample of 2500 voters show that 1400 support him. Test his claim
at 0.10 level of significance.
Solution:
𝑥
Given 𝑛 = 2500; 𝑥 = 1400; 𝑝0 = 0.6 𝑝̂ = =
𝑛
Let 𝑝 ≡ true proportion of voters that support him.
𝐻0 :
𝐻1 :
𝑝̂ − 𝑝0
𝑍= =
√𝑝0 (1 − 𝑝0 )
𝑛
Chapter 5 – Page 14
The p-value Approach
The p-value approach is the hypothesis test process that compares the
probability, called the p-value, with the significance level 𝛼.
Definition:
A p-value is the probability that the test statistic would assume a value as
extreme as, or more extreme than the observed value of the test statistic
(in the direction of the alternative hypothesis) when the null hypothesis is
true.
EXAMPLE:
A standard intelligence examination has been given to the students for
several years and it is assumed that the scores are normally distributed
with an average of 80 and a standard deviation of 7. A group of 25
students obtained a mean grade of 77 in the examination this year. Is this
year’s student inferior in intelligence to the past years’ students? Test at
𝛼 = 0.05. Calculate the p-value. Use the p-value to draw conclusion
regarding the statistical test.
𝐻0 : 𝜇 ≥ 80
𝐻1 : 𝜇 < 80
𝑋̅ − 𝜇0 70 − 80
𝑍= 𝜎 = = −2.14
7
√𝑛 √25
Chapter 5 – Page 15
CHI-SQUARE TEST
Example:
1. In market research, we count the number of people who prefer a
particular brand of detergent powder
2. In quality control, we count the number of defectives produced by a
machine during a certain period
• There are many situations of this type where measurements are made
by counting the numbers or frequency in each category.
• The test is applied to such frequency of occurrences as against the
2
expected ones.
Chapter 5 – Page 16
Test whether a given set of data actually follows an assumed
distribution or not
(2) Test of independence
For more than one row or column in the form of a contingency table
concerning several attributes
Test for dependence between two variables
• The chi-square test can be used in more than one variable and more
than one characteristic
• Often data are collected on several variables at a time. For example, a
questionnaire will usually contain more than one question.
• Another important application of the distribution is in testing for the
2
Contingency Table
r c Contingency Table
B
Class B1 B2 … Bc Total
A1 f11 f12 … f1c f1+
A2 f 21 f 22 … f 2c f2+
A
Ar f r1 fr 2 … f rc fr +
Total f +1 f+2 … f+c n
r rows, c columns
𝑓𝑖+ = ∑𝑐𝑗=1 𝑓𝑖𝑗 for 𝑖 = 1,2, … , 𝑟 𝑓+𝑗 = ∑𝑟𝑖=1 𝑓𝑖𝑗 for 𝑗 = 1,2, … , 𝑐
Chapter 5 – Page 17
If the criteria are independent, then the joint probability for each
combination can be expressed as the product of the separate marginal
probabilities: P( Ai B j ) = P( Ai ) P( B j )
The chi-square procedure will be used to see how well the data fit this
assumption.
Estimate the marginal probabilities from the row and column totals:
f f
P( Ai ) i + P( B j ) + j
n n
Test statistic
r c (Oij − Eij ) 2
=
2
i =1 j =1 Eij
Chapter 5 – Page 18
Example:
An accident inspector makes spot checks on working practices during
visits to industrial sites chosen at random. At one large construction site,
the numbers of accidents occurring per week were counted for a period of
three years, and each week was also classified as to whether or not the
inspector had visited the site during the previous week. The results are
shown as follows.
Number of accidents
0 1 2 3
Visit 33 8 5 4
No visit 67 42 15 6
Solution:
H 0 : Number of accidents independent from inspector’s visits
H 1 : Number of accidents depends on inspector’s visits
Number of accidents
0 1 2 3 Total
Visit 33 (27.7778) 8 (13.8889) 5 (5.5556) 4 (2.7778) 50
No visit 67 (72.2222) 42 (36.1111) 15 (14.4444) 6 (7.2222) 130
Total 100 50 20 10 180
2 4
2
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
𝜒 = ∑∑
𝐸𝑖𝑗
𝑖=1 𝑗=1
(33 − 27.7778)2 (8 − 13.8889)2 (5 − 5.5556)2 (4 − 2.7778)2
= + + +
27.7778 13.8889 5.5556 2.7778
(67 − 72.2222)2 (42 − 36.1111)2 (15 − 14.4444)2 (6 − 7.2222)2
+ + + +
72.2222 36.1111 14.4444 7.2222
= 0.9818 + 2.4969 + 0.0556 + 0.5378 + 0.3776 + 0.9603 + 0.0214 + 0.2068
= 5.6382
Chapter 5 – Page 19
Example:
A sample of hotels in a particular country was selected. The following table
shows the number of hotels in each region of the country and in each of
four grades.
Region
Grade Eastern Central Western
1 star 29 22 29
2 star 67 38 55
3 star 53 32 35
4 star 11 8 21
Solution:
H 0 : There is no association between the region and the grade of hotel
H 1 : There is an association between the region and the grade of hotel
Rejection region:
Region
Grade Eastern Central Western Total
1 star 29 ( ) 22 ( ) 29 ( ) 80
2 star 67 ( ) 38 ( ) 55 ( ) 160
3 star 53 ( ) 32 ( ) 35 ( ) 120
4 star 11 ( ) 8( ) 21 ( ) 40
Total 160 100 140 400
4 3 (Oij − Eij ) 2
=
2
i =1 j =1 Eij
=
Chapter 5 – Page 20
Chapter 5 – Page 21
Formula:
Population
Population mean
Proportion
𝜎
𝑥̅ ± 𝑍𝛼 if 𝜎 is known
Confidence 2 √𝑛 𝑝̂ (1 − 𝑝̂ )
Interval 𝑠 𝑝̂ ± 𝑍𝛼 √
𝑥̅ ± 𝑍𝛼 if 𝜎 is unknown and 𝑛 ≥ 30 2 𝑛
√𝑛
2
𝑥̅ − 𝜇0
𝑍= 𝜎 if 𝜎 is known
𝑝̂ − 𝑝0
Test √𝑛 𝑍=
statistics 𝑥̅ − 𝜇0 √𝑝0 (1 − 𝑝0 )
𝑍= 𝑛
𝑠 if 𝜎 is unknown and 𝑛 ≥ 30
√𝑛
𝑟 𝑐
2
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
Chi-Square Test of Independence 𝜒 = ∑∑
𝐸𝑖𝑗
𝑖=1 𝑗=1
Chapter 5 – Page 22
AAMS1773 QUANTITATIVE STUDIES
Tutorial 5 (Estimation and Hypothesis Testing)
Chapter 5 – Page 23
8. A theory predicts that the probability of an event is 0.4. The theory
is tested experimentally and in 400 independent trials the event
occurred 140 times. Is the number of occurrences significantly less
than that predicted by the theory? Test at the 1% level of
significance.
10. An article in the Washington Post stated that nearly 45% of the U.S.
population is born with brown eyes, although they don’t necessarily
stay that way. To test the newspaper’s claim, a random sample of
80 people was selected, and 32 had brown eyes. Is there sufficient
evidence to dispute the newspaper’s claim regarding the proportion
of brown-eyed people in the United States? Use α = 0.01.
Number of Flights
Internal Regional International
Fully booked 154 171 275
Not fully booked 96 79 225
Chapter 5 – Page 24
13. The machines in a factory were classified according to the observed
degree of defectiveness over the previous year’s operations, as in
the following table:
Number of Machines
% defective Cutting Grinding Millers
machine machine
1% or less 22 74 102
Over 1% and less than 2% 31 102 143
2% and more 7 64 55
Answers:
Chapter 5 – Page 25