0% found this document useful (0 votes)
16 views25 pages

5 Estimation and Hypothesis Testing

Chapter 5 discusses estimation and hypothesis testing in statistics, focusing on estimating population parameters from sample statistics and the procedures for hypothesis testing. It covers point and interval estimates, confidence intervals, and the concepts of null and alternative hypotheses, including Type I and Type II errors. The chapter outlines the steps for hypothesis testing and provides examples for both confidence intervals and hypothesis tests.

Uploaded by

KIN WEI NG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views25 pages

5 Estimation and Hypothesis Testing

Chapter 5 discusses estimation and hypothesis testing in statistics, focusing on estimating population parameters from sample statistics and the procedures for hypothesis testing. It covers point and interval estimates, confidence intervals, and the concepts of null and alternative hypotheses, including Type I and Type II errors. The chapter outlines the steps for hypothesis testing and provides examples for both confidence intervals and hypothesis tests.

Uploaded by

KIN WEI NG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

AAMS1773 QUANTITATIVE STUDIES

CHAPTER 5: ESTIMATION AND HYPOTHESIS TESTING

Estimation of parameters
 The statistical technique of estimating unknown population parameters
based on a value of the corresponding sample statistic.

The estimation procedure involves the following steps:


1. Select a sample
2. Collect the required information from the members of the sample
3. Calculate the value of the sample statistic
4. Assign value(s) to the corresponding population parameter

Estimate
 The value(s) assigned to a population parameter based on the value
of a sample statistic.

Estimator
 The sample statistic that is used to estimate a population parameter.

Two types of estimates


1. Point estimate
 The value (single number) of a sample statistic that is used to
estimate a population parameter.
 Example: ˆ = x = 77 , ˆ 2 = s 2 = 6
2. Interval estimate / Confidence Intervals
 An estimate of a population parameter given by two numbers
between which the parameter may be considered to lie on.
 An interval that is constructed with a given confidence level.
 Example: 66    88

The confidence level associated with a confidence interval states how


much confidence we have, that this interval contains the true population
parameter. The confidence level is denoted by (1 −  )100% .

Consider a population with unknown parameter θ. If we can find an interval


(a, b) such that P(a < θ < b) = 0.95, we say that (a, b) is a 95% confidence
interval for θ. In this case, 0.95 is the probability that the interval includes
θ.

Chapter 5 – Page 1
CONFIDENCE INTERVAL FOR THE POPULATION MEAN

❑ The 100(1-  )% confidence interval for the population mean,  when


the population variance  2 is known is given by
𝜎 𝜎 𝜎
𝑥̄ ± 𝑍𝛼 or 𝑥̄ − 𝑍𝛼 < 𝜇 < 𝑥̄ + 𝑍𝛼
2 √𝑛 2 √𝑛 2 √𝑛

𝜎
❑ The maximum error of estimate for 𝜇 is 𝑍𝛼 .
2 √𝑛

Example:
To determine the mean waiting time for his customers, a bank manager
took a random sample of 50 customers and found that the mean waiting
time was 7.2 minutes. Assuming that the population standard deviation is
known to be 5 minutes, find the 90% confidence interval of the mean
waiting time for all of the bank’s customers.

Solution:
Let  ≡ Population mean waiting time of customers in minutes
n = 50, x = 7.2 ,  = 5 ,  = 0.10 ,
𝛼
= 0.05, 𝑍𝛼 = 𝑍0.05 = 1.6449
2 2
The 90% confidence interval for  is

Example:
In a random sample of 70 students in a large university, a dean found that
the mean weekly time spent doing homework was 14.3 hours. If we
assume that homework time is normally distributed with a standard
deviation of 4.0 hours, find the 99% confidence interval estimate of the
weekly time spent doing homework for all the university’s students.

Solution:

Chapter 5 – Page 2
❑ The 100(1-  )% confidence interval for the population mean,  when
the population variance  2 is unknown and the sample size n is large
( n  30 ) is given by
𝑆 𝑆 𝑆
𝑥̄ ± 𝑍𝛼 or 𝑥̄ − 𝑍𝛼 < 𝜇 < 𝑥̄ + 𝑍𝛼
2 √𝑛 2 √𝑛 2 √𝑛

where S is the sample standard deviation


𝑆
❑ The maximum error of estimate for 𝜇 is 𝑍𝛼
2 √𝑛

Example:
Measurements of the diameters of a random sample of 200 ball bearings
made by a certain machine during 1 week showed a mean of 8.24 mm
and a standard deviation of 0.42 mm. Find the 95% confident interval for
the mean diameter of all the ball bearings.

Solution:
Let  ≡ Population mean diameter of ball bearings in mm
𝛼
n = 200, x = 8.24 , s = 0.42,  = 0.05 , = 0.025, 𝑍0.025 = 1.96
2
The 95% confident interval for  is

Example:
A random sample of 35 drums of a wax-base floor cleaner, has a standard
deviation of 12 pounds and a mean weight of 240 pounds. Construct a
95% confidence interval for the actual mean weight of all these drums.

Solution:

Chapter 5 – Page 3
CONFIDENCE INTERVAL FOR THE POPULATION PROPORTION

Let X  the number of ‘successes’ in n trials


p  the probability of success at each trial
 X ~ Bin(n, p)
x
Sample proportion, pˆ =
n
 X ~ N (np, np(1 − p)) approximately for large n
 p (1 − p ) 
 Pˆ ~ N  p,  approximately for large n
 n 

❑ The 100(1 −  )% confidence interval for the population proportion, p


with large sample size ( n  30 ) is given by

𝑝̂(1−𝑝̂) 𝑝̂(1−𝑝̂) 𝑝̂(1−𝑝̂)


𝑝̂ ± 𝑍𝛼 √ or 𝑝̂ − 𝑍𝛼 √ < 𝑝 < 𝑝̂ + 𝑍𝛼 √
2 𝑛 2 𝑛 2 𝑛

𝑝̂(1−𝑝̂)
❑ The maximum error of estimate for 𝑝 is 𝑍𝛼 √
2 𝑛

Example:
A manufacturer wants to assess the proportion of defective items in a
large batch produced by a particular machine. He tests a random sample
of 300 items and finds that 45 are defective. Calculate a 98% confidence
interval for the proportion of defective items in the complete batch.

Solution:

Example:
In an opinion poll conducted with a sampled of 1000 people chosen at
random, 30% said that they support a certain political party. Find a 95%
confidence interval for the actual proportion of the population who
supports this party.

Solution:

Chapter 5 – Page 4
HYPOTHESIS TESTING

In hypothesis testing, we test a certain given theory or belief about a


population parameter. Using some sample information, we may want to
know whether a given claim (or statement) about a population parameter
is true or not. Then, we can either accept or reject the given theory or
belief. This chapter discusses how to make such tests of hypothesis about
the population mean,  and the population proportion, p .

Statistical Hypothesis
 A statement, assumption or belief about parameter(s) of one or
more populations.
 Experimental /sample evidence is required to verify the statement

Null Hypothesis ( H 0 )
 A claim (or statement) about a population parameter that is
assumed to be true until it is declared false
 Is the hypothesis that we hope to reject
 Specifies the value of the population parameter to be tested

Alternative Hypothesis ( H 1 )
 A claim (or statement) about a population parameter that will be true
if the null hypothesis is false
 The rejection of H 0 means to accept the H 1

There are only four possible results when we test a given hypothesis.
1. We accept a true hypothesis
 a correct decision
2. We reject a false hypothesis
 a correct decision
3. We reject a true hypothesis
 an incorrect decision
 known as Type I error (denoted by α --- “alpha”)
4. We accept a false hypothesis
 an incorrect decision
 known as Type II error (denoted by β --- “beta”)

Significance Levels (  )
 The maximum probability of making Type I error in hypothesis testing
 Usually specified before a hypothesis test is made
 The value of 5% (  = 0.05 ) or 1% (  = 0.01 ) is frequently used

Chapter 5 – Page 5
e.g. If we select 5% significance level, we will expect that the probability
of making an error of rejecting the hypothesis when it is true is 5%. In other
words, we are about 95% confidence that we will make a correct decision
although we could be wrong with a probability of 5%.

A test statistic is a number calculated from the sample data that


determines the acceptance or rejection of H 0 .

Critical Region / Rejection Region


 The region which corresponds to a predetermined level of
significance
 If the test statistic falls in the acceptance region, H0 is accepted.
Otherwise, H0 is rejected.

Critical value(s)
 Value(s) that separates rejection region from the acceptance region
 Examples: 𝑍𝛼 , 𝑍𝛼 etc
2

The critical region may be represented by a portion of the area under the
normal curve in two ways:
1. Two tails under the curve
2. One tail under the curve which is either the right tail or left tail

Two-tailed test
 The test of hypothesis which are based on critical regions
represented by both tails under the normal curve, T-curve, etc

Rejection Region Rejection Region

 Acceptance 
2 Region 2

Critical value Critical value

Chapter 5 – Page 6
One-tailed test
 The test of hypothesis which are based on a critical region
represented by only one tail under the normal curve.

 Right-tailed test

Rejection Region

Acceptance
Region

Critical value

 Left-tailed test

Rejection Region

 Acceptance
Region

Critical value

Sign of H 1 Type of test


> Right-tailed test
 Two-tailed test
< Left-tailed test

The key problem in a hypothesis test is to decide when to use a one-sided


test and when to use a two-sided test. In deciding which test to use, first
we have to know two key characteristics of a hypothesis test.

1. In conducting a formal statistical hypothesis test, we always test the null


hypothesis, whether it corresponds to the original claim or not.
Sometime the null hypothesis corresponds to the original claim and
sometimes it corresponds to the opposite of the original claim. Since
we always test the null hypothesis, we will be testing the original claim
in some cases and the opposite of the original claim in other cases.

Chapter 5 – Page 7
2. The null hypothesis always has a statement of equality in it. Hence, the
statement of a hypothesis can be of three types:
(i) H 0 :  = 123
(ii) H 0 :   123
(iii) H 0 :   123

The corresponding alternatives hypothesis would be:


(i) H1 :   123 (Two-sided test)
(ii) H 1 :   123 (One-sided test to the left)
(iii) H 1 :   123 (One–sided test to the right)

EXAMPLE:
For each of the statement given below, identify 𝐻0 and 𝐻1 .
(a) The mean height of females in a country is 156cm.

(b) The mean annual household income is at least $12,000.

(c) The mean life of a car battery is not more than 40 months.

(d) The mean life of a car battery is above 40 months.

(e) A television executive claims that the majority of teenagers are in


favor of sport shows on television.

(f) A maximum of 3% of mailing handled by mail order companies will be


returned as “address unknown” or “not known at this address”.

Steps to perform a hypothesis testing


1. Identify the specific claim or hypothesis to be tested. State the null
and alternative hypothesis.
2. Determine the significance level 𝛼 and the critical value.
3. Select the distribution (test statistic) to use.
4. Determine the rejection and non-rejection regions. Set up a decision
rule based on the critical value. Draw a distribution curve if
necessary.
5. Calculate the value of the test statistic
6. Make a decision (reject 𝐻0 or fail to reject 𝐻0 ).

Chapter 5 – Page 8
Hypothesis Testing about a Population Mean: Large Sample

• The null and alternative hypothesis


H0 H1 Type of test
(i)  = 0   0 Two-tailed test
(ii)  = 0 or   0   0 Left-tailed test
(iii)  = 0 or   0   0 Right-tailed test

• Test statistic
(i) X − 0
Z=

if  is known
n
(ii) X − 0
Z=
S
if  is unknown and n  30
n

(i) Critical value and rejection region


𝐻0 𝐻1 Critical Value Critical Region
(i) 𝜇 = 𝜇0 𝜇 ≠ 𝜇0 ±𝑍𝛼 𝑍 < −𝑍𝛼 or 𝑍 > 𝑍𝛼
2 2 2

(ii) 𝜇 ≥ 𝜇0 𝜇 < 𝜇0 −𝑍𝛼 𝑍 < −𝑍𝛼


(iii) 𝜇 ≤ 𝜇0 𝜇 > 𝜇0 𝑍𝛼 𝑍 > 𝑍𝛼

EXAMPLE:
A company markets car tires. Their lives are normally distributed with a
mean of 40,000 km and standard deviation of 3,000 km. A change in the
production process is believed to result in a better product. A test sample
of 64 new tyres has a mean life of 41,200 km. Can you conclude that the
new product is significantly better than the current one? (𝛼 = 0.05)

Solution:
Given 𝜇0 = 40,000; 𝜎 = 3,000; 𝑛 = 64; 𝑋 = 41,200

Let   true mean life of the new tires.

𝐻0 : 𝜇 ≤ 40,000 (Mean life of the new tires and old tires are the same)
𝐻1 : 𝜇 > 40,000 (Mean life of the new tires is better than the old tires)

Chapter 5 – Page 9
At 𝛼 = 0.05, critical value = 𝑍𝛼 = 𝑍0.05 = 1.6449
rejection region: 𝑍 > 1.6449

𝑋̅ − 𝜇0 41200 − 40000
𝑍= 𝜎 = = 3.2
3000
√𝑛 √64

Since 𝑍 = 3.2 > 1.6449, we reject 𝑯𝟎 and conclude that at the 5%


significance level, the new tires is better than the old tires.

EXAMPLE:
The expected lifetime of electric light bulbs produced by a given process
was 1500 hours. To test a new batch, a sample of 40 was taken which
showed a mean lifetime of 1410 hours and a standard deviation of 90
hours. Test the hypothesis that the mean lifetime of the electric light bulbs
has not changed, using a level of significance of 0.05.

Solution:
Given 𝜇0 = 1500; 𝑛 = 40; 𝑋 = 1410; 𝑠 = 90

Let   the true mean lifetime of the electric light bulb.

𝐻0 :
𝐻1 :

At 𝛼 = 0.05, critical values =


rejection regions:

𝑋̅ − 𝜇0
𝑍= 𝑠 =
√𝑛

Since 𝑍 , we reject 𝑯𝟎 and conclude that at the 5%


level of significance, there is some evidence to suggest that the mean
lifetime of the electric light bulbs has changed.

Chapter 5 – Page 10
EXAMPLE:
It is thought that a certain Normal population has a mean of 1.6. A sample
of 50 gives a mean of 1.55 and a standard deviation of 0.3. Does this
provide evidence, at the 5% level, that the population mean is less than
1.6?

Solution:
Given 𝜇0 = 1.6; 𝑛 = 50; 𝑋 = 1.55; 𝑠 = 0.3

Let   the true mean.

𝐻0 :
𝐻1 :

At 𝛼 = 0.05, critical values =


rejection regions:

𝑋̅ − 𝜇0
𝑍= 𝑠 =
√𝑛

Since 𝑍 , we do not reject 𝑯𝟎 at 𝛼 = 0.05 and conclude


that

Chapter 5 – Page 11
Hypothesis Testing on a Population Proportion: Large Sample

 The null and alternative hypothesis


𝐻0 𝐻1 Type of test
(i) 𝑝 = 𝑝0 𝑝 ≠ 𝑝0 Two-tailed test
(ii) 𝑝 = 𝑝0 or 𝑝 ≥ 𝑝0 𝑝 < 𝑝0 Left-tailed test
(iii) 𝑝 = 𝑝0 or 𝑝 ≤ 𝑝0 𝑝 > 𝑝0 Right-tailed test

 Test statistic
𝑝̂ − 𝑝0
𝑍=
√𝑝0 (1 − 𝑝0 )
𝑛

 Critical value and rejection region


𝐻0 𝐻1 Critical Value Critical Region
(i) 𝑝 = 𝑝0 𝑝 ≠ 𝑝0 ±𝑍𝛼 𝑍 < −𝑍𝛼 or 𝑍 > 𝑍𝛼
2 2 2

(ii) 𝑝 ≥ 𝑝0 𝑝 < 𝑝0 −𝑍𝛼 𝑍 < −𝑍𝛼


(iii) 𝑝 ≤ 𝑝0 𝑝 > 𝑝0 𝑍𝛼 𝑍 > 𝑍𝛼

Note:
1. p0  the population proportion (predetermined constant)
2. p̂  the sample proportion
3. Population proportion P , is used instead of sample proportion
because the population proportion is known

EXAMPLE:
ABC Mailing Company sells computers and computer parts by mail. The
company claims that at least 90% of all orders are mailed within 72 hours
after they are received. The quality control department at the company
often takes samples to check if this claim is valid. A recently taken sample
of 150 orders showed that 129 of them were mailed within 72 hours. Do
you think the company’s claim is true? Use a 2.5% significance level.

Solution:
𝑥 129
Given 𝑛 = 150; 𝑥 = 129; 𝑝0 = 0.9 𝑝̂ = = = 0.86
𝑛 150
Let 𝑝 ≡ true proportion of orders mail are mailed received within 72 hours.

Chapter 5 – Page 12
𝐻0 : 𝑝 ≥ 0.9
𝐻1 : 𝑝 < 0.9

At 𝛼 = 0.025, critical value = −𝑍𝛼 = 𝑍0.025 = −1.96


rejection region: 𝑍 < −1.96

𝑝̂ − 𝑝0 0.86 − 0.9
𝑍= = = −1.6330
√𝑝0 (1 − 𝑝0 ) √0.9(1 − 0.9)
𝑛 150

Since 𝑍 = −1.6330 > −1.96, we do not reject 𝑯𝟎 at 𝛼 = 0.025 and


conclude that the company’s claim is true.

EXAMPLE:
In an investigation into ownership of calculators, 200 randomly chosen
school students were interviewed, 163 of them owned a calculator. Using
the evidence of this sample, test at the 5% level of significance, the
hypothesis that the proportion of school students owning a calculator is
more than 80%.

Solution:
𝑥
Given 𝑛 = 200; 𝑥 = 163; 𝑝0 = 0.8 𝑝̂ = =
𝑛
Let 𝑝 ≡ true proportion of students owning a calculator.

𝐻0 :
𝐻1 :

At 𝛼 = 0.05, critical value =


rejection region:

𝑝̂ − 𝑝0
𝑍= =
√𝑝0 (1 − 𝑝0 )
𝑛

Since 𝑍 = , we do not reject 𝑯𝟎 and conclude that at the


5% significance level, there is no sufficient evidence to suggest that the
proportion of school students owning a calculator is more than 80%.

Chapter 5 – Page 13
EXAMPLE:
An election candidate claims that 60 percent of the voters support him. A
random sample of 2500 voters show that 1400 support him. Test his claim
at 0.10 level of significance.

Solution:
𝑥
Given 𝑛 = 2500; 𝑥 = 1400; 𝑝0 = 0.6 𝑝̂ = =
𝑛
Let 𝑝 ≡ true proportion of voters that support him.

𝐻0 :
𝐻1 :

At 𝛼 = 0.10, critical values =


rejection regions:

𝑝̂ − 𝑝0
𝑍= =
√𝑝0 (1 − 𝑝0 )
𝑛

Since 𝑍 = , we reject 𝑯𝟎 at 𝛼 = 0.10 and conclude that

Chapter 5 – Page 14
The p-value Approach

The p-value approach is the hypothesis test process that compares the
probability, called the p-value, with the significance level 𝛼.

Definition:
A p-value is the probability that the test statistic would assume a value as
extreme as, or more extreme than the observed value of the test statistic
(in the direction of the alternative hypothesis) when the null hypothesis is
true.

Decision rule using p-value approach:


If p-value ≤ 𝛼, then reject 𝑯𝟎
If p-value > 𝛼, then do not reject 𝑯𝟎

EXAMPLE:
A standard intelligence examination has been given to the students for
several years and it is assumed that the scores are normally distributed
with an average of 80 and a standard deviation of 7. A group of 25
students obtained a mean grade of 77 in the examination this year. Is this
year’s student inferior in intelligence to the past years’ students? Test at
𝛼 = 0.05. Calculate the p-value. Use the p-value to draw conclusion
regarding the statistical test.

Solution: Given 𝑛 = 25; 𝑋 = 77; 𝜎 = 7; 𝜇0 = 80; 𝛼 = 0.05


Let 𝜇 ≡ population average score.

𝐻0 : 𝜇 ≥ 80
𝐻1 : 𝜇 < 80

𝑋̅ − 𝜇0 70 − 80
𝑍= 𝜎 = = −2.14
7
√𝑛 √25

p-value = P (𝑍 < −2.14) = 0.01618

Since p-value = 0.01618 < 𝛼 = 0.05, we reject 𝑯𝟎 at 𝛼 = 0.05. This year’s


student inferior in intelligence to the past years’ students.

Chapter 5 – Page 15
CHI-SQUARE TEST

• ‘Chi’ is the Greek letter  , pronounced ‘kye’.


• The chi-square distribution is a continuous distribution and it has a
positive integer parameter v , which determines its shape.
• As its name implies,  cannot take a negative value.
2

• The parameter v is known as the degrees of freedom of the distribution


and we refer to a ‘chi-square distribution with v degrees of freedom’.
For simplicity, we write this as  .
2

• There are many  distributions; one for each degree of freedom. As


2

the degrees of freedom become fewer, the distribution becomes more


positively skewed. Conversely as the number of degrees of freedom is
increased, the distribution becomes approximately normal.

Chi-square distribution curve

• The  statistic plays an important role in many business problems


2

dealing with count data where information is obtained by counting


rather than by measuring.

Example:
1. In market research, we count the number of people who prefer a
particular brand of detergent powder
2. In quality control, we count the number of defectives produced by a
machine during a certain period

• There are many situations of this type where measurements are made
by counting the numbers or frequency in each category.
• The  test is applied to such frequency of occurrences as against the
2

expected ones.

The  test is used broadly for:


2

(1) Test of goodness-of-fit (not included in our syllabus)


 For one-way classification or for one variable only

Chapter 5 – Page 16
 Test whether a given set of data actually follows an assumed
distribution or not
(2) Test of independence
 For more than one row or column in the form of a contingency table
concerning several attributes
 Test for dependence between two variables

CONTINGENCY TABLE ANALYSIS (Test of independence)

• The chi-square test can be used in more than one variable and more
than one characteristic
• Often data are collected on several variables at a time. For example, a
questionnaire will usually contain more than one question.
• Another important application of the  distribution is in testing for the
2

independence of two variables on the basis of sample data.


• If there are differences in the two variables then the variables are said
to be associated whereas if there are no differences then the variables
are said to be independent.

Contingency Table

 A table that gives the frequencies for two or more variables


simultaneously
 Only consider 2 x 2 contingency table

r  c Contingency Table
B
Class B1 B2 … Bc Total
A1 f11 f12 … f1c f1+
A2 f 21 f 22 … f 2c f2+
A
    
Ar f r1 fr 2 … f rc fr +
Total f +1 f+2 … f+c n

 r rows, c columns
 𝑓𝑖+ = ∑𝑐𝑗=1 𝑓𝑖𝑗 for 𝑖 = 1,2, … , 𝑟  𝑓+𝑗 = ∑𝑟𝑖=1 𝑓𝑖𝑗 for 𝑗 = 1,2, … , 𝑐

Chapter 5 – Page 17
If the criteria are independent, then the joint probability for each
combination can be expressed as the product of the separate marginal
probabilities: P( Ai  B j ) = P( Ai ) P( B j )

The chi-square procedure will be used to see how well the data fit this
assumption.

Estimate the marginal probabilities from the row and column totals:
f f
 P( Ai )  i +  P( B j )  + j
n n

Expected number Eij for each combination when A and B are


independent:
Eij = nP( Ai  B j ) = nP( Ai ) P( B j )
 f  f 
 n i +  + j 
 n  n 
f f
= i+ + j
n

𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙


𝐸=
𝑛

Test for independence between two variables

 The null and alternative hypothesis


H 0 : The two variables are independent
H 1 : The two variables are not independent

 Test statistic
r c (Oij − Eij ) 2
 = 
2

i =1 j =1 Eij

 Critical value and rejection region


Critical value:  ; ( r −1)( c −1)
2

Critical region:    ; ( r −1)( c −1)


2 2

Chapter 5 – Page 18
Example:
An accident inspector makes spot checks on working practices during
visits to industrial sites chosen at random. At one large construction site,
the numbers of accidents occurring per week were counted for a period of
three years, and each week was also classified as to whether or not the
inspector had visited the site during the previous week. The results are
shown as follows.
Number of accidents
0 1 2 3
Visit 33 8 5 4
No visit 67 42 15 6

Do the number of accidents depends on the visits by the inspector? Use


 = 0.05 .

Solution:
H 0 : Number of accidents independent from inspector’s visits
H 1 : Number of accidents depends on inspector’s visits

At  = 0.05 , Critical value =  0.05; ( 2 −1)( 4 −1) =  0.05; 3 = 7.815


2 2

Rejection region:   7.815


2

Number of accidents
0 1 2 3 Total
Visit 33 (27.7778) 8 (13.8889) 5 (5.5556) 4 (2.7778) 50
No visit 67 (72.2222) 42 (36.1111) 15 (14.4444) 6 (7.2222) 130
Total 100 50 20 10 180
2 4
2
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
𝜒 = ∑∑
𝐸𝑖𝑗
𝑖=1 𝑗=1
(33 − 27.7778)2 (8 − 13.8889)2 (5 − 5.5556)2 (4 − 2.7778)2
= + + +
27.7778 13.8889 5.5556 2.7778
(67 − 72.2222)2 (42 − 36.1111)2 (15 − 14.4444)2 (6 − 7.2222)2
+ + + +
72.2222 36.1111 14.4444 7.2222
= 0.9818 + 2.4969 + 0.0556 + 0.5378 + 0.3776 + 0.9603 + 0.0214 + 0.2068
= 5.6382

Since 𝜒 2 = 5.6382 < 7.815, we failed to reject H 0 at  = 0.05 and


conclude that the number of accidents independent from the visits by the
inspector.

Chapter 5 – Page 19
Example:
A sample of hotels in a particular country was selected. The following table
shows the number of hotels in each region of the country and in each of
four grades.
Region
Grade Eastern Central Western
1 star 29 22 29
2 star 67 38 55
3 star 53 32 35
4 star 11 8 21

Show whether there is any evidence of a significant association between


region and grade of hotel in this country. Use  = 0.05 .

Solution:
H 0 : There is no association between the region and the grade of hotel
H 1 : There is an association between the region and the grade of hotel

At  = 0.05 , Critical value =  0.05; ( 4−1)( 3−1) =


2

Rejection region:

Region
Grade Eastern Central Western Total
1 star 29 ( ) 22 ( ) 29 ( ) 80
2 star 67 ( ) 38 ( ) 55 ( ) 160
3 star 53 ( ) 32 ( ) 35 ( ) 120
4 star 11 ( ) 8( ) 21 ( ) 40
Total 160 100 140 400

4 3 (Oij − Eij ) 2
 = 
2

i =1 j =1 Eij
=

Chapter 5 – Page 20
Chapter 5 – Page 21
Formula:
Population
Population mean
Proportion
𝜎
𝑥̅ ± 𝑍𝛼 if 𝜎 is known
Confidence 2 √𝑛 𝑝̂ (1 − 𝑝̂ )
Interval 𝑠 𝑝̂ ± 𝑍𝛼 √
𝑥̅ ± 𝑍𝛼 if 𝜎 is unknown and 𝑛 ≥ 30 2 𝑛
√𝑛
2
𝑥̅ − 𝜇0
𝑍= 𝜎 if 𝜎 is known
𝑝̂ − 𝑝0
Test √𝑛 𝑍=
statistics 𝑥̅ − 𝜇0 √𝑝0 (1 − 𝑝0 )
𝑍= 𝑛
𝑠 if 𝜎 is unknown and 𝑛 ≥ 30
√𝑛

𝑟 𝑐
2
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
Chi-Square Test of Independence 𝜒 = ∑∑
𝐸𝑖𝑗
𝑖=1 𝑗=1

Summary for hypothesis testing:

𝐻0 𝐻1 Type of Critical Rejection Region


test Value
(i) 𝜇 = 𝜇0 𝜇 ≠ 𝜇0 Two-tailed ±𝑍𝛼 𝑍 < −𝑍𝛼 or 𝑍 > 𝑍𝛼
2 2 2
test
𝑝 = 𝑝0 𝑝 ≠ 𝑝0

(ii) 𝜇 ≥ 𝜇0 𝜇 < 𝜇0 Left-tailed −𝑍𝛼 𝑍 < −𝑍𝛼


test
𝑝 ≥ 𝑝0 𝑝 < 𝑝0

(iii) 𝜇 ≤ 𝜇0 𝜇 > 𝜇0 Right-tailed 𝑍𝛼 𝑍 > 𝑍𝛼


test
𝑝 ≤ 𝑝0 𝑝 > 𝑝0

Chapter 5 – Page 22
AAMS1773 QUANTITATIVE STUDIES
Tutorial 5 (Estimation and Hypothesis Testing)

1. Suppose that the standard deviation of the life (normally distributed)


for a particular brand of Light bulbs is known to be 500 hours, but
that the mean operating life is not known. A random sample of 35
bulbs was checked and found that the mean life was 8900 hours.
Determine the 95% confidence interval for the mean life of the
particular brand of bulbs.

2. A sample of 50 college students showed mean height of 167.16 cm


and standard deviation of 6.86 cm. Construct a 98% confidence
interval for the mean height of all college students.

3. For a random sample of 100 households in a large metropolitan


area, the number of households in which at least one adult is
currently unemployed and seeking a full-time job is 12. Estimate the
proportion of households in the area in which at least one adult is
unemployed, using a 99% confidence interval.

4. In a survey of 1000 electors, 20% were found to favor party A.


Construct a 90% confidence interval for the percentage in favor of
party A.

5. The manufacturer of a certain oil-additive claims that the mean net


weight of jars of his product is 1kg. A random sample of size 49 of a
large consignment supplied to your company is found to have a
mean weight of 0.98kg with a standard deviation of 0.02kg. Test the
manufacturer’s claim at the 0.05 significance level.

6. In the past, the average IQ of freshmen in XYZ University was 124.


This year, a sample of 100 freshmen students was taken from this
university and it was found that their average IQ was 125 with a
standard deviation of 8. Does this indicate a significant
improvement? (use α = 0.05)

7. Suppose that hourly wage in the chemical industry are normally


distributed, with a mean of $7.60 and a standard deviation of $0.60.
A large company in this industry took a random sample of 50 of its
workers and determined that their average hourly wage was $7.50.
Can we conclude at the 10% level of significance that this
company’s average hourly wage is less than that of the entire
industry?

Chapter 5 – Page 23
8. A theory predicts that the probability of an event is 0.4. The theory
is tested experimentally and in 400 independent trials the event
occurred 140 times. Is the number of occurrences significantly less
than that predicted by the theory? Test at the 1% level of
significance.

9. It is assumed that over half of the employees in a large organization


are in favor of a proposed new wage structure. A random sample of
340 employees found that 56% were in favor. Does this sample
verify the assumption? (use the 5% significance level)

10. An article in the Washington Post stated that nearly 45% of the U.S.
population is born with brown eyes, although they don’t necessarily
stay that way. To test the newspaper’s claim, a random sample of
80 people was selected, and 32 had brown eyes. Is there sufficient
evidence to dispute the newspaper’s claim regarding the proportion
of brown-eyed people in the United States? Use α = 0.01.

11. A commonly prescribed drug on the market for relieving nervous


tension is believed to be only 60% effective. Experimental results
with a new drug administered to a random sample of 100 adults who
were suffering from nervous tension showed that 70 received relief.
Is this sufficient evidence to conclude that the new drug is superior
to the one commonly prescribed? Use a 0.05 level of significance.
Calculate the p-value. Use the p-value to draw conclusion regarding
the statistical test.

12. 1,000 flights from a major airport were classified according to


intensity of booking and the following table was prepared.

Number of Flights
Internal Regional International
Fully booked 154 171 275
Not fully booked 96 79 225

Is there any evidence of significant association between the type of


flight and the intensity of bookings? (use α = 0.01)

Chapter 5 – Page 24
13. The machines in a factory were classified according to the observed
degree of defectiveness over the previous year’s operations, as in
the following table:

Number of Machines
% defective Cutting Grinding Millers
machine machine
1% or less 22 74 102
Over 1% and less than 2% 31 102 143
2% and more 7 64 55

Is there any evidence to show a significant association between


degree of defectiveness and type of machine? (use α = 0.02)

Answers:

1. [8734.35, 9065.65] hours


2. [164.9031, 169.4169] cm
3. [0.0363, 0.2037]
4. [17.92%, 22.08%]
Type of test used Test Statistics Conclusion
5. Two tailed -7 Reject Ho
6. Right tailed 1.25 Do not reject Ho
7. Left tailed -1.1785 Do not reject Ho
8. Left tailed -2.0412 Do not reject Ho
9. Right tailed 2.2127 Reject Ho
10. Two tailed -0.8989 Do not reject Ho
11. Right tailed 2.04 Reject Ho
12. 12.825 Reject Ho
13. 9.0904 Do not reject Ho

Chapter 5 – Page 25

You might also like