0% found this document useful (0 votes)
24 views47 pages

Elementary Statistics Ch.9

Uploaded by

v6gpg7kvd8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views47 pages

Elementary Statistics Ch.9

Uploaded by

v6gpg7kvd8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter 9

Chi-Square Tests
and the F-
Distribution
§ 9.1
Goodness of Fit
Multinomial Experiments
A multinomial experiment is a probability experiment
consisting of a fixed number of trials in which there are more than
two possible outcomes for each independent trial. (Unlike the
binomial experiment in which there were only two possible
outcomes.)

Example:
A researcher claims that the distribution of favorite pizza
toppings among teenagers is as shown below.
Topping Frequency, f
Each outcome is Cheese 41% The probability for
classified into Pepperoni 25% each possible
categories. Sausage 15% outcome is fixed.
Mushrooms 10%
Onions 9%
Chi-Square Goodness-of-Fit
Test
A Chi-Square Goodness-of-Fit Test is used to test whether
a frequency distribution fits an expected distribution.
To calculate the test statistic for the chi-square goodness-of-
fit test, the observed frequencies and the expected
frequencies are used.
The observed frequency O of a category is the frequency for the category observed in the sample data.
The expected frequency E of a category is the calculated frequency for the category. Expected frequencies are obtained assuming the specified (or
hypothesized) distribution. The expected frequency for the ith category is
Ei = npi
where n is the number of trials (the sample size) and pi is the assumed probability of the ith category.
Observed and Expected
Frequencies
Example:
200 teenagers are randomly selected and asked what their favorite
pizza topping is. The results are shown below.
Find the observed frequencies and the expected
frequencies.

Topping Results % of Observed Expected


(n = teenager Frequenc Frequency
200) s y 200(0.41) = 82
Cheese 78 41% 78 200(0.25) = 50
Pepperoni 52 25% 52 200(0.15) = 30
Sausage 30 15% 30 200(0.10) = 20
Mushroom 25 10% 25 200(0.09) = 18
s 15
Onions 15 9%
Chi-Square Goodness-of-Fit
Test
For the chi-square goodness-of-fit test to be used, the following
must be true.
1. The observed frequencies must be obtained by using a
random sample.
2. Each expected frequency must be greater than or equal to 5.

The Chi-Square Goodness-of-Fit Test


If the conditions listed above are satisfied, then the sampling
distribution for the goodness-of-fit test is approximated by a chi-
square distribution with k – 1 degrees of freedom, where k is the
number of categories. The test statistic for the chi-square
goodness-of-fit test is 2
(O  E ) The test is always a
χ 2  right-tailed test.
E
where O represents the observed frequency of each category
and E represents the expected frequency of each category.
Chi-Square Goodness-of-Fit
Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
1. Identify the claim. State the State H0 and Ha.
null and alternative
hypotheses.
Identify .
2. Specify the level of
significance. d.f. = k – 1

3. Identify the degrees of Use Table 6 in


freedom. Appendix B.

4. Determine the critical value.


Continued.
5. Determine the rejection region.
Chi-Square Goodness-of-Fit
Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
6. Calculate the test statistic. 2 (O  E )2
χ 
E

7. Make a decision to reject or If χ2 is in the


fail to reject the null rejection region,
hypothesis. reject H0.
Otherwise, fail to
reject H0.
8. Interpret the decision in the
context of the original claim.
Chi-Square Goodness-of-Fit
Test
Example:
A researcher claims that the distribution of favorite
pizza toppings among teenagers is as shown below.
200 randomly selected teenagers are surveyed.
Topping Frequency, f
Cheese 39%
Pepperoni 26%
Sausage 15%
Mushrooms 12.5%
Onions 7.5%

Using  = 0.01, and the observed and expected


values previously calculated, test the surveyor’s
claim using a chi-square goodness-of-fit test.
Continued.
Chi-Square Goodness-of-Fit
Test
Example continued:
H0: The distribution of pizza toppings is 39% cheese,
26% pepperoni, 15% sausage, 12.5%
mushrooms, and 7.5% onions. (Claim)
Ha: The distribution of pizza toppings differs from
the claimed or expected distribution.

Because there are 5 categories, the chi-square


distribution has k – 1 = 5 – 1 = 4 degrees of freedom.

With d.f. = 4 and  = 0.01, the critical value is χ20 =


13.277.
Continued.
Chi-Square Goodness-of-Fit
Test
Example continued:
Topping Observed Expected
Rejectio Frequenc Frequenc
n region
y y
 0.01 Cheese 78 82
Pepperoni 52 50
X2 Sausage 30 30
χ20 = 13.277 Mushroom 25 20
s
2 Onions
(O  E )2 (78  82)2 (52  50)2
(30  30)2 (2515 18
 20)2 (15  18)2
χ      
E 82 50 30 20 18
2.025
Fail to reject H0.
There is not enough evidence at the 1% level to
reject the surveyor’s claim.
§ 9.2
Independence
Contingency Tables
An r  c contingency table shows the observed
frequencies for two variables. The observed
frequencies are arranged in r rows and c columns.
The intersection of a row and a column is called a cell.

The following contingency table shows a random


sample of 321 fatally injured passenger vehicle
drivers by age and gender. (Adapted from Insurance
Institute for Highway Safety)
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6
Expected Frequency
Assuming the two variables are independent, you can
use the contingency table to find the expected
frequency for each cell.

Finding the Expected Frequency for Contingency


Table Cells
The expected frequency for a cell Er,c in a contingency
table is frequency E (Sum of row r ) (Sum of column c).
Expected r ,c Sample size
Expected Frequency
Example:
Find the expected frequency for each “Male” cell in the
contingency table for the sample of 321 fatally injured
drivers. Assume that the variables, age and gender, are
independent.
Age
Gende 16 – 21 – 31 – 41 – 51 – 61 and Total
r 20 30 40 50 60 older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

Continued.
Expected Frequency
Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321

(Sum of row r ) (Sum of column c)


Expected frequency E r ,c 
Sample size
216 45 216 73 216 85
E 1,1  30.28 E 1,2  49.12 E 1,3  57.20
321 321 321

216 64 216 38 216 16


E 1,4  43.07 E 1,5  25.57 E 1,6  10.77
321 321 321
Chi-Square Independence
Test
A chi-square independence test is used to test the
independence of two variables. Using a chi-square test,
you can determine whether the occurrence of one variable
affects the probability of the occurrence of the other
variable.

For the chi-square independence test to be used, the


following must be true.
1. The observed frequencies must be obtained by using a
random sample.
2. Each expected frequency must be greater than or equal to 5.
Chi-Square Independence
Test
The Chi-Square Independence Test
If the conditions listed are satisfied, then the
sampling distribution for the chi-square independence
test is approximated by a chi-square distribution with
(r – 1)(c – 1)
degrees of freedom, where r and c are the number of
rows and columns, respectively, of a contingency
table. The test statistic for the chi-square
independence test is 2 The test is always a
(O  E )
χ 2  right-tailed test.
E

where O represents the observed frequencies and E


represents the expected frequencies.
Chi-Square Independence
Test
Performing a Chi-Square Independence Test
In Words In Symbols
1. Identify the claim. State the State H0 and Ha.
null and alternative
hypotheses.
Identify .
2. Specify the level of
significance. d.f. = (r – 1)(c –
1)
3. Identify the degrees of Use Table 6 in
freedom. Appendix B.

4. Determine the critical value.


Continued.
5. Determine the rejection region.
Chi-Square Independence
Test
Performing a Chi-Square Independence Test
In Words In Symbols
6. Calculate the test statistic. 2 (O  E )2
χ 
E

7. Make a decision to reject or If χ2 is in the


fail to reject the null rejection region,
hypothesis. reject H0.
Otherwise, fail to
reject H0.
8. Interpret the decision in the
context of the original claim.
Chi-Square Independence
Test
Example:
The following contingency table shows a random
sample of 321 fatally injured passenger vehicle
drivers by age and gender. The expected
frequencies are displayed in parentheses. At  =
0.05, can you conclude that the drivers’ ages are
related to gender in such accidents?
Age
Gende 16 – 21 – 31 – 41 – 51 – 61 and Total
r 20 30 40 50 60 older
Male 32 51 52 43 28 10 216
(30.28) (49.12) (57.20) (43.07) (25.57) (10.77)
Femal 13 22 33 21 10 6 105
e (14.72) (23.88) (27.80) (20.93) (12.43) (5.23)
45 73 85 64 38 16 321
Chi-Square Independence
Test
Example continued:
Because each expected frequency is at least 5 and
the drivers were randomly selected, the chi-square
independence test can be used to test whether the
variables are independent.
H0: The drivers’ ages are independent of gender.
Ha: The drivers’ ages are dependent on gender.
(Claim)
d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) =
5
With d.f. = 5 and  = 0.05, the critical value is χ20 =
11.071.
Continued.
Chi-Square Independence
Test
Example continued: O E O–E (O – E)2 (O  E )2
Rejectio
E
32 30.28 1.72 2.9584 0.0977
n region
51 49.12 1.88 3.5344 0.072
 0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O  E )2 22 23.88 1.88 3.5344 0.148
2
χ  2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134

There is not enough evidence at the 5% level to


conclude that age is dependent on gender in such
accidents.
§ 9.3
Comparing Two
Variances
F-Distribution
Let s12 and s22 represent the sample variances of two
different populations. If both populations are normal
σ12 and σ22
and the population variances are equal,
then the sampling distribution of
s12
F  2
s2
is called an F-distribution.
There are several properties of this distribution.
1. The F-distribution is a family of curves each of which is
determined by two types of degrees of freedom: the
degrees of freedom corresponding to the variance in
the numerator, denoted d.f.N, and the degrees of
freedom corresponding to the variance in the Continued.
denominator, denoted d.f.D.
F-Distribution
Properties of the F-distribution continued:
2. F-distributions are positively skewed.
3. The total area under each curve of an F-distribution is
equal to 1.
4. F-values are always greater than or equal to 0.
5. For all F-distributions, the mean value of F is
approximately equal to 1.

d.f.N = 1 and d.f.D = 8


d.f.N = 8 and d.f.D = 26
d.f.N = 16 and d.f.D = 7
d.f.N = 3 and d.f.D = 11

F
1 2 3 4
Critical Values for the F-
Distribution
Finding Critical Values for the F-Distribution
1. Specify the level of significance .
2. Determine the degrees of freedom for the numerator,
d.f.N.
3. Determine the degrees of freedom for the denominator,
d.f.D.
4. Use Table 7 in Appendix B to find the critical value. If
the hypothesis test is 1
b. two-tailed, use the 2  F-table.
a. one-tailed, use the  F-table.
Critical Values for the F-
Distribution
Example:
Find the critical F-value for a right-tailed test when
 = 0.05, d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D:  = 0.05
Degrees of d.f.N: Degrees of freedom, numerator
freedom,
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
27
2 4.21 19.00
18.51 3.35 2.96
19.16 2.73
19.25 2.57
19.30 2.46
19.33
28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.


Critical Values for the F-
Distribution
Example:
Find the critical F-value for a two-tailed 1 1
 = (0.10) = 0.05
test when  = 0.10, d.f.N = 4 and d.f.D = 2 2

6. Appendix B: Table 7: F-Distribution


d.f.D:  = 0.05
Degrees of d.f.N: Degrees of freedom, numerator
freedom,
denominator

1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
The critical
6 value is F05.14
5.99 = 4.53.
4.76 4.53 4.39 4.28
7 5.59 4.74 4.35 4.12 3.97 3.87
Two-Sample F-Test for
Variances
Two-Sample F-Test for Variances
A two-sample F-test is used to compare two
population σ12variances
and σ22 when a sample is
randomly selected from each population. The
populations must be independent and normally
distributed.
s12 The test statistic is
F  2
s2
s12 and s22
s12 s22.
where represent the sample variances with
The degrees of freedom for the numerator is
d.f.N = n1 – 1 and the degrees of freedom for the
2
s
denominator is d.f.D = n12 – 1, where n1 is the size of
2
the sample having s2. variance and n2 is the size of
the sample having variance
Two-Sample F-Test for
Variances
σ12 and σ22
Using a Two-Sample F-Test to Compare
In Words In Symbols
1. Identify the claim. State the State H0 and Ha.
null and alternative
hypotheses.
Identify .
2. Specify the level of
significance. d.f.N = n1 – 1
d.f.D = n2 – 1
3. Identify the degrees of
freedom. Use Table 7 in
Appendix B.

Continued.
4. Determine the critical value.
Two-Sample F-Test for
Variances
σ12 and σ22
Using a Two-Sample F-Test to Compare
In Words In Symbols
5. Determine the rejection
region. s12
F  2
6. Calculate the test statistic. s2

If F is in the
7. Make a decision to reject or rejection region,
fail to reject the null reject H0.
hypothesis. Otherwise, fail to
reject H0.

8. Interpret the decision in the


Two-Sample F-Test
Example:
A travel agency’s marketing brochure indicates that
the standard deviations of hotel room rates for two
cities are the same. A random sample of 13 hotel
room rates in one city has a standard deviation of
$27.50 and a random sample of 16 hotel room rates
in the other city has a standard deviation of $29.75.
Can you reject the agency’s claim at  = 0.01?
s12=885.06 and s22 756.25.
Because 29.75 > 27.50,
H0: σ12  σ22 (Claim)
Ha: σ12  σ22
Continued.
Two-Sample F-Test
Example continued:
1 1
This is a two-tailed test with2 2=
( 0.01) = 0.005, d.f.N
= 15 and d.f.D = 12.
The critical value is F0 = 4.72.
1
 0.005
2 The test statistic is
s12 885.06
F F  2 1.17.
1 2 3 4
F0 = 4.72 s2 756.25

Fail to reject H0.


There is not enough evidence at the 1% level to
reject the claim that the standard deviation of the
hotel room rates for the two cities are the same.
§ 9.4
Analysis of
Variance
One-Way ANOVA
One-way analysis of variance is a hypothesis-testing
technique that is used to compare means from three
or more populations. Analysis of variance is usually
abbreviated ANOVA.

In a one-way ANOVA test, the following must be true.


1. Each sample must be randomly selected from a normal, or approximately normal,
population.
2. The samples must be independent of each other.
3. Each population must have the same variance.
One-Way ANOVA
Variance between samples
Test statistic 
Variance within samples

1. The variance between samples MSB measures the differences related to the treatment given to
each sample and is sometimes called the mean square between.
2. The variance within samples MSW measures the differences related to entries within the same
sample. This variance, sometimes called the mean square within, is usually due to sampling
error.
One-Way ANOVA
One-Way Analysis of Variance Test
If the conditions listed are satisfied, then the
sampling distribution for the test is approximated
by the F-distribution. The test statistic is
MS B
F  .
MSW

The degrees of freedom for the F-test are


d.f.N = k – 1
and
d.f.D = N – k
where k is the number of samples and N is the sum
of the sample sizes.
Test Statistic for a One-Way
ANOVA
Finding the Test Statistic for a One-Way
ANOVA Test
In Words In Symbols
1. Find the mean and variance x 2 (x  x )2
x s 
of each sample. n n 1

2. Find the mean of all entries x


x
in all samples (the grand N
mean).
SS B  ni (x i  x )2
3. Find the sum of squares
between the samples.
SSW (ni  1)si2
4. Find the sum of squares
within the samples. Continued.
Test Statistic for a One-Way
ANOVA
Finding the Test Statistic for a One-Way
ANOVA Test
In Words In Symbols
5. Find the variance between SS B SS B
MS B  
the samples. k  1 d.f.N

6. Find the variance within SSW SS


MSW   W
the samples N  k d.f.D

7. Find the test statistic. MS B


F 
MSW
Performing a One-Way ANOVA
Test
Performing a One-Way Analysis of Variance
Test
In Words In Symbols
1. Identify the claim. State the State H0 and Ha.
null and alternative
hypotheses.
Identify .
2. Specify the level of
significance. d.f.N = k – 1
d.f.D = N – k
3. Identify the degrees of
freedom. Use Table 7 in
Appendix B.
Continued.
4. Determine the critical value.
Performing a One-Way ANOVA
Test
Performing a One-Way Analysis of Variance
Test
In Words In Symbols
5. Determine the rejection
region.
MS B
6. Calculate the test statistic. F 
MSW

If F is in the
7. Make a decision to reject or rejection region,
fail to reject the null reject H0.
hypothesis. Otherwise, fail to
reject H0.

8. Interpret the decision in the


ANOVA Summary Table
A table is a convenient way to summarize the results
in a one-way ANOVA test.

Sum of Degrees
Mean
Variation square of F
squares
s freedom

SS B
Between SSB d.f.N MS B  MS B MSW
d.f.N

SSW
Within SSW d.f.D MSW 
d.f.D
Performing a One-Way ANOVA
Test
Example:
The following table shows the salaries of randomly
selected individuals from four large metropolitan
areas. At  = 0.05, can you conclude that the mean
salary is different in at least one of the areas?
(Adapted from US Bureau of Economic Analysis)

Pittsburg Dallas Chicago Minneapo


h lis
27,800 30,000 32,000 30,000
28,000 33,900 35,800 40,000
25,500 29,750 28,000 35,000
29,150 25,000 38,900 33,000
30,295 34,055 27,245 29,805
Continued.
Performing a One-Way ANOVA
Test
Example continued:
H0: μ1 = μ2 = μ3 = μ4
Ha: At least one mean is different from the others. (Claim)

Because there are k = 4 samples, d.f.N = k – 1 = 4 – 1 = 3.

The sum of the sample sizes is


N = n1 + n2 + n3 + n4 = 5 + 5 + 5 + 5 = 20.

d.f.D = N – k = 20 – 4 = 16

Using  = 0.05, d.f.N = 3, and d.f.D = 16,


the critical value is F0 = 3.24.
Continued.
Performing a One-Way ANOVA
Test
Example continued:
To find the test statistic, the following must be
calculated.
 x 140745  152705  161945  167805
x  31160
N 20
SS B  n (x  x )2
MS B   i i
d.f.N k1
5(28149  31160)2  5(30541  31160)2
 
4 1
5(32389  31160)2  5(33561  31160)2
4 1
27874206.67
Continued.
Performing a One-Way ANOVA
Test
Example continued:
SSW (ni  1)si2
MSW  
d.f.D N  k
(5  1)(3192128.94)  (5  1)(13813030.08)
 
20  4
(5  1)(24975855.83)  (5  1)(17658605.02)
20  4
14909904.97 Test Critical
statistic value
MS B 27874206.67 1.870
F   1.870 < 3.24.
MSW 14909904.34
Fail to reject H0.
There is not enough evidence at the 5% level to conclude
that the mean salary is different in at least one of the
areas.

You might also like