Cebu Institute of Technology - University
Department of industrial engineering
CHI SQUARE
TESTS
Prepared by: Engr. Kristan Ian D. Cabaña
OUTLINE
Two Non-Parametric Hypothesis Tests
• Chi-square Test for Goodness-of-Fit
• Chi-square Test for Independence
PARAMETRIC AND
NON-PARAMETRIC
TESTS
PARAMETRIC AND NON-PARAMETRIC TESTS
MODULE 7
PARAMETRIC TEST
Previous examples of hypothesis tests, such as the t tests
and analysis of variance, are parametric tests and they do
include assumptions about parameters and hypotheses
about parameters.
NON-PARAMETRIC TEST
The term "non-parametric" refers to the fact that the
chi-square tests do not require assumptions about
population parameters nor do they test hypotheses about
population parameters.
PARAMETRIC AND NON-PARAMETRIC TESTS
MODULE 7
The most obvious difference between the
chi-square tests and the other hypothesis tests
we have considered (t and ANOVA) is the
nature of the data.
For chi-square, the data are frequencies rather
than numerical scores.
CHI SQUARE TESTS
CHI-SQUARE TESTS
MODULE 7
CHI-SQUARE TESTS
MODULE 7
CHI-SQUARE TESTS
MODULE 7
GOODNESS OF FIT TEST
GOODNESS-OF-FIT TEST
MODULE 7
GOODNESS-OF-FIT TEST
The Chi-Square Statistic can be used to see whether a
frequency distribution fits a specific pattern. The chi-square
goodness-of-fit test is used for this test.
GOODNESS-OF-FIT TEST
MODULE 7
GOODNESS-OF-FIT TEST
Chi-Square goodness of fit test is a non-parametric test that
is used to find out how the observed value of a given
phenomena is SIGNIFICANTLY DIFFERENT from the
expected value. In Chi-Square goodness of fit test, the term
goodness of fit is used to compare the observed sample
distribution with the expected probability distribution.
GOODNESS-OF-FIT TEST
MODULE 7
GOODNESS-OF-FIT TEST
For Example, a researcher wants to determine whether
consumers have any preference among the flavors of ice
cream. A sample of 100 people provided the data:
Flavor Frequency
Chocolate 32
Strawberry 28
Mango 16
Cookies ‘n Cream 14
Vanilla 10
GOODNESS-OF-FIT TEST
MODULE 7
GOODNESS-OF-FIT TEST
Flavor Observed Frequency Expected Frequency
Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ‘n Cream 14 20
Vanilla 10 20
OBSERVED FREQUENCY – frequency obtained from the sample
EXPECTED FREQUENCY – frequency obtained from calculation
GOODNESS-OF-FIT TEST
MODULE 7
ASSUMPTIONS
The test statistic has a chi-square distribution
when the following assumptions are met
1. The data obtained from a random sample
must be independent.
2. The expected frequency for each category
must be at least 5.
GOODNESS-OF-FIT TEST
MODULE 7
PROPERTIES
The following are properties of the goodness-of-fit
test:
1. The data are the observed frequencies. This means that there is
only one data value for each category.
2. The degrees of freedom is one less than the number of
categories, not one less than the sample size.
3. It is always a right tail test.
4. It has a chi-square distribution.
5. The value of the test statistic doesn't change if the order of the
categories is switched.
GOODNESS-OF-FIT TEST
MODULE 7
Chi-Square Test Statistic
!
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)
𝜒! = #
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑤𝑖𝑡ℎ 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 − 1
GOODNESS-OF-FIT TEST
MODULE 7
HYPOTHESIS TESTING
Before computing the test statistic, the
hypotheses must be stated first.
𝐻! : 𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑟𝑠 𝑠ℎ𝑜𝑤 𝑛𝑜 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒.
𝐻" : 𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑟𝑠 𝑠ℎ𝑜𝑤 𝑎 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒.
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
A clothing manufacturer wants to determine whether
customers prefer any specific color over other colors in
shirts. He selects a random sample of 102 shirts sold notes
the color. The table shows the results. At alpha=0.10, is there
a color preference?
Flavor Frequency
White 43
Blue 22
Black 16
Red 10
Yellow 6
Green 5
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: Customers have no color preference.
Ha: Customers show a color preference.
STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.10 ; Right-Tailed Test, n= 6; df = 6-1 = 5
STEP 3: Identify the critical value of the test statistic.
Test Statistic, 𝜒!" = +9.236
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.
Rejection Region
X2
9.236
STEP 5: Calculate the standardized test statistic.
$ (&'("))! ($$("))! +(") !
𝜒# = + + ⋯+ = 59.7647
") ") ")
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.
Using the standardized and critical values of the test statistic.
|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 59.7647| > |𝑋 ! & = 9.236| : Reject H0
Using the P-value.
𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼 = 0.05 : Reject H0
HYPOTHESIS TESTING
MODULE 3
ILLUSTRATIVE EXAMPLE
STEP 7: Conclusion.
At 0.10 level of significance, there is enough evidence to reject the
claim that customers show no preference for the color of shirts.
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
The dean of student affairs of a college wishes to test the claim that
the distribution of students is as follows: 40% nursing; 25% business;
15% computer science; 10% engineering; 5% humanities; and 5%
education. Last semester, the program enrollment was distributed as
shown below. At alpha=0.05, is the distribution of students the same
as hypothesized?
MAJOR NUMBER
Nursing 72
Business 53
Computer Science 32
Engineering 20
Humanities 16
Education 7
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
MAJOR OBSERVED EXPECTED
Nursing 72 200*40%=80
Business 53 50
Computer Science 32 30
Engineering 20 20
Humanities 16 10
Education 7 10
TOTAL 200 200
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
PRINTED CIRCUIT BOARD DEFECTS: The number of defects in printed
circuit boards is hypothesized to follow a Poisson distribution. A
random sample of 60 printed circuit boards has been collected, and
the following number of defects observed:
NO. OF DEFECTS FREQUENCY
0 32
1 15
2 9
3 4
Using alpha=0.05, can we claim that the number of defects follows a
Poisson distribution?
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
𝑒 9: 𝜆; 32 ∗ 0 + 15 ∗ 1 + 9 ∗ 2 + (4 ∗ 3)
𝑃 𝑥 = 𝜆= = 0.75
𝑥! 60
OBSERVED EXPECTED
NO. OF DEFECTS
FREQUENCY FREQUENCY
𝑒 !".$% 0.75"
0 32 60*0.472 = 28.32 𝑃 𝑥=0 =
0!
1 15 21.24 𝑃 𝑥 = 0 = 0.472
2 9 7.98
3 4 2.46
GOODNESS-OF-FIT TEST
MODULE 7
ILLUSTRATIVE EXAMPLE
OBSERVED EXPECTED
NO. OF DEFECTS
FREQUENCY FREQUENCY
0 32 28.32
1 15 21.24
2 OR MORE 13 10.44
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: The number of defects does not form a Poisson distribution.
Ha: The number of defects does form a Poisson distribution.
STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.05 ; Right-Tailed Test, k = 3;
df = k-p-1 = 3-1-1=1 where p is the number of parameters estimated
STEP 3: Identify the critical value of the test statistic.
Test Statistic, 𝜒!" = +3.841
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.
Rejection Region
X2
+3.841
STEP 5: Calculate the standardized test statistic.
𝜒#$ = 2.94
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.
Using the standardized and critical values of the test statistic.
|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 2.94| < |𝑋 ! & = 3.84| : Do not reject H0
Using the P-value.
𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 = 0.05 : Do not reject H0
HYPOTHESIS TESTING
MODULE 3
ILLUSTRATIVE EXAMPLE
STEP 7: Conclusion.
At 0.05 level of significance, there is no enough evidence to support
the claim that the number of defects follows a Poisson distribution.
TEST FOR INDEPENDENCE
TEST FOR INDEPENDENCE
MODULE 7
TEST FOR INDEPENDENCE
There are times when we might be interested in observing
more than one variable on each individual to find if a
relationship exists between these variables.
TEST FOR INDEPENDENCE
MODULE 7
TEST FOR INDEPENDENCE
As an example, for each person we might observe his blood
type and eye color and investigate if these characteristics
are related in any way. Our goal is a TEST OF
INDEPENDENCE, or to find whether two observed
characteristics of a member of a population are
independent.
TEST FOR INDEPENDENCE
MODULE 7
TEST FOR INDEPENDENCE
Column 1 Column 2 Column 3
Row 1 C11 C12 C13
Row 2 C21 C22 C23
Contingency Table – a table for determining whether the distribution according to
one variable is contingent on the distribution of the other
Cell – the name for each block in the contingency table and is designated by its row
and column position
TEST FOR INDEPENDENCE
MODULE 7
TEST FOR INDEPENDENCE
The chi-square independence test can be used to test the
independence of two variables. The hypotheses are stated
as follows:
𝐻" : 𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.
𝐻# : 𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.
GOODNESS-OF-FIT TEST
MODULE 7
Chi-Square Test Statistic
!
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)
𝜒! = #
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑤𝑖𝑡ℎ 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛𝑜. 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1 𝑥 (𝑛𝑜. 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1)
TEST FOR INDEPENDENCE
MODULE 7
EXPECTED FREQUENCY
1. Find the sum of each row and each column, and find the
grand total.
2. For each cell, multiply the corresponding row sum by the
column sum and divide by the grand total, to get the
expected value.
𝑅𝑜𝑤 𝑆𝑢𝑚 𝑥 𝐶𝑜𝑙𝑢𝑚𝑛 𝑆𝑢𝑚
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 =
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
3. Place the expected values in the corresponding cells
along with the observed values.
TEST FOR INDEPENDENCE
MODULE 7
ILLUSTRATIVE EXAMPLE
A study is being conducted to determine whether there is a
relationship between jogging and blood pressure. A random sample of
210 subjects is selected, and they are classified as shown in the table
that follows. Use alpha=0.05.
Blood Pressure
Jogging Status Low Moderate High Total
Joggers 34 57 21 112
Nonjoggers 15 63 20 98
Total 49 120 41 210
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: The blood pressure of a person does not depend on whether he jogs or not.
Ha: The blood pressure of a person depends on whether he jogs or not.
STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.05 ; Right-Tailed Test,
df = (r-1)(c-1)=(2-1)(3-1)=2
STEP 3: Identify the critical value of the test statistic.
Test Statistic, 𝜒!" = +5.991
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.
Rejection Region
X2
+5.991
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 5: Calculate the standardized test statistic.
(112)(49) (98)(49)
𝐸77 = = 26.13 𝐸"7 = = 22.87
210 210
(112)(120) (98)(120)
𝐸7" = = 64 𝐸"" = = 56
210 210
(112)(41) (98)(41)
𝐸78 = = 21.87 𝐸"8 = = 19.13
210 210
TEST FOR INDEPENDENCE
MODULE 7
ILLUSTRATIVE EXAMPLE
Blood Pressure
Jogging Status Low Moderate High Total
O E O E O E
Joggers 34 26.13 57 64 21 21.87 112
Nonjoggers 15 22.87 63 56 20 19.13 98
Total 49 120 41 210
$ ('&($,."')! ("+($$..))! $!("/."' !
𝜒# = + + ⋯+ = 6.79
$,."' $$..) "/."'
HYPOTHESIS TESTING
MODULE 7
ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.
Using the standardized and critical values of the test statistic.
|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 6.79| > |𝑋 ! & = 5.991| : Reject H0
Using the P-value.
𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼 = 0.05 : Reject H0
HYPOTHESIS TESTING
MODULE 3
ILLUSTRATIVE EXAMPLE
STEP 7: Conclusion.
At 0.05 level of significance, there is enough evidence to support that
the blood pressure depends on whether a person jogs or not.
END OF PRESENTATION