0% found this document useful (0 votes)
93 views47 pages

ES031 M5 ChiSquare

This document provides an overview of chi-square tests, including the chi-square test for goodness of fit and independence. It discusses the key differences between parametric and non-parametric tests, noting that chi-square tests use frequency data rather than numerical scores. The document outlines the assumptions, properties, test statistic, and hypothesis testing process for the chi-square goodness of fit test. Examples are provided to demonstrate how to perform a goodness of fit test to analyze whether observed data fits an expected distribution.

Uploaded by

Gabriel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views47 pages

ES031 M5 ChiSquare

This document provides an overview of chi-square tests, including the chi-square test for goodness of fit and independence. It discusses the key differences between parametric and non-parametric tests, noting that chi-square tests use frequency data rather than numerical scores. The document outlines the assumptions, properties, test statistic, and hypothesis testing process for the chi-square goodness of fit test. Examples are provided to demonstrate how to perform a goodness of fit test to analyze whether observed data fits an expected distribution.

Uploaded by

Gabriel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Cebu Institute of Technology - University

Department of industrial engineering

CHI SQUARE
TESTS
Prepared by: Engr. Kristan Ian D. Cabaña
OUTLINE

Two Non-Parametric Hypothesis Tests


• Chi-square Test for Goodness-of-Fit
• Chi-square Test for Independence
PARAMETRIC AND
NON-PARAMETRIC
TESTS
PARAMETRIC AND NON-PARAMETRIC TESTS
MODULE 7

PARAMETRIC TEST
Previous examples of hypothesis tests, such as the t tests
and analysis of variance, are parametric tests and they do
include assumptions about parameters and hypotheses
about parameters.

NON-PARAMETRIC TEST
The term "non-parametric" refers to the fact that the
chi-square tests do not require assumptions about
population parameters nor do they test hypotheses about
population parameters.
PARAMETRIC AND NON-PARAMETRIC TESTS
MODULE 7

The most obvious difference between the


chi-square tests and the other hypothesis tests
we have considered (t and ANOVA) is the
nature of the data.

For chi-square, the data are frequencies rather


than numerical scores.
CHI SQUARE TESTS
CHI-SQUARE TESTS
MODULE 7
CHI-SQUARE TESTS
MODULE 7
CHI-SQUARE TESTS
MODULE 7
GOODNESS OF FIT TEST
GOODNESS-OF-FIT TEST
MODULE 7

GOODNESS-OF-FIT TEST

The Chi-Square Statistic can be used to see whether a


frequency distribution fits a specific pattern. The chi-square
goodness-of-fit test is used for this test.
GOODNESS-OF-FIT TEST
MODULE 7

GOODNESS-OF-FIT TEST

Chi-Square goodness of fit test is a non-parametric test that


is used to find out how the observed value of a given
phenomena is SIGNIFICANTLY DIFFERENT from the
expected value. In Chi-Square goodness of fit test, the term
goodness of fit is used to compare the observed sample
distribution with the expected probability distribution.
GOODNESS-OF-FIT TEST
MODULE 7

GOODNESS-OF-FIT TEST

For Example, a researcher wants to determine whether


consumers have any preference among the flavors of ice
cream. A sample of 100 people provided the data:
Flavor Frequency
Chocolate 32
Strawberry 28
Mango 16
Cookies ‘n Cream 14
Vanilla 10
GOODNESS-OF-FIT TEST
MODULE 7

GOODNESS-OF-FIT TEST
Flavor Observed Frequency Expected Frequency
Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ‘n Cream 14 20
Vanilla 10 20

OBSERVED FREQUENCY – frequency obtained from the sample


EXPECTED FREQUENCY – frequency obtained from calculation
GOODNESS-OF-FIT TEST
MODULE 7

ASSUMPTIONS

The test statistic has a chi-square distribution


when the following assumptions are met
1. The data obtained from a random sample
must be independent.
2. The expected frequency for each category
must be at least 5.
GOODNESS-OF-FIT TEST
MODULE 7

PROPERTIES

The following are properties of the goodness-of-fit


test:
1. The data are the observed frequencies. This means that there is
only one data value for each category.
2. The degrees of freedom is one less than the number of
categories, not one less than the sample size.
3. It is always a right tail test.
4. It has a chi-square distribution.
5. The value of the test statistic doesn't change if the order of the
categories is switched.
GOODNESS-OF-FIT TEST
MODULE 7

Chi-Square Test Statistic

!
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)
𝜒! = #
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

𝑤𝑖𝑡ℎ 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 − 1


GOODNESS-OF-FIT TEST
MODULE 7

HYPOTHESIS TESTING

Before computing the test statistic, the


hypotheses must be stated first.

𝐻! : 𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑟𝑠 𝑠ℎ𝑜𝑤 𝑛𝑜 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒.


𝐻" : 𝐶𝑜𝑛𝑠𝑢𝑚𝑒𝑟𝑠 𝑠ℎ𝑜𝑤 𝑎 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒.
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE
A clothing manufacturer wants to determine whether
customers prefer any specific color over other colors in
shirts. He selects a random sample of 102 shirts sold notes
the color. The table shows the results. At alpha=0.10, is there
a color preference?
Flavor Frequency
White 43
Blue 22
Black 16
Red 10
Yellow 6
Green 5
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: Customers have no color preference.
Ha: Customers show a color preference.
STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.10 ; Right-Tailed Test, n= 6; df = 6-1 = 5
STEP 3: Identify the critical value of the test statistic.

Test Statistic, 𝜒!" = +9.236


HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.

Rejection Region

X2
9.236

STEP 5: Calculate the standardized test statistic.

$ (&'("))! ($$("))! +(") !


𝜒# = + + ⋯+ = 59.7647
") ") ")
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.

Using the standardized and critical values of the test statistic.


|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 59.7647| > |𝑋 ! & = 9.236| : Reject H0

Using the P-value.


𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼 = 0.05 : Reject H0
HYPOTHESIS TESTING
MODULE 3

ILLUSTRATIVE EXAMPLE

STEP 7: Conclusion.

At 0.10 level of significance, there is enough evidence to reject the


claim that customers show no preference for the color of shirts.
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE
The dean of student affairs of a college wishes to test the claim that
the distribution of students is as follows: 40% nursing; 25% business;
15% computer science; 10% engineering; 5% humanities; and 5%
education. Last semester, the program enrollment was distributed as
shown below. At alpha=0.05, is the distribution of students the same
as hypothesized?
MAJOR NUMBER
Nursing 72
Business 53
Computer Science 32
Engineering 20
Humanities 16
Education 7
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE

MAJOR OBSERVED EXPECTED


Nursing 72 200*40%=80
Business 53 50
Computer Science 32 30
Engineering 20 20
Humanities 16 10
Education 7 10
TOTAL 200 200
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE
PRINTED CIRCUIT BOARD DEFECTS: The number of defects in printed
circuit boards is hypothesized to follow a Poisson distribution. A
random sample of 60 printed circuit boards has been collected, and
the following number of defects observed:
NO. OF DEFECTS FREQUENCY
0 32
1 15
2 9
3 4

Using alpha=0.05, can we claim that the number of defects follows a


Poisson distribution?
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE
𝑒 9: 𝜆; 32 ∗ 0 + 15 ∗ 1 + 9 ∗ 2 + (4 ∗ 3)
𝑃 𝑥 = 𝜆= = 0.75
𝑥! 60

OBSERVED EXPECTED
NO. OF DEFECTS
FREQUENCY FREQUENCY
𝑒 !".$% 0.75"
0 32 60*0.472 = 28.32 𝑃 𝑥=0 =
0!
1 15 21.24 𝑃 𝑥 = 0 = 0.472
2 9 7.98
3 4 2.46
GOODNESS-OF-FIT TEST
MODULE 7

ILLUSTRATIVE EXAMPLE

OBSERVED EXPECTED
NO. OF DEFECTS
FREQUENCY FREQUENCY

0 32 28.32

1 15 21.24

2 OR MORE 13 10.44
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: The number of defects does not form a Poisson distribution.
Ha: The number of defects does form a Poisson distribution.
STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.05 ; Right-Tailed Test, k = 3;
df = k-p-1 = 3-1-1=1 where p is the number of parameters estimated
STEP 3: Identify the critical value of the test statistic.

Test Statistic, 𝜒!" = +3.841


HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.

Rejection Region

X2
+3.841

STEP 5: Calculate the standardized test statistic.

𝜒#$ = 2.94
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.

Using the standardized and critical values of the test statistic.


|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 2.94| < |𝑋 ! & = 3.84| : Do not reject H0

Using the P-value.


𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 = 0.05 : Do not reject H0
HYPOTHESIS TESTING
MODULE 3

ILLUSTRATIVE EXAMPLE

STEP 7: Conclusion.

At 0.05 level of significance, there is no enough evidence to support


the claim that the number of defects follows a Poisson distribution.
TEST FOR INDEPENDENCE
TEST FOR INDEPENDENCE
MODULE 7

TEST FOR INDEPENDENCE

There are times when we might be interested in observing


more than one variable on each individual to find if a
relationship exists between these variables.
TEST FOR INDEPENDENCE
MODULE 7

TEST FOR INDEPENDENCE

As an example, for each person we might observe his blood


type and eye color and investigate if these characteristics
are related in any way. Our goal is a TEST OF
INDEPENDENCE, or to find whether two observed
characteristics of a member of a population are
independent.
TEST FOR INDEPENDENCE
MODULE 7

TEST FOR INDEPENDENCE

Column 1 Column 2 Column 3


Row 1 C11 C12 C13
Row 2 C21 C22 C23

Contingency Table – a table for determining whether the distribution according to


one variable is contingent on the distribution of the other

Cell – the name for each block in the contingency table and is designated by its row
and column position
TEST FOR INDEPENDENCE
MODULE 7

TEST FOR INDEPENDENCE

The chi-square independence test can be used to test the


independence of two variables. The hypotheses are stated
as follows:

𝐻" : 𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.


𝐻# : 𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.
GOODNESS-OF-FIT TEST
MODULE 7

Chi-Square Test Statistic

!
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠)
𝜒! = #
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

𝑤𝑖𝑡ℎ 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑛𝑜. 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1 𝑥 (𝑛𝑜. 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1)


TEST FOR INDEPENDENCE
MODULE 7

EXPECTED FREQUENCY

1. Find the sum of each row and each column, and find the
grand total.
2. For each cell, multiply the corresponding row sum by the
column sum and divide by the grand total, to get the
expected value.
𝑅𝑜𝑤 𝑆𝑢𝑚 𝑥 𝐶𝑜𝑙𝑢𝑚𝑛 𝑆𝑢𝑚
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 =
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
3. Place the expected values in the corresponding cells
along with the observed values.
TEST FOR INDEPENDENCE
MODULE 7

ILLUSTRATIVE EXAMPLE
A study is being conducted to determine whether there is a
relationship between jogging and blood pressure. A random sample of
210 subjects is selected, and they are classified as shown in the table
that follows. Use alpha=0.05.

Blood Pressure
Jogging Status Low Moderate High Total
Joggers 34 57 21 112
Nonjoggers 15 63 20 98
Total 49 120 41 210
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 1: State the null and alternative hypotheses.
H0: The blood pressure of a person does not depend on whether he jogs or not.
Ha: The blood pressure of a person depends on whether he jogs or not.

STEP 2: Choose the level of significance, type of test, and sample size.
Level of significance, α = 0.05 ; Right-Tailed Test,
df = (r-1)(c-1)=(2-1)(3-1)=2
STEP 3: Identify the critical value of the test statistic.

Test Statistic, 𝜒!" = +5.991


HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 4: Determine the rejection region.

Rejection Region

X2
+5.991
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE

STEP 5: Calculate the standardized test statistic.

(112)(49) (98)(49)
𝐸77 = = 26.13 𝐸"7 = = 22.87
210 210
(112)(120) (98)(120)
𝐸7" = = 64 𝐸"" = = 56
210 210
(112)(41) (98)(41)
𝐸78 = = 21.87 𝐸"8 = = 19.13
210 210
TEST FOR INDEPENDENCE
MODULE 7

ILLUSTRATIVE EXAMPLE
Blood Pressure
Jogging Status Low Moderate High Total
O E O E O E
Joggers 34 26.13 57 64 21 21.87 112
Nonjoggers 15 22.87 63 56 20 19.13 98
Total 49 120 41 210

$ ('&($,."')! ("+($$..))! $!("/."' !


𝜒# = + + ⋯+ = 6.79
$,."' $$..) "/."'
HYPOTHESIS TESTING
MODULE 7

ILLUSTRATIVE EXAMPLE
STEP 6: State the Decision Rule.

Using the standardized and critical values of the test statistic.


|𝑋 ! "#$"%$#&'( | ≥ |𝑋 ! ")*&*"#$ | : Reject H0
|𝑋 ! "#$"%$#&'( | < |𝑋 ! ")*&*"#$ | : Do not reject H0
|𝑋 ! " = 6.79| > |𝑋 ! & = 5.991| : Reject H0

Using the P-value.


𝑃 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝛼 : Reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼 : Do not reject H0
𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼 = 0.05 : Reject H0
HYPOTHESIS TESTING
MODULE 3

ILLUSTRATIVE EXAMPLE

STEP 7: Conclusion.

At 0.05 level of significance, there is enough evidence to support that


the blood pressure depends on whether a person jogs or not.
END OF PRESENTATION

You might also like