0% found this document useful (0 votes)
295 views13 pages

Math13 - Advanced Statistics Lecture Note: Case of K-Samples

The document discusses the Kruskal-Wallis H test, a non-parametric test used to compare more than two independent samples. It describes the assumptions, method, and example use of the test. The test compares the medians of multiple groups and can be used as an alternative to one-way ANOVA when the data is not normally distributed or variances are unequal. An example calculation demonstrates how to perform the test and post-hoc analyses to determine which group medians differ significantly.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
295 views13 pages

Math13 - Advanced Statistics Lecture Note: Case of K-Samples

The document discusses the Kruskal-Wallis H test, a non-parametric test used to compare more than two independent samples. It describes the assumptions, method, and example use of the test. The test compares the medians of multiple groups and can be used as an alternative to one-way ANOVA when the data is not normally distributed or variances are unequal. An example calculation demonstrates how to perform the test and post-hoc analyses to determine which group medians differ significantly.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

MATH13 – ADVANCED STATISTICS

Lecture Note: Case of k-samples

Lesson No. 1: Kruskal-Wallis H Test

Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Kruskal-Wallis test
2. perform the Kruskal-Wallis test
3. explain the results of the Kruskal-Wallis test

The Kruskal-Wallis H Test is a non-parametric alternative to the one-way analysis of


variance. It is an extension of the Mann-Whitney Test to more than two independent samples. It
is useful in the following instances:
a. group samples strongly deviate from normal; This is especially relevant when sample
sizes are small and unequal, and data are not symmetric.
b. group variances are quite different because of the presence of outliers.

Characteristics

 The assumptions are similar to those for the Mann-Whitney test: Independent group
samples, data in each group is randomly selected, and data is at least ordinal.
 No assumptions are made about the type of underlying distribution.
 Each group sample has at least 5 elements.
 No population parameters are estimated, and so there are no confidence intervals.

Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)

2. State the rejection rule.


With degrees of freedom df =k −1, where k is the number of groups, use Table C (Chi-square
2
Distribution). Reject Ho if H ≥ χ α ,df

3. Find the value of the test statistic.

k 2
12 Ri
H= ∑
N ( N +1 ) i=1 ni
−3 ( N +1 )

But for data set with many tied ranks, use


k 2
12 Ri

N ( N +1 ) i=1 ni
−3 ( N +1 )
H=
1− 3
∑T
N −N

where N – total number of observations, Ri – ranks in each group, ni – sample size of each
group, T =t 3−t , and t – number of tied observations in a rank.

The corresponding p-value of H is computed based on the chi-square distribution.

4. State the conclusion and interpretation.

After rejecting the null hypothesis, a pairwise comparison of medians may be performed.
If the formula is satisfied then two population, i and j, seem to be different. In this section, we will
use the Schaich-Hamerle test (based on the chi-square distribution) and the Conover test
(based on student t-distribution).

Schaich-Hamerle test: The difference between the ith and jth group are significant if

| Ri R j
ni n j |√
2
− > χ α ,df ⋅
N ( N +1 ) 1 1
12
+
(
ni n j )
Conover test: The difference between the ith and jth group are significant if

| Ri R j
− >t α
ni n j 2
|
, N −k

2 N −1−H 1
S ⋅
N −k n i
+
1
n j
( )
No ties
N ( N +1 )
S 2=
12
With ties

S=
1
2
N−1 ( N ( N +1 )2
∑ R ( X ij ) − 4
2
)
where ∑ R ( X ij ) are sum of squared ranks of the ith element of the jth group.
2

Example: The following table shows the scores of students from three schools in a quiz bee.
Test at 5% level of significance the claim that there is a significant difference among the median
score of three schools.

School A 81 32 42 62 37 44 38 47 49 41
School B 48 31 25 22 30 30 32 15 40
School 18 49 33 19 24 17 48 22
C
Solution:
By means of box-plot, we can check that the data set is not normally distributed. Hence, we can
use the Kruskal-Wallis test.

Step 1: Hypotheses
Ho: The schools have equal dominance. (The median scores of the three schools are equal.)
Ha: At least one of the school is dominant over the others. (At least one median score is not
equal with the others.)

Step 2: Rejection rule


2
Reject Ho if H ≥ χ 0.05 ,2=5.991

Step 3: Test statistic


Rank the scores from lowest to highest, breaking ties if there are any. Then, get the sum of the
ranks per school.
School A 81 32 42 62 37 44 38 47 49 41
RankA 27 12. 19 26 15 20 16 21 24.5 18 199 n=10
5
School B 48 31 25 22 30 30 32 15 40
RankB 22.5 11 8 5.5 9.5 9.5 12.5 1 17 96.5 n=9
School C 18 49 33 19 24 17 48 22
RankC 3 24. 14 4 7 2 22.5 5.5 82.5 n=8
5

Note that there many tied scores/observations.


From the rankings, we can have the following information.
N=27 , R 1=199 , R2 =96.5 , R 3=82.5

For the tied ranks:


Score 22: t=2 tied observations T =2 −2=6
3

Score 30: t=2 tied observations 3


T =2 −2=6
Score 32: t=2 tied observations 3
T =2 −2=6
Score 48: t=2 tied observations 3
T =2 −2=6
Score 49: t=2 tied observations 3
T =2 −2=6
∑ T =30
The test statistic H is computed as

( )
2 2 2
12 199 96.5 82.5
+ + −3 ( 27+ 1 )
27 ( 27+1 ) 10 9 8
H= =8.800347328
30
1− 3
27 −27

The corresponding p-value is 0.012275.

Step 4: Decision and conclusion/interpretation


2
Since H=8.800> χ 0.05 ,2 =5.991, reject Ho. At least one of the schools is dominant over the
others. (At least one median score is not equal with the others.)

Post-hoc Analysis using Schaich-Hamerle test

Pair | Ri R j

ni n j | √ 2
χ α ,df ⋅
12 (
N ( N +1 ) 1 1
+
ni n j ) Significance

A–B 9.1778 > 8.9264 Significant


A– 9.5875 > 9.2153 Significant
C
B– 0.4097 < 9.4401 Not Significant
C

Post-hoc Analysis using Conover test

| | √ ( )
Ri R j 2 N −1−H 1 1
Pair − tα S ⋅ + Significance
ni n j 2
,N −k N−k ni n j
A–B 9.1778 > 6.3671 Significant
A–C 9.5875 > 6.5732 Significant
B–C 0.4097 < 6.7335 Not Significant
Note: t α ,N −k =t 0.025 ,24=2.063899 , S2=62.903846
2
Based on the post-hoc analysis, School A has dominance over the other two schools.

Lesson No. 2: Friedman Test

Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Friedman test
2. perform the Friedman test
3. explain the results of the Friedman test

The Friedman test is a non-parametric alternative to two-way analysis of variance with


repeated measures. It is an extension of the Wilcoxon Signed-Rank test. No normality
assumption is required. This test only requires the assumption of any continuous data and at
least measured on an ordinal scale.

Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)

2. State the rejection rule.


With degrees of freedom df =k −1, where k is the number of groups (treatments), use Table C
2 2
(Chi-square Distribution). Reject Ho if χ r ≥ χ α ,df

3. Find the value of the test statistic.

k
12
χ 2r = ∑
Nk ( k +1 ) i=1
R2i −3 N ( k +1 )

where N – number of subjects (blocks), Ri – ranks in each group (treatments), Rij – elements in
the ith group (column) and jth subject (row).

4. State the conclusion and interpretation.


After rejecting the null hypothesis, a pairwise comparison of medians may be performed.
If the formula is satisfied then two population, i and j, seem to be different. In this section, we will
use the Conover post-hoc test.

√ ( )
k
2 n ∑ Rij −∑ Ri
2

| Ra R b

na nb |
>t α
2

, ( n−1) ( k−1 ) n
1 i=1

( n−1 ) ( k −1 )

Example: A pastry chef invited 12 people to find out their preferred cakes (chocolate, carrot, red
velvet). Assuming that the order of tasting the cakes are randomly chosen and there is suitable
time intervals between tasting, test at 5% level of significance that there no significant difference
in the cake preferences.

Person Chocolate Carrot Red Velvet


1 10 7 8
2 8 5 5
3 7 8 6
4 9 6 4
5 7 5 7
6 4 7 5
7 5 9 3
8 6 6 7
9 5 4 6
10 10 6 4
11 4 7 4
12 7 3 3

Solution:
Using box plot, we can determine that normality of data is not satisfied.
Step 1: Hypotheses
Ho: There no significant difference in the cake preferences.
Ha: At least one cake is more preferred than the rest.

Step 2: Rejection rule


2 2
Reject Ho if χ r ≥ χ 0.05 ,2=5.991

Step 3: Test statistic


a. Rank each observation per subject (row) from the lowest to highest, breaking ties if
necessary.
b. Compute the sum of the ranks for each group (column). Then, square each sum.

Data Ranks
Person Chocolat Carrot Red Velvet Chocolate Carrot Red Velvet
e
1 10 7 8 3 1 2
2 8 5 5 3 1.5 1.5
3 7 8 6 2 3 1
4 9 6 4 3 2 1
5 7 5 7 2.5 1 2.5
6 4 7 5 1 3 2
7 5 9 3 2 3 1
8 6 6 7 1.5 1.5 3
9 5 4 6 2 1 3
10 10 6 4 3 2 1
11 4 7 4 1.5 3 1.5
12 7 3 3 3 1.5 1.5
Ri 27.5 23.5 21
R2i 756.25 552.2 441
5

The test statistic is


12
χ 2r = ( 756.25+552.25+441 ) −3(12) ( 3+1 )=1.7917
12(3) (3+ 1 )

Step 4: Decision and conclusion/interpretation.


2 2
Since χ r =1.7917 < χ 0.05 ,2 =5.991, fail to reject Ho. There is no significant difference in the cake
preferences.

Lesson No. 3: Cochran’s Q Test

Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Cochran’s Q test
2. perform the Cochran’s Q test
3. explain the results of the Cochran’s Q test

The Cochran’s Q test is a nonparametric test for two-way analysis of variance with repeated
measures where the dependent variable is dichotomous. It is the extension of the McNemar test
for two related samples.

Assumptions
1. Responses are binary and from k matched samples.
2. The subjects are independent of one another an were selected at random from a larger
population.
3. The sample size is sufficiently large. (As a rule of thumb, the number of subjects for which the
responses are not all 0’s or 1’s, n, should be greater than or equal 4 and nk should be greater
than or equal 24. This assumption is not required for exact binomial McNemar test.)

For large samples, the test statistic, Q, is distributed as chi-square with k – 1 degrees of
freedom. As in the McNemar test, only subjects who do not have the same response in all
categories contribute to the overall Q statistic.

Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)

2. State the rejection rule.


With degrees of freedom df =k −1, where k is the number of groups (treatments), use Table C
2
(Chi-square Distribution). Reject Ho if Q ≥ χ α , df

3. Find the value of the test statistic.

[ (∑ ) ]
k k 2
( k −1 ) k ∑ G − 2
j Gj
j=1 j=1
Q= N N
k ∑ Li−∑ L2i
i=1 i=1

where G j – total number of successes in jth column (groups), Li – total number of successes in
ith row, N – number of rows (subjects), k – number of groups

4. State the conclusion and interpretation.

After rejecting the null hypothesis, a pairwise comparison of medians may be performed
using the McNemar Test.

Example: A college professor wanted to examine whether pass rates increased as students
had more time to study. In the study, 60 students enrolled in the review class took part. All
students were first given a surprise exam to test their current knowledge. They were then given
a mock exam two weeks later before they took a final exam a further two weeks later. Students’
performance in the exam were assessed in terms of a pass or fail. Test the appropriate
hypothesis at 5% level of significance.

Student Surprise Exam Mock Exam Final Exam


1 Failed Passed Passed
2 Failed Passed Passed
3 Failed Failed Failed
4 Failed Passed Passed
5 Failed Failed Passed
6 Failed Failed Failed
7 Failed Failed Failed
8 Passed Passed Passed
9 Failed Failed Failed
10 Failed Failed Failed
11 Failed Failed Passed
12 Failed Passed Passed
13 Failed Passed Passed
14 Failed Passed Passed
15 Failed Failed Failed
16 Passed Passed Passed
17 Failed Failed Failed
18 Failed Failed Failed
19 Failed Failed Passed
20 Failed Passed Passed

Solution:
Step 1: Hypotheses
Ho: The group population medians are equal.
Ha: At least one group population median is not equal with the others.

Step 2: Rejection rule


2
Reject Ho if Q ≥ χ 0.05 ,2=5.991

Step 3: Test statistic


a. Convert the string of data to 0’s (failed) and 1’s (passed).
b. Get the total number of success per column and per row.
c. Square the number of success per column and get the total.
Surprise Mock Exam Final Exam Li L2i
Exam
0 1 1 2 4
0 1 1 2 4
0 0 0 0 0
0 1 1 2 4
0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
1 1 1 3 9
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 1 1 2 4
0 1 1 2 4
0 1 1 2 4
0 0 0 0 0
1 1 1 3 9
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 1 1 2 4
G1=2 G 1=9 G1=12 N N

∑ Li =23 ∑ L2i =49


i=1 i=1

( 3−1 ) [ 3 ( 22 +9 2+122 ) −( 2+ 9+12 )2 ]


Q= =15.8
3(23)−49

Step 4: State the conclusion and interpretation.


2
Since Q=15.8> χ 0.05 ,2=5.991 , reject Ho. At least one group population median is not equal with
the others.

The post-hoc analysis is left to the readers as part of the practice set.

WORKSHEET NO. 3

Directions: Solve the following problems completely.

1. Does physical exercise alleviate depression? An experiment involving 24 persons clinically


diagnosed with depression were randomly assigned to one of the three groups: no exercise, 20
minutes of jogging per day; 60 minutes of jogging per day. At the end of the month, each person
was asked to rate how depressed they now feel (on a Likert scale: 1 – totally miserable to 100 –
ecstatically happy). Test the appropriate statistics at 5% level of significance. Perform post-hoc
analysis, if necessary.

No Jogging for 20 minutes Jogging for 60 minutes


Exercise
23 22 59
26 27 66
51 39 38
49 29 49
58 46 56
37 48 60
29 49 62
44 65 56

2. Nine experts rated the top four research papers in a research conference. A rating on a 7-
point scale (1 – needs improvement to 7 – excellent work) is given for each of the following four
criteria: Background of the Problem, Methodology, Results and Findings, Conclusions and
Implications. The following table displays the final ratings.
Exper Research Paper
t A B C D
AA 24 26 25 22
BB 27 27 26 24
CC 19 22 20 16
DD 24 27 25 23
EE 22 25 22 21
FF 26 27 24 24
GG 27 26 22 23
HH 25 27 24 21
II 22 23 20 19

At 5% level of significance, is there evidence of a difference in the median ratings of the four
research papers? Perform a post-hoc analysis, if necessary.

3. Determine if the proportion of students who pass a test is equal when using three different
studying techniques. Use 5% level of significance to test the appropriate statistics. (1 – Passed,
0 – Failed). Perform post-hoc

Student Surprise Exam Mock Exam Final Exam


1 Failed Passed Passed
2 Failed Passed Passed
3 Failed Failed Failed
4 Failed Passed Passed
5 Failed Failed Passed
6 Failed Failed Failed
7 Failed Failed Failed
8 Passed Passed Passed
9 Failed Failed Failed
10 Failed Failed Failed
11 Failed Failed Passed
12 Failed Passed Passed
13 Failed Passed Passed
14 Failed Passed Passed
15 Failed Failed Failed
16 Passed Passed Passed
17 Failed Failed Failed
18 Failed Failed Failed
19 Failed Failed Passed
20 Failed Passed Passed

REFLECTION LOG

Directions:
1. Look for at least 5 research articles that used nonparametric test for k-samples: Kruskal-
Wallis Test, Friedman Test, and Cochran’s Q Test in their data analysis.
2. Discuss why they use these nonparametric tests and the results/findings associated with
the data analysis.
3. What have you learned from these studies in regards the use of nonparametric tests for
k-samples?

You might also like