Math13 - Advanced Statistics Lecture Note: Case of K-Samples
Math13 - Advanced Statistics Lecture Note: Case of K-Samples
Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Kruskal-Wallis test
2. perform the Kruskal-Wallis test
3. explain the results of the Kruskal-Wallis test
Characteristics
The assumptions are similar to those for the Mann-Whitney test: Independent group
samples, data in each group is randomly selected, and data is at least ordinal.
No assumptions are made about the type of underlying distribution.
Each group sample has at least 5 elements.
No population parameters are estimated, and so there are no confidence intervals.
Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)
k 2
12 Ri
H= ∑
N ( N +1 ) i=1 ni
−3 ( N +1 )
where N – total number of observations, Ri – ranks in each group, ni – sample size of each
group, T =t 3−t , and t – number of tied observations in a rank.
After rejecting the null hypothesis, a pairwise comparison of medians may be performed.
If the formula is satisfied then two population, i and j, seem to be different. In this section, we will
use the Schaich-Hamerle test (based on the chi-square distribution) and the Conover test
(based on student t-distribution).
Schaich-Hamerle test: The difference between the ith and jth group are significant if
| Ri R j
ni n j |√
2
− > χ α ,df ⋅
N ( N +1 ) 1 1
12
+
(
ni n j )
Conover test: The difference between the ith and jth group are significant if
| Ri R j
− >t α
ni n j 2
|
, N −k
√
2 N −1−H 1
S ⋅
N −k n i
+
1
n j
( )
No ties
N ( N +1 )
S 2=
12
With ties
S=
1
2
N−1 ( N ( N +1 )2
∑ R ( X ij ) − 4
2
)
where ∑ R ( X ij ) are sum of squared ranks of the ith element of the jth group.
2
Example: The following table shows the scores of students from three schools in a quiz bee.
Test at 5% level of significance the claim that there is a significant difference among the median
score of three schools.
School A 81 32 42 62 37 44 38 47 49 41
School B 48 31 25 22 30 30 32 15 40
School 18 49 33 19 24 17 48 22
C
Solution:
By means of box-plot, we can check that the data set is not normally distributed. Hence, we can
use the Kruskal-Wallis test.
Step 1: Hypotheses
Ho: The schools have equal dominance. (The median scores of the three schools are equal.)
Ha: At least one of the school is dominant over the others. (At least one median score is not
equal with the others.)
( )
2 2 2
12 199 96.5 82.5
+ + −3 ( 27+ 1 )
27 ( 27+1 ) 10 9 8
H= =8.800347328
30
1− 3
27 −27
Pair | Ri R j
−
ni n j | √ 2
χ α ,df ⋅
12 (
N ( N +1 ) 1 1
+
ni n j ) Significance
| | √ ( )
Ri R j 2 N −1−H 1 1
Pair − tα S ⋅ + Significance
ni n j 2
,N −k N−k ni n j
A–B 9.1778 > 6.3671 Significant
A–C 9.5875 > 6.5732 Significant
B–C 0.4097 < 6.7335 Not Significant
Note: t α ,N −k =t 0.025 ,24=2.063899 , S2=62.903846
2
Based on the post-hoc analysis, School A has dominance over the other two schools.
Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Friedman test
2. perform the Friedman test
3. explain the results of the Friedman test
Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)
k
12
χ 2r = ∑
Nk ( k +1 ) i=1
R2i −3 N ( k +1 )
where N – number of subjects (blocks), Ri – ranks in each group (treatments), Rij – elements in
the ith group (column) and jth subject (row).
√ ( )
k
2 n ∑ Rij −∑ Ri
2
| Ra R b
−
na nb |
>t α
2
⋅
, ( n−1) ( k−1 ) n
1 i=1
( n−1 ) ( k −1 )
Example: A pastry chef invited 12 people to find out their preferred cakes (chocolate, carrot, red
velvet). Assuming that the order of tasting the cakes are randomly chosen and there is suitable
time intervals between tasting, test at 5% level of significance that there no significant difference
in the cake preferences.
Solution:
Using box plot, we can determine that normality of data is not satisfied.
Step 1: Hypotheses
Ho: There no significant difference in the cake preferences.
Ha: At least one cake is more preferred than the rest.
Data Ranks
Person Chocolat Carrot Red Velvet Chocolate Carrot Red Velvet
e
1 10 7 8 3 1 2
2 8 5 5 3 1.5 1.5
3 7 8 6 2 3 1
4 9 6 4 3 2 1
5 7 5 7 2.5 1 2.5
6 4 7 5 1 3 2
7 5 9 3 2 3 1
8 6 6 7 1.5 1.5 3
9 5 4 6 2 1 3
10 10 6 4 3 2 1
11 4 7 4 1.5 3 1.5
12 7 3 3 3 1.5 1.5
Ri 27.5 23.5 21
R2i 756.25 552.2 441
5
Learning Objectives:
At the end of the lesson, you will be able to:
1. discuss the assumption and conditions of using the Cochran’s Q test
2. perform the Cochran’s Q test
3. explain the results of the Cochran’s Q test
The Cochran’s Q test is a nonparametric test for two-way analysis of variance with repeated
measures where the dependent variable is dichotomous. It is the extension of the McNemar test
for two related samples.
Assumptions
1. Responses are binary and from k matched samples.
2. The subjects are independent of one another an were selected at random from a larger
population.
3. The sample size is sufficiently large. (As a rule of thumb, the number of subjects for which the
responses are not all 0’s or 1’s, n, should be greater than or equal 4 and nk should be greater
than or equal 24. This assumption is not required for exact binomial McNemar test.)
For large samples, the test statistic, Q, is distributed as chi-square with k – 1 degrees of
freedom. As in the McNemar test, only subjects who do not have the same response in all
categories contribute to the overall Q statistic.
Method
1. State the hypotheses.
Ho: The group population have equal dominance. (The group population medians are equal.)
Ha: At least one of the group population is dominant over the others. (At least one group
population median is not equal with the others.)
[ (∑ ) ]
k k 2
( k −1 ) k ∑ G − 2
j Gj
j=1 j=1
Q= N N
k ∑ Li−∑ L2i
i=1 i=1
where G j – total number of successes in jth column (groups), Li – total number of successes in
ith row, N – number of rows (subjects), k – number of groups
After rejecting the null hypothesis, a pairwise comparison of medians may be performed
using the McNemar Test.
Example: A college professor wanted to examine whether pass rates increased as students
had more time to study. In the study, 60 students enrolled in the review class took part. All
students were first given a surprise exam to test their current knowledge. They were then given
a mock exam two weeks later before they took a final exam a further two weeks later. Students’
performance in the exam were assessed in terms of a pass or fail. Test the appropriate
hypothesis at 5% level of significance.
Solution:
Step 1: Hypotheses
Ho: The group population medians are equal.
Ha: At least one group population median is not equal with the others.
The post-hoc analysis is left to the readers as part of the practice set.
WORKSHEET NO. 3
2. Nine experts rated the top four research papers in a research conference. A rating on a 7-
point scale (1 – needs improvement to 7 – excellent work) is given for each of the following four
criteria: Background of the Problem, Methodology, Results and Findings, Conclusions and
Implications. The following table displays the final ratings.
Exper Research Paper
t A B C D
AA 24 26 25 22
BB 27 27 26 24
CC 19 22 20 16
DD 24 27 25 23
EE 22 25 22 21
FF 26 27 24 24
GG 27 26 22 23
HH 25 27 24 21
II 22 23 20 19
At 5% level of significance, is there evidence of a difference in the median ratings of the four
research papers? Perform a post-hoc analysis, if necessary.
3. Determine if the proportion of students who pass a test is equal when using three different
studying techniques. Use 5% level of significance to test the appropriate statistics. (1 – Passed,
0 – Failed). Perform post-hoc
REFLECTION LOG
Directions:
1. Look for at least 5 research articles that used nonparametric test for k-samples: Kruskal-
Wallis Test, Friedman Test, and Cochran’s Q Test in their data analysis.
2. Discuss why they use these nonparametric tests and the results/findings associated with
the data analysis.
3. What have you learned from these studies in regards the use of nonparametric tests for
k-samples?