Rank Sum Test
By Jayasiri Deshabandu(B.Sc., M.Sc., C.E.I. )
Background: When two samples are drawn from two distributions in order to compare medians of the
distributions, we use the Wilcoxon rank sum test. Here underlying population distributions are
unknown but we need to assume that probability distributions have the same shape. In other words
they have the same variance. Null hypothesis is usually the medians are the same. This means we are
assuming that two samples drawn from the same distribution if the null hypothesis is true. The second
assumption is two samples are independent. You need to remember these two assumptions for your
examination.
Example: In 2016, nine economists made predictions for the growth rate in Indian gross domestic
product (GDP) for 2017. Their predictions are shown below, together with seven predictions by another
economists for the growth rate in the Chinese GDP made at the same time. Note that data are unpaired.
The formal statement is:
H0 : The population median predictions for India and China are equal.
H1 : The population median prediction for China is greater than that for India.
Significance level: 5%
Step 1: We rank the combined data together
Step 2: Take the sum Rm = 81 of the ranks for the
sample of the smaller sample size m = 7. Since un-
der the null hypothesis we assume that data come
from the same distribution with same median, for
any sample of size 7, the minimum possible sum is
1 + 2 + · · · 7 = 28 and the maximum possible is 91.
Probabilities of getting a value Rm close to maxi-
mum possible or minimum possible are the same
by symmetry. The Wilcoxon table have been con-
structed using the test statistic value close to the
minimum possible. We find the value close to the
minimum possible by taking the minimum of Rm
and m2 (m + n + 1) − Rm . For this example, we take the
minimum of 81 and 119 − 81. So the test statistic is
38. Keep in mind that required process is explained
in the formula sheet.
Explanation: Minimum possible value for the sum of the ranks Rm is 1 + 2 + 3 + · · · + m = m 2 (1 + m)
Maximum possible value for the sum of the ranks Rm is n + 1 + n + 2 · · · + n + m = m
2 (m + 2n + 1).
m
Note that Max + Min = 2 (m + n + 1). Suppose Rm is close to Rmax . We find the corresponding number L
close to Rmin .
Due to symmetry we have max − Rm = L − Rmin . Hence L = Rmax + Rmin − Rm .
Step 3
Use the Wilcoxon Rank-Sum Test table to find the value 43 under one tail 5%. This means that if the
null hypothesis is true, the chance that value of a getting a test statistic less than 43 is only 5%. Since
our test statistic 38 is less than 43, we do not have sufficient evidence to accept the null hypothesis that
India and China have the same median prediction rate!
When the value of n > 10, it is not possible to use the Wilcoxon Rank-Sum Table. We use the normal
approximation using
1
a normal distribution with mean µ = 12 (m + n + 1) and variance σ 2 12 mn(m + n + 1)
If the test statistic T = min(Rm , m(m + n + 1) − Rm ) when n > 10, do not forget to use a continuity
correction for T as it is a discrete variable.
If we have a one tail test with 5% confidence level, you will reject the null hypothesis if
T + 0.05 − µ
P (z < ) < 0.5
σ