0% found this document useful (0 votes)
28 views13 pages

Tests For Rank Correlation Coefficients. I

This document presents a study on the sampling distributions and relationships between three measures of rank correlation, specifically Spearman's coefficient and Kendall's coefficient, using data from correlated random normal deviates. The authors aim to address the distributional problems associated with these coefficients in the context of bivariate normal populations and provide empirical results from extensive sampling experiments. The findings suggest that under certain conditions, the z-transforms of the rank correlation measures can be treated as normal variates, allowing for significance testing and comparisons of correlation in different populations.

Uploaded by

Samm Sung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

Tests For Rank Correlation Coefficients. I

This document presents a study on the sampling distributions and relationships between three measures of rank correlation, specifically Spearman's coefficient and Kendall's coefficient, using data from correlated random normal deviates. The authors aim to address the distributional problems associated with these coefficients in the context of bivariate normal populations and provide empirical results from extensive sampling experiments. The findings suggest that under certain conditions, the z-transforms of the rank correlation measures can be treated as normal variates, allowing for significance testing and comparisons of correlation in different populations.

Uploaded by

Samm Sung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Tests for Rank Correlation Coefficients.

E. C. Fieller; H. O. Hartley; E. S. Pearson

Biometrika, Vol. 44, No. 3/4. (Dec., 1957), pp. 470-481.

Stable URL:
http://links.jstor.org/sici?sici=0006-3444%28195712%2944%3A3%2F4%3C470%3ATFRCCI%3E2.0.CO%3B2-6

Biometrika is currently published by Biometrika Trust.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/bio.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.org
Thu Feb 28 22:17:41 2008
TESTS FOR RANK CORRELATION COEFFICIENTS. I

BY E. C. FIELLER, H. 0. HARTLEY AND E. S. PEARSON


Statistical Advisory Unit, Ministry of Supply, London;
Iowa State College, Ames; University College, London

( 1.1) The measures considered


The following is a first report on an investigation which became possible with the avail-
ability of the 25,000 sets of correlated random normal deviates, 3000 of which were pub-
lished in Fieller, Lewis & Pearson's (1955) Tracts for Computers, no. XXVI. The object
which we set ourselves was to study with the aid of these data the sampling distributions of,
and relationships between, three measures of rank correlation; in the case where the basic
variables which have been ranked follow bivariate normal distributions.
We shall use the following notation. Suppose that there are n pairsof associated rankings
u,, u2, ..., U, and v,, v,, ..., v,,
where the integers ui (i = 1,2,..., n ) may be taken in ascending order 1,2, ...,n and the
vi are a permutation of these integers. We shall consider in the present paper the two
following measures of correlation between these rankings:
(a) Spearman's coefficient which we denote by r,. This is simply the product moment
correlation coefficient of ui, vi and may be computed from the sum of squared differences

where rs = 1 - 6Ss/(n3- n). (2)


( b ) Kendull's coefficient,7,which we denote by rK. This may be computed as follows.
For every integer ui count the number of vj with vj > ui and j > i; then add these counts to
obtain the posit,ive score PK.Then

Both r, and rK lie between + 1 and - 1. We shall not be concerned liere with ties among
the u's or v's.
The followiiig is a third coefficient which has been computed for all the sampling data
and which we hope to consider later:
(c) The Fisher-Ya.tes coefficient. Let ((i / n) be a so-called normal order statistic, i.e. the
expected value of the it11 largest standardized deviate in a sample of n observations from
a normal population. Then we may attach these score values to both the u rankings
and the v rankings. Fisher & Yates (1938, p. 50) have suggested that a measure of rank
correlation might be obtained from the product moment correlation coefficient of these
scores, namely
IF = 5 t(i1 n) t(vi I n)
i=l i=l
I
Convenient tables of the individual f[(i n) as well as of 2 f [ 2 ( i I n) are given, for example,
i
in Fisher & Yates (1938, Tables X X and XXI). As an approximation to the actual product-
moment correlation coefficient, r,,, in a normal sample r, clearly has much to recommend
it; but the only discussions of this coefficient of which we are aware are those by Jeffreys
(1948, pp. 209-10) and Hoeffding (1951, pp. 86-9).
(1.2) Xome known results on the distribution theory of rs and r,
For a comprehensive summary of the older results, the reader may consult Kendall
(1948) and Moran (1950). Briefly these are as follows:
For independent random rankings (i.e. for random permutations of the vi) the complete
distributions of rs and rK have been obtained for small n by combinatorial enumeration.
Adequate approximations have been evolved for larger n.
I n the case of correlated rankings it is first necessary to specify the nature of the depend-
ence. A discussion of this problem of appropriate population models was given by Daniels
(1950), and very recently Mallows (1957) has developed a new form of approach related to
paired-comparison theory. I n the present paper we start from the assumption that the
n pairs of rankings ui, vi have arisen as the rank numbers in a sample of n pairs of correlated
normal variates. Thus, if xi, yi (i = 1,2, ...,n) denote a random sample of n paired obser-
vations from a bivariate normal population having correlation coefficient p, we suppose
that the xi are arranged in order of magnitude and that vi is the rank of y,. This model
has received considerable attention and a certain number of theoretical results are known.
Thus we have
6
G(rs) = -{sin-lp + (n- 2) sin-I ip) (Moran, 1948),
(n+ 1)n
var (rs) l / n ( l - 1.563465p2+ 0*304743p4+0.155286p6+ 0.061552ps + 0.022099p10 ...).
= +
(6)
Equation (6) is a large sample approximation due to Kendall(1949) and David, Kendall &
Stuart (1951). As we shall see below, it does not appear to be very accurate when the
sample size is as small as 10. Turning to r,, we have
2
b(rK) = -sin-lp (Greiner, 1909), (7)
77

-
var (rK)= n(n- ( (
1)[ I - ( ~ ~ i n - ~ ~ ) ~ + 2- (-~ -sin-lip
' -2) )2)] (Esscher, 1924). (8)

As far as we are aware, no results are available for the higher moments or cumulants of
rs or r,, but Sundrum (1953) showed how the third and fourth moments of rK might be
obtained in the general case. He also used some random sampling results to give empirical
values for these moments, assuming underlying normal correlation, in the single casep = 1/42.
As can be seen from equations (6) and (8), the standard deviations of rs and r, change
with p. Further, as might be anticipated from the parallel case of the product moment
correlation coefficient r,,, the shape of the sampling distributions are found to change
with p. Thus, when we get away from the problem of using rank correlation coefficientsin
tests of independence, we a t once run into difficulties. The lack of results for dependent
rankings has made it difficult to compare the relative merits of different rank coefficients
in detecting dependence, nor has it been possible to use these coefficients for a comparison
of correlation in different populations. If we accept the underlying bivariate normal struc-
472 Tests for rank correlation coeficients. I
ture, then we are faced with the distributional problem; if we do not accept this, then we
have also to look for a simple definition of non-parametric dependence.
(1.3) The present results and their bearing on these difiulties
While we do not claim to have solved all these difficultieswe hope, in this paper, to have
compiled evidence which shows that the problem is capable of a simple solution provided the
rankings arise from the class of population models specified below. We proceed as follows:
(A) We start with r&nkingsgenerated by sampling from a bivariate normal parent with
correlation p. With the help of extensive sampling experiments backed by analytical
approximation, we show that if n is not too large the z-transforms

are approximately normally distributed with variances nearly independent of p. I n fact

var (zs) --
1.060
n-3'
var (2,) --
0.437
n-4'
The expectation of zK can be expressed approximately as a simple function of p, making
use of the expressions for &(rK)and var (r,) given in (7) and (8). The approximation to the
expectation of zs is less satisfactory in small samples owing to the inadequacy of the
expression (6) for var (rs)*. It should be noted, however, that, just as in using the z-trans-
formation for r,,, a knowledge of the' precise expectation of the transformed variable is
not necessary in a number of the test procedures that become available.
(B) The results in A can clearly be extended to a much wider class of parental distribu-
tions. If we start from a bivariate normal distribution of x, y and introduce new variates
X = f(x), Y = g(y), the rankings of X and Y will clearly be identical with those of x and y
provided the functions f and g are monotonic. Thus, the simple results under A will also
apply to rankings generated by the wider class of bivariate distributions of X, Y. Conver-
sely, starting from any bivariate distribution $ ( X , Y) we can always find monotonic
transformations X = f(x), Y = g(y) to standardized normal variates x and y. The resulting
bivariate distribution $(x, y) will not necessarily be bivariate normal, but we think it likely
that in practical situations it woiild not differ greatly from this form.? This is a field in
which further investigation would be of considerable interest.
(C) Summarizing the resultsof A and B, we may state that if the rankings are generated
by one of a wide class of distributions of paired variables$ X, Y, then the z transforms of the
rank correlation measureizs, zK can be regarded as normal variates with variances depen-
dent only on the sample size, and given approximately in equations (10). Further, within
this class of bivariate populations, either of the z transforms is an unbiased estimate of
a function of.the correlation p. This is the correlation in the bivariate distribution obtained
after distortion to normality of the marginal distribution of X and Y. p may be regarded
as a non-parametric measure of dependence. Without the need to specify p, simple tests of
* The approximation to the expectation of z contains the variance of r ; see equation (13) below.
t Johnson (1949) considered a particular case of surfaces having the property of being convertible
into the bivariate normal form through the application of his S Band SU transformations to the marginal
distributions.
$ It is of course realized that other models of non-parametric dependence have been suggested in
which the ranks are not generated by a parental bivariate distribution. Such models are not considered
here.
significance may be applied to the z values to determine whether two or more samples are
likely to have come from populations with a common p.
(D) Within these conditions it is possible to make approximate comparisons of the
relative merits of the rank coefficients rs and r, (and later we hope of r,). I n particular,
we may compare their power in detecting differences in population p values.

2. THE EXPERIMENTAL DISTRIBUTIONS OF SPEARMAN'S


AND KENDALL'S
COEFFICIENTS

(2.1) The distributions of r, and r,


The experimental sampling made full use of the 25,000 sets of correlated normal deviates
referred to in $ 1.1. Thus, we had 2500 samples with n = 10, 833 with n = 30 and 500 with
n = 50. For each value of n we had samples from nine bivariate norma1 populations, namely
those with p = 0.1 (0.1)0.9. The samples of 10, 30 and 50 were independent in the sense
that the 25,000 cards containing the basic data were re-shuffled between each of the three
experiments. The basic calculations for our study were all carried out in the Mathematics
Division of the National Physical Laboratory. The samples were formed and ranked on the
Division's punched card installation under the supervision of Miss M. U. Thomas. She was
responsible, also, for the calculation of all the values of 8, (of equation (1)) and 8, (the
numerator on the right-hand side of equation (4))and for that of P, (of equation (3))for
samples of size 10. The values of P, for samples of sizes 30 and 50 were obtained on the Deuce
digital computer by Mr T. Vickers and Mr B. W. Munday. An account of the methods used
will be given in a later paper; we plan also to print the observed frequency distributions
corresponding to the various coefficients.
Comparison of the observed mean values of r, with the theoretical values of equation (5)
and of the means and variances of r, with equations (7) and (8)is only useful as a check on
the representative character of the random samples. This check has been made and passed
satisfactorily; the observed values are not reproduced here. Examination of the variance
of r, is however necessary, equation (6) giving an approximation only to the true value.

(2.2) The variance of r,


The Kendall formula (6) does not give the correct values of l / ( n - 1) and 0 to var (r,)
when p = 0 and 1, respectively. A purely empirical adjustment is obtained by substituting
n - 1 for n as divisor and adding a term + 0.019785p12 which reduces the variance to zero
when p = 1, so that we have
1
var (r,) = ---- (1 - 1.563465p2f0.304743p4+ 0.155286p6
n-1

Table 1 contains for each of the three sample sizes, (a) the estimated variance from equation
( l l ) , (b) the observed variance from the sampling experiment, (c) smoothed values of (b)
obtained by a rough graphical process. These last values are made use of in $3.2 below. It
will be seen that for n = 10, the modified Kendall formula (11) gives values which for
p 2 0.3 are consistently smaller than the observed values. The theoretical approximation
is also too small, but less noticeably so, when n = 30. It seems clear .that for small samples
var (r,) cannot be accurately expressed as the product of a function of n and a function of p.
Below, when approximating to the variance of z, = tanh-lrs we have therefore used the
smoothed observed values of var (r,) taken from the third columns of Table 1.
474 Tests for r a n k correlation coeficients. I

Table 1. Variance of Spearman's rs

From
(11) 1 1oba
Smoothe
oba.
From
(11) 1 Obs'
1 Smoothed
oba. I From
(11) 1 Obs' 1 Smoothe
oba. 1

(3.1) The transformation and its justiification


Our object is to find transformations of r, and r, which will give variances approximately
independent of p and will a t the same time make the distributions roughly normal. The
basic distributions of r, and rK become increasingly skew as I p I ct 1. It is natural that we
should consider R. A. Fisher's z transform which proved so successful in the case of the
product moment correlation coefficient r,, in normal samples. If we write in general
l+r
z = tanh-l r = 4 log, -
1 -r'
then the cumulants of z may be expanded in series in terms of the cumulants of r. The
leading terms of the expansions for the mean and variance of z are given in equations (13)
and (14):

KZ(T)
var ( z )= - --
( 1 -?2)2+ ...'
where ;i; = ~ ~ (= r&(r).
)
The distribution of r depends only on the single parameter p and it will be seen that to
a first approximation the z transformation may be expected to stabilize the variance of a
statistic r if the ratio of var ( r )to ( 1 - ;i;2)2 is independent of p, or nearly so. We have given
these ratios in Table 2, Fs, rK and var ( r K )being obtained exactly from equations ( 5 ) , ( 7 )
and (8), respectively, and for var (rs) we have used the smoothed observed values from
Table 1. The ratios are least constant for n = 10 where, in particular, there is a definite
increase for p = 0.9. Further useful comment must await the calculation of ~ , ( r a) nd ~ , ( r )
and a fuller study of the expansions for the cumulants of z, but in the meantime we have
felt no hesitation in going further with the use of the z transforms.
Table 2. First approximtions to the variance of zs and z,
-

var ( r s ) / ( l var ( T K ) / ( 1 -
P
n = 10 n = 30 n = 50 n = 10 n = 30 n = 50

0.1 0.111 0.035 0.021 0.0618 0.0166 0.00952


0.2 0.112 0.036 0.021 0.0619 0.0166 0.00950
0.3 0.115 0.037 0.022 0.0622 0.0166 0.00947
0.4 0.120 0.037 0.022 0.0627 0.0165 0.00943
0-5 0.124 0.037 0.022 0.0634 0.0165 0.00937

0.6 0.126 0.037 0.023 0.0644 0.0164 0-00930


0.7 0.130 0.038 0.022 0.0662 0.0164 0.00920
0.8 0.141 0.039 0.022 0.0697 0.01 64 0.00910
0.9 0.155 0.043 0.022 0.0787 0.0168 0.00906

Mean
0.1224 0.0370 0.0219 0.06404 0.01649 0.00936
p = 0.1-0.8

* var ( r s )obtained from smoothing the experimental values.


t var ( r ~is)the correct theoretical value.

Frequency tables of the distributions of zs and zK have been obtained and the following
sections are concerned with comments on the mean values and variances obtained from
these tables and with the normality of the distributions.

(3.2) The mean values of zs = tanh-lrs and z, = tanh-lr,


In Table 3 we compare
( a ) the observed mean values of zs found from the experimental data;
(b) the approximate theoretical value of b(zs) given by the first two terms of (13),namely

where %is calculated from (5)andvar (rs) is thesmoothedobservedvalue alreadyreferred to ;


(c) the second or 'corrective term' from the right-hand side of (15).
Owing to the fact that; in a few samples of 10, the rankings of the two variates were in
perfect agreement, some values of rs (and r,) are unity and the corresponding z, (and zK)
become infinite.* The means and variances tabled omit these observations which in any
case form a very small part of a distribution of 2500 observations. We first, however, made
estimates of the mean and varianceof z, using the technique for a censored distribution, but
the difference in results was not large enough to be of importance. Having regard to the
standard errors quoted below the tablet it will be seen that the differences between obser-
vation and approximate theory are not significant except perhaps in the case of p = 0.9.
'The corrective term is of some im.portance in small samples with large p, but is steadily
* This happened in seven cases for p = 0.9, in three cases for p = 0.8 and once for p = 0.7.
t The standard error of 5 is u,/JN, where the averaged values of a, given below Table 5 for zs and
Table 6 for z~ have been used and N = 2500, 833 and 500, respectively.
476 Tests for rank correlation coeficients. I
reduced in importance as n increases. In the case of the transformed product moment
correlation coefficient a similar, if less important, effect is present. Table 4 gives similar
results for the mean values of z,, except that in this case the true values of var (r,) may be

Table 3. Mean values of z,

Obs. A~~mx'
theory
1 ": 1 Obs. ~pprox. ~ o r r
theory term
1 Obs ~ p p r o x . Corr.
theory term
1

~ ~ ~ r o x i m atheory +
; t e = tanh-l ?s Bs var ( r s ) / ( l- i ; : ) ~using
, Ps from equation ( 5 ) and the smoothed
observed var ( r s ) .
Corrective term = second term in expression for approximate theory.
Standard errors of observed means: for n = 10 about 0.008; for n = 30, 50, about 0.007.

Table 4. Mean values of z,

Approx.
theory
Corr.
term
Obs'
Approx.
theory
Corr.
term I I Obs'
Approx.
theory
/
/ Corr.
term

0.066 0.065
0.131 0.131
0.199 0.200
0.275 - 0.273
0.346 0.352

0.439 0.442
0.551 0.549
0.692 0.688
0.917 0.905

Approximate theory = tanh-l ?K + ?Kvar ( T K ) / ( ~ where ?K and var ( r ~ are) derived from,
equations ( 7 ) and ( 8 ) .
Corrective term = second term in expression for approximate t,heory.
Standard errors of observed means: for n = 10 about 0.0055; for n = 30, 50 about 0.0045.
* Ignoring the few infinite values.
E. C. FIELLER,
H. 0.HARTLEY
AND E. S. PEARSON 477
Table 5. Observed variance and standard deviation of zs

Variance S.D.

P
n = 10 n = 30 n = 50 n=10 n=30 n=50
- - - - - -
0.1 0.1380 0.0365 0.0204 0.371 0.191 0.143
0.2 0.1407 0.0378 0.0239 0.375 0.194 0.155
0.3 0.1473 0.0405 0.0238 0.384 0.201 0.154
0.4 0.1528 0.0374 0.0226 0.391 0.193 0.150
0.5 0.1537 0.0407 0.0230 0.392 0.202 0.152

0.6 0.1507 0.0389 0.0246 0.388 0.197 0.157


0.7 0.1423* 0.0380 0.02 13 0.377 0.195 0.146
0.8 0.1643* 0.0409 0.0227 0.405 0.202 0.161
0.9 0.1700* 0.0465 0.0235 0.412 0.216 0.153

Mean,
0.14872 0.03884 0.02279 0.385 0.197 0.151
p = 0.1-0.8

1'0296
0.389 0.198 0.150
J(n- 3)

Standard errors of s . ~ . :n = 10, 0.0055; n = 30, 0.0049; n = 50, 0.0047

Table 6. Observed variance and standard deviation of z ,

Variance S.D.

P
n = 10 n = 30 n = 50 n = 10 n = 30 n = 50

0.1 0.06830 0.01700 0.00933 0.2613 0.1304 0.0966


0.2 0.06884 0.01758 0.01074 0.2624 0.1327 0.1036
0.3 0.07290 0.01830 0.01049 0.2700 0.1353 0.1024
0.4 0.07446 0.01639 0.00991 0.2729 0.1280 0.0996
0.5 0.07443 0.01712 0.00966 0.2728 0.1308 0.0983

0.6 0.07384 0.01628 0.00985 0.27 17 0.1276 0.0992 ,


0.7 0.06949* 0.01514 0.00822 0.2636 0.1230 0.0907
0.8 0.08126* 0.01551 0.00824 0.2851 0.1245 0.0908
0.9 0.08910* 0.01712 0.00780 0.2985 0.1308 0.0883

Mean,
0.07294 0.01667 0.00956 0.2700 0.1290 0.0977
p = 0.1-0.8

0.6611
0.2699 0.1297 0.0975
?i=G
Standard errors of n = 10, 0.0038; n = 30, 0.0032; n = 50, 0.0031.
s.D.:
* Ignoring the few infinite values.
478 Tests for rank correlation coeflicients. I
derived from equation (8). Again, the differences between observation and the approxima-
tion appear only to be significant when p = 0.9. The corrective term is important for n = 10
and p large; it is smaller in proportion than the corresponding term in the approximation
to mean 2,.
(3.3) The variances and standard deviations of z, and zK
Tables 5 and 6 contain the observed variances and standard deviations for the trans-
formed variables*. Comparison with Table 2 shows that the first term in the expansions for
var (2,) and var (2,) is definitely not adequate when n = 10 and still somewhat in defect
for the larger samples. These points are brought out by a comparison of the mean values
given a t the bottom of the tables for the eight cases p = 0.1 to 0.8. Apart from the extreme
case with p = 0.9, the change in the variance of z with p is not very great. We shall not
attempt now to discuss the changes further. The figures, however, suggest that for most
practical purposes if p < 0.8 it will be justifiable to assume a constant variance for z, for any
given sample size not greatly exceeding 50. The expressions given below are not, however,
to be regarded as asymptotic results.
Assuming that we may use the observed mean values given a t the bottom of Tables 5
and 6 we may then look for a general empirical expression for the variance of the form
var (2) = a/(%- b ) ,
where b is an integer. We suggest the use of the following:
For Spearman's coefficient
1.060 1.03
var (2,) = -- vZ= -
n-3' J(n-3)'
For Iiendall's coefficient
0.437 0.66
var (2,) = - cTz = -
n-4' J(n - 4) '
The resulting approximations for the standard deviations of z, and zK when n = 10, 30
and 50 are given a t the bottom of the right-hand side of Tables 5 and 6 where they may be
compared with the individual sampled values and the means of the latter for p = 0.1 to 0.8.
We think that except for p > 0.8, the approximation can be safely used in tests of signi-
ficance for 10 < n < 50, provided, of course, that the underlying conditions discussed in 5 1.3
are applicable to the data.

I n the case of n = 10 three difficulties arise in examining the fit of a normal curve to the
experimental distributions. I n the first place, as mentioned above, in a few samples there
was complete agreement between the two rankings so that r , and r, were unity and z,
and z, were consequently infinite. These observations have to be omitted in calculating
the momei~tsof z; alternatively, moments could be estimated using the technique for
dealing with truncated observations. Secondly, while possible values for rs and r, are
equally spaced, the possible values of z, and z, occur a t intervals which increase with the
z value. For n = 10, where the number of permissible values is relatively small, i t is a little
difficult to know what criterion of normality to adopt. Finally, the distributions of r,
exhibit, particularly for low values of p, the 'saw-edged' character noted by Kendall
(1948, p. 47) in the case of independence. These factors all make it difficult to know how to
* For n = 30, 50 the variances tabled ere rn,, but for n = 10 they are k, = m, N / ( N - 1 ) .
assess the importance of excessive values of x2,when comparing the observed distributions
with fitted normal curves. Although we have not yet available all the values of P,(z), it
appears, as in the case of the z transform of the product moment correlation coefficient,
that the z distributions are somewhat lepcokurtic (P2> 3).
For n = 30 and 50 we have fitted a certain number of normal curves to the z distributions.
The result of applying the x2 test for goodness-of-fit is summarized in Table 7. Apart from
three values of x2 which are over 30, the fits appear very reasonable. It is clear that the
matter needs further investigation, but we doubt whether even in samples as small as 10,
the assumption of a normal z distribution will lead to any serious misinterpretation of a
significance test.

Table 7. Normal curve Jits to observed distributions of zs and z,

I n broacl terms the power of discrimination of any one of the possible correlation measures
depends upon the rapiclity with which its sampling distributions draw clear of one another
as the population p changes. If, for example, for a given value of n, the distribution of rs
for p = 0.2 does not sensibly overlap the distribution for p = 0.8, then if a single sample of
n is drawn from each population a test of significance will always establish a difference in
populat'ion p values. The amount of overlap can of course be seen most directly in the
distributions of r, and r , (or 8, and P,) which we hope to publish later.
If the distributions of the z's were normal with a standard deviation cr, which is fixed for
a given sample size, the efficiency of discrimination would depend on the way in which the
scale of mean z expressed in standard measure (i.e. B(z)/c~,)opened out asp is increased from
0 to 1. Without assuming a constant a;, we can obtain a rough measure of local sensitivity
by calculating the ratios (E, - Z,)/,/(S~~ + sz2) of
(a) the differences between pairs of consecutive observed means given in Table3 (or 4),to
( b ) the square roots of the sum of the correspoiiding pair of observed variances from
Table 5 (or 6).
31 Biom 44
480 Tests for rank correlation coeficients. I
These ratios are given in Table 8 for both Spearman's and Kendall's z.We have also given
corresponding ratios for the product moment correlation coefficient, taking b(z) and ui
from the full Fisher expansions as corrected by Gayen (1951, p. 236). Having regard to
sampling fluctuations, it is clear that we cannot establish any difference in sensitivity
between the two rank coefficients for n = 10. At 7' = .30 and 50 and for p > 0.6 the ratio for
z, is consistently larger than for zs, which suggests a possible advantage for Kendall's
coefficient. More detailed examination of this point is however needed. It will be noted,
as expected, that the product moment coefficient is throughout more sensitive to changes
in p than either of the rank coefficients. I n all cases for a given difference in p, the power
of discrimination increases with p.

Table 8. Sensitivity ratios (2, - ~ , ) / ~ +


( ssz,)
~ ,for different coeficients

n = 10 n = 30 n=50

PI* Pa
Product
Spearman Kendall $pearman Kendall Spearman Kendall
moment moment moment

0.1,0.2 0.205 0.191 0.188 0.383 0.359 0.352 0.501 0.452 0.447
0-2,0.3 0.214 0.202 0.198 0.399 0.362 0.359 0.523 0.489 0.484
0.3,O-4 0.228 0.206 0.208 0.427 0.405 0.406 0.558 0.503 0.508
0.4,0.5 0.250 0.198 0.188 0.469 0.384 0.387 0.615 0.542 0.538

0-5,0.6 0.286 0.263 0.270 0.537 0.508 0.508 0.704 0.655 0.664
0.6,0.7 0.345 0.315 0.308 0.649 0.622 0.636 0.851 0.809 0.828
0.7,0.8 0.456 0.345 0.348 0.861 0.776 0.805 1.13 1.04 1.08
0.8, 0.9 0.732 0.592 0.592 1.39 1.20 1.24 1.82 1-68 1.74

Concluding remarks
Besides putting on record the basic sampling distributions we hope in a further paper
to carry our investigations further in a number of directions, in particular to give parallel
results for the coefficient r, of equation (4).

We should like to express our great indebtedness to Miss M. U. Thomas and Mr T. Vickers
whose work has already been mentioned, to Mrs Esmb Hill, formerly of the Statistical
Advisory Unit, Ministry of Supply, and to Mrs Maxine Merrington and Miss Janet Hall of
University College London. Finally, we should like to say how much we owe to the co-
operation of the Mathematics Division of the National Physical Laboratory for the facilities
which made this investigation possible.

REFERENCES
DANIELS,H. E. (1950). J. R . Statist. Soc. B, 12, 171-81.
DAVID,S. T., KENDALL, M. G. & STUART, A. (19519. Biometrika, 38, 131-40.
ESSCHER, F. (1924). Skand. Aktuarl'idskr. 7, 201-19.
FIELLER,E. C., LEWIS,T. & PEARSON, E. S. (1955). Correlated Random Nonnal Deviates. Tracts for
Computers, no. X X V I . Cambridge University Press.
FISHER,R. A. & YATES,F. (1938). Statistical Tables for Biological, Agricultural and Medical Research
(5th edition, 1957). Edinburgh: Oliver and Boyd.
GAYEN,A. J. (1951). Biomtrika, 38, 219-47.
GREINER, R. (1909). 2. Math. Phys. 57, 121, 225, 337.
HOEFFDING, W. ( 1951). Proceedings of the Second Berkeley Symposium on Mathematical Statistics a d
Probability, 83-92. University of California Press.
JEFFREYS, HAROLD (1948). T h e o q of Probability. Oxford University Press.
JOHNSON, N. L. (1949). Biometrika, 36, 297-304.
KENDALL, M. G. (1948). Rank Correlcction Methods (2nd edition, 1966). London: Chas. Grifi and
Co., Ltd.
KENDALL, M. G. (1949). Biomtrika, 36, 177-93.
~MALLows, C. L. (1957). Biometrika, 44, 114-30.
MORAN, P. A. ~ . ' ( 1 9 4 8 ) .BiomRtrih, 35, 203-6.
MORAN, P. A. P. (1950). J. R. Statist. Soc. B, 12, 153-62.
SUNDRUM, R. M. (1953). Bionzetrih, 49, 409-20.

You might also like