0% found this document useful (0 votes)
29 views7 pages

Sec 8-1

The document discusses Spearman's rank correlation coefficient and its application in analyzing relationships between ranked data. It includes exercises for ranking preferences, calculating differences, and determining correlation coefficients for various datasets. Additionally, it contrasts Spearman's method with Pearson's correlation, highlighting the types of data suitable for each and the implications of their results.

Uploaded by

san.naikar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

Sec 8-1

The document discusses Spearman's rank correlation coefficient and its application in analyzing relationships between ranked data. It includes exercises for ranking preferences, calculating differences, and determining correlation coefficients for various datasets. Additionally, it contrasts Spearman's method with Pearson's correlation, highlighting the types of data suitable for each and the implications of their results.

Uploaded by

san.naikar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

8 TESTING FOR VALIDITY: SPEARMAN’S, HYPOTHESIS TESTING ANDTEST FOR INDEPENDENCE

8.1 Spearman’s rank correlation


coefficient
Ranking data
For each of the following lists, rank the
items in order of how much you like them,
with a "1" being the one you like most and
a "6" the one you like least.
You must rank them all, and ties are not
allowed!

Subject Rank Film genre Rank Sport Rank


Maths Sci-fi Soccer
English Romance Basketball
PE Documentary Tennis
Art Comedy Baseball
Science Adventure American football
Drama Thriller Table tennis

Now, find someone to compare your choices with.


In the following table, write down your ranks for the same items.
Then, in the other columns, work out the difference between your
ranks (d), and the square of the difference (d2).
Finally, add up the totals in the last column {d2).
Item You Them d d2

International
Total:
mindedness
Once you have this total, do the following:
Charles Pearson
• multiply it by 6 (1863-1945) was an
• divide the answer by 210 English psychologist
who developed rank
• subtract the answer from 1.
correlation, usually shown
Your answer should be a number between -1 and +1. as the Greek letter p
(rho) or, rs, as a tool for
Which other mathematical calculation in statistics has an answer
psychiatry.
between -1 and +1?

380
What do you think an answer close to -1 means?
What do you think an answer close to +1 means?
What do you think an answer close to 0 means?
What can you say about your answer?
Consider the following sets of data:

Hours of study,* 0 1 2 3 4 5 6 2 8
Test results, y 43 50 52 20 68 25 81 28 92

A scatter graph of the data is


shown here, and Pearson's
correlation coefficient is 0.97.
So, there is a strong, positive
relationship between the hours
of study and the test results.

Now consider this data:

Statistics and
probability
Number of
0 5 10 15 20 25 30 35 40 45
minutes, x
Temperature
0 140 165 125 180 180 180 180 180 180
of oven,y°C

A scatter graph of this data is


shown here, and Pearson's
correlation coefficient is 0.73. So,
there is only a moderate, positive
relationship between the number
of minutes and the temperature
of the oven. However, the scatter
graph indicates that there is an
exponential relationship between
the data points. Pearson's only
measures for linear relationships.
So, is there another test to find
the strength of relationships that
are not linear?

381
8 TESTING FOR VALIDITY: SPEARMAN’S, HYPOTHESIS TESTING ANDTEST FOR INDEPENDENCE

Investigation 1
Mould is grown in eight different petri dishes with different amounts of nutrients (A), and the area of the dish
covered in mould after 48 hours (Y) is recorded. The results are given in the table and also shown on the graph.

X Y
r
5.68 6.00 F

1.04 0.50
2.22 0.26 A ^
4.20 2.84

3.66 1.44 G

6.22 8.20
D

4.22 4.20
8.00 8.60 E

C.
3 •
»
————— ——
123456789

1 Use your GDC to graph these results.

2 Calculate the Pearson's product moment correlation coefficient (PMCCJ for this data and comment on
your results.
Now give each data point a rank, which is the position of the point if the data were listed in order of size for
each of the variables. For example, H would be ranked 1 for both X and Y. (It does not matter if we rank from
largest to smallest, like this, or from smallest to largest; the result will be the same.)

3 a Complete the following table showing the ranks for each of the data points.

A B C D E F G H
X rank 1
Y rank 1

b Use your GDC to graph these ranks,


c Calculate the value of PMCC for these ranks.
d Comment on your result, relating it to the particular shape of the graph.
In another experiment the temperature (T) is varied and the area of the petri dish covered after 48 hours
(Y) is recorded.
T Y
4.95 4.50
10.49 4.86
16.40 4.36
19.80 3.86
23.90 3.38
22.20 3.14
32.30 3.06
36.40 0.22

382
4 a Use your GDC to graph these results.
b Calculate the value of PMCC for this data and comment on your results,
c Complete the following table showing the ranks for each of the data points.

A B C D E F G H
T rank 1
Y rank 8

d Use your GDC to graph these results.


e Calculate the value of PMCC for this data and comment on your results,
f Discuss the features of the data that led to this value.
The PMCC of the rank values is called Spearman’s rank correlation coefficient.

5 Factual What type of data is used for Spearman’s?

6 Factual What type of data is used for Pearson’s?

7 What do correlation coefficients tell you about the relationship between two
variables?

The product moment correlation coefficient of the ranks of a set of data is


called Spearman’s rank correlation coefficient. The IB notation is r^.

Spearman's correlation coefficient shows the extent to which one


variable increases or decreases as the other variable increases.

Statistics and
probability
\
An rs value of 1 means the set of data is strictly increasing, and a value of-1
means it is strictly decreasing. Data that is only increasing or only decreasing
is known as monotonic.
A value close to 0 suggests that the data is not consistently increasing or
decreasing.

Example 1
1 Find Spearman's rank correlation coefficient for the following sets of data.

Time spent training, A' hours 23 34 17 23 29 45


Time to run 2 km, y min 12 10 14 11 11 8

Number of pets, x 1 2 3 4 5
Time spent each week caring
6 7 8 8 16
for them, y hours

o
Continued on next page

383
8 TESTING FOR VALIDITY: SPEARMAN’S, HYPOTHESIS TESTING ANDTEST FOR INDEPENDENCE

2 A student was asked to rank nine different makes of burger in terms of which she liked best
to which she liked least. She put 1 for the one she liked best and 9 for the one she liked least.
These rankings and the costs of the burgers are given in the table.

Burger A B C D E F G H 1
Taste rank ? 3 4 6 1 9 2 5 8
Cost, US S 3.50 P.45 6.50 4.50 8.50 2.65 3.95 4.35 1.45

a Explain why you cannot use Pearson's in this example.


b Find Spearman's rank correlation coefficient for this data and comment on your answer.

1 a The ranks are If we order the values of x by their size, we


get their rank.
X 4.5 2 6 4.5 3 1
2 5 1 3.5 3.5 6 When more than one piece of data have
y the same value the rank given to each is the
average of the ranks. For example, the two
values of x equalling 23 here would have
ranks 4 and 5; hence, each is given a rank

2
rs =-0.956 The ranked data is put into a GDC and the
PMCC obtained.
So, there is a strong, negative
correlation. The more hours that you
train the faster you can run the 2 km.
b The ranks are Often, when one of the variables increases
at a fixed rate, for example measurements
X 5 4 3 2 1
taken at one minute intervals, the order of
y 5 4 2.5 2.5 1 the ranks will be the reverse of the order of
rs =0.975 the data.

So, there is a strong, positive


correlation. The more pets you have
the more hours it takes each week to
look after them.
2 a Because the ranks are given rather
than quantifiable data,
b Ranking the costs, you get:
Taste ? 3 4 6 1 9 2 5 8
Cost P 2 3 4 1 8 6 5 9

and rs = 0.8. So, there is a moderately


strong relationship between the taste
and the cost of the burgers.

Spearman's rank correlation coefficient is only valid for the data given
in the question. If some data points are similar then any small changes TOK
could affect the value of r$. What practical problems
can or does mathematics
tru to solve?
384
1 Write down the value of Spearman's rank A group of students is asked to rank six
correlation coefficient for each of the sets of snack foods by taste and value for money.
data shown. The ranks are averaged and recorded in the
a y following table.
7
Calculate Spearman's rank correlation
6- coefficient for the data and comment on
5- your results.
4-
Pop Crisps Choco Chews Chocolate-
3-
corn late bar chip cookie
2 -
Taste 2 4 1 5 3
1-
Value 5 3 2 4 1
3 Find Spearman's rank correlation coefficient
for the following data sets.

X 0 5 10 15 20 25 30
y 23 18 10 9 ? ? ?

X 10 12 9 6 3 14 8
y 12 11 8 5 ? 14 9
n i i i i r*- X
4 A sports scientist is testing the relationship
1 2 3 4 5 6 7
between the speed of muscle movement

Statistics and
probability
and the force produced. In 10 tests the
following data is collected.

A

c
Force (N/kg) D
• l c___
► G
• Si J

i _L 1
d.5 05 | 1 5 25 f 35 4

Point
• A - (0.25,30.4) • E-(1.52,13.4) • I - (2.93,10.8)
• B« (0.51,25.1) • F-(2.02.13.4) • J-(3.17,10.3)
• C- (0.69.20) • G-(2.43,11.2) • K-(-0.29,25.07)
• D - (1.09.15.6) • H - (2.67,10.2) • L - (3.09, -2.44)

Explain why it might not be appropriate


to use the PMCC in this case.
Calculate Spearman's rank correlation
coefficient (rs) for this data.
Interpret the value of rs and comment
on its validity.

385
8 TESTING FOR VALIDITY: SPEARMAN’S, HYPOTHESIS TESTING ANDTEST FOR INDEPENDENCE

5 A class took a mathematics test (marked ? Consider the following data set:
out of 80) and an English test (marked out
of 100), and the results are given in the
following table. J

Maths 15 25 32 45 60 22 24 28 28 29 29
English 44 42 42 49 52 44 54 59 69 28 89

a Calculate the PMCC for this data and 3 F


comment on the result. 3 c'► • 1

A E ___
b Use graphing software to plot these •
• •.
points on a scatter diagram and
comment on your result from a.
Point
c Calculate Spearman's rank correlation
• A - (0.82,0.86) • E = (2.46,0.84) • I = (2.98,0.62)
coefficient for this data and comment on • B = (1.28,1.56) • F = (2.48, 1.76) • J = (7.46,4.98)
your result. • C-(1.78,1.22) • G - (2.02,1.82)
• D - (1.46,0.62) • H - (3.02,1.42)
d State which is the more valid measure of
correlation, and give a reason. For this data, calculate the PMCC:
6 In a blind tasting, customers are asked to rank i with the outlier J
six different brands of coffee in terms of taste. ii without the outlier J.
These rankings and the costs of the different
brands are given in the following table. Calculate Spearman's rank correlation
coefficient:
Brand A B C D E F i with the outlier J
Taste rank 1 2 3 4 5 6 ii without the outlier J.
Cost 450 360 390 320 350 300 Comment on the results.
a Explain why you cannot use PMCC in
this case.
b Find Spearman's rank correlation
coefficient for this data and comment on
your answer.

The advantages of Spearman’s rank correlation coefficient over the PMCC are:
• It can be used on data that is not linear.
• It can be used on data that has been ranked even if the original data is
unknown or cannot be quantified.
• It is not greatly affected by outliers.
Developing
your toolkit
Developing inquiry skills Now do the
Can you use the PMCC or Spearman’s rank correlation coefficient to Modelling and
compare the data in the opening scenario of this chapter, which looked at investigation
tree heights in different forest areas? activity on page
Why, or why not? 418.

386

You might also like