Correlation
Correlation
Correlation co-efficient:
A correlation coefficient is a numerical measure of some type of correlation,
meaning a statistical relationship between two variables.
Methods of studying correlation:
1.Scatter diagram
2.Karl Pearson co-efficient of correlation
3.Spearman’s rank co-efficient of correlation
4.Least square method.
Scatter diagram method:
A scatter diagram is a useful technique for visually examining the form of
relationship, without calculating any numerical value. In this technique, the values
of the two variables are plotted as points on a graph paper. Scatter diagram tells us
how closely the two variables are related and indicate the direction of changes in
respective variables. In this method the points plotted will show the degree of
correlation.
Positive correlation Negative correlation
Y Y
Perfect positive Perfect negative
X X
Y Y
Highly positive Highly negative
X Y
Y Y
Low positive correlation Low negative
X X
Y
No correlation
In a scatter diagram the degree of closeness of the scatter points and their overall
direction enable us to examine the relationship.
If all the points lie on a line, the correlation is perfect and is said to be unity.
In scatter diagram the points are no longer scattered around an upward rising or
downward falling line. The points themselves are on the lines. This is referred to as
perfect positive correlation and perfect negative correlation respectively
The correlation is said to be linear if the scatter points lie near a line or on a line.
Scatter diagrams shows a scatter around an upward rising line indicating the
movement of the variables in the same direction. When X rises Y will also rise.
This is positive correlation.
In a scatter diagram the points are found to be scattered around a downward
sloping line. This time the variables move in opposite directions. When X rises Y
falls and vice versa. This is negative correlation.
If the scatter points are widely dispersed around the line, the correlation is low.
In scatter diagram there is no upward rising or downward sloping line around
which the points are scattered. This is an example of no correlation.
Probable Error:
‘Probable Error” is a difference resulting due to taking samples from the mass of
population. It is not possible to consider the entire population in a statistical
analysis. So there lies small error in the sampling result as compared to the actual
result.
Definition:
The probable error (P.E.) is the value which is added or subtracted from the
coefficient of correlation (r) to get the upper limit and the lower limit respectively,
within which the value of the correlation expectedly lies.
The probable error of correlation coefficient can be obtained by applying the
following formula:
1 r2
P.E = 0.6745 ( )
n
Reason for taking the factor 0.6745 is that in a normal distribution 50% of the
observation lie in the range μ+0.6745σ where μ is the mean population and σ is the
standard deviation.
There is no correlation between the variables if the value of ‘r’ is less than
6(P.E). This shows that the coefficient of correlation is not at all significant.
The correlation is said to be certain when the value of ‘r’ is six times more than
the probable error (r > 6(P.E)) this shows that the value of ‘r’ is significant.
By adding and subtracting the value of P.E from the value of ‘r,’ we get the
upper limit and the lower limit, respectively within which the correlation of
coefficient is expected to lie. It can be expressed
ρ (rho) = r ± (P.E ) r.
where ρ denotes the correlation in a population
The probable Error can be used only when the following three conditions are
fulfilled:
1.The data must approximate to the bell-shaped curve, i.e. a normal frequency
curve.
2.The Probable error computed from the statistical measure must have been taken
from the sample.
Assumed Mean:
( dx) 2 ( dy) 2
dxdy
r= N (without standard deviation)
( dx ) 2
( dy ) 2
( dx 2 )( dy 2
N N
or
xy
r=
n x y
where dx = x- x
dy = y- y
σx = standard deviation of ‘X’ varaiable.
σy = standard deviation of ‘Y’ varaiable
n =number of items .
Problems:
1.Find the Karl Pearson’s co-efficient of correlation from the following between
price and demand. And interpret the result.
Price(₹) 55 23 37 28 42
Demand(tons) 100 120 165 35 180
Solution:
Price(₹) Demand dx.dy dx2 dy2
dy = y- y
X (tons) Y dx = x- x
55 100 18 -20 -360 324 400
23 120 -14 0 0 196 0
37 165 0 45 0 0 2025
28 035 -9 -85 765 81 7225
42 180 5 60 300 25 3600
∑X=185 ∑Y=600 ∑ dx.dy=705 ∑ dx =626 ∑dy2 =13250
2
x 185
X = = = 37
n 5
y 600
Y= = = 120
n 5
=0.24479
r = 0.24479.
Conclusion:
There is a low positive correlation between the two variables.
Practice:
1. Find the Karl Pearson’s co-efficient of correlation from the following and
interpret the result.
X 12 25 26 14 18
Y 35 40 13 27 25
Problems:
1.Find the rank correlation for the following.
X 58 69 85 45 36 82 95 110 26 84
Y 62 89 54 75 81 24 67 99 42 76
Solution:
x Y Rx Ry d = Rx - Ry d2
58 62 4 4 0 0
69 89 5 9 -4 16
85 54 8 3 5 25
45 75 3 6 -3 9
36 81 2 8 -6 36
82 24 6 1 5 25
95 67 9 5 4 16
110 99 10 10 0 0
26 42 1 2 -1 1
84 76 7 7 0 0
∑d =0 ∑ d = 128
2
.
Here ranks are not repeated and n = 10
6 d 2
𝜌 =1-
n3 n
6 128
=1-
10 3 10
768
=1-
990
= 1 - 0.776 = 0.224
𝜌 = 0.224
Conclusion:
There is a very low positive correlation between two variables.
2.Third year B.Sc. Nursing students obtained the following marks in statistics and
language. Find out the rank correlation between the subjects.
Statistics 85 98 85 69 73 99 100 68 100 95 86 100
Language 75 62 45 95 75 68 95 82 46 51 89 95
Solution:
Statistics 85 98 85 69 73 99 100 68 100 95 86 100
Language 75 62 45 95 75 68 95 82 46 51 89 95
Statistics Language Rx Ry d = Rx - Ry d2
(x) (y)
85 75 4.5((4+5)/2) 6.5((6+7)/2) -2 4
98 62 8 4 4 16
85 45 4.5((4+5)/2) 1 3.5 12.25
69 95 2 11((10+11+12)/3) -9 81
73 75 3 6.5((6+7)/2) -3.5 12.25
99 68 9 5 4 16
100 95 11((10+11+12)/3) 11((10+11+12)/3) 0 00
68 82 1 8 -7 49
100 46 11((10+11+12)/3) 2 9 81
95 51 7 3 4 16
86 89 6 9 -3 09
100 95 11((10+11+12)/3) 11((10+11+12)/3) 0 00
∑ d2 =296.5
m1 = 2, m2 = 3, m3 = 2, m4 = 3
6296.5
1 3
2 2
1 3
3 3
1 3
2 2
1 3
3 3 ...
𝜌 =1- 12 12 12 12
12 12
3
n3-n = n(n2-1)=12(144-1)
6296.5 8 2 27 3 8 2 27 3...
1 1 1 1
=1- 12 12 12 12
1728 12
6296.5 0.5 2 .5 2...
=1-
1716
6301.5
=1-
1716
1809
=1-
1716
= 1- 1.05
𝜌= - 0.05
Conclusion:
There is a very low negative correlation between two subjects.
3.Ten competitors in a I.Q contest are ranked by three judges in the following
order.
Judge 1 10 8 3 1 5 7 4 6 9 2
Judge 2 9 7 8 5 6 8 2 3 10 1
Judge 3 9 8 6 3 2 7 5 1 10 4
Using rank correlation, determine which pair of judges have the nearest approach
to common taste in I.Q.
Solution:
Judge-1 Judge-2 Judge-3 (R1 - R2) (R2 - R3) (R1 – R3)
(R1) (R2) (R3) d d2 d d2 d d2
10 9 9 1 1 0 0 1 1
8 7 8 1 1 -1 1 0 0
3 8 6 -5 25 2 4 -3 9
1 5 3 -4 16 2 4 -2 4
5 6 2 -1 1 4 16 3 9
7 8 7 -1 1 1 1 0 0
4 2 5 2 4 -3 9 -1 1
6 3 1 3 9 2 4 5 25
9 10 10 -1 1 0 0 -1 1
2 1 4 1 1 -3 9 -2 4
∑ d12=60 ∑ d2=48 ∑ d2=54
6 60 6 48 6 54
=1- =1- =1-
10 3 10 10 3 10 10 3 10
(ii) X 68 72 58 48 95 57 68 72 58 68
Y 28 45 68 75 84 75 45 94 93 29