0% found this document useful (0 votes)
16 views12 pages

Correlation

The document discusses different types of correlation between variables including positive, negative, and partial correlation. It also discusses methods for determining correlation coefficients including scatter diagrams, Karl Pearson's coefficient, and Spearman's rank coefficient.

Uploaded by

nitprincyj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

Correlation

The document discusses different types of correlation between variables including positive, negative, and partial correlation. It also discusses methods for determining correlation coefficients including scatter diagrams, Karl Pearson's coefficient, and Spearman's rank coefficient.

Uploaded by

nitprincyj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CORRELATION

Correlation is a degree of relationship between any two or more variable


Types of correlation:
On the basis of direction:
1.Positive or Direct: If the values of variables deviate in the same direction
Then The correlation is said to be positive or direct correlation.(i.e)if one value
increases/decreases other value also will increases/decreases
Example: height and weight of a person.
2.Negative or Indirect: If the values of variables deviate in the opposite direction
then The correlation is said to be indirect or negative correlation.(i.e)if one
value decreases/creases other value also will increases/decreases
Example: speed and time.
3.Partial:When one variable is independent and the other variable is dependent
on the former is called the partial correlation.
On the basis of number of variables:
1.Simple: When only two variables are studied it is called a simple correlation.
2.Multiple: When three or more variables are studied , it is called multiple
correlation.
On the basis of change:
1. Linear (straight line): The amount of change in one variable tends to bear
a constant ratio to the amount of the change in the other variable.(i.e)if
the variables under study are graphed and the plotted points form a
straight line, it is said to be a linear correlation.
Example:
X 2 4 6 8 10 12 14
Y 5 10 15 20 25 30 35
2.Non- linear (curve): if the variables under study are graphed and
the plotted points does not form a straight line, it is said to be a linear
correlation.
Example:
X 2 4 6 8 10 12 14
Y 5 8 4 6 12 4 9

Method of determining correlation. (Interpretation).


Degree of Positive Negative
correlation
Correlation lies Range (Between) Range (Between)
between +1 and From To From To
-1 +1 0 0 -1
Perfect +1 -1
Very high +1 +0.9 -0.9 -1
High +.9 +0.75 -0.75 -0.9
Moderate +0.75 +0.50 -0.50 -0.75
Low +0.50 +0.25 -0.25 -0.50
No correlation 0 0

Correlation co-efficient:
A correlation coefficient is a numerical measure of some type of correlation,
meaning a statistical relationship between two variables.
Methods of studying correlation:
1.Scatter diagram
2.Karl Pearson co-efficient of correlation
3.Spearman’s rank co-efficient of correlation
4.Least square method.
Scatter diagram method:
A scatter diagram is a useful technique for visually examining the form of
relationship, without calculating any numerical value. In this technique, the values
of the two variables are plotted as points on a graph paper. Scatter diagram tells us
how closely the two variables are related and indicate the direction of changes in
respective variables. In this method the points plotted will show the degree of
correlation.
Positive correlation Negative correlation
Y Y
Perfect positive Perfect negative

X X

Y Y
Highly positive Highly negative

X Y

Y Y
Low positive correlation Low negative

X X
Y
No correlation

In a scatter diagram the degree of closeness of the scatter points and their overall
direction enable us to examine the relationship.
If all the points lie on a line, the correlation is perfect and is said to be unity.
In scatter diagram the points are no longer scattered around an upward rising or
downward falling line. The points themselves are on the lines. This is referred to as
perfect positive correlation and perfect negative correlation respectively
The correlation is said to be linear if the scatter points lie near a line or on a line.
Scatter diagrams shows a scatter around an upward rising line indicating the
movement of the variables in the same direction. When X rises Y will also rise.
This is positive correlation.
In a scatter diagram the points are found to be scattered around a downward
sloping line. This time the variables move in opposite directions. When X rises Y
falls and vice versa. This is negative correlation.
If the scatter points are widely dispersed around the line, the correlation is low.
In scatter diagram there is no upward rising or downward sloping line around
which the points are scattered. This is an example of no correlation.
Probable Error:
‘Probable Error” is a difference resulting due to taking samples from the mass of
population. It is not possible to consider the entire population in a statistical
analysis. So there lies small error in the sampling result as compared to the actual
result.
Definition:
The probable error (P.E.) is the value which is added or subtracted from the
coefficient of correlation (r) to get the upper limit and the lower limit respectively,
within which the value of the correlation expectedly lies.
The probable error of correlation coefficient can be obtained by applying the
following formula:
1 r2
P.E = 0.6745 ( )
n
Reason for taking the factor 0.6745 is that in a normal distribution 50% of the
observation lie in the range μ+0.6745σ where μ is the mean population and σ is the
standard deviation.
 There is no correlation between the variables if the value of ‘r’ is less than
6(P.E). This shows that the coefficient of correlation is not at all significant.
 The correlation is said to be certain when the value of ‘r’ is six times more than
the probable error (r > 6(P.E)) this shows that the value of ‘r’ is significant.
 By adding and subtracting the value of P.E from the value of ‘r,’ we get the
upper limit and the lower limit, respectively within which the correlation of
coefficient is expected to lie. It can be expressed
ρ (rho) = r ± (P.E ) r.
where ρ denotes the correlation in a population
The probable Error can be used only when the following three conditions are
fulfilled:

1.The data must approximate to the bell-shaped curve, i.e. a normal frequency
curve.

2.The Probable error computed from the statistical measure must have been taken
from the sample.

3.The sample items must be selected in an unbiased manner and must


be independent of each other.
Thus, the probable error is calculated to check the reliability of the value of
coefficient calculated from the random sampling
Karl Pearson Co-efficient of Correlation:
 dxdy
r= (without standard deviation)
 dx 2  dy 2

Assumed Mean:
( dx) 2 ( dy) 2
 dxdy 
r= N (without standard deviation)
(  dx ) 2
(  dy ) 2
( dx 2  )( dy 2 
N N
or
 xy
r=
n x  y

where dx = x- x

dy = y- y
σx = standard deviation of ‘X’ varaiable.
σy = standard deviation of ‘Y’ varaiable
n =number of items .
Problems:
1.Find the Karl Pearson’s co-efficient of correlation from the following between
price and demand. And interpret the result.
Price(₹) 55 23 37 28 42
Demand(tons) 100 120 165 35 180

Solution:
Price(₹) Demand dx.dy dx2 dy2
dy = y- y
X (tons) Y dx = x- x
55 100 18 -20 -360 324 400
23 120 -14 0 0 196 0
37 165 0 45 0 0 2025
28 035 -9 -85 765 81 7225
42 180 5 60 300 25 3600
∑X=185 ∑Y=600 ∑ dx.dy=705 ∑ dx =626 ∑dy2 =13250
2
 x 185
X = = = 37
n 5

y 600
Y= = = 120
n 5

 dxdy 705 705


r= = =
 dx 2  dy 2 626  13250 25  115.11

=0.24479
r = 0.24479.
Conclusion:
There is a low positive correlation between the two variables.
Practice:
1. Find the Karl Pearson’s co-efficient of correlation from the following and
interpret the result.

X 12 25 26 14 18
Y 35 40 13 27 25

Spearmen’s rank Correlation Co-efficient:


Under this method of rank correlation co-efficient, the individual items of variables
are arranged in order of their ranks. this method is only for individual observations
not for frequency distributions.
In the process of ranking original values are not taken into consideration, but only
the ranks are assigned on the basis of original values. It is denoted by the letter
𝜌(rho).
If the values are not repeated (There is no tie)
6 d 2
𝜌 =1-
n3  n
If the values are repeated (There is a tie)

6 d 2 
1

m1  m1 
3 1

m2  m2 
3

1
 
m3  m3  ...
3
 
𝜌 =1-  12 12 12 
n n
3

Problems:
1.Find the rank correlation for the following.
X 58 69 85 45 36 82 95 110 26 84
Y 62 89 54 75 81 24 67 99 42 76

Solution:
x Y Rx Ry d = Rx - Ry d2
58 62 4 4 0 0
69 89 5 9 -4 16
85 54 8 3 5 25
45 75 3 6 -3 9
36 81 2 8 -6 36
82 24 6 1 5 25
95 67 9 5 4 16
110 99 10 10 0 0
26 42 1 2 -1 1
84 76 7 7 0 0
∑d =0 ∑ d = 128
2

.
Here ranks are not repeated and n = 10
6 d 2
𝜌 =1-
n3  n
6  128
=1-
10 3  10
768
=1-
990

= 1 - 0.776 = 0.224
𝜌 = 0.224
Conclusion:
There is a very low positive correlation between two variables.

2.Third year B.Sc. Nursing students obtained the following marks in statistics and
language. Find out the rank correlation between the subjects.
Statistics 85 98 85 69 73 99 100 68 100 95 86 100
Language 75 62 45 95 75 68 95 82 46 51 89 95

Solution:
Statistics 85 98 85 69 73 99 100 68 100 95 86 100
Language 75 62 45 95 75 68 95 82 46 51 89 95

Statistics Language Rx Ry d = Rx - Ry d2
(x) (y)
85 75 4.5((4+5)/2) 6.5((6+7)/2) -2 4
98 62 8 4 4 16
85 45 4.5((4+5)/2) 1 3.5 12.25
69 95 2 11((10+11+12)/3) -9 81
73 75 3 6.5((6+7)/2) -3.5 12.25
99 68 9 5 4 16
100 95 11((10+11+12)/3) 11((10+11+12)/3) 0 00
68 82 1 8 -7 49
100 46 11((10+11+12)/3) 2 9 81
95 51 7 3 4 16
86 89 6 9 -3 09
100 95 11((10+11+12)/3) 11((10+11+12)/3) 0 00
∑ d2 =296.5

Here some values are repeating and n =12


m1 = 2, m2 = 3, m3 = 2, m4 = 3

6 d 2 
1

m1  m1 
3 1

m2  m2 
3
 1
  
m3  m3  ...
3

𝜌 =1-  12 12 12 
n n
3

m1 = 2, m2 = 3, m3 = 2, m4 = 3


6296.5 
1 3

2 2 
1 3

3 3  
1 3
2 2  
1 3

3  3 ...  
𝜌 =1-  12 12 12 12 
12  12
3

n3-n = n(n2-1)=12(144-1)
 
6296.5  8  2  27  3  8  2  27  3...
1 1 1 1
=1-  12 12 12 12 
1728  12
6296.5  0.5  2  .5  2...
=1-
1716
6301.5
=1-
1716
1809
=1-
1716

= 1- 1.05

𝜌= - 0.05
Conclusion:
There is a very low negative correlation between two subjects.
3.Ten competitors in a I.Q contest are ranked by three judges in the following
order.
Judge 1 10 8 3 1 5 7 4 6 9 2
Judge 2 9 7 8 5 6 8 2 3 10 1
Judge 3 9 8 6 3 2 7 5 1 10 4
Using rank correlation, determine which pair of judges have the nearest approach
to common taste in I.Q.
Solution:
Judge-1 Judge-2 Judge-3 (R1 - R2) (R2 - R3) (R1 – R3)
(R1) (R2) (R3) d d2 d d2 d d2
10 9 9 1 1 0 0 1 1
8 7 8 1 1 -1 1 0 0
3 8 6 -5 25 2 4 -3 9
1 5 3 -4 16 2 4 -2 4
5 6 2 -1 1 4 16 3 9
7 8 7 -1 1 1 1 0 0
4 2 5 2 4 -3 9 -1 1
6 3 1 3 9 2 4 5 25
9 10 10 -1 1 0 0 -1 1
2 1 4 1 1 -3 9 -2 4
∑ d12=60 ∑ d2=48 ∑ d2=54

Here no values are repeated.


6 d 2 6 d 2 6 d 2
12 =1-  23 = 1 - 13 = 1 -
n3  n n3  n n3  n

6  60 6  48 6  54
=1- =1- =1-
10 3  10 10 3  10 10 3  10

360 288 324


=1- =1- =1-
1000  10 10 3  10 10 3  10

360 288 324


=1- =1- =1-
990 990 990

= 1 - 0.3636 = 1 - 0.2909 = 1 - 0.32727


= 0.6364 = 0.7091 = 0.6727
Conclusion:
The third and second judges co-efficient of correlation is highest, so judge 2 and
judge 3 have the nearest approach to common opinion in I.Q.
Practice:
1.Find the co-efficient of rank correlation from the following:
(i) X 62 55 87 95 48 65 72 84 61 75 85 92
Y 25 65 74 84 56 75 38 49 86 96 97 99

(ii) X 68 72 58 48 95 57 68 72 58 68
Y 28 45 68 75 84 75 45 94 93 29

You might also like