Correlation
Correlation
DEFINITION
Correlation refers to the relationship of two variables or more e.g. relation between height of
father and son, yield and rainfall, wage and price index, etc.
USES OF CORRELATION
1. It is used in physical and social sciences.
2. It is useful for economists to study the relationship between variables like price, quantity
etc. Businessmen estimates costs, sales, price etc. using correlation.
3. It is helpful in measuring the degree of relationship between the variables like income and
expenditure, price and supply, supply and demand etc.
4. Sampling error can be calculated.
5. It is the basis for the concept of regression.
COMPUTATION OF CORRELATION
Steps:
1. Find the mean of the two series x and y.
2. Take deviations of the two series from x and y.
𝑋 = 𝑥 − 𝑥̅ , 𝑦 − 𝑦̅
3. Square the deviations and get the total, of the respective squares of deviations of x and y
and denote by ∑ 𝑋 2 , ∑ 𝑌 2 respectively.
4. Multiply the deviations of x and y and get the total and divide by n. This is covariance.
5. Substitute the values in the formula.
𝐶𝑂𝑉(𝑥, 𝑦) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅⁄𝑛
𝑟= =
𝜎𝑥. 𝜎𝑦
√∑(𝑥 − 𝑥̅ )2 . √∑(𝑦 − 𝑦̅)2
𝑛 𝑛
The above formula is simplified as follows:
∑ 𝑋𝑌
𝑟= , 𝑋 = 𝑥 − 𝑥̅ , 𝑌 = 𝑦 − 𝑦̅
∑ 𝑋2. ∑ 𝑌2
Page 1 of 5
Example 1
Find Karl Pearson’ s coefficient of correlation from the following data between height of
father (x) and son (y).
X 64 65 66 67 68 69 70
Y 66 67 65 68 70 68 72
Required:
Comment on the result.
Solution
X Y ̅
𝑿=𝒙−𝒙 𝑿𝟐 ̅
𝒀=𝒚−𝒚 𝒀𝟐 XY
𝑿 = 𝒙 − 𝟔𝟕 𝒀 = 𝒚 − 𝟔𝟖
64 66 ─3 9 ─2 4 6
65 67 ─2 4 ─1 1 2
66 65 ─1 1 ─3 9 3
67 68 0 0 0 0 0
68 70 1 1 2 4 2
69 68 2 4 0 0 0
70 72 3 9 4 16 12
469 476 0 28 0 34 25
469 476
𝑥̅ = = 67; 𝑦̅ = = 68
7 7
∑ 𝑋𝑌 25
𝑟 = = = 0.81
∑ 𝑋 2 . ∑ 𝑌 2 √28 ∗ 34
Comment: Since r = + 0.81, the variables are highly positively correlated i.e. Tall fathers
have tall sons.
Page 2 of 5
𝑁 ∑ 𝑓𝑢𝑣 (∑ 𝑓𝑢)(∑ 𝑓𝑣)
𝑟=
√[𝑁 ∑ 𝑓𝑢2 − (∑ 𝑓𝑢)2 ]. [𝑁 ∑ 𝑓𝑣 2 − (∑ 𝑓𝑣)2 ]
Steps:
1. Take the step deviations of the variable x and denote these deviations by u.
2. Take the step deviations of the variable y and denote these deviations by v.
3. Multiply uv and the respective frequency of each cell and unite the figure obtained in the
right hand bottom corner of each cell.
4. Add the corrected (all) as calculated in step 3 and obtain the total Σfuv.
5. Multiply the frequencies of the variable x by the deviations of x and obtain the total Σfu.
6. Take the squares of the step deviations of the variable x and multiply them by the
respective frequencies and obtain the Σfu2
7. Similarly get Σfv and Σfv2. Then substitute these values in the formula to get the value of ‘
r’.
Example 2
The following are the marks obtained by 132 students in two tests.
Test 1 30–40 40–50 50–60 60–70 70–80 Total
Test 2
20–30 2 5 3 10
30–40 1 8 12 6 27
40–50 5 22 14 1 42
50–60 2 16 9 2 29
60–70 1 8 6 1 16
70–80 2 4 2 8
Total 3 21 63 39 6 132
Required
Calculate the correlation coefficient.
Solution
Let x denote Test 1 marks and y denote Test 2 marks.
𝑥 − 𝑎 𝑥 − 55
𝑢= =
𝑏 10
Page 3 of 5
𝑦 − 𝑐 𝑦 − 45
𝑣= =
𝑑 10
Mid x 35 45 55 65 75 f fv fv2 fuv
Mid y
u –2 –1 0 1 2
v
25 –2 2 8 5 10 3 0 10 –20 40 18
35 –1 1 2 8 8 12 0 6 –6 27 –27 27 4
45 0 5 0 22 0 14 0 1 0 42 0 0 0
55 1 2 –2 16 0 9 9 2 4 29 29 29 11
65 2 1 –2 8 0 6 12 1 4 16 32 64 14
75 3 2 0 4 12 2 12 8 24 72 24
f 3 21 63 39 6 132 38 232 71
fu –6 –21 0 39 12 24 Check
fu2 12 21 0 39 24 96
fuv 10 14 0 27 20 71
9372 − 912
=
√(12672 − 576) − (30264 − 1444)
Properties of Correlation
1. Correlation coefficient lies between –1 and +1
2. ‘ r’ is independent of change of origin and scale.
3. It is a pure number independent of units of measurement.
4. Independent variables are uncorrelated but the converse is not true.
5. Correlation coefficient is the geometric mean of two regression coefficients.
6. The correlation coefficient of x and y is symmetric. rxy = ryx.
Page 4 of 5
Limitations
1. Correlation coefficient assumes linear relationship regardless of the assumption is correct
or not.
2. Extreme items of variables are being unduly operated on correlation coefficient.
3. Existence of correlation does not necessarily indicate causeeffect relation.
Interpretation
The following rules helps in interpreting the value of ‘ r’ .
1. When r = 1, there is perfect + ve relationship between the variables.
2. When r = – 1, there is perfect – ve relationship between the variables.
3. When r = 0, there is no relationship between the variables.
4. If the correlation is +1 or –1, it signifies that there is a high degree of correlation.
5. (+ve or –ve) between the two variables.
6. If r is near to zero (ie) 0.1, – 0.1, (or) 0.2 there is less correlation.
Page 5 of 5