0% found this document useful (0 votes)
33 views5 pages

Correlation

Correlation refers to the relationship between two or more variables. Karl Pearson's coefficient of correlation, r, measures the strength and direction of this relationship between -1 and 1. An r of 0 means no correlation, 1 means perfect positive correlation, and -1 means perfect negative correlation. The document provides examples to demonstrate how to compute r using raw data points and grouped frequency tables to understand the correlation between different variables.

Uploaded by

Martin Kobimbo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

Correlation

Correlation refers to the relationship between two or more variables. Karl Pearson's coefficient of correlation, r, measures the strength and direction of this relationship between -1 and 1. An r of 0 means no correlation, 1 means perfect positive correlation, and -1 means perfect negative correlation. The document provides examples to demonstrate how to compute r using raw data points and grouped frequency tables to understand the correlation between different variables.

Uploaded by

Martin Kobimbo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CORRELATION

DEFINITION
Correlation refers to the relationship of two variables or more e.g. relation between height of
father and son, yield and rainfall, wage and price index, etc.

USES OF CORRELATION
1. It is used in physical and social sciences.
2. It is useful for economists to study the relationship between variables like price, quantity
etc. Businessmen estimates costs, sales, price etc. using correlation.
3. It is helpful in measuring the degree of relationship between the variables like income and
expenditure, price and supply, supply and demand etc.
4. Sampling error can be calculated.
5. It is the basis for the concept of regression.

COMPUTATION OF CORRELATION

Karl Pearson’ S Coefficient of Correlation


When there exists some relationship between two variables, we have to measure the degree of
relationship. This measure is called the measure of correlation (or) correlation coefficient and
it is denoted by ‘ r’.

Steps:
1. Find the mean of the two series x and y.
2. Take deviations of the two series from x and y.
𝑋 = 𝑥 − 𝑥̅ , 𝑦 − 𝑦̅
3. Square the deviations and get the total, of the respective squares of deviations of x and y
and denote by ∑ 𝑋 2 , ∑ 𝑌 2 respectively.
4. Multiply the deviations of x and y and get the total and divide by n. This is covariance.
5. Substitute the values in the formula.
𝐶𝑂𝑉(𝑥, 𝑦) ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅⁄𝑛
𝑟= =
𝜎𝑥. 𝜎𝑦
√∑(𝑥 − 𝑥̅ )2 . √∑(𝑦 − 𝑦̅)2
𝑛 𝑛
The above formula is simplified as follows:
∑ 𝑋𝑌
𝑟= , 𝑋 = 𝑥 − 𝑥̅ , 𝑌 = 𝑦 − 𝑦̅
∑ 𝑋2. ∑ 𝑌2

Page 1 of 5
Example 1
Find Karl Pearson’ s coefficient of correlation from the following data between height of
father (x) and son (y).

X 64 65 66 67 68 69 70
Y 66 67 65 68 70 68 72

Required:
Comment on the result.

Solution
X Y ̅
𝑿=𝒙−𝒙 𝑿𝟐 ̅
𝒀=𝒚−𝒚 𝒀𝟐 XY
𝑿 = 𝒙 − 𝟔𝟕 𝒀 = 𝒚 − 𝟔𝟖
64 66 ─3 9 ─2 4 6
65 67 ─2 4 ─1 1 2
66 65 ─1 1 ─3 9 3
67 68 0 0 0 0 0
68 70 1 1 2 4 2
69 68 2 4 0 0 0
70 72 3 9 4 16 12
469 476 0 28 0 34 25

469 476
𝑥̅ = = 67; 𝑦̅ = = 68
7 7

∑ 𝑋𝑌 25
𝑟 = = = 0.81
∑ 𝑋 2 . ∑ 𝑌 2 √28 ∗ 34

Comment: Since r = + 0.81, the variables are highly positively correlated i.e. Tall fathers
have tall sons.

Correlation of Grouped Bi-Variate Data


When the number of observations is very large, the data is classified into two way frequency
distribution or correlation table. The class intervals for y’ are in the column headings and for
‘ x’ in the stubs. The order can also be reversed. The frequencies for each cell of the table are
obtained. The formula for calculation of correlation coefficient ‘ r’ is

Page 2 of 5
𝑁 ∑ 𝑓𝑢𝑣 (∑ 𝑓𝑢)(∑ 𝑓𝑣)
𝑟=
√[𝑁 ∑ 𝑓𝑢2 − (∑ 𝑓𝑢)2 ]. [𝑁 ∑ 𝑓𝑣 2 − (∑ 𝑓𝑣)2 ]

Steps:
1. Take the step deviations of the variable x and denote these deviations by u.
2. Take the step deviations of the variable y and denote these deviations by v.
3. Multiply uv and the respective frequency of each cell and unite the figure obtained in the
right hand bottom corner of each cell.
4. Add the corrected (all) as calculated in step 3 and obtain the total Σfuv.
5. Multiply the frequencies of the variable x by the deviations of x and obtain the total Σfu.
6. Take the squares of the step deviations of the variable x and multiply them by the
respective frequencies and obtain the Σfu2
7. Similarly get Σfv and Σfv2. Then substitute these values in the formula to get the value of ‘
r’.

Example 2
The following are the marks obtained by 132 students in two tests.
Test 1 30–40 40–50 50–60 60–70 70–80 Total

Test 2
20–30 2 5 3 10
30–40 1 8 12 6 27
40–50 5 22 14 1 42
50–60 2 16 9 2 29
60–70 1 8 6 1 16
70–80 2 4 2 8
Total 3 21 63 39 6 132

Required
Calculate the correlation coefficient.

Solution
Let x denote Test 1 marks and y denote Test 2 marks.

𝑥 − 𝑎 𝑥 − 55
𝑢= =
𝑏 10

Page 3 of 5
𝑦 − 𝑐 𝑦 − 45
𝑣= =
𝑑 10
Mid x 35 45 55 65 75 f fv fv2 fuv

Mid y
u –2 –1 0 1 2
v
25 –2 2 8 5 10 3 0 10 –20 40 18
35 –1 1 2 8 8 12 0 6 –6 27 –27 27 4
45 0 5 0 22 0 14 0 1 0 42 0 0 0
55 1 2 –2 16 0 9 9 2 4 29 29 29 11
65 2 1 –2 8 0 6 12 1 4 16 32 64 14
75 3 2 0 4 12 2 12 8 24 72 24
f 3 21 63 39 6 132 38 232 71
fu –6 –21 0 39 12 24 Check
fu2 12 21 0 39 24 96
fuv 10 14 0 27 20 71

𝑁 ∑ 𝑓𝑢𝑣 (∑ 𝑓𝑢)(∑ 𝑓𝑣)


𝑟=
√[𝑁 ∑ 𝑓𝑢2 − (∑ 𝑓𝑢)2 ]. [𝑁 ∑ 𝑓𝑣 2 − (∑ 𝑓𝑣)2 ]

(132 ∗ 71) − (24 ∗ 38)


=
√(132 ∗ 96) − (24)2 ]. [132 ∗ 232 − (38)2 ]

9372 − 912
=
√(12672 − 576) − (30264 − 1444)

9372 − 912 8460


= = = 0.4503
109.96 ∗ 170.82 18786.78

Properties of Correlation
1. Correlation coefficient lies between –1 and +1
2. ‘ r’ is independent of change of origin and scale.
3. It is a pure number independent of units of measurement.
4. Independent variables are uncorrelated but the converse is not true.
5. Correlation coefficient is the geometric mean of two regression coefficients.
6. The correlation coefficient of x and y is symmetric. rxy = ryx.

Page 4 of 5
Limitations
1. Correlation coefficient assumes linear relationship regardless of the assumption is correct
or not.
2. Extreme items of variables are being unduly operated on correlation coefficient.
3. Existence of correlation does not necessarily indicate causeeffect relation.

Interpretation
The following rules helps in interpreting the value of ‘ r’ .
1. When r = 1, there is perfect + ve relationship between the variables.
2. When r = – 1, there is perfect – ve relationship between the variables.
3. When r = 0, there is no relationship between the variables.
4. If the correlation is +1 or –1, it signifies that there is a high degree of correlation.
5. (+ve or –ve) between the two variables.
6. If r is near to zero (ie) 0.1, – 0.1, (or) 0.2 there is less correlation.

Page 5 of 5

You might also like