B.
Com III SEM
Correlation:
Correlation is a statistical technique which studies the
relationship between two or more variable.
Co-efficient of Correlation:
It is the numerical measure of the amount of correlation
existing between the two variables X and Y – the subject and
the relative respectively. It is denoted by ‘r’.
The value of ‘r’ ranges from (-1.0 to +1.0).
Types of Correlation on the basis of
Number of
Direction Change
Sets
Positive Simple Linear
Correlation Correlation Correlation
Negative Multiple Non-linear
Correlation Correlation Correlation
Partial Total
Correlation Correlation
Uses of correlation:
1. To determine how strongly the scores of two variable are
associated or correlated with each other.
2. Helps in making important decisions.
3. Helps in estimating the value of one variable given the value of
another by studying the relationship between the two variable.
4. It can be used as the step before conducting a statistical
experiment as it determines the degree of relationship between
the two variables.
Methods of Correlation:
A. Karl Pearson’s Coefficient Correlation Method.
B. Spearman’s Rank Correlation.
Karl Pearson’s Coefficient Correlation Method:
Karl Pearson(1867-1936)
British biometrician and statistician
Karl Pearson's coefficient of correlation is denoted As “ r”.
Interpretation of correlation coefficient:
According to Karl Pearson, the coefficient of correlation lies between
two limits, +1 and -1, Within these limits, the value of correlation
coefficient is interpreted as follows:
Degree of Correlation Positive Negative
Correlation lies between Range Range
+1 and -1 Approximation From +1 To 0 From +1 To 0
1. Perfect +1 -1
2. Very High +1.00 +0.90 -0.90 -1.00
3. High +0.90 +0.75 -0.75 -0.90
4. Moderate +0.75 +0.60 -0.60 -0.75
5. Low +0.60 +0.30 -0.30 -0.60
6. Very Low +0.30 +0.00 -0.00 -0.30
7. No Correlation 0 0
Utility of Probable error:
1. If I r I > 6PE, then correlation is taken to be significant
2. If I r I < 6PE, then correlation is taken to be insignificant. This means that there is no
evidence of existence of correlation in both the series.
3. It determine the upper and lower limit within which the correlation of randomly selected
sample will fall.
Upper limit = r + PE
Lower limit = r - PE
Problems:
Calculate the Coefficient and correlation between the
variables and comment.
X Y
6 9
2 11
10 5
4 8
8 7
Solution:
x y dx = dy = dxdy dx2 dy2
(x-6) (y-8)
6 9 0 1 0 0 1
2 11 -4 3 -12 16 9
10 5 4 -3 -12 16 9
4 8 -2 0 0 4 0
8 7 2 -1 -2 4 1
∑x = ∑y = -26 40 20
30 40
Therefore there is a negative with VERY high degree of correlation.
Find the coefficient of correlation between the variable and comment
on the result.
X y
6 10
8 12
12 15
15 15
18 18
20 25
24 22
28 26
31 28
Solution:
X Y dx dy dxdy dx2 dy2
= x-18 = y-19
6 10 -12 -9 108 144 81
8 12 -10 -7 70 100 49
12 15 -6 -4 24 36 16
15 15 -3 -4 12 9 16
18 18 0 -1 0 0 1
20 25 2 6 12 4 36
24 22 6 3 18 36 9
28 26 10 7 70 100 49
31 28 13 9 117 169 81
∑x = ∑y= 171 431 598 338
162
Spearman’s Rank correlation:
Charles Edward Spearman, a British Psychologist, developed a formula to obtain the rank
correlation coefficient in 1904, He has tried to establish the rank correlation coefficient between
the “Ranks” of ‘n’ individuals in the two or more variables.
Spearman’s Rank Correlation is denoted by ‘rs’, its based on the rank’s of the variable. Variables
are assigned rank according to their size.
Eg: Fashion show, Cooking contest.
STEPS IN CALCULATING RANK CORRELATION
Step 1: Assign the ranks for the given variables
STEP 2: Take a difference of two RANKS (Rx-Ry) and denote the difference by ‘d’
STEP 3: Square this difference and you will get d2
STEP 4 : Apply the following formula
If ranks are not repeated
rs =1- 6∑d2
n3-n
If ranks are repeated
rs = 1- 6{∑d2+1 / 12(m3-m)+1 / 12(m3-m)..}
n3-n
Problems:
In a beauty contest 2 judges ranked 12 participants. What is the degree of argument between 2
judges.
Judge 1 Judge 2
3 6
4 10
1 12
5 3
2 9
10 2
6 5
9 8
8 7
7 4
12 1
11 11
Solution:
rs =1- 6∑d2
x y Rx Ry d= Rx-Ry d2
n3-n
3 6 10 7 3 9
4 10 9 3 6 36 6 x 416
rs = 1- ----------
1 12 12 1 11 121 123 -12
5 3 8 10 -2 4
2 9 2496
11 4 7 49
rs = 1- ----------
10 2 3 11 -8 64 1728 -12
6 5 7 8 -1 1
rs = 1- 2496
9 8 4 5 -1 1 ---------
8 7 5 6 -1 1 1716
7 4 6 9 -3 9
rs = 1- 1.45
12 1 1 12 -11 121
11 11 2 2 0 0 rs = - 0.45
∑d2 =
416
Therefore there is low degree of negative rank
correlation.
Calculate Rank Correlation co-efficient for the following.
x y
60 75
34 32
40 35
50 40
45 45
41 33
22 12
43 30
42 36
66 72
64 41
46 57
Solution:
X Y Rx Ry d= Rx -Ry d2
60 75 3 1 2 4 rs =1- 6∑d2
n3-n
34 32 11 10 1 1
40 35 10 8 2 4 6 x 48
rs = 1- ----------
50 40 4 6 -2 4 123 -12
45 45 6 4 2 4
288
41 33 9 9 0 0 rs = 1- ----------
22 12 12 12 0 0 1728 - 12
43 30 7 11 -4 16 288
42 36 8 7 1 1 rs = 1- --------------
1716
66 72 1 2 -1 1
64 41 2 5 -3 9 rs = 1- 0.1678
46 57 5 3 2 4 rs = 0.832
∑d2
Therefore there is a high degree of positive
= 48 rank correlation.
Calculate rank correlation co-efficient between the ranks given for x and y variables.
X Y
6 5
4 6
5 3
3 4
1 1
2 2
Solution: rs =1- 6∑d2
n3-n
Rx Ry d= Rx-Ry d2 6 x 10
rs = 1- ----------
6 5 1 1 63 -6
4 6 -2 4 60
rs = 1- ----------
216-6
5 3 2 4
60
3 4 -1 1 rs = 1- --------------
210
1 1 0 0
rs = 1-0.286
2 2 0 0
rs = 0.714
10 Therefore there is a moderate degree
of positive correlation.
Calculate rank correlation co-efficient for the following data.
Marks in Marks in
Statistics Accounts
115 75
109 73
112 85
87 70
98 76
120 82
98 65
100 73
98 68
118 80
Solution:
x y Rx Ry d= Rx-Ry d2
115 75 3 5 -2 4
109 73 5 6.5 -1.5 2.25
112 85 4 1 3 9
7+8+9
87 70 10 8 2 4 ---------- = 8
3
98 76 8 4 4 16
120 82 1 2 -1 1 6+7
---------- = 6.5
98 65 8 10 -2 4 2
100 73 6 6.5 -0.5 0.25
98 68 8 9 -1 1
118 80 2 3 -1 1
42.5
rs = 1- 6{∑d2+1 / 12(m3-m)} + 1 / 12 (m3-m)}
n3-n
6[ 42.5+ 1/12 (33 - 3) + 1/12 (23 - 2)
rs = 1- ---------------------------------------------------
10 3 – 10
6[ 42.5 + 1/12 (27-3) + 1/12 (8-2)]
rs = 1- ---------------------------------------------------
1000 – 10
1 2 2 1
6[ 42.5 + 1/12 (24) + 1/12 (6)]
rs = 1- -------------------------------------------------
990
6 [ 42.5 + 2 + 0.5]
= 1 - ------------------------------------
990
6 x 45 270
= 1- ------------------- = 1- --------------- = 1- 0.27 = 0.73
990 990
Therefore there is moderate degree of positive correlation.
Calculate rank correlation co-efficient for the following data.
x Y
60 75
34 32
40 35
50 40
45 45
41 33
22 45
43 50
42 45
66 40
Solution:
x y Rx Ry d= Rx-Ry d2
60 75 2 1 1 1
3+4+5
34 32 9 10 -1 1 ---------- = 4
3
40 35 8 8 0 0
50 40 3 6.5 -3.5 12.25 6+7
----- = 6.5
45 45 4 4 0 0 2
41 33 7 9 -2 4
22 45 10 4 6 36
43 50 5 2 3 9
42 45 6 4 2 4
66 40 1 6.5 -5.5 30.25
97.5
rs = 1- 6{∑d2+1 / 12(m3-m)} + 1 / 12 (m3-m)}
n3-n
6[ 97.5+ 1/12 (33 - 3) + 1/12 (23 - 2)
rs = 1- ---------------------------------------------------
10 3 – 10
6[ 97.5 + 1/12 (27-3) + 1/12 (8-2)]
rs = 1- ---------------------------------------------------
1000 – 10
1 2 2 1
6[ 97.5 + 1/12 (24) + 1/12 (6)]
rs = 1- -------------------------------------------------
990
6 [ 97.5 + 2 + 0.5]
= 1 - ------------------------------------
990
6 x 100 600
= 1- ------------------- = 1- --------------- = 1- 0.606= 0.394
990 990
Therefore there is low degree of positive correlation.
5. Calculate the coefficient of correlation from the following data and calculate its probable
error.
Marks in Marks in
statistics Accountancy
30 06
60 36
30 12
66 48
72 30
24 06
18 24
12 36
42 30
06 12
Solution:
x Y dx dy dxdy dx2 dy2
= x- 36 = y-24
30 06 -6 -18 108 36 324
60 36 24 12 288 576 144
30 12 -6 -12 72 36 144
66 48 30 24 720 900 576
72 30 36 6 216 1296 36
24 06 -12 -18 216 144 324
18 24 -18 0 0 324 0
12 36 -24 12 -288 576 144
42 30 6 6 36 36 36
06 12 -30 -12 360 900 144
∑x = ∑y = 240 1728 4824 1872
360
Regression Analysis
• It is technique for predicting the value of a dependent variable on the basis of independent
variable.
• In a cause and effect relationship, the independent variable is the cause and the dependent
variable is the effect. It is the statistical procedure for determining the relationship between
values of independent variables.
• British biometrician : Sir Francis Galton
• 19th century
• The statistical technique that express the relationship between two or more variables
in the form of an equation to estimate the value of a variable, based on the given
value of another variable is called Regression Analysis.
In Regression Analysis there are two types of variables:
Dependent Variable (Y)
The variable whose value is estimated using the algebraic equation is called
Dependent Variable.
Independent Variable (X)
The variable whose value is used to estimate or predict another variable is
called Independent Variable.
Uses of Regression:
I. It is used to predict a continuous dependent variable from a number of
independent variables.
II. It is used by the management accountant for both planning and control
purposes.
III. It is useful in indicating the degree of association or correlation that exists between
the two variables.
IV. It provides estimates of values of the dependent variables from the values of
independent variables.
Difference between correlation and regression:
Correlation Regression
1. It measure the degree and direction of 1. It measure the nature and extent of relationship
relationship between the variables. between the two variables.
2. It tests closeness between the two variables 2. It estimates future dependent variables
3. It does not indicate the cause and effect 3. It indicates cause and effect relationship between
relationship between the variables. the variables.
4. Both the variables are interdependent 4. One is independent and other one is dependent.
5. There may be non- sense correlation 5. There is no such non-sense regression.
between two variables.
6. Correlation has a limited application 6. Regression has a wider applications.
Problems:
1. Find the two regression equation from the following data
X Y
2 10
4 20
6 25
8 30
Solution:
x y XY X2 Y2
2 10 -3 -11.25 33.75 9 126.56
4 20 -1 -1.25 1.25 1 1.56
6 25 1 3.75 3.75 1 14.06
8 30 3 8.75 26.25 9 76.56
20 85 65 20 218.74
∑XY 65
bxy = ----------- = ------- = 0.297
∑Y2 218.74
∑XY 65
byx = ----------- = ------- = 3.25
∑X2 20
2. Calculate,
a. Two regression equations.
b. Estimate the value of x when y is 20.
c. Determine the value of co-efficient of correlation through regression co-efficient.
X Y
10 5
12 6
13 7
17 9
18 13
Solution:
x Y XY X2 Y2
10 5 -4 -3 12 16 9
12 6 -2 -2 4 4 4
13 7 -1 -1 1 1 1
17 9 3 1 3 9 1
18 13 4 5 20 16 25
70 40 40 46 40
∑XY 40
bxy = ----------- = --------- = 1
∑Y2 40
∑XY 40
byx = ----------- = --------- = 0.869
∑ 2
∑X 46
b. Estimate the value of x when y is 20
x= 1y + 6
x = 1 (20) + 6
x = 20 + 6
x = 26
c. Determine the value of co-efficient of correlation through regression co-efficient
Calculate the two regression equations from the following data.
x y
Mean 36 85
Standard deviation 11 8
r = 0.66
Solution:
Obtain two regression equations from the following data.
x y
Mean 20 120
Standard deviation 5 125
r = 0.8
Find x when y = 25 and find y when x = 150.
Solution:
Find x when y = 25
x = 0.032y + 16.16
x= 0.032 ( 25) + 16.16
x = 0.8 + 16.16
x = 16.96
Find y when x = 150.
y = 20x – 280
y = 20( 150) – 280
y= 3000 – 280
y = 2720
Calculate two regression equation for the following.
x y
Mean 22 36
SD 13 10
Co-efficient of correlation= 0.8
Solution:
16.Given below the information about advertising expenses and sales of company.
a) Calculate 2 regression equation
b) Find the likely sales when advertising expenses in Rs 20
c) What should be the advertising expenses when the company sales is 190 lakhs.
Advertising expenses(x) Sales(y)
Mean 10 90
Variance 9 144
Co-efficient of correlation = 0.8
Solution:
b) Find the likely sales when advertising expenses in Rs 20
y = 3.2x + 58
y = 3.2 ( 20 ) + 58
y = 64 + 58
Y = 122
Therefore the likely sales is Rs. 122 lakhs when advertising expenses is Rs. 20
c) What should be the advertising expenses when the company sales is 190 lakhs.
x = 0.2y – 8
x= 0.2 ( 190) – 8
x = 38 – 8
x = 30
Therefore the advertising expenses is Rs 30 when the company sales is 190 lakhs.