0% found this document useful (0 votes)
20 views57 pages

Correlation and Regression

The document provides an overview of correlation, including its definition, types, and methods for calculating correlation coefficients such as Karl Pearson's and Spearman's Rank correlation. It outlines the interpretation of correlation coefficients, their significance, and practical applications in decision-making and statistical analysis. Additionally, the document includes examples and problems to illustrate the calculation of correlation coefficients and their interpretation.

Uploaded by

backup4sudarshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views57 pages

Correlation and Regression

The document provides an overview of correlation, including its definition, types, and methods for calculating correlation coefficients such as Karl Pearson's and Spearman's Rank correlation. It outlines the interpretation of correlation coefficients, their significance, and practical applications in decision-making and statistical analysis. Additionally, the document includes examples and problems to illustrate the calculation of correlation coefficients and their interpretation.

Uploaded by

backup4sudarshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

B.

Com III SEM


Correlation:

Correlation is a statistical technique which studies the


relationship between two or more variable.

Co-efficient of Correlation:

It is the numerical measure of the amount of correlation


existing between the two variables X and Y – the subject and
the relative respectively. It is denoted by ‘r’.

The value of ‘r’ ranges from (-1.0 to +1.0).


Types of Correlation on the basis of

Number of
Direction Change
Sets

Positive Simple Linear


Correlation Correlation Correlation

Negative Multiple Non-linear


Correlation Correlation Correlation

Partial Total
Correlation Correlation
Uses of correlation:

1. To determine how strongly the scores of two variable are


associated or correlated with each other.
2. Helps in making important decisions.
3. Helps in estimating the value of one variable given the value of
another by studying the relationship between the two variable.
4. It can be used as the step before conducting a statistical
experiment as it determines the degree of relationship between
the two variables.
Methods of Correlation:

A. Karl Pearson’s Coefficient Correlation Method.

B. Spearman’s Rank Correlation.


Karl Pearson’s Coefficient Correlation Method:

 Karl Pearson(1867-1936)

 British biometrician and statistician

 Karl Pearson's coefficient of correlation is denoted As “ r”.


Interpretation of correlation coefficient:

According to Karl Pearson, the coefficient of correlation lies between


two limits, +1 and -1, Within these limits, the value of correlation
coefficient is interpreted as follows:
Degree of Correlation Positive Negative
Correlation lies between Range Range
+1 and -1 Approximation From +1 To 0 From +1 To 0
1. Perfect +1 -1
2. Very High +1.00 +0.90 -0.90 -1.00
3. High +0.90 +0.75 -0.75 -0.90
4. Moderate +0.75 +0.60 -0.60 -0.75
5. Low +0.60 +0.30 -0.30 -0.60
6. Very Low +0.30 +0.00 -0.00 -0.30
7. No Correlation 0 0
Utility of Probable error:

1. If I r I > 6PE, then correlation is taken to be significant

2. If I r I < 6PE, then correlation is taken to be insignificant. This means that there is no
evidence of existence of correlation in both the series.

3. It determine the upper and lower limit within which the correlation of randomly selected
sample will fall.
Upper limit = r + PE
Lower limit = r - PE
Problems:

Calculate the Coefficient and correlation between the


variables and comment.
X Y

6 9

2 11

10 5

4 8

8 7
Solution:

x y dx = dy = dxdy dx2 dy2


(x-6) (y-8)

6 9 0 1 0 0 1

2 11 -4 3 -12 16 9

10 5 4 -3 -12 16 9

4 8 -2 0 0 4 0

8 7 2 -1 -2 4 1

∑x = ∑y = -26 40 20
30 40
Therefore there is a negative with VERY high degree of correlation.
Find the coefficient of correlation between the variable and comment
on the result.
X y
6 10
8 12
12 15
15 15
18 18
20 25
24 22
28 26
31 28
Solution:

X Y dx dy dxdy dx2 dy2


= x-18 = y-19
6 10 -12 -9 108 144 81

8 12 -10 -7 70 100 49

12 15 -6 -4 24 36 16

15 15 -3 -4 12 9 16

18 18 0 -1 0 0 1

20 25 2 6 12 4 36

24 22 6 3 18 36 9

28 26 10 7 70 100 49

31 28 13 9 117 169 81

∑x = ∑y= 171 431 598 338


162
Spearman’s Rank correlation:

Charles Edward Spearman, a British Psychologist, developed a formula to obtain the rank
correlation coefficient in 1904, He has tried to establish the rank correlation coefficient between
the “Ranks” of ‘n’ individuals in the two or more variables.

Spearman’s Rank Correlation is denoted by ‘rs’, its based on the rank’s of the variable. Variables
are assigned rank according to their size.

Eg: Fashion show, Cooking contest.


STEPS IN CALCULATING RANK CORRELATION

Step 1: Assign the ranks for the given variables

STEP 2: Take a difference of two RANKS (Rx-Ry) and denote the difference by ‘d’

STEP 3: Square this difference and you will get d2


STEP 4 : Apply the following formula

If ranks are not repeated

rs =1- 6∑d2
n3-n

If ranks are repeated

rs = 1- 6{∑d2+1 / 12(m3-m)+1 / 12(m3-m)..}


n3-n
Problems:

In a beauty contest 2 judges ranked 12 participants. What is the degree of argument between 2
judges.

Judge 1 Judge 2
3 6
4 10
1 12
5 3
2 9
10 2
6 5
9 8
8 7
7 4
12 1
11 11
Solution:
rs =1- 6∑d2
x y Rx Ry d= Rx-Ry d2
n3-n
3 6 10 7 3 9
4 10 9 3 6 36 6 x 416
rs = 1- ----------
1 12 12 1 11 121 123 -12
5 3 8 10 -2 4
2 9 2496
11 4 7 49
rs = 1- ----------
10 2 3 11 -8 64 1728 -12
6 5 7 8 -1 1
rs = 1- 2496
9 8 4 5 -1 1 ---------
8 7 5 6 -1 1 1716
7 4 6 9 -3 9
rs = 1- 1.45
12 1 1 12 -11 121
11 11 2 2 0 0 rs = - 0.45
∑d2 =
416
Therefore there is low degree of negative rank
correlation.
Calculate Rank Correlation co-efficient for the following.

x y
60 75
34 32
40 35
50 40
45 45
41 33
22 12
43 30
42 36
66 72
64 41
46 57
Solution:
X Y Rx Ry d= Rx -Ry d2
60 75 3 1 2 4 rs =1- 6∑d2
n3-n
34 32 11 10 1 1
40 35 10 8 2 4 6 x 48
rs = 1- ----------
50 40 4 6 -2 4 123 -12
45 45 6 4 2 4
288
41 33 9 9 0 0 rs = 1- ----------
22 12 12 12 0 0 1728 - 12

43 30 7 11 -4 16 288
42 36 8 7 1 1 rs = 1- --------------
1716
66 72 1 2 -1 1
64 41 2 5 -3 9 rs = 1- 0.1678

46 57 5 3 2 4 rs = 0.832
∑d2
Therefore there is a high degree of positive
= 48 rank correlation.
Calculate rank correlation co-efficient between the ranks given for x and y variables.

X Y

6 5

4 6

5 3

3 4

1 1

2 2
Solution: rs =1- 6∑d2
n3-n

Rx Ry d= Rx-Ry d2 6 x 10
rs = 1- ----------
6 5 1 1 63 -6

4 6 -2 4 60
rs = 1- ----------
216-6
5 3 2 4
60
3 4 -1 1 rs = 1- --------------
210
1 1 0 0
rs = 1-0.286
2 2 0 0
rs = 0.714

10 Therefore there is a moderate degree


of positive correlation.
Calculate rank correlation co-efficient for the following data.

Marks in Marks in
Statistics Accounts
115 75

109 73

112 85

87 70

98 76

120 82

98 65

100 73

98 68

118 80
Solution:
x y Rx Ry d= Rx-Ry d2

115 75 3 5 -2 4

109 73 5 6.5 -1.5 2.25

112 85 4 1 3 9
7+8+9
87 70 10 8 2 4 ---------- = 8
3
98 76 8 4 4 16

120 82 1 2 -1 1 6+7
---------- = 6.5
98 65 8 10 -2 4 2

100 73 6 6.5 -0.5 0.25

98 68 8 9 -1 1

118 80 2 3 -1 1

42.5
rs = 1- 6{∑d2+1 / 12(m3-m)} + 1 / 12 (m3-m)}
n3-n

6[ 42.5+ 1/12 (33 - 3) + 1/12 (23 - 2)


rs = 1- ---------------------------------------------------
10 3 – 10

6[ 42.5 + 1/12 (27-3) + 1/12 (8-2)]


rs = 1- ---------------------------------------------------
1000 – 10

1 2 2 1
6[ 42.5 + 1/12 (24) + 1/12 (6)]
rs = 1- -------------------------------------------------
990

6 [ 42.5 + 2 + 0.5]
= 1 - ------------------------------------
990
6 x 45 270
= 1- ------------------- = 1- --------------- = 1- 0.27 = 0.73
990 990

Therefore there is moderate degree of positive correlation.


Calculate rank correlation co-efficient for the following data.

x Y

60 75

34 32

40 35

50 40

45 45

41 33

22 45

43 50

42 45

66 40
Solution:

x y Rx Ry d= Rx-Ry d2

60 75 2 1 1 1
3+4+5
34 32 9 10 -1 1 ---------- = 4
3
40 35 8 8 0 0

50 40 3 6.5 -3.5 12.25 6+7


----- = 6.5
45 45 4 4 0 0 2
41 33 7 9 -2 4

22 45 10 4 6 36

43 50 5 2 3 9

42 45 6 4 2 4

66 40 1 6.5 -5.5 30.25

97.5
rs = 1- 6{∑d2+1 / 12(m3-m)} + 1 / 12 (m3-m)}
n3-n

6[ 97.5+ 1/12 (33 - 3) + 1/12 (23 - 2)


rs = 1- ---------------------------------------------------
10 3 – 10

6[ 97.5 + 1/12 (27-3) + 1/12 (8-2)]


rs = 1- ---------------------------------------------------
1000 – 10

1 2 2 1
6[ 97.5 + 1/12 (24) + 1/12 (6)]
rs = 1- -------------------------------------------------
990

6 [ 97.5 + 2 + 0.5]
= 1 - ------------------------------------
990
6 x 100 600
= 1- ------------------- = 1- --------------- = 1- 0.606= 0.394
990 990

Therefore there is low degree of positive correlation.


5. Calculate the coefficient of correlation from the following data and calculate its probable
error.

Marks in Marks in
statistics Accountancy
30 06

60 36

30 12

66 48

72 30

24 06

18 24

12 36

42 30

06 12
Solution:

x Y dx dy dxdy dx2 dy2


= x- 36 = y-24
30 06 -6 -18 108 36 324

60 36 24 12 288 576 144

30 12 -6 -12 72 36 144

66 48 30 24 720 900 576

72 30 36 6 216 1296 36

24 06 -12 -18 216 144 324

18 24 -18 0 0 324 0

12 36 -24 12 -288 576 144

42 30 6 6 36 36 36

06 12 -30 -12 360 900 144


∑x = ∑y = 240 1728 4824 1872
360
Regression Analysis

• It is technique for predicting the value of a dependent variable on the basis of independent
variable.

• In a cause and effect relationship, the independent variable is the cause and the dependent
variable is the effect. It is the statistical procedure for determining the relationship between
values of independent variables.

• British biometrician : Sir Francis Galton

• 19th century

• The statistical technique that express the relationship between two or more variables
in the form of an equation to estimate the value of a variable, based on the given
value of another variable is called Regression Analysis.
In Regression Analysis there are two types of variables:

 Dependent Variable (Y)


The variable whose value is estimated using the algebraic equation is called
Dependent Variable.

 Independent Variable (X)


The variable whose value is used to estimate or predict another variable is
called Independent Variable.
Uses of Regression:

I. It is used to predict a continuous dependent variable from a number of


independent variables.

II. It is used by the management accountant for both planning and control
purposes.

III. It is useful in indicating the degree of association or correlation that exists between
the two variables.

IV. It provides estimates of values of the dependent variables from the values of
independent variables.
Difference between correlation and regression:

Correlation Regression

1. It measure the degree and direction of 1. It measure the nature and extent of relationship
relationship between the variables. between the two variables.
2. It tests closeness between the two variables 2. It estimates future dependent variables
3. It does not indicate the cause and effect 3. It indicates cause and effect relationship between
relationship between the variables. the variables.
4. Both the variables are interdependent 4. One is independent and other one is dependent.
5. There may be non- sense correlation 5. There is no such non-sense regression.
between two variables.
6. Correlation has a limited application 6. Regression has a wider applications.
Problems:

1. Find the two regression equation from the following data

X Y

2 10

4 20

6 25

8 30
Solution:

x y XY X2 Y2

2 10 -3 -11.25 33.75 9 126.56

4 20 -1 -1.25 1.25 1 1.56

6 25 1 3.75 3.75 1 14.06

8 30 3 8.75 26.25 9 76.56

20 85 65 20 218.74
∑XY 65
bxy = ----------- = ------- = 0.297
∑Y2 218.74

∑XY 65
byx = ----------- = ------- = 3.25
∑X2 20
2. Calculate,
a. Two regression equations.
b. Estimate the value of x when y is 20.
c. Determine the value of co-efficient of correlation through regression co-efficient.

X Y

10 5

12 6

13 7

17 9

18 13
Solution:

x Y XY X2 Y2

10 5 -4 -3 12 16 9

12 6 -2 -2 4 4 4

13 7 -1 -1 1 1 1

17 9 3 1 3 9 1

18 13 4 5 20 16 25

70 40 40 46 40
∑XY 40
bxy = ----------- = --------- = 1
∑Y2 40

∑XY 40
byx = ----------- = --------- = 0.869
∑ 2
∑X 46
b. Estimate the value of x when y is 20

x= 1y + 6
x = 1 (20) + 6
x = 20 + 6

x = 26

c. Determine the value of co-efficient of correlation through regression co-efficient


Calculate the two regression equations from the following data.

x y
Mean 36 85
Standard deviation 11 8
r = 0.66

Solution:
Obtain two regression equations from the following data.

x y
Mean 20 120
Standard deviation 5 125
r = 0.8
Find x when y = 25 and find y when x = 150.
Solution:
 Find x when y = 25

x = 0.032y + 16.16
x= 0.032 ( 25) + 16.16
x = 0.8 + 16.16
x = 16.96

 Find y when x = 150.

y = 20x – 280
y = 20( 150) – 280
y= 3000 – 280
y = 2720
Calculate two regression equation for the following.
x y
Mean 22 36
SD 13 10
Co-efficient of correlation= 0.8

Solution:
16.Given below the information about advertising expenses and sales of company.
a) Calculate 2 regression equation
b) Find the likely sales when advertising expenses in Rs 20
c) What should be the advertising expenses when the company sales is 190 lakhs.
Advertising expenses(x) Sales(y)
Mean 10 90
Variance 9 144
Co-efficient of correlation = 0.8
Solution:
b) Find the likely sales when advertising expenses in Rs 20
y = 3.2x + 58
y = 3.2 ( 20 ) + 58
y = 64 + 58
Y = 122

Therefore the likely sales is Rs. 122 lakhs when advertising expenses is Rs. 20

c) What should be the advertising expenses when the company sales is 190 lakhs.

x = 0.2y – 8
x= 0.2 ( 190) – 8
x = 38 – 8
x = 30

Therefore the advertising expenses is Rs 30 when the company sales is 190 lakhs.

You might also like