Correlation Regression
Correlation Regression
BACHELOR OF TECHNOLOGY
in
Prepared By
Dr Mihir Suthar
1. Correlation
• Correlation is the relationship that exists between two or more variables. Two variables are said to be
correlated if a change in one variable affects a change in the other variable. Such a data connecting two
variables is called bivariate data.
• Correlation measures the closeness of the relationship between the variables.
• Some examples of a relationship are as follows:
▪ Relationship between heights and weights
▪ Relationship between price and demand of commodity
▪ Relationship between age of husband and age of wife
➢ Positive correlation
If the value of one variable increases, the value of the other variable also increases, or, if value of one
variable decreases, the value of the other variable also decreases. This type of correlation is said to be
positive correlation.
e.g. The correlation between heights and weights of group of persons
➢ Negative correlation
If the value of one variable increases, the value of the other variable decreases, or, if value of one variable
decreases, the value of the other variable increases. This type of correlation is said to be negative
correlation.
e.g. The correlation between the price and demand of a commodity
Price (Rs 15 10 8 7 6 3
per unit)
Demand 150 200 220 260 300 320
(units)
➢ Simple correlation
The relationship between only two variables is described as simple correlation.
e.g. The quantity of money and price level, demand and price
➢ Multiple correlation
The relationship between more than two variables is described as multiple correlation.
e.g. Relationship between price, demand and supply of a commodity
➢ Partial correlation
When more than two variables are studied excluding some other variables, the relationship is termed as
partial correlation.
➢ Total correlation
When more than two variables are studied without excluding any variables, the relationship is termed as
total correlation.
➢ Linear correlation
If the ratio of change between two variables is constant, the correlation is said to be linear.
The graph of a linear relationship will be a straight line.
e.g.
Milk (l) 5 10 15 20 25 30
Curg (kg) 2 4 6 8 10 12
➢ Nonlinear correlation
If the ratio of change between two variables is not constant, the correlation is said to be nonlinear.
The graph of a nonlinear relationship will be a curve.
e.g.
Price (Rs 15 10 8 7 6 3
per unit)
Demand 150 200 220 260 300 320
(units)
❖ Important
• There are various relationship between two variables represented by the following scatter diagrams.
Perfect positive correlation: If all the plotted points lie on a straight line rising from the lower hand corner to
the upper righthand corner, the correlation is said to be perfect positive correlation.
Perfect negative correlation: If all the plotted points lie on a straight line from the upper left-hand corner to the
lower right-hand corner, the correlation is said to be perfect negative correlation.
High degree of negative correlation: If all the plotted points lie in a narrow strip, falling from the upper left-
hand corner to the lower right-hand corner, it indicates the existence of a high degree of negative correlation.
degree of negative correlation.
No correlation: If all the plotted points lie on a straight line parallel to the x- axis or y- axis, it indicates the
absence of any relationship between the variables.
The coefficient of correlation is the measure of correlation between two random variables 𝑋 and 𝑌 , is denoted
by 𝑟.
𝑐𝑜𝑣(𝑋,𝑌)
𝑟= -------------------(1)
𝜎𝑋 𝜎𝑌
Where,
1
𝑐𝑜𝑣(𝑋, 𝑌) = the covariance of variables 𝑋 and 𝑌 = 𝑛 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
∑(𝑦−𝑦̅)2
𝜎𝑌 = the standard deviation of variable 𝑌 = √ 𝑛
So,
1
)
𝑟= 𝑛 ∑(𝑥 − 𝑥̅ (𝑦 − 𝑦̅)
2 2
√∑(𝑥 − 𝑥̅ ) √∑(𝑦 − 𝑦̅)
𝑛 𝑛
By simplifying,
𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑟=
√𝑛∑𝑥 2 − (∑𝑥)2 √𝑛∑𝑦 2 − (∑𝑦)2
Solution: Here, 𝑛 = 9
𝒙 𝒚 𝒙𝟐 𝒚𝟐 𝒙𝒚
9 15 81 225 135
8 16 64 256 128
7 14 49 196 98
6 13 36 169 78
5 11 25 121 55
4 12 16 144 48
3 10 9 100 30
2 8 4 64 16
1 9 1 81 9
∑𝒙 =45 ∑𝒚 =108 ∑𝒙𝟐 =285 ∑𝒚𝟐 =1356 ∑𝒙𝒚 =597
= 0.95
𝒙 𝒚 𝒅𝒙 𝒅𝒚 𝒅𝟐𝒙 𝒅𝟐𝒚 𝒅𝒙 𝒅𝒚
17 23 -6 -4 36 16 24
19 27 -4 0 16 0 0
21 25 -2 -2 4 4 4
26 26 3 -1 9 1 -3
20 27 -3 0 9 0 0
28 25 5 -2 25 4 -10
26 30 3 3 9 9 9
27 33 4 6 16 36 24
∑𝒙 =184 ∑𝒚 =216 ∑𝒅𝒙 =0 ∑𝒅𝒚 =0 ∑𝒅𝟐𝒙 =124 ∑𝒅𝟐𝒚 =70 ∑𝒅𝒙 𝒅𝒚 =48
𝑛 ∑𝑑𝑥 𝑑𝑦 − ∑𝑑𝑥 ∑𝑑𝑦
𝑟=
2
√𝑛 ∑𝑑𝑥2 − (∑𝑑𝑥 )2 √𝑛 ∑𝑑𝑦2 − (∑𝑑𝑦 )
8(48)
= = 0.515
√8(124)√8(70)
Exercise:
1. From the following information relating to the stock exchange quotations for two shares 𝐴 and 𝐵,
ascertain by using Pearson’s coefficient of correlation how shares 𝐴 and 𝐵 are correlated in their prices?
Price share (A) Rs. 160 164 172 182 166 170 178
Price share (B) Rs. 292 280 260 234 266 254 230
Ans. – 0.96
2)
2. For the following data, show that 𝑐𝑜𝑣(𝑥, 𝑥 = 0.
𝑥 -3 -2 -1 0 1 2 3
2
𝑥 9 4 1 0 1 4 9
3. The following data gave the growth of employment in lacs in the organized sector in India between 1988
and 1995.
Year 1988 1989 1990 1991 1992 1993 1994 1995
Public sector 98 101 104 107 113 120 125 128
Private sector 65 65 67 68 68 69 68 68
Find the coefficient of correlation between the employment in public and private sectors.
Ans. 0.77
4. The coefficient of correlation between two variables 𝑋 and 𝑌is 0.48. The covariance is 36. The variance
of 𝑋 is 16. Find the standard deviation of 𝑌. Ans. 18.75
5. Calculate the coefficient of correlation between 𝑥 and 𝑦 from the following data.
𝑛 = 10, ∑𝑥 = 140 , ∑𝑦 = 150, ∑(𝑥 − 10)2 = 180
∑(𝑦 − 15)2 = 215, ∑(𝑥 − 10)(𝑦 − 15) = 60 Ans. 0.915
2. Rank Correlation
• Let a group of 𝑛 individuals be arranged in order of merit with respect to some characteristics. The same
group would give a different rank for different characteristics.
• Considering the orders corresponding to two characteristics 𝐴 and 𝐵, the correlation between these 𝑛 pairs
of ranks is called the rank correlation in the characteristics 𝐴 and 𝐵 for that group of individuals.
Example 1: Ten students got the following percentage of marks in Mathematics and English:
Mathematics(x) 8 36 98 25 75 82 92 62 65 35
English (y) 84 51 91 60 68 62 86 58 35 49
Find the rank correlation coefficient.
Solution: Here, 𝑛 = 10
𝒙 𝒚 Rank in Rank in 𝒅=𝒙−𝒚 𝒅𝟐
Mathematics 𝒙 English 𝒚
8 84 10 3 7 49
36 51 7 8 -1 1
98 91 1 1 0 0
25 60 9 6 3 9
75 68 4 4 0 0
82 62 3 5 -2 4
92 86 2 2 0 0
62 58 6 7 -1 1
65 35 5 10 -5 25
35 49 8 9 -1 1
∑𝒅 = 𝟎 ∑𝒅𝟐 = 𝟗𝟎
• If there is a tie between two or more individuals ranks, the rank is divided among equal individuals. e.g. if
4+5
two items have fourth rank, the 4th and 5th rank is divided between them equally and is given as = 4.5𝑡ℎ
2
rank to each of them.
Exercise:
1. Two judges gave the following ranks to a series of eight one -act plays in a drama competition. Examine
the relationship between their judgments.
Judge A 8 7 6 3 2 1 5 4
Judge B 7 5 4 1 3 2 6 8
Ans. 0.62
2. Compute Spearman’s rank correlation coefficient from the following data:
𝑥 18 20 34 52 12
𝑦 39 23 35 18 46
Ans. – 0.9
3. Regression
• Regression is defined as a method of estimating the value of one variable when that of the other is known
and the variables are correlated.
• Regression analysis is used to predict or estimate one variable in terms of the other variable.
• It is useful in statistical estimation of demand curves, supply curves, production function, cost function etc.
➢ Simple regression: The regression analysis for studying only two variables at a time is known as simple
regression.
➢ Multiple regression: The regression analysis for studying more than two variables at a time is known as
multiple regression.
➢ Linear regression: If the regression curve is straight line, then the regression is said to be linear.
➢ Nonlinear regression: If the regression curve is not a straight line i.e. not a first-degree equation in the
variables 𝑥 and 𝑦, the regression is said to be nonlinear regression.
• If all the points in the scatter diagram cluster around a straight line, the line is called the line of regression.
• The line of regression is the line of best fit and is obtained by the principle of least squares.
𝜎𝑦
i) Regression coefficient of 𝑦 on 𝑥 = 𝑏𝑦𝑥 = 𝑟 𝜎
𝑥
𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
=
𝑛∑𝑥 2 − (∑𝑥)2
𝜎
Regression coefficient of 𝑥 on 𝑦 = 𝑏𝑥𝑦 = 𝑟 𝜎𝑥
𝑦
𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
=
𝑛∑𝑦 2 − (∑𝑦)2
𝜎𝑦
iii) Regression coefficient of 𝑦 on 𝑥 = 𝑏𝑦𝑥 = 𝑟 𝜎
𝑥
𝜎
Regression coefficient of 𝑥 on 𝑦 = 𝑏𝑥𝑦 = 𝑟 𝜎𝑥
𝑦
1 2 1
𝑟 = √𝑏𝑦𝑥 𝑏𝑥𝑦 = √(− ) (− ) =
6 3 3
Here, both 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are negative. So, 𝑟 is also negative.
1
Therefore, the coefficient of correlation is 𝑟 = − 3 .
Example 4: The number of bacterial cells (y) per unit volume in a culture at different hours (x) is given
below:
𝑥 0 1 2 3 4 5 6 7 8 9
𝑦 43 46 82 98 123 167 199 213 245 272
Fit lines of regression of 𝑦 on 𝑥 and 𝑥 on 𝑦. Also, estimate the number of bacterial cells after
15 hours.
Solution: Here, 𝑛 = 10
𝒙 𝒚 𝒙𝟐 𝒙𝒚 𝒚𝟐
0 43 0 0 1849
1 46 1 46 2116
2 82 4 164 6724
3 98 9 294 9604
4 123 16 492 15129
5 167 25 835 27889
6 199 36 1194 39601
7 213 49 1491 45369
8 245 64 1960 60025
9 272 81 2448 73984
∑𝒙 = 𝟒𝟓 ∑𝒚 = 𝟏𝟒𝟖𝟖 𝟐 ∑𝒙𝒚 = 𝟖𝟗𝟐𝟒 𝟐
∑𝒙 = 𝟐𝟖𝟓 ∑𝒚 = 𝟐𝟖𝟐𝟐𝟗𝟎
Here, 𝑛 = 10
𝒙 𝒚 𝒅𝒙 𝒅𝒚 𝒅𝟐𝒙 𝒅𝟐𝒚 𝒅𝒙 𝒅𝒚
25 18 -1 1 1 1 -1
22 15 -4 -2 16 4 8
28 0 2 3 4 9 6
26 17 0 0 0 0 0
35 22 9 5 81 25 45
20 14 -6 -3 36 9 18
22 16 -4 -1 16 1 4
40 21 14 4 196 16 56
20 15 -6 -2 36 4 12
18 14 -8 -3 64 9 24
∑𝒙 = 𝟐𝟓𝟔 ∑𝒚 = 𝟏𝟕𝟐 ∑𝒅𝒙 = −𝟒 ∑𝒅𝒚 = 𝟐 𝟐 𝟐 ∑𝒅𝒙 𝒅𝒚 = 𝟏𝟕𝟐
∑𝒅𝒙 = 𝟒𝟓𝟎 ∑𝒅𝒚 = 𝟕𝟖
and
𝑛∑𝑑𝑥 𝑑𝑦 − ∑𝑑𝑥 ∑𝑑𝑦
𝑏𝑦𝑥 = = 0.385
𝑛∑𝑑𝑥2 − (∑𝑑𝑥 )2
The regression line of 𝑦 on 𝑥 is
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
Exercise:
1. The following are the lines of regression 4𝑦 = 𝑥 + 38 and 9𝑦 = 𝑥 + 288. Estimate 𝑦 when 𝑥 = 99 and 𝑥
when 𝑦 = 30. Also, find the means of 𝑥 and 𝑦. Ans. 𝒚 = 𝟒𝟑, 𝒙 = 𝟖𝟐, 𝒙 ̅ = 𝟏𝟔𝟐, 𝒚 ̅ = 𝟓𝟎
2. In partially destroyed laboratory record of analysis of correlation data the following results are legible.
Variance= 9, the equations of the lines of regression 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 = 0. Find (i) the
mean values of 𝑥 and 𝑦 (ii) the standard deviation of 𝑦 and (iii) the coefficient of correlation between 𝑥 and
̅ = 𝟏𝟑, 𝒚
𝑦. Ans. (i) 𝒙 ̅ = 𝟏𝟕 (ii) 𝝈𝒚 = 𝟒 (iii) 𝒓 = 𝟎. 𝟔
3. Find the likely production corresponding to a rainfall of 40 cm from the following data:
Rainfall (in cm) Output (in quintals)
Mean 30 50
SD 5 10
𝑟 = 0.8
Ans. 66 quintals
References:
[1] P.G. Hoel, S.C. Port and C. J. Stone, Introduction to Probability Theory, Universal Book Stall
[2] S. Ross. A First Course in Probability, 6th Ed., Pearson Education India.
[3] W. Feller, An Introduction to Probability Theory and its Applications, Vol.1, Wiley.
[4] D.C. Montogomery and G.C. Runger, Applied Statistics and Probability for Engineers, Wiley
[5] J.L.Devore, Probability and Statistics for Engineering and Sciences, Cengage Learning.
[6] R. R. Singh and M. Bhatt, Probability and Statistics, Mc Graw Hill