Business Statistics by Gupta 365 379
Business Statistics by Gupta 365 379
x=x
O X O X O X O X O X
Fig. 9·3(a ). Fig. 9·3(b). Fig. 9·3(c ). Fig. 9·3 (d). Fig. 9·3(e ).
Remark. The sign to be taken before the square root is same as that of regression coefficients. If the
regression coefficients are positive, we take positive sign in (9·36) and if regression coefficients are
negative, we take negative sign in (9·36).
Theorem 9·2. If one of the regression coefficients is greater than unity (one), the other must be less
than unity.
Proof. If one of the regression coefficients is greater than 1, then the other must be less than one
because otherwise, on using (9·33), we shall get :
r2 = byx . bxy > 1,
which is impossible, since 0 ≤ r2 ≤ 1.
Theorem 9·3. The arithmetic mean of the modulus value of the regression coefficients is greater than
the modulus value of the correlation coefficient
i.e.,
1
2 [ byx + bxy ]> r …(9.37)
Theorem 9·4. Regression coefficients are independent of change of origin but not of scale.
Symbolically, if we transform from x and y to new variables u and v by change of origin and scale,
viz.,
x – a , v y – b,
u= = where a, b, h (>0) and k(> 0) are constants, …(9·38)
h k
Then byx = k bvu and bxy = h · buv …(9·39)
h k
In particular if we take h = k = 1, i.e., we transform the variables x and y to u and v by the relation :
u=x–a and v=y–b …(9·40)
i.e., by change of origin only, then from (9·39), we get
n∑ uv – (∑ u)(∑ v) n∑ uv – (∑ u)(∑ v)
bxy = buv = …(9·40a) and byx = bvu = …(9·40b)
n∑ v2 – (∑ v)2 n∑ u2 – (∑ u)2
These formulae are very useful for obtaining the equations of the lines of regression if the mean values
–x and / or –y come out to be in fractions or if the values of x and y are large.
Example 9·1. From the following data, obtain the two regression equations :
Sales : 91 97 108 121 67 124 51 73 111 57
Purchases : 71 75 69 97 70 91 39 61 80 47
Solution. Let us denote the sales by the variable X and the purchases by the variable Y.
CALCULATIONS FOR REGRESSION EQUATIONS
– –
x y dx = x – x dy = y – y dx 2 dy 2 dxdy
91 71 1 1 1 1 1
97 75 7 5 49 25 35
108 69 18 –1 324 1 –18
121 97 31 27 961 729 837
67 70 – 23 0 529 0 0
124 91 34 21 1156 441 714
51 39 –39 –31 1521 961 1209
73 61 –17 –9 289 81 153
111 80 21 10 441 100 210
57 47 –33 –23 1089 529 759
∑ x = 900 ∑ y = 700 ∑ dx = 0 ∑ dy = 0 ∑ dx = 6360 ∑ dy = 2868 ∑ dx dy = 3900
2 2
–x = ∑x = 900 = 90 ; –y = ∑y = 700 = 70
We have 10 and 10
n n
∑(x – –x ) (y ––y ) ∑ dx dy
byx = = = 3900
6360 = 0·6132
–
∑(x – x ) 2 ∑dx 2
LINEAR REGRESSION ANALYSIS 9·9
∑ (x – –x ) (y ––y ) ∑ dx dy 3900
bxy = = = = 1·361
∑(y – –y )2 ∑dy 2 2868
Regression Equations
Equation of line of regression of y on x is Equation of line of regression of x on y is
y – –y = b (x – –x )
yx x – –x = b (y – –y )
xy
∑(x – –x ) (y ––y ) ∑ dx dy – 93
Coefficient of regression of x on y = bxy = = = 398 = – 0·2337
–
∑(y – y ) 2 ∑ dy2
–x = ∑x = 180 = 20 ; –y = ∑y = 360 = 40
Then 9 9
n n
byx = Coefficient of regression of y on x bxy = Coefficient of regression of x on y
∑ dxdy 193 ∑ dxdy 193
= = = 1·6083 = = = 0·5578
∑ dx2 120 ∑ dy2 346
Karl Pearson’s correlation coefficient r between x and y is given by :
Thus, we see that there is a very high degree of positive correlation between the test scores (x) and the
sales (’000 Rs.) (y). This justifies the proposal for the termination of service of those with low test scores.
Regression Equations
To obtain the test sclore (x) for given sales To estimate the sales volume (y) of a salesman
(y), we use the equation of the line of regression of with given test score (x ), we use the line of
x on y. regression of y on x, which is given by :
The equation of line of regression of x on y is : y – –y = byx (x – –x )
– –
x – x = bxy (y – y ) ⇒ y – 40 = 1·6083 (x – 20)
⇒ x – 20 = 0·5578 (y – 40) = 0·5578y – 22·312 = 1·6083x – 32·1660
⇒ x = 0·5578y – 22·312 + 20 ⇒ y = 1·6083x – 32·1660 + 40
⇒ x = 0·5578y – 2·312 …(*) ⇒ y = 1·6083x + 7·8340
Hence to ensure the continuation of service, Hence the estimated sales volume of a
the minimum test score (x) corresponding to a salesman with test score of 28 is (in ’000 Rs.)
minimum sales volume (y) of Rs. 30,000 = 30 y = 1·6083 × 28 + 7·8340
(’000 Rs.) is obtained on putting y = 30 in (*) and
= 45·0324 + 7·8340
is given by :
= 52·8664 (’000 Rs.)
x = 0·5578 × 30 – 2·312 = 16·734 – 2·312
= Rs. 52,866.40
= 14·422 ~– 14
Example 9·5. The data about the sales and advertisement expenditure of a firm is given below :
Sales Advertisement expenditure
(in crores of Rs.) (in crores of Rs.)
Means 40 6
Standard deviations 10 1·5
Coefficient of correlation = r = 0·9
(i) Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores.
(ii) What should be the advertisement expenditure if the firm proposes a sales target of 60 crores of
rupees ?
Solution. Let the variable x denote the sales (in crores of Rs.) and the variable y denote the
advertisement expenditure (in crores of Rs.). Then, in usual notations, we are given :
–x = 40, –y = 6,
σ = 10 ;
x σ = 1·5, y r = r = 0·9 xy
(i) To estimate the likely sales (x) for given advertisement expenditure (y), we need the regression
equation of x on y which is given by :
r σx r σx 0·9 × 10
x – –x = σy (y – –y ) ⇒ x = σy (y – –y ) + –x ⇒ x = 1·5 (y – 6) + 40 = 6(y – 6) + 40 …(*)
Hence the estimated sales (x) for a proposed advertisement expenditure (y) of Rs. 10 crores are
obtained on putting y = 10 in (*) and are given by :
x = 6(10 – 6) + 40 = 6 × 4 + 40 = 64 crores of Rs.
(ii) To estimate the advertisement expenditure (y) for proposed sales (x), we need the equation of line
of regression of y on x which is given by :
r σy r σy × 1·5
y – –y = (x – –x ) ⇒ y = (x – –x ) + –y ⇒ y = 0·910 (x – 40) + 6 = 0·135 (x – 40) + 6 …(**)
σx σx
Hence the likely advertisement expenditure (y) of the firm for proposed sales target (x) of 60 crores of
Rs. is obtained on taking x = 60 in (**) and is given by :
LINEAR REGRESSION ANALYSIS 9·13
y = 0·135 (60 – 40) + 6 = 0·135 × 20 + 6 = 2·7 + 6 = 8·7 crores of Rs.
Example 9·6. Point out the inconsistency, if any, in the following statement.
“The regression equation of y on x is 2y + 3x = 4 and the correlation coefficient between x
and y is 0·8”. [I.C.W.A. (Intermediate), Dec. 1998]
Solution. Line of regression of y on x is :
2y + 3x = 4 ⇒ y = – 32 x + 2
3
∴ byx = Coefficient of regression of y on x = – 2 ·
Also rxy = 0·8 (Given).
Since byx and rxy have different signs, the given statement is wrong (inconsistent).
Remark. The sign of the correlation coefficient (rxy) and the regression coefficients byx and bxy must be
same, each depending on the sign of the covariance term Cov (x, y).
Example 9·7. The following is an estimated supply regression for sugar :
Y = 0·025 + 1·5X
where Y is supply in kilos and X is price (Rs.) per kilo.
(i) Interpret the coefficient of variable X.
(ii) Predict the supply when price is Rs. 20 per kilo.
(iii) Given that r(x, y) = 1 in the above case, interpret the implied relationship between price and
quantity supplied. [Delhi Univ. B.A. (Econ. Hons.), 1998]
Solution. The regression equation of Y (supply in kgs.) on X (price in Rupees per kg.) is given to be :
Y = 0·025 + 1·5 X = a + bX, (say) …(*)
(i) The coefficient of the variable X viz., b = 1·5, is the coefficient of regression of Y on X. It
reflects the unit change in the value of Y, for a unit change in the corresponding value of X.
This means that if the price of sugar goes up by Re. 1 per kg., the estimated supply of sugar
goes up by 1·5 kg.
(ii) From (*), the estimated supply of sugar when its price is Rs. 20 per kg. is given by :
^
Y = 0·025 + 1·5 × 20 = 30·025 kg.
(iii) r (X, Y) = 1, implies that the relationship between X and Y is exactly linear. This means that all
the observed values (X, Y) lie on a straight line.
Example 9·8. (a) The coefficient of regression of Y on X is bYX = 1·2. If
X – 100 Y – 200
U= and V = ; find bVU. [Delhi Univ. B.A. (Econ. Hons.), 1998]
2 3
(b) The covariance between X and Y is 900 and the standard deviations of X and Y are 15 and 80
respectively.
20 – X 50 + Y
If two variables S and T are defined as : S = and T = ,
5 8
find the slope coefficients of the regressions of : (i) S on T and (ii) T on S.
[Delhi Univ. B.A. (Econ. Hons.), 2005]
k 3
Solution. (a) Using formula (9· 39), we get : bYX = · bVU = 2 bVU ; (h = 2, k = 3)
h
2
⇒ bVU = 23 bYX = × 1·2 = 0·8
3
(b) We are given : Cov (X, Y) = 900 ; σX = 15 ; σY = 80 …(1 )
9·14 BUSINESS STATISTICS
[ ] ( )
2
20 – X 1 1 1
S = ⇒ Var (S) = Var – (X – 20) = – Var (X) = × 152 = 9 [From (1)]
5 5 5 25
[ ] ()
2
50 + Y 50 + Y 1 802
and T = ⇒ Var (T) = Var = Var (Y) = = 100 [From (1)]
8 8 8 64
[Q Var (ax) = a2 Var (X) and V (X ± A) = Var (X)]
1 1 900 45
= Cov (– X, Y) = – Cov (X, Y) = – =– [From (1)]
40 40 40 2
The slopes coefficients of regression of S on T and Ton S are given respectively by
Cov (S‚ T) – 45/2 9 Cov (S‚ T) – 45/2 5
(i) bST = = =– and (ii) bTS = = =–
Var (T) 100 40 Var (S) 9 2
Example 9·9. By using the following data, find out the two lines of regression and from them compute
the Karl Pearson’s coefficient of correlation.
∑X = 250 ; ∑Y = 300 ; ∑XY = 7,900 ; ∑X2 = 6,500 ; ∑Y 2 = 10,000 ; and N = 10.
Solution. We have :
— ∑X 250
= 10 = 25 ; Y = ∑Y = 300
—
X= 10 = 30
N N
N ∑XY – (∑X) (∑Y)
bYX = Coefficient of regression of Y on X =
N ∑X2 – (∑X)2
= 10 × 7900 – 250 × 300 79000 – 75000 4000
= 65000 – 62500 = 2500 = 1·6
10 × 6500 – (250) 2
Regression Equations
Regression equation of Y on X Regression equation of X on Y
— — — —
Y – Y = bYX (X – X ) X – X = bXY (Y – Y )
⇒ Y – 30 = 1·6 (X – 25) ⇒ X – 25 = 0·4 (Y – 30)
⇒ Y = 1·6X – 40 + 30 ⇒ X = 0·4 Y – 12 + 25
⇒ Y = 1·6X – 10 ⇒ X = 0·4 Y + 13
Example 9·10. In the estimation of regression equations of two variables X and Y the following results
were obtained :
∑X = 900, ∑Y = 700, n = 10 ; ∑ x2 = 6360, ∑ y2 = 2860, ∑ xy = 3900,
where x and y are deviations from respective means. Obtain the two regression equations.
[Delhi Univ. B.Com (Hons.), 2008]
Solution. The coefficients of regression of Y on X, and X on y are given respectively by :
LINEAR REGRESSION ANALYSIS 9·15
— —
Cov (X‚ Y) ∑(X – X ) (Y – Y ) ∑ xy 3900
bYX = = — = = = 0·6132
σx2 ∑(X – X ) 2 ∑ x2 6360
— —
Cov (X‚ Y) ∑(X – X ) (Y – Y ) ∑ xy 3900
bXY = = — = = = 1·3636
σy2 ∑(Y – Y )2 ∑ y2 2860
— ∑X 900 ∑Y 700
—
X= = = 90 , Y= = = 70
n 10 n 10
Regression Equations
Regression equation of Y on X : Regression equation of X on Y :
— – — —
Y – Y = bYX (X – X ) X – X = bXY (Y – Y )
⇒ Y – 70 = 0·6132 (X – 90) ⇒ X – 90 = 1·3636 (Y – 70)
⇒ Y = 0·6132X – 55·188 + 70 ⇒ X = 1·3636Y – 95·452 + 70
⇒ Y = 0·6132X + 14·812 ⇒ X = 1·3636Y – 5·452
Example 9·11. For a set of 10 pairs of values of x and y, the regression line of x on y is x – 2y + 12 =
0; mean and standard deviation of y being 8 and 2 respectively. Later it is known that a pair (x = 3, y = 8)
was wrongly recorded and the correct pair detected is (x = 8, y = 3). Find the correct regression line of x
on y. [I.C.W.A. (Intermediate), June 1998]
–
Solution. In the usual notations we are given : n = 10, y = 8, σ = 2 y … (*)
The equation of the line of regression of x on y is : x – 2y + 12 = 0 (Given). Since the lines of
regression pass through the point ( –x , –y ), we get
–x – 2y– + 12 = 0 –x = 2y– – 12 = 2 × 8 – 12 = 4
⇒ [Using (*)]
Also x – 2y + 12 = 0 ⇒ x = 2y – 12 ⇒ bxy = 2
Cov (x‚ y)
∴ =2 ⇒ Cov (x, y) = 2 × 22 = 8 [From (*)]
σy2
∑xy – –
⇒ –x y=8 ⇒ ∑xy = 10 (8 + 4 × 8) = 10 × 40 = 400
n
∑y2 – 2
σy = 2 ⇒ σy2 = –y =4 ⇒ ∑y 2 = 10 (4 + 82 ) = 680
n
–x = 4, –y = 8,
∴ We have ∑y 2 = 680 , ∑xy = 400
Wrong pair = (x = 3, y = 8) ; Correct pair = (x = 8, y = 3)
Corrected Values. [Suffix c stands for corrected values]
– –
–x = nx – 3 + 8 = 10 × 4 + 5 = 9 –y = ny – 8 + 3 = 10 × 8 – 5 = 15
c 10 2 ; c 10 2
n n
(∑y2 )c = ∑y2 – 82 + 3 2 = 680 – 64 + 9 = 625 ; (∑ xy)c = ∑ xy – 3 × 8 + 8 × 3 = 400 – 24 + 24 = 400
(∑y2 )c
– [(y–)c]2 = 10 – 4 =
625 225 1250 – 1125 25
(σy2)c = 20 = 4
n
(∑xy) c
– ( –xc ) × ( –y c ) = 10 – 2 × 2 = 40 – 4 = 4
400 9 15 135 25
[Cov (xy)]c =
n
[Cov (x‚ y)]c 25/4
∴ (bxy)c = = 25/4 = 1.
(σy2)c
Corrected line of regression of x on y becomes :
9·16 BUSINESS STATISTICS
x – –xc = (bxy)c (y – –y c) ⇒
9
x–2=1 y– ( 15
2 ) ⇒ x = y – 3.
EXERCISE 9·1
1. (a) Explain the concept of regression and point out its usefulness in dealing with business problems.
(b) What is a scatter diagram ? Indicate by means of suitable scatter diagrams different types of correlation that
may exist between the variables in bivariate data. What are regression lines ? Write down the main points of distinction
between correlation analysis and regression analysis.
2. Distinguish between correlation and regression analysis and indicate the utility of regression analysis in
economic activities. [C.A. (Foundation), Nov. 1996]
3. (a) What is regression analysis ? How does it differ from correlation ? Why there are, in general, two regression
equations ?
(b) Comment on the following :
“Regression equations are irreversible”. [Delhi Univ. B.Com. (Hons.), 2002]
4. Given a scatter diagram of bivariate data involving variables X and Y. Find the conditions of minimisation of
∑(Yi – Ye)2 and hence derive normal equations for the linear regression of Y upon X. What sum is to be minimised when
X is regressed upon Y and what are the normal equations in this case ?
5. Derive the normal equations for the regression of Y on X for a data comprising of n pairs of values of X and Y.
Show that the mean of the error terms is zero. [Delhi Univ. B.A. (Econ. Hons.), 2005]
Hint. Y = a + bX …… (i) (Regression equation of Y on X)
Normal equations are :
∑Y = na + b∑X …(ii) and ∑XY = a∑X + b∑X2 …(iii)
Mean of error terms is given by :
n ∧ 1 n
–e = 1 ∑ (Y – Y ) = ∑ (Yi – a – bXi ) [From (i)]
n i=1 i i n i=1
1
= [∑Yi – na – b∑Xi] = 0. [From (ii)]
n
6. What is linear regression ? Why are there, in general, two regression lines ? When do they coincide ? Explain
the use of regression equations in economic enquiry.
7. (a) It is said that regression equations are irreversible meaning thereby that you cannot find out the regression
equation of x on y from that of y on x. Justify the comment with special reference to the principle of least squares.
(b) Explain the term ‘Regression’. Why do we take, in general, two regression lines ? When are the regression
lines (i) perpendicular to each other and (ii) coincident ?
8. What are regression lines ? Why is it necessary to consider two lines of regression ? In case the two lines are
identical, prove that the correlation coefficient is +1 or –1. If the two variables are independent, show that the two
regression lines are perpendicular.
9. What is the angle between the two lines of regression ? Discuss the nature of the lines for the following
particular cases :
(i) r = ± 1. (ii) r = 0.
10. What is the difference between correlation and regression coefficients ? Can correlation coefficient be
computed out of regression coefficients ? If yes, how ?
11. (a) Define regression coefficients. What information do they supply ?
(b) Let byx and bxy stand for the coefficients of regression of Y on X and X on Y respectively. Show that :
⎯√⎯⎯⎯
r xy = b xy × byx [Delhi Univ. B.A. (Econ. Hons.), 1997]
12. Given the following values of x and y :
x : 3 5 6 8 9 11
y : 2 3 4 6 5 8
find the equation of regression of
(i) y on x and (ii) x on y.
Interpret the results.
Ans. y = 0·7143x – 0·3334 ; x = 1·2857y + 1·0001.
LINEAR REGRESSION ANALYSIS 9·17
13. Obtain the equations of the two lines of regression for the data given below :
X : 1 2 3 4 5 6 7 8 9
Y : 9 8 10 12 11 13 14 16 15
Ans. Y = 0·95X + 7·25 ; X = 0·95Y + 7·25.
14. From the following data of the age of husband and the age of wife, form two regression lines and calculate the
husband’s age when the wife’s age is 16.
Husband’s age : 36 23 27 28 28 29 30 31 33 35
Wife’s age : 29 18 20 22 27 21 29 27 29 28
Ans. Husband’s age : x ; Wife’s age : y
y = 0·95x – 3·5 ; x = 0·8y + 10 ; (x)y = 16 = 22·8.
15. Find the regression equation of y on x where y and x are the marks obtained by 10 students as given below :
y : 20 60 55 45 75 35 25 90 10 50
x : 20 45 65 40 55 35 15 80 25 50
[C.A. (Foundation), May 2002]
Ans. byx = 1·105 ; y = 1·105x – 1·015.
16. The following data give the experience of machine operators and their performance ratings as given by the
number of good parts turned out per 100 pieces :
Operator : 1 2 3 4 5 6 7 8
Experience (in years) (X) : 16 12 18 4 3 10 5 12
Performance Ratings (Y) : 87 88 89 68 78 80 75 83
Calculate the regression line of performance ratings on experience and estimate the probable performance if an
operator has 7 years experience. [Himachal Pradesh Univ. B.Com., 1996]
Ans. Y = 69·67 + 1·133 X ; 77·601.
17. You are given the data relating to purchases and sales. Obtain the two regression equations by the method of
least squares and estimate the likely sales when the purchases equal 100.
Purchases : 62 72 98 76 81 56 76 92 88 49
Sales : 112 124 131 117 132 96 120 136 97 85
Ans. Purchase : x ; Sales : y ; x = 0·6515y + 0·0775
y = 0·7825y + 56·3125 ; 134·5625.
18. The height of fathers and sons is given in the following table. Find the two lines of regression and estimate the
expected average height of the son when the height of the father is 67·5 inches.
Height of father (in inches) : 65 66 67 67 68 69 71 73
Height of son (in inches) : 67 68 64 68 72 70 69 70
Ans. y = 0·4242x + 39·5484 ; x = 0·525y + 32·2875; 68·18 inches.
19. The following table gives the ages and blood pressure of 10 women.
Age (X) : 56 42 36 47 49 42 60 72 63 55
Blood Pressure (Y) : 147 125 118 128 145 140 155 160 149 150
(i) Find the correlation coefficient between X and Y.
(ii) Determine the least square regression equation of Y on X.
(iii) Estimate the blood pressure of a woman whose age is 45 years.
Ans. (i) r = 0·89, (ii) Y = 83·758 + 1·11X, (iii) When X = 45, Y = 134.
20. A panel of two judges P and Q graded seven dramatic performances by independently awarding marks as
follows :
Performance : 1 2 3 4 5 6 7
Marks by P : 46 42 44 40 43 41 45
Marks by Q : 40 38 36 35 39 37 41
The eighth performance, which Judge Q could not attend, was awarded 37 marks by Judge P. If Judge Q had also
been present, how many marks would be expected to have been awarded by him to the eighth performance ?
Ans. 33·5 –~ 34 .
21. The following table gives the normal weight of a baby during the first six months of life :
9·18 BUSINESS STATISTICS
Age in months : 0 2 3 5 6
Weight in lbs. : 5 7 8 10 12
Estimate the weight of a baby at the age of 4 months.
Ans. 9·2982 lbs.
22. You are given the following data :
x y
Arithmetic Mean 36 85
Standard Deviation 11 8
Correlation coefficient between x and y = 0·66
(i) Find two regression equations. (ii) Estimate value of x when y = 75.
Ans. (i) y = 0·48x + 67·72 ; x = 0·9075y – 41·1375, (ii) 26·925.
23. Given the information : Sum of X = 5 ; Sum of Y = 4
Sum of squares of deviations from the mean of X = 40 ; Sum of squares of deviations from the mean of Y = 50
Sum of the products of deviations from the means of X and Y = 32; Number of pairs of observations = 10
Calculate :
(i) regression coefficient of Y on X ; (ii) regression coefficient of X on Y ;
(iii) Karl Pearson’s coefficient of correlation. [Delhi Univ. B.A. (Econ. Hons.), 1999]
Ans. bYX = 0·80 ; bXY = 0·64 ; r (X, Y) = 0·7156.
24. For some bi-variate data, the following results were obtained :
Mean value of variable X = 53·2 and of Y = 39·5.
Regression Coefficient of Y and X = – 1·5 and of X on Y = – 0·38·
What should be the most likely value of X when Y = 50?
Also find the coefficient of correlation between two variables. [Delhi Univ. B.Com. (Hons.), 2005]
∧
Ans. X = 53·2 + (– 1·5) (50 – 39·5) = 49·21 ; r = – √ ⎯⎯⎯⎯⎯⎯⎯⎯⎯
(– 1·5) (– 0·38) = – ⎯
√⎯·57 = – 0·7549
25. For a particular product, the sales (y) and the advertisement expenditure (x) for 10 years, provide the results
∑ x = 15, ∑ y = 110, ∑ xy = 400, ∑ x2 = 250, ∑ y2 = 3200.
Find the regression line of y on x and the estimated value of y for x = 10. [I.C.W.A (Intermediate), Dec. 2001]
^
Ans. y = 1·033x + 9·4505 ; (y)x = 10 = 19·781.
26. Calculate the correlation coefficient from the following results :
N = 10, ∑X = 350, ∑Y = 310 , ∑(X – 35)2 = 162, ∑(Y – 31) 2 = 222, ∑(X – 35) (Y – 31) = 92.
Also find the regression line of Y on X. [Delhi Univ. B.A. (Econ. Hons.), 2007]
— — –2
Hint. X = 35, Y = 31 ⇒ ∑ (x – 35) = ∑ (x – x ) = 162 and so on.
2
— —
9·5. TO FIND THE MEAN VALUES ( x , Y ) FROM THE TWO LINES OF
REGRESSION
Let us suppose that the two lines of regression are :
a1 x + b1 y + c 1 = 0 …(9·41)
and a2 x + b2 y + c 2 = 0 …(9·42)
We have already discussed that both the lines of regression pass through the point ( x– , –y ). In other
words, ( –x , –y ) is the point of intersection of the two lines of regression. Hence, solving (9·41) and (9·42)
simultaneously, we get
–x –y
1 –x = b1 c2 – b2 c1 ‚ –y = c1a2 – c2a1
= = ⇒ …(9·43)
b1 c2 – b2 c1 c1a2 – c2a1 a1 b2 – a2 b1 a1 b2 – a 2 b1 a1 b2 – a 2 b1
Let (9·41) and (9·42) be the given lines of regression and let us suppose that (9·41) is the line of
regression of y on x and (9·42) is the line of regression of x on y. To obtain byx, the coefficient of regression
of y on x, write the regression equation of y on x in the form y = a + bx. Then b, the coefficient of x gives
the value of byx. Similarly to obtain bxy, write the equation of regression of x on y in the form x = A + By.
Then B, the coefficient of y gives bxy. Therefore, re-writing (9·41), we get the regression equation of y on x :
a1 c a1
y =– x– 1 ⇒ byx = – …(9·44)
b1 b1 b1
Similarly re-writing (9·42), we get regression equation of x on y as :
b c b
x =– 2 y– 2 ⇒ bxy = – 2 …(9·45)
a2 a2 a2
The correlation coefficient r between x and y can now be obtained by using the formula
( ba ) × ( – ab ) = aa bb
r2 = byx . bxy = – 1
1
2
2
1 2
2 1
⇒ r =±
a1b2 ,
a2 b1
…(9·46)
the sign to be taken before the square root is same as that of the regression coefficients. If regression
coefficients are positive, we take positive sign and if they are negative, we take negative sign in (9·46).
Remark. Given the two lines of regression (9·41) and (9·42) how to determine which is the line of
regression of y on x and which is the line of regression of x on y ? Incidentally, the above discussion
enables us to answer this question. By supposing (9·41) and (9·42) to be equations of the lines of regression