0% found this document useful (0 votes)
431 views15 pages

Business Statistics by Gupta 365 379

1) The document discusses the relationship between the correlation coefficient (r) and regression coefficients (byx and bxy) for two variables x and y. 2) It states that the correlation coefficient r is equal to the geometric mean of the two regression coefficients. 3) The sign of the correlation coefficient r is the same as the sign of the regression coefficients - if the regression coefficients are positive, r is positive, and if the regression coefficients are negative, r is negative.

Uploaded by

Kowsalya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
431 views15 pages

Business Statistics by Gupta 365 379

1) The document discusses the relationship between the correlation coefficient (r) and regression coefficients (byx and bxy) for two variables x and y. 2) It states that the correlation coefficient r is equal to the geometric mean of the two regression coefficients. 3) The sign of the correlation coefficient r is the same as the sign of the regression coefficients - if the regression coefficients are positive, r is positive, and if the regression coefficients are negative, r is negative.

Uploaded by

Kowsalya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

9·6 BUSINESS STATISTICS

If r = 0, then from (9·28), θ = tan–1 (∞) = π/2,


i.e., if the variables are uncorrelated, the two lines of regression become perpendicular to each other.
Remarks 1. When r = 0 i.e., when x and y are uncorrelated, then the lines of regression of y on x, and x
on y are given respectively by [From (9·13) and (9·24)],
y – –y = 0 ⇒ y = –y and x – –x = 0 ⇒ x = –x
Y
y = –y , represents a line parallel to X-axis at a distance of –y units y=y (x, y)

from the origin and x = –x , represents a line parallel to Y-axis at a
distance of –x units from the origin. x=x
Hence, if r = 0, the two lines of regression are perpendicular to
each other and are parallel to x-axis and y-axis respectively, as O
shown in Fig 9·2(a). X
Fig. 9·2(a ).
2. We have seen above that if r = 0 (variables uncorrelated), the two lines of regression are
perpendicular to each other and if r = ± 1, θ = 0, i.e., the two lines coincide. This leads us to the conclusion
that for higher degree of correlation between the variables, the angle between the lines is smaller, i.e., the
two lines of regression are nearer to each other. On the other hand, the angle between the lines increases,
i.e., the lines of regression move apart as the value of correlation coefficient decreases. In other words, if
the lines of regression make a larger angle, they indicate a poor degree of correlation between the variables
and ultimately for θ = π/2, i.e., the lines becoming perpendicular if no correlation exists between the
variables. Thus by plotting the lines of regression on a graph paper, we can have an approximate idea about
the degree of correlation between the two variables under study. Some illustrations are given below in
Fig. 9·3(a) to Fig. 9·3(e).
TWO LINES TWO LINES TWO LINES TWO LINES TWO LINES
COINCIDE COINCIDE PERPENDICULAR APART (LOW CLOSER (HIGH
DEGREE OF DEGREE OF
(r = –1) (r = 1) (r = 0) CORRELATION) CORRELATION)
Y Y Y Y Y
y=y (x, y)

x=x

O X O X O X O X O X
Fig. 9·3(a ). Fig. 9·3(b). Fig. 9·3(c ). Fig. 9·3 (d). Fig. 9·3(e ).

9·4. COEFFICIENTS OF REGRESSION


Let us consider the line of regression of y on x, viz.,
y = a + bx
The coefficient ‘b’ which is the slope of the line of regression of y on x is called the coefficient of
regression of y on x. It represents the increment in the value of the dependent variable y for a unit change
in the value of the independent variable x. In other words, it represents the rate of change of y w.r.t. x. For
notational convenience, the slope b, i.e., coefficient of regression of y on x is written as byx.
Similarly in the regression equation of x on y, viz.,
x = A + By,
the coefficient B represents the change in the value of dependent variable x for a unit change in the value of
independent variable y and is called the coefficient of regression of x on y. For notational convenience, it is
written as bxy.
Notations
byx = Coefficient of regression of y on x.
bxy = Coefficient of regression of x on y.
LINEAR REGRESSION ANALYSIS 9·7
From (9·10), the coefficient of regression of y on x is given by
Cov (x‚ y) r σy
byx = = [·. · Cov(x, y) = r σx σy.] …(9·26)
σ2 x σ x
Similarly from (9·21), the coefficient of regression of x on y is given by :
Cov (x‚ y) r σx
bxy = = …(9·27)
σy2 σy
Accordingly, the equation of the line of regression of y on x becomes
y – –y = b ( x – –x ),
yx …(9·28)
and the equation of the line of regression of x on y becomes :
x – –x = b ( y – –y )
xy …(9·29)
Remarks 1. For numerical computations of the equations of line of regression of y on x, and x on y, the
following formulae for the regression coefficients byx and bxy are very convenient to use.
Cov (x, y) ∑ (x – –x ) (y ––y ) n∑ xy – (∑ x )(∑ y )
byx = = = …(9·30)
σx2 ∑(x – –x ) 2 n∑ x2 – (∑ x)2

Cov (x‚ y) ∑ (x – –x ) (y ––y ) n∑ xy – (∑ x )(∑ y )


and bxy = = = …(9·31)
σy2 ∑ (y – –y ) 2 n∑y 2 – (∑y)2
Formulae (9·30) and (9·31) are very useful for computing the values of regression coefficients from
given set of n points (x1, y1), (x2, y2), …, (xn, yn).
Other convenient formulae to be used for finding the regression coefficients for numerical problems
are :
r σy r σx
byx = and bxy = … (9·32)
σx σy
2. Correlation coefficient between two variables x and y is a symmetrical function between x and y, i.e.,
rxy = ryx. However, the regression coefficients are not symmetric functions of x and y, i.e., byx ≠ bxy.
3. We have :
Cov (x‚ y) Cov (x‚ y) Cov (x‚ y)
byx = …(*), bxy = …(**), rxy = …(***)
σx2 σy2 σ x σy
From (*) and (**), we observe that the sign of each regression coefficient byx and bxy depends on the
covariance term, since σ x > 0 and σ y > 0. If Cov (x, y) is positive, both the regression coefficients are
positive and if Cov (x, y) is negative, both the regression coefficients are negative.
4. Further, since σx > 0 and σy > 0, the sign of each of r, byx and bxy depends on the covariance term. If
Cov (x, y) is positive, all the three are positive and if Cov (x, y) is negative, all the three are negative. This
result can be stated slightly differently as follows :
The sign of correlation coefficient is same as that of the regression coefficients. If regression
coefficients are positive, r is positive and if regression coefficients are negative, r is negative.
9·4·1. Theorems on Regression Coefficients
Theorem 9·1. The correlation coefficient is the geometric mean between the regression coefficients
i.e.,
r2 = byx . bxy … (9·33)
Cov (x‚ y) σy Cov (x‚ y) σx
Proof. We have, byx = =r· …(9·34) and bxy = = r· …(9·35)
σx 2 σx σy 2 σy
Multiplying (9·34) and (9·35), we get r2 = byx . bxy ⇒ r =±⎯
√⎯⎯⎯⎯
byx . bxy …(9·36)
which establishes the result.
9·8 BUSINESS STATISTICS

Remark. The sign to be taken before the square root is same as that of regression coefficients. If the
regression coefficients are positive, we take positive sign in (9·36) and if regression coefficients are
negative, we take negative sign in (9·36).
Theorem 9·2. If one of the regression coefficients is greater than unity (one), the other must be less
than unity.
Proof. If one of the regression coefficients is greater than 1, then the other must be less than one
because otherwise, on using (9·33), we shall get :
r2 = byx . bxy > 1,
which is impossible, since 0 ≤ r2 ≤ 1.
Theorem 9·3. The arithmetic mean of the modulus value of the regression coefficients is greater than
the modulus value of the correlation coefficient
i.e.,
1
2 [ byx + bxy ]> r …(9.37)
Theorem 9·4. Regression coefficients are independent of change of origin but not of scale.
Symbolically, if we transform from x and y to new variables u and v by change of origin and scale,
viz.,
x – a , v y – b,
u= = where a, b, h (>0) and k(> 0) are constants, …(9·38)
h k
Then byx = k bvu and bxy = h · buv …(9·39)
h k
In particular if we take h = k = 1, i.e., we transform the variables x and y to u and v by the relation :
u=x–a and v=y–b …(9·40)
i.e., by change of origin only, then from (9·39), we get
n∑ uv – (∑ u)(∑ v) n∑ uv – (∑ u)(∑ v)
bxy = buv = …(9·40a) and byx = bvu = …(9·40b)
n∑ v2 – (∑ v)2 n∑ u2 – (∑ u)2
These formulae are very useful for obtaining the equations of the lines of regression if the mean values
–x and / or –y come out to be in fractions or if the values of x and y are large.
Example 9·1. From the following data, obtain the two regression equations :
Sales : 91 97 108 121 67 124 51 73 111 57
Purchases : 71 75 69 97 70 91 39 61 80 47
Solution. Let us denote the sales by the variable X and the purchases by the variable Y.
CALCULATIONS FOR REGRESSION EQUATIONS
– –
x y dx = x – x dy = y – y dx 2 dy 2 dxdy
91 71 1 1 1 1 1
97 75 7 5 49 25 35
108 69 18 –1 324 1 –18
121 97 31 27 961 729 837
67 70 – 23 0 529 0 0
124 91 34 21 1156 441 714
51 39 –39 –31 1521 961 1209
73 61 –17 –9 289 81 153
111 80 21 10 441 100 210
57 47 –33 –23 1089 529 759
∑ x = 900 ∑ y = 700 ∑ dx = 0 ∑ dy = 0 ∑ dx = 6360 ∑ dy = 2868 ∑ dx dy = 3900
2 2

–x = ∑x = 900 = 90 ; –y = ∑y = 700 = 70
We have 10 and 10
n n
∑(x – –x ) (y ––y ) ∑ dx dy
byx = = = 3900
6360 = 0·6132

∑(x – x ) 2 ∑dx 2
LINEAR REGRESSION ANALYSIS 9·9

∑ (x – –x ) (y ––y ) ∑ dx dy 3900
bxy = = = = 1·361
∑(y – –y )2 ∑dy 2 2868

Regression Equations
Equation of line of regression of y on x is Equation of line of regression of x on y is
y – –y = b (x – –x )
yx x – –x = b (y – –y )
xy

⇒ y – 70 = 0·6132 (x – 90) ⇒ x – 90 = 1·361 (y – 70)


= 0·6132x – 55·188 = 1·361y – 95·27
⇒ y = 0·6132x – 55·188 + 70·000 ⇒ x = 1·361y – 95·27 + 90·00
⇒ y = 0·6132x + 14·812 ⇒ x = 1·361y – 5·27
Remark. We have
r2 = byx bxy = 0·6132 × 1·361 = 0·8346 ⇒ r = ±⎯√⎯⎯⎯
0·8346 = ± 0·9135
But since, both the regression coefficients are positive, r must be positive. Hence, r = 0·9135.
Example 9·2. From the data given below find :
(a) The two regression coefficients. (b) The two regression equations.
(c) The coefficient of correlation between the marks in Economics and Statistics.
(d) The most likely marks in Statistics when marks in Economics are 30.
Marks in Economics : 25 28 35 32 31 36 29 38 34 32
Marks in Statistics : 43 46 49 41 36 32 31 30 33 39
[Himachal Pradesh Univ. M.A. (Econ.), 2003]
Solution. Let us denote the marks in Economics by the variable X and the marks in Statistics by the
variable Y.
CALCULATIONS FOR REGRESSION EQUATIONS
– –
x y dx = x – x = x – 32 dy = y – y = y – 38 dx 2 dy2 dxdy
25 43 –7 5 49 25 –35
28 46 –4 8 16 64 –32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 –1 –2 1 4 2
36 32 4 –6 16 36 –24
29 31 –3 –7 9 49 21
38 30 6 –8 36 64 – 48
34 33 2 –5 4 25 –10
32 39 0 1 0 1 0
∑ x = 320 ∑ y = 380 ∑ dx = 0 ∑ dy = 0 ∑ dx2 = 140 ∑ dy2 = 398 ∑ dxdy = – 93
–x = ∑x = 320 = 32 ; –y = ∑y = 380
Here, 10 and 10 = 38.
n n
(a) Regression Coefficients
∑(x – –x ) (y – –y ) ∑ dxdy – 93
Coefficient of regression of y on x = byx = = = = – 0·6643

∑(x – x )2 ∑ dx2 140
9·10 BUSINESS STATISTICS

∑(x – –x ) (y ––y ) ∑ dx dy – 93
Coefficient of regression of x on y = bxy = = = 398 = – 0·2337

∑(y – y ) 2 ∑ dy2

(b) Regression Equations


Equation of the line of regression of x on y is : Equation of the line of regression of y on x is :
– –
x – x = bxy (y – y ) y – –y = byx (x – –x )
⇒ x – 32 = – 0·2337 (y – 38) ⇒ y – 38 = – 0·6643 (x – 32)
= – 0·2337y + 0·2337 × 38 ⇒ y = – 0·6643x + 38 + 0·6643 × 32
= – 0·2337y + 8·8806 = – 0·6643x + 38 + 21·2576
⇒ x = – 0·2337y + 32 + 8·8806 ⇒ y = – 0·6643x + 59·2576 …(*)
⇒ x = – 0·2337y + 40·8806
(c) Correlation Coefficient. We have
r2 = byx . bxy = (– 0·6643) × (– 0·2337) = 0·1552 ⇒ r=±√ ⎯⎯⎯⎯⎯
0·1552 = ± 0·394
Since both the regression coefficients are negative, r must be negative. Hence, we get r = – 0·394.
(d) In order to estimate the most likely marks in Statistics (y) when marks in Economics (x) are 30, we
shall use the line of regression of y on x viz., the equation (*). Taking x = 30 in (*), the required estimate is
given by
y = – 0·6643 × 30 + 59·2576 = –19·929 + 59·2576 = 39·3286
Hence, the most likely marks in Statistics when marks in Economics are 30, are 39·3286 ~– 39.
Example 9·3. A panel of judges A and B graded seven debators and independently awarded the
following marks :
Debator 1 2 3 4 5 6 7
Marks by A : 40 34 28 30 44 38 31
Marks by B : 32 39 26 30 38 34 28
An eighth debator was awarded 36 marks by Judge A while Judge B was not present.
If Judge B was also present, how many marks would you expect him to award to eighth debator
assuming same degree of relationship exists in judgement ?
[Delhi Univ. B.Com (Hons.), 1993; Himachal Pradesh Univ. M.A. (Econ.), June 1999,
Allahabad Univ. M.Com. 2002]
Solution. Let the marks awarded by Judge ‘A’ be denoted by the variable X and the marks awarded by
Judge ‘B’ by the variable Y.
CALCULATIONS FOR REGRESSION EQUATIONS
Debator x y u = x – A = x – 35 v = y – B = y – 30 u2 v2 uv
1 40 32 5 2 25 4 10
2 34 39 –1 9 1 81 –9
3 28 26 –7 –4 49 16 28
4 30 30 –5 0 25 0 0
5 44 38 9 8 81 64 72
6 38 34 3 4 9 16 12
7 31 28 –4 –2 16 4 8
Total ∑u=0 ∑ v = 17 ∑ u2 = 206 ∑ v2 = 185 ∑ uv = 121
The marks awarded by Judge A to the eighth debator are given to be 36, i.e., we are given x = 36. We
want to find the marks which would have been given to the 8th debator by Judge B, if he were present. In
other words, we want to find y when x = 36. To do this we need the equation of line of regression of y on x.
In the usual notations we have :
LINEAR REGRESSION ANALYSIS 9·11
–x = A + ∑u = 35 + 0 = 35, –y = B + ∑v = 30 + 17 = 32·4286
n 7 n 7
n∑ uv – (∑ u) (∑ v) 7 × 121 – 0 × 17 121
byx = bvu = = = = 0·5874
n∑ u2 – (∑ u)2 7 × 206 – 0 206
The equation of line of regression of y on x is given by
y – –y = b (x – –x )
yx
⇒ y – 32·4286 = 0·5874 (x – 35)
= 0·5874x – 0·5874 × 35
⇒ y = 0·5874x – 20·5590 + 32·4286
⇒ y = 0·5874x + 11·8696
When x = 36, y = 0·5874 × 36 + 11·8696 = 21·1464 + 11·8696 = 33·016
Hence, if the Judge B were also present, he would have given 33 marks to the eighth debator.
Example 9·4. A departmental store gives in-service training to its salesmen which is followed by a test.
It is considering whether it should terminate the service of any salesman who does not do well in the test.
The following data give the test scores and sales made by nine salesmen during a certain period :
Test scores : 14 19 24 21 26 22 15 20 19
Sales (’000 Rs.) : 31 36 48 37 50 45 33 41 39
Calculate the coefficient of correlation between the test scores and the sales. Does it indicate that the
termination of services of low test scores is justified ? If the firm wants a minimum sales volume of
Rs. 30,000, what is the minimum test score that will ensure continuation of service ? Also estimate the most
probable sales volume of a salesman making a score of 28. [Delhi Univ. B.Com. (Hons.), 2003]
Solution. Let x denote the test scores of the salesmen and y denote their corresponding sales (in ’000
Rs.)
CALCULATIONS FOR REGRESSION LINES
– –
x y dx = x – x = x – 20 dy = y – y = y – 40 dx2 dy2 dxdy
14 31 –6 –9 36 81 54
19 36 –1 –4 1 16 04
24 48 4 8 16 64 32
21 37 1 –3 1 9 – 03
26 50 6 10 36 100 60
22 45 2 5 4 25 10
15 33 –5 –7 25 49 35
20 41 0 1 0 1 0
19 39 –1 –1 1 1 01
180 360 ∑ dx = 0 ∑ dy = 0 ∑ dx2 = 120 ∑ dy2 = 346 ∑ dxdy = 193

–x = ∑x = 180 = 20 ; –y = ∑y = 360 = 40
Then 9 9
n n
byx = Coefficient of regression of y on x bxy = Coefficient of regression of x on y
∑ dxdy 193 ∑ dxdy 193
= = = 1·6083 = = = 0·5578
∑ dx2 120 ∑ dy2 346
Karl Pearson’s correlation coefficient r between x and y is given by :

r2 = byx . bxy = 1·6083 × 0·5578 = 0·8971 ⇒ r=±√


⎯⎯⎯⎯⎯
0·8971 = ± 0·9471
Since, the regression coefficients are positive, r is also positive. ∴ r = + 0·9471.
9·12 BUSINESS STATISTICS

∑ dxdy 193 193 193


Aliter. rxy = = = = 203·7646 = 0·9472
⎯⎯⎯⎯⎯⎯⎯⎯
√∑dx 2
· ∑dy2 ⎯
√⎯⎯⎯⎯⎯⎯
120 × 346 ⎯⎯⎯⎯⎯
√ 41520

Thus, we see that there is a very high degree of positive correlation between the test scores (x) and the
sales (’000 Rs.) (y). This justifies the proposal for the termination of service of those with low test scores.
Regression Equations
To obtain the test sclore (x) for given sales To estimate the sales volume (y) of a salesman
(y), we use the equation of the line of regression of with given test score (x ), we use the line of
x on y. regression of y on x, which is given by :
The equation of line of regression of x on y is : y – –y = byx (x – –x )
– –
x – x = bxy (y – y ) ⇒ y – 40 = 1·6083 (x – 20)
⇒ x – 20 = 0·5578 (y – 40) = 0·5578y – 22·312 = 1·6083x – 32·1660
⇒ x = 0·5578y – 22·312 + 20 ⇒ y = 1·6083x – 32·1660 + 40
⇒ x = 0·5578y – 2·312 …(*) ⇒ y = 1·6083x + 7·8340
Hence to ensure the continuation of service, Hence the estimated sales volume of a
the minimum test score (x) corresponding to a salesman with test score of 28 is (in ’000 Rs.)
minimum sales volume (y) of Rs. 30,000 = 30 y = 1·6083 × 28 + 7·8340
(’000 Rs.) is obtained on putting y = 30 in (*) and
= 45·0324 + 7·8340
is given by :
= 52·8664 (’000 Rs.)
x = 0·5578 × 30 – 2·312 = 16·734 – 2·312
= Rs. 52,866.40
= 14·422 ~– 14
Example 9·5. The data about the sales and advertisement expenditure of a firm is given below :
Sales Advertisement expenditure
(in crores of Rs.) (in crores of Rs.)
Means 40 6
Standard deviations 10 1·5
Coefficient of correlation = r = 0·9
(i) Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores.
(ii) What should be the advertisement expenditure if the firm proposes a sales target of 60 crores of
rupees ?
Solution. Let the variable x denote the sales (in crores of Rs.) and the variable y denote the
advertisement expenditure (in crores of Rs.). Then, in usual notations, we are given :
–x = 40, –y = 6,
σ = 10 ;
x σ = 1·5, y r = r = 0·9 xy

(i) To estimate the likely sales (x) for given advertisement expenditure (y), we need the regression
equation of x on y which is given by :
r σx r σx 0·9 × 10
x – –x = σy (y – –y ) ⇒ x = σy (y – –y ) + –x ⇒ x = 1·5 (y – 6) + 40 = 6(y – 6) + 40 …(*)
Hence the estimated sales (x) for a proposed advertisement expenditure (y) of Rs. 10 crores are
obtained on putting y = 10 in (*) and are given by :
x = 6(10 – 6) + 40 = 6 × 4 + 40 = 64 crores of Rs.
(ii) To estimate the advertisement expenditure (y) for proposed sales (x), we need the equation of line
of regression of y on x which is given by :
r σy r σy × 1·5
y – –y = (x – –x ) ⇒ y = (x – –x ) + –y ⇒ y = 0·910 (x – 40) + 6 = 0·135 (x – 40) + 6 …(**)
σx σx
Hence the likely advertisement expenditure (y) of the firm for proposed sales target (x) of 60 crores of
Rs. is obtained on taking x = 60 in (**) and is given by :
LINEAR REGRESSION ANALYSIS 9·13
y = 0·135 (60 – 40) + 6 = 0·135 × 20 + 6 = 2·7 + 6 = 8·7 crores of Rs.
Example 9·6. Point out the inconsistency, if any, in the following statement.
“The regression equation of y on x is 2y + 3x = 4 and the correlation coefficient between x
and y is 0·8”. [I.C.W.A. (Intermediate), Dec. 1998]
Solution. Line of regression of y on x is :
2y + 3x = 4 ⇒ y = – 32 x + 2
3
∴ byx = Coefficient of regression of y on x = – 2 ·
Also rxy = 0·8 (Given).
Since byx and rxy have different signs, the given statement is wrong (inconsistent).
Remark. The sign of the correlation coefficient (rxy) and the regression coefficients byx and bxy must be
same, each depending on the sign of the covariance term Cov (x, y).
Example 9·7. The following is an estimated supply regression for sugar :
Y = 0·025 + 1·5X
where Y is supply in kilos and X is price (Rs.) per kilo.
(i) Interpret the coefficient of variable X.
(ii) Predict the supply when price is Rs. 20 per kilo.
(iii) Given that r(x, y) = 1 in the above case, interpret the implied relationship between price and
quantity supplied. [Delhi Univ. B.A. (Econ. Hons.), 1998]
Solution. The regression equation of Y (supply in kgs.) on X (price in Rupees per kg.) is given to be :
Y = 0·025 + 1·5 X = a + bX, (say) …(*)
(i) The coefficient of the variable X viz., b = 1·5, is the coefficient of regression of Y on X. It
reflects the unit change in the value of Y, for a unit change in the corresponding value of X.
This means that if the price of sugar goes up by Re. 1 per kg., the estimated supply of sugar
goes up by 1·5 kg.
(ii) From (*), the estimated supply of sugar when its price is Rs. 20 per kg. is given by :
^
Y = 0·025 + 1·5 × 20 = 30·025 kg.
(iii) r (X, Y) = 1, implies that the relationship between X and Y is exactly linear. This means that all
the observed values (X, Y) lie on a straight line.
Example 9·8. (a) The coefficient of regression of Y on X is bYX = 1·2. If
X – 100 Y – 200
U= and V = ; find bVU. [Delhi Univ. B.A. (Econ. Hons.), 1998]
2 3
(b) The covariance between X and Y is 900 and the standard deviations of X and Y are 15 and 80
respectively.
20 – X 50 + Y
If two variables S and T are defined as : S = and T = ,
5 8
find the slope coefficients of the regressions of : (i) S on T and (ii) T on S.
[Delhi Univ. B.A. (Econ. Hons.), 2005]
k 3
Solution. (a) Using formula (9· 39), we get : bYX = · bVU = 2 bVU ; (h = 2, k = 3)
h
2
⇒ bVU = 23 bYX = × 1·2 = 0·8
3
(b) We are given : Cov (X, Y) = 900 ; σX = 15 ; σY = 80 …(1 )
9·14 BUSINESS STATISTICS

[ ] ( )
2
20 – X 1 1 1
S = ⇒ Var (S) = Var – (X – 20) = – Var (X) = × 152 = 9 [From (1)]
5 5 5 25

[ ] ()
2
50 + Y 50 + Y 1 802
and T = ⇒ Var (T) = Var = Var (Y) = = 100 [From (1)]
8 8 8 64
[Q Var (ax) = a2 Var (X) and V (X ± A) = Var (X)]

∴ Cov (S, T) = Cov [ 20 – X 50 + Y


5
,
8
= ]1
5×8
. Cov [20 – X, 50 + Y]

1 1 900 45
= Cov (– X, Y) = – Cov (X, Y) = – =– [From (1)]
40 40 40 2
The slopes coefficients of regression of S on T and Ton S are given respectively by
Cov (S‚ T) – 45/2 9 Cov (S‚ T) – 45/2 5
(i) bST = = =– and (ii) bTS = = =–
Var (T) 100 40 Var (S) 9 2
Example 9·9. By using the following data, find out the two lines of regression and from them compute
the Karl Pearson’s coefficient of correlation.
∑X = 250 ; ∑Y = 300 ; ∑XY = 7,900 ; ∑X2 = 6,500 ; ∑Y 2 = 10,000 ; and N = 10.
Solution. We have :
— ∑X 250
= 10 = 25 ; Y = ∑Y = 300

X= 10 = 30
N N
N ∑XY – (∑X) (∑Y)
bYX = Coefficient of regression of Y on X =
N ∑X2 – (∑X)2
= 10 × 7900 – 250 × 300 79000 – 75000 4000
= 65000 – 62500 = 2500 = 1·6
10 × 6500 – (250) 2

N ∑XY – (∑X) (∑Y)


bXY = Coefficient of regression of X on Y =
N ∑Y2 – (∑Y)2
= 10 × 7900 – 250 × 300 79000 – 75000 4000
= 100000 – 90000 = 10000 = 0·4
10 × 10000 – (300)2

Hence correlation coefficient rXY between X and Y is given by :


rXY 2 = bYX . bXY = 1·6 × 0·4 = 0·64 ⇒ rXY = ± √
⎯⎯⎯
0·64 = ± 0·8
Since the regression coefficients are positive, we take r = + 0·8.

Regression Equations
Regression equation of Y on X Regression equation of X on Y
— — — —
Y – Y = bYX (X – X ) X – X = bXY (Y – Y )
⇒ Y – 30 = 1·6 (X – 25) ⇒ X – 25 = 0·4 (Y – 30)
⇒ Y = 1·6X – 40 + 30 ⇒ X = 0·4 Y – 12 + 25
⇒ Y = 1·6X – 10 ⇒ X = 0·4 Y + 13
Example 9·10. In the estimation of regression equations of two variables X and Y the following results
were obtained :
∑X = 900, ∑Y = 700, n = 10 ; ∑ x2 = 6360, ∑ y2 = 2860, ∑ xy = 3900,
where x and y are deviations from respective means. Obtain the two regression equations.
[Delhi Univ. B.Com (Hons.), 2008]
Solution. The coefficients of regression of Y on X, and X on y are given respectively by :
LINEAR REGRESSION ANALYSIS 9·15
— —
Cov (X‚ Y) ∑(X – X ) (Y – Y ) ∑ xy 3900
bYX = = — = = = 0·6132
σx2 ∑(X – X ) 2 ∑ x2 6360
— —
Cov (X‚ Y) ∑(X – X ) (Y – Y ) ∑ xy 3900
bXY = = — = = = 1·3636
σy2 ∑(Y – Y )2 ∑ y2 2860
— ∑X 900 ∑Y 700

X= = = 90 , Y= = = 70
n 10 n 10
Regression Equations
Regression equation of Y on X : Regression equation of X on Y :
— – — —
Y – Y = bYX (X – X ) X – X = bXY (Y – Y )
⇒ Y – 70 = 0·6132 (X – 90) ⇒ X – 90 = 1·3636 (Y – 70)
⇒ Y = 0·6132X – 55·188 + 70 ⇒ X = 1·3636Y – 95·452 + 70
⇒ Y = 0·6132X + 14·812 ⇒ X = 1·3636Y – 5·452
Example 9·11. For a set of 10 pairs of values of x and y, the regression line of x on y is x – 2y + 12 =
0; mean and standard deviation of y being 8 and 2 respectively. Later it is known that a pair (x = 3, y = 8)
was wrongly recorded and the correct pair detected is (x = 8, y = 3). Find the correct regression line of x
on y. [I.C.W.A. (Intermediate), June 1998]

Solution. In the usual notations we are given : n = 10, y = 8, σ = 2 y … (*)
The equation of the line of regression of x on y is : x – 2y + 12 = 0 (Given). Since the lines of
regression pass through the point ( –x , –y ), we get
–x – 2y– + 12 = 0 –x = 2y– – 12 = 2 × 8 – 12 = 4
⇒ [Using (*)]
Also x – 2y + 12 = 0 ⇒ x = 2y – 12 ⇒ bxy = 2
Cov (x‚ y)
∴ =2 ⇒ Cov (x, y) = 2 × 22 = 8 [From (*)]
σy2
∑xy – –
⇒ –x y=8 ⇒ ∑xy = 10 (8 + 4 × 8) = 10 × 40 = 400
n
∑y2 – 2
σy = 2 ⇒ σy2 = –y =4 ⇒ ∑y 2 = 10 (4 + 82 ) = 680
n
–x = 4, –y = 8,
∴ We have ∑y 2 = 680 , ∑xy = 400
Wrong pair = (x = 3, y = 8) ; Correct pair = (x = 8, y = 3)
Corrected Values. [Suffix c stands for corrected values]
– –
–x = nx – 3 + 8 = 10 × 4 + 5 = 9 –y = ny – 8 + 3 = 10 × 8 – 5 = 15
c 10 2 ; c 10 2
n n
(∑y2 )c = ∑y2 – 82 + 3 2 = 680 – 64 + 9 = 625 ; (∑ xy)c = ∑ xy – 3 × 8 + 8 × 3 = 400 – 24 + 24 = 400
(∑y2 )c
– [(y–)c]2 = 10 – 4 =
625 225 1250 – 1125 25
(σy2)c = 20 = 4
n
(∑xy) c
– ( –xc ) × ( –y c ) = 10 – 2 × 2 = 40 – 4 = 4
400 9 15 135 25
[Cov (xy)]c =
n
[Cov (x‚ y)]c 25/4
∴ (bxy)c = = 25/4 = 1.
(σy2)c
Corrected line of regression of x on y becomes :
9·16 BUSINESS STATISTICS

x – –xc = (bxy)c (y – –y c) ⇒
9
x–2=1 y– ( 15
2 ) ⇒ x = y – 3.

EXERCISE 9·1
1. (a) Explain the concept of regression and point out its usefulness in dealing with business problems.
(b) What is a scatter diagram ? Indicate by means of suitable scatter diagrams different types of correlation that
may exist between the variables in bivariate data. What are regression lines ? Write down the main points of distinction
between correlation analysis and regression analysis.
2. Distinguish between correlation and regression analysis and indicate the utility of regression analysis in
economic activities. [C.A. (Foundation), Nov. 1996]
3. (a) What is regression analysis ? How does it differ from correlation ? Why there are, in general, two regression
equations ?
(b) Comment on the following :
“Regression equations are irreversible”. [Delhi Univ. B.Com. (Hons.), 2002]
4. Given a scatter diagram of bivariate data involving variables X and Y. Find the conditions of minimisation of
∑(Yi – Ye)2 and hence derive normal equations for the linear regression of Y upon X. What sum is to be minimised when
X is regressed upon Y and what are the normal equations in this case ?
5. Derive the normal equations for the regression of Y on X for a data comprising of n pairs of values of X and Y.
Show that the mean of the error terms is zero. [Delhi Univ. B.A. (Econ. Hons.), 2005]
Hint. Y = a + bX …… (i) (Regression equation of Y on X)
Normal equations are :
∑Y = na + b∑X …(ii) and ∑XY = a∑X + b∑X2 …(iii)
Mean of error terms is given by :
n ∧ 1 n
–e = 1 ∑ (Y – Y ) = ∑ (Yi – a – bXi ) [From (i)]
n i=1 i i n i=1
1
= [∑Yi – na – b∑Xi] = 0. [From (ii)]
n
6. What is linear regression ? Why are there, in general, two regression lines ? When do they coincide ? Explain
the use of regression equations in economic enquiry.
7. (a) It is said that regression equations are irreversible meaning thereby that you cannot find out the regression
equation of x on y from that of y on x. Justify the comment with special reference to the principle of least squares.
(b) Explain the term ‘Regression’. Why do we take, in general, two regression lines ? When are the regression
lines (i) perpendicular to each other and (ii) coincident ?
8. What are regression lines ? Why is it necessary to consider two lines of regression ? In case the two lines are
identical, prove that the correlation coefficient is +1 or –1. If the two variables are independent, show that the two
regression lines are perpendicular.
9. What is the angle between the two lines of regression ? Discuss the nature of the lines for the following
particular cases :
(i) r = ± 1. (ii) r = 0.
10. What is the difference between correlation and regression coefficients ? Can correlation coefficient be
computed out of regression coefficients ? If yes, how ?
11. (a) Define regression coefficients. What information do they supply ?
(b) Let byx and bxy stand for the coefficients of regression of Y on X and X on Y respectively. Show that :

⎯√⎯⎯⎯
r xy = b xy × byx [Delhi Univ. B.A. (Econ. Hons.), 1997]
12. Given the following values of x and y :
x : 3 5 6 8 9 11
y : 2 3 4 6 5 8
find the equation of regression of
(i) y on x and (ii) x on y.
Interpret the results.
Ans. y = 0·7143x – 0·3334 ; x = 1·2857y + 1·0001.
LINEAR REGRESSION ANALYSIS 9·17
13. Obtain the equations of the two lines of regression for the data given below :
X : 1 2 3 4 5 6 7 8 9
Y : 9 8 10 12 11 13 14 16 15
Ans. Y = 0·95X + 7·25 ; X = 0·95Y + 7·25.
14. From the following data of the age of husband and the age of wife, form two regression lines and calculate the
husband’s age when the wife’s age is 16.
Husband’s age : 36 23 27 28 28 29 30 31 33 35
Wife’s age : 29 18 20 22 27 21 29 27 29 28
Ans. Husband’s age : x ; Wife’s age : y
y = 0·95x – 3·5 ; x = 0·8y + 10 ; (x)y = 16 = 22·8.
15. Find the regression equation of y on x where y and x are the marks obtained by 10 students as given below :
y : 20 60 55 45 75 35 25 90 10 50
x : 20 45 65 40 55 35 15 80 25 50
[C.A. (Foundation), May 2002]
Ans. byx = 1·105 ; y = 1·105x – 1·015.
16. The following data give the experience of machine operators and their performance ratings as given by the
number of good parts turned out per 100 pieces :
Operator : 1 2 3 4 5 6 7 8
Experience (in years) (X) : 16 12 18 4 3 10 5 12
Performance Ratings (Y) : 87 88 89 68 78 80 75 83
Calculate the regression line of performance ratings on experience and estimate the probable performance if an
operator has 7 years experience. [Himachal Pradesh Univ. B.Com., 1996]
Ans. Y = 69·67 + 1·133 X ; 77·601.
17. You are given the data relating to purchases and sales. Obtain the two regression equations by the method of
least squares and estimate the likely sales when the purchases equal 100.
Purchases : 62 72 98 76 81 56 76 92 88 49
Sales : 112 124 131 117 132 96 120 136 97 85
Ans. Purchase : x ; Sales : y ; x = 0·6515y + 0·0775
y = 0·7825y + 56·3125 ; 134·5625.
18. The height of fathers and sons is given in the following table. Find the two lines of regression and estimate the
expected average height of the son when the height of the father is 67·5 inches.
Height of father (in inches) : 65 66 67 67 68 69 71 73
Height of son (in inches) : 67 68 64 68 72 70 69 70
Ans. y = 0·4242x + 39·5484 ; x = 0·525y + 32·2875; 68·18 inches.
19. The following table gives the ages and blood pressure of 10 women.
Age (X) : 56 42 36 47 49 42 60 72 63 55
Blood Pressure (Y) : 147 125 118 128 145 140 155 160 149 150
(i) Find the correlation coefficient between X and Y.
(ii) Determine the least square regression equation of Y on X.
(iii) Estimate the blood pressure of a woman whose age is 45 years.
Ans. (i) r = 0·89, (ii) Y = 83·758 + 1·11X, (iii) When X = 45, Y = 134.
20. A panel of two judges P and Q graded seven dramatic performances by independently awarding marks as
follows :
Performance : 1 2 3 4 5 6 7
Marks by P : 46 42 44 40 43 41 45
Marks by Q : 40 38 36 35 39 37 41
The eighth performance, which Judge Q could not attend, was awarded 37 marks by Judge P. If Judge Q had also
been present, how many marks would be expected to have been awarded by him to the eighth performance ?
Ans. 33·5 –~ 34 .
21. The following table gives the normal weight of a baby during the first six months of life :
9·18 BUSINESS STATISTICS
Age in months : 0 2 3 5 6
Weight in lbs. : 5 7 8 10 12
Estimate the weight of a baby at the age of 4 months.
Ans. 9·2982 lbs.
22. You are given the following data :
x y
Arithmetic Mean 36 85
Standard Deviation 11 8
Correlation coefficient between x and y = 0·66
(i) Find two regression equations. (ii) Estimate value of x when y = 75.
Ans. (i) y = 0·48x + 67·72 ; x = 0·9075y – 41·1375, (ii) 26·925.
23. Given the information : Sum of X = 5 ; Sum of Y = 4
Sum of squares of deviations from the mean of X = 40 ; Sum of squares of deviations from the mean of Y = 50
Sum of the products of deviations from the means of X and Y = 32; Number of pairs of observations = 10
Calculate :
(i) regression coefficient of Y on X ; (ii) regression coefficient of X on Y ;
(iii) Karl Pearson’s coefficient of correlation. [Delhi Univ. B.A. (Econ. Hons.), 1999]
Ans. bYX = 0·80 ; bXY = 0·64 ; r (X, Y) = 0·7156.
24. For some bi-variate data, the following results were obtained :
Mean value of variable X = 53·2 and of Y = 39·5.
Regression Coefficient of Y and X = – 1·5 and of X on Y = – 0·38·
What should be the most likely value of X when Y = 50?
Also find the coefficient of correlation between two variables. [Delhi Univ. B.Com. (Hons.), 2005]

Ans. X = 53·2 + (– 1·5) (50 – 39·5) = 49·21 ; r = – √ ⎯⎯⎯⎯⎯⎯⎯⎯⎯
(– 1·5) (– 0·38) = – ⎯
√⎯·57 = – 0·7549
25. For a particular product, the sales (y) and the advertisement expenditure (x) for 10 years, provide the results
∑ x = 15, ∑ y = 110, ∑ xy = 400, ∑ x2 = 250, ∑ y2 = 3200.
Find the regression line of y on x and the estimated value of y for x = 10. [I.C.W.A (Intermediate), Dec. 2001]
^
Ans. y = 1·033x + 9·4505 ; (y)x = 10 = 19·781.
26. Calculate the correlation coefficient from the following results :
N = 10, ∑X = 350, ∑Y = 310 , ∑(X – 35)2 = 162, ∑(Y – 31) 2 = 222, ∑(X – 35) (Y – 31) = 92.
Also find the regression line of Y on X. [Delhi Univ. B.A. (Econ. Hons.), 2007]
— — –2
Hint. X = 35, Y = 31 ⇒ ∑ (x – 35) = ∑ (x – x ) = 162 and so on.
2

Ans. r(X, Y) = 0·485 ; Y = 0·568X + 11·12.


27. For bivariate data, you are given the following :
∑(X – 58) = 46 ; ∑(Y – 58) = 9, ∑(X – 58) 2 = 3086, ∑(Y – 58) 2 = 483 ; ∑(X – 58) (Y – 58) = 1095.
Number of pairs of observations is 7. You are required to determine the two regression equations and the
coefficient of correlation between X and Y. [Delhi Univ. B.Com. (Hons.), 2000]
Hint. Let U = X – 58, V = Y – 58. ; Then we are given ∑U, ∑V, ∑U 2, ∑V2 and ∑UV.
— – – –
X = 58 + U ; Y = 58 + V ; b YX = bVU and b XY = bUV
Ans. Regression Equations
Y on X : Y = 0·372 X + 35·266 ; X on Y : X = 2·197Y – 65·680 ; r(X, Y) = 0·904.
28. If the two regression lines corresponding to two variables X and Y meet at a point (2, 3), V(X) = 4, V(Y) = 1 and
correlation coefficient between X and Y is 12 , the estimated value of Y for X = 6 is :
(i) 2, (ii) 4, (iii) 7, (iv) None of these.
[I.C.W.A. (Intermediate), Dec. 1999]
– –
Hint. Lines of regression intersect at the point ( x , y ) = (2, 3).
Ans. (ii).
LINEAR REGRESSION ANALYSIS 9·19
29. Let the two variables X and Y have the covariance and correlation coefficient between them as 2 and 0·5
respectively and V(X) = 2V(Y), then the regression coefficient of X on Y is
1 1
(i) 1, (ii) 2 , (iii) 4 , (iv) None of these.
[I.C.W.A. (Intermediate), June 2001]
Ans. (iv) bxy = 1 / 2
30. For a bivariate data the mean value of X is 20 and the mean value of Y is 45. The regression coefficient of Y on
X is 4 and that of X on Y is 1/9. Find
(i) The coefficient of correlation.
(ii) The standard deviation of X if the standard deviation of Y is 12.
(iii) Also write down the equations of regression lines.
Ans. (i) 0·67, (ii) σ x = 2, (iii) Regression = ns of y on x and x on y are respectively : y = 4x – 35, 9x = y + 135.
31. From the following results, obtain the two regression equations and estimate the yield of crops when the
rainfall is 22 cms. and the rainfall when the yield is 600 kgs.
Yield in kgs. Rainfall in cms.
(X) (Y)
Mean 508·4 26·7
S.D. 36·8 4·6
Coefficient of correlation between yield and rainfall is 0·52. [C.A. (Foundation), Nov. 2001]
Ans. y = 4·16x + 397·328 ; x = 0·065y – 6·346 ; 488·85 kgs. ; 32·654 cms.
32. The following table shows the mean and standard deviation of the prices of two shares in a stock exchange.
Share Mean (in Rs.) Standard deviation (in Rs.)
A Ltd. 39·5 10·8
B Ltd. 47·5 16·0
If the coefficient of correlation between the prices of two shares is 0·42, find the most likely price of share. A
corresponding to a price of Rs. 55 observed in the case of share B. [Delhi Univ. (FMS), M.B.A. Oct. 2002]
Ans. X = 0.27Y + 26.675 ; Rs. 41.52.
33. Given the following information :
X Y
Mean : 6 8
Standard Deviation : 5 13
Coefficient of Determination = 0·64
Find : (i) bYX and bXY and (ii) Value of Y when X = 100. [Delhi Univ. B.A. (Econ. Hon.) 2009]
Ans. (i) r 2 = 0·64 ⇒ r = ± 0·8;
σY σX
r = 0·8 ⇒ b YX = r = 2·08 ; b XY = r = 0·31
σX σY
r = – 0·8 ⇒ b YX = – 2·08 ; b XY = – 0·31·
∧ — —
( ii) (Y )X = 100 = Y + bYX (100 – X ) = 8 + 2·08 (100 – 6) = 203·52; [Assume : bYX >0].
34. A survey was conducted to study the relationship between expenditure on accommodation (X) and expenditure
on food and entertainment (Y) and the following results were obtained :
Mean S.D.
Expenditure on accommodation Rs. 173 63·15
Expenditure on food and entertainment Rs. 47·8 22·98
Coefficient of correlation = + 0·57
Write down the equation of regression of X on Y and estimate the expenditure on food and entertainment, if the
expenditure on accommodation is Rs. 200. [Bangalore Univ. B.Com., 1998]
Ans. Y = 0·207X + 11·99, (Y)X = 200 = Rs. 53·29
35. Find out the regression coefficients of Y on X , and X on Y on the basis of the following data :
9·20 BUSINESS STATISTICS
— —
∑X = 50, X = 5, ∑Y = 60, Y = 6, ∑XY = 350, Variance of X = 4, Variance of Y = 9.
Ans. byx = 1·25, bxy = 0·56.
36. In order to find the correlation coefficient between two variables X and Y from 12 pairs of observations, the
following calculations were made :
∑X = 30 ; ∑X2 = 670 ; ∑Y = 5 ; ∑Y2 = 285 ; ∑XY = 344
On subsequent verification it was discovered that the pair (X = 11, Y = 4) was copied wrongly, the correct values
being (X = 10, Y = 14). After making necessary correction, find :
(a) the two regression coefficients ; (b) the two regression equations ; (c) the correlation coefficient.
[Delhi Univ. B.Com. (Hons.), 1990]
Ans. (a) byx = 0·694 ; bxy = 0·898 (b) : Y on X : y = 0·694x – 0·427 ; X on Y : x = 0·898y + 1·294
(c) r (x, y) = 0·7894 –~ 0·79.

— —
9·5. TO FIND THE MEAN VALUES ( x , Y ) FROM THE TWO LINES OF
REGRESSION
Let us suppose that the two lines of regression are :
a1 x + b1 y + c 1 = 0 …(9·41)
and a2 x + b2 y + c 2 = 0 …(9·42)

We have already discussed that both the lines of regression pass through the point ( x– , –y ). In other
words, ( –x , –y ) is the point of intersection of the two lines of regression. Hence, solving (9·41) and (9·42)
simultaneously, we get
–x –y
1 –x = b1 c2 – b2 c1 ‚ –y = c1a2 – c2a1
= = ⇒ …(9·43)
b1 c2 – b2 c1 c1a2 – c2a1 a1 b2 – a2 b1 a1 b2 – a 2 b1 a1 b2 – a 2 b1

9·6. TO FIND THE REGRESSION COEFFICIENTS AND THE CORRELATION


COEFFICIENT FROM THE TWO LINES OF REGRESSION

Let (9·41) and (9·42) be the given lines of regression and let us suppose that (9·41) is the line of
regression of y on x and (9·42) is the line of regression of x on y. To obtain byx, the coefficient of regression
of y on x, write the regression equation of y on x in the form y = a + bx. Then b, the coefficient of x gives
the value of byx. Similarly to obtain bxy, write the equation of regression of x on y in the form x = A + By.
Then B, the coefficient of y gives bxy. Therefore, re-writing (9·41), we get the regression equation of y on x :
a1 c a1
y =– x– 1 ⇒ byx = – …(9·44)
b1 b1 b1
Similarly re-writing (9·42), we get regression equation of x on y as :
b c b
x =– 2 y– 2 ⇒ bxy = – 2 …(9·45)
a2 a2 a2
The correlation coefficient r between x and y can now be obtained by using the formula

( ba ) × ( – ab ) = aa bb
r2 = byx . bxy = – 1
1
2
2
1 2
2 1
⇒ r =±
a1b2 ,
a2 b1
…(9·46)

the sign to be taken before the square root is same as that of the regression coefficients. If regression
coefficients are positive, we take positive sign and if they are negative, we take negative sign in (9·46).
Remark. Given the two lines of regression (9·41) and (9·42) how to determine which is the line of
regression of y on x and which is the line of regression of x on y ? Incidentally, the above discussion
enables us to answer this question. By supposing (9·41) and (9·42) to be equations of the lines of regression

You might also like