0% found this document useful (0 votes)
9 views

Module-I Regression (3)

Uploaded by

Vrushtii gala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module-I Regression (3)

Uploaded by

Vrushtii gala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Regression Analysis

Correlation :- So far we have considered only univariate distribution . We know how to find
averages and dispersion of a distribution. These measures give a complete idea about the
structure of a distribution..
Sometimes it is necessary to know the relationship between two variables . For example, family
income and expenditure, price of a product and its demand, advertisement expenditure and sales
volume etc. If two quantities vary in such a way that movements in one are accompanied by
movements in the other, then these quantities are said to be correlated.
Correlation is a statistical technique to ascertain the association or relationship between two or
more variables. Correlation analysis is a statistical technique to study the degree and direction of
relationship between two or more variables. A correlation coefficient is a statistical measure of
the degree to which changes to the value of one variable predict change to the value of another.
When the fluctuation of one variable reliably predicts a similar fluctuation in another variable,
there’s often a tendency to think that means that the change in one causes the change in the other.
Types of Correlation:
Correlation is described or classified in several different ways. Three of the most important are:
I. Positive and Negative
II. Simple, Partial and Multiple
III. Linear and nonlinear

I. Positive, Negative and Zero Correlation:- Whether correlation is positive (direct) or


negative (in-versa) would depend upon the direction of change of the variable.

Positive Correlation: If both the variables vary in the same direction, correlation is said to be
positive. It means if one variable is increasing, the other on an average is also increasing or if
one variable is decreasing, the other on an average is also decreasing, then the correlation is
said to be positive correlation. For example, the correlation between heights and weights of a
group of persons is a positive correlation.

Height (cm): X 158 160 163 166 168 171 174 176

Weight (kg) : Y 60 62 64 65 67 69 71 72

Negative Correlation: If both the variables vary in opposite direction, the correlation is said
to be negative. If it means if one variable increases, but the other variable decreases or if one
variable decreases, but the other variable increases, then the correlation is said to be negative
correlation. For example, the correlation between the price of a product and its demand is a
negative correlation.

Price of Product (Rs. Per Unit) : X 6 5 4 3 2 1

Demand (In Units) : Y 75 120 175 250 215 400

Zero Correlation: Actually it is not a type of correlation but still it is called zero or no
correlation. When we don’t find any relationship between the variables then, it is said to be zero
correlation. It means a change in value of one variable doesn’t influence or change the value of
another variable. For example, the correlation between weight of a person and intelligence is a
zero or no correlation.

State in each case whether there is


(a) Positive Correlation
(b) Negative Correlation
(c) No Correlation

Sl No Particulars Solution

1 Price of commodity and its demand Negative

2 Yield of crop and amount of rainfall Positive

3 No of fruits eaten and hunger of a person Negative

4 No of units produced and fixed cost per unit Negative

5 No of girls in the class and marks of boys No Correlation

6 Ages of Husbands and wife Positive

7 Temperature and sale of woolen garments Negative

8 Number of cows and milk produced Positive

9 Weight of person and intelligence No Correlation

10 Advertisement expenditure and sales volume Positive


Scatter Diagram:This is a graphic method of measurement of correlation. It is a
diagrammatic representation of bivariate data to ascertain the relationship between two
variables. Under this method the given data are plotted on a graph paper in the form of a
dot. i.e. for each pair of X and Y values we put dots and thus obtain as many points as the
number of observations. Usually an independent variable is shown on the X-axis whereas
the dependent variable is shown on the Y-axis. Once the values are plotted on the graph it
reveals the type of the correlation between variable X and Y. A scatter diagram reveals
whether the movements in one series are associated with those in the other series.

Perfect Positive Correlation: In this case, the points will form on a straight line falling
from the lower left hand corner to the upper right hand corner.
Perfect Negative Correlation: In this case, the points will form on a straight line rising
from the upper left hand corner to the lower right hand corner.
High Degree of Positive Correlation: In this case, the plotted points fall in a narrow
band, wherein points show a rising tendency from the lower left hand corner to the upper
right hand corner.
High Degree of Negative Correlation: In this case, the plotted points fall in a narrow
band, wherein points show a declining tendency from upper left hand corner to the lower
right hand corner.
Low Degree of Positive Correlation: If the points are widely scattered over the
diagrams, wherein points are rising from the left hand corner to the upper right hand
corner.
Low Degree of Negative Correlation: If the points are widely scattered over the
diagrams, wherein points are declining from the upper left hand corner to the lower right
hand corner.
Zero (No) Correlation: When plotted points are scattered over the graph haphazardly,
then it indicates that there is no correlation or zero correlation between two variables.
Coefficient of Correlation: Karl Pearson’s method of calculating coefficient of correlation is
based on the covariance of the two variables in a series. This method is widely used in practice
and the coefficient of correlation is denoted by the symbol “r”. If the two variables under study
are X and Y, the following formula suggested by Karl Pearson can be used for measuring the
degree of relationship of correlation we have

Σ(𝑥𝑖−𝑥)Σ(𝑦𝑖−𝑦)
Cov (x,y)= 𝑛
𝑐𝑜𝑣(𝑥,𝑦)
r= σ𝑥σ𝑦
(i)
By substituting the values of σ𝑥 𝑎𝑛𝑑 σ𝑦 equation(i) becomes
Σ𝑥Σ𝑦
Σ𝑥𝑦 −
r= 2
𝑛
2
(ii)
2 (Σ𝑥) 2 (Σ𝑦)
Σ𝑥 − 𝑛
Σ𝑦 − 𝑛

Σ(𝑥𝑖−𝑥)Σ(𝑦𝑖−𝑦)
r=
2 2
(iii)
Σ(𝑥 − 𝑥) Σ(𝑦 − 𝑦)
𝑛 𝑛

Above different formulas can be used in different situations depending upon the information
given in the problem.

2 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
The coefficient of determination = 𝑟 = 𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
r 𝑟
2 comment

1 1 Variation in dependent
variable Y can be completely
explained by the independent
variable X.

0.9≤r<1 0.81≤𝑟 <1


2 81% of variation in Y can be
explained by the presence of
X. Hence we say there exists
high positive correlation

0.75≤r<0.9 2
0.56≤𝑟 <0.81 56% of variation in Y can be
explained by the presence of
X. Hence, we say there exists
definite positive correlation

0.5≤r<0.75 2
0.25≤𝑟 <0.56 Unreliable positive
correlation

0<r<0.5 2
0≤𝑟 <0.25 Poor positive correlation

r=0 2
𝑟 =0 No linear correlation
Example 1: a set of data giving the number of police traffic patrols on duty and the number of
fatalities for the region was recorded and a correlation of r = -0.81was found. Interpret the value
of r.
Solution:
since the value of r is between -0.75 & -0.9. Hence there exists Definite Negative Correlation.

Example 2: Compute Karl Pearson’s correlation coefficient and comment on your result.

x 7 5 4 11 10 12 14 9

y 14 8 8 19 16 19 20 16

Solution:
Here n = 8, Σ𝑥 = 72, Σ𝑦 = 120
So, 𝑥 = 9, 𝑦 = 15
Σ (𝑥 − 𝑥) × (𝑦 − 𝑦)
We use the formula, r = 𝑛 σ𝑥 σ𝑦
To calculate the required summations, we prepare the following table.

x y x-𝑥 2
y-𝑦 2
(𝑥 − 𝑥) (𝑦 − 𝑦) (𝑥 − 𝑥) (𝑦 − 𝑦)

7 14 -2 4 -1 1 2

5 8 -4 16 -7 49 28

4 8 -5 25 -7 49 35

11 19 2 4 4 16 8

10 16 1 1 1 1 1

12 19 3 9 4 16 12

14 20 5 25 5 25 25

9 16 0 0 1 1 0

72 120 0 84 0 158 111

2 2
Now, Σ(𝑥 − 𝑥) = 84, Σ(𝑦 − 𝑦) = 158, Σ(𝑥 − 𝑥) (𝑦 − 𝑦) = 111
2
Σ(𝑥 − 𝑥) 84
Standard Deviation of X = σ𝑥 = 𝑛
= 8
= 10. 5 = 3.2403
2
Σ(𝑦 − 𝑦) 158
Standard Deviation of y = σ𝑦 = 𝑛
= 8
= 19. 75 = 4.4441

Σ (𝑥 − 𝑥) × (𝑦 − 𝑦)
Now r = 𝑛 σ𝑥 σ𝑦
111
= 8 × 3.2403 × 4.4441
= 0.9635
Hence there exists high positive Correlation.

Example: 3 Find the correlation coefficient or the following data.

x 14 8 10 11 9 13 5

y 14 9 11 13 11 12 4

Solution:
We observe that n = 7, Σ𝑥 = 70, Σ𝑦 = 74
∑𝑥∑𝑦
∑ 𝑥𝑦 − 𝑛
We use the formula, r = 2 2
2 (Σ𝑥) 2 (Σ𝑦)
Σ𝑥 − 𝑛
Σ𝑦 − 𝑛
To calculate the required summations, we prepare the following table.

x y 𝑥
2
𝑦
2 xy

14 14 196 196 196

8 9 64 81 72

10 11 100 121 110

11 13 121 169 143

9 11 81 121 99

13 12 169 144 156


5 4 25 16 20

70 74 756 848 796

2 2
Here Σ𝑥 = 756, Σ𝑦 = 848, Σ𝑥𝑦 = 796,
Substituting theses values in the formula
∑𝑥∑𝑦
∑ 𝑥𝑦 − 𝑛
r= 2 2
2 (Σ𝑥) 2 (Σ𝑦)
Σ𝑥 − 𝑛
Σ𝑦 − 𝑛
70 × 74
796 −
= 2
7
2
(70) (74)
756 − 7
848 − 7

r = 0.9231.
Example 4: A computer while calculating the correlation coefficient between the variable X
and Y obtained the following results: N = 30; ∑X = 120 ∑X2 = 600 ∑Y = 90 ∑Y2 = 250
∑XY = 335 It was, however, later discovered at the time of checking that it had copied
down two pairs of observations as: (X, Y) : (8, 10) (12, 7) While the correct values were: (X,
Y) : (8, 12) (10, 8) Obtain the correct value of the correlation coefficient between X and Y.

Solution:
Correct ∑X = 120 – 8 – 12 + 8 + 10 = 118
2 2 2 2 2
Correct ∑𝑥 = 600 – 8 – 12 + 8 + 10
= 600 – 64 – 144 + 64 + 100 = 556
Correct ∑Y = 90 – 10 – 7 + 12 + 8 = 93
2 2 2 2 2
Correct ∑𝑦 = 250 – 10 – 7 + 12 + 8
= 250 – 100 – 49 + 144 + 64 = 309
Correct ∑XY = 335 – (8×10) – (12×7) + (8×12) + (10×8)
= 335 – 80 – 84 + 96 + 80 = 347

Substituting theses corrected values in the formula


∑𝑥∑𝑦
∑ 𝑥𝑦 − 𝑛
r= 2 2
2 (Σ𝑥) 2 (Σ𝑦)
Σ𝑥 − 𝑛
Σ𝑦 − 𝑛
118 × 93
347 −
= 2
30
2
(118) (93)
556 − 30
309 − 30

r = -0.4030
Hence there exists poor negative correlation.
Regression analysis
Meaning: A study of measuring the relationship between associated variables, wherein one
variable is dependent on another independent variable, called Regression. It was developed by
Sir Francis Galton in 1877 to measure the relationship of height between parents and their
children.
Regression analysis is a statistical tool to study the nature and extent of functional
relationship between two or more variables and to estimate (or predict) the unknown values of
dependent variables from the known values of independent variables.
The variable that forms the basis for predicting another variable is known as the Independent
Variable and the variable that is predicted is known as the dependent variable. For example, if we
know that two variables price (X) and demand (Y) are closely related we can find out the most
probable value of X for a given value of Y or the most probable value of Y for a given value of
X. Similarly, if we know that the amount of tax and the rise in the price of a commodity are
closely related, we can find out the expected price for a certain amount of tax levy.
Uses of Regression Analysis: 1. It provides estimates of values of the dependent variables from
values of independent variables. 2. It is used to obtain a measure of the error involved in using
the regression line as a basis for estimation. 3. With the help of regression analysis, we can
obtain a measure of degree of association or correlation that exists between the two variables. 4.
It is a highly valuable tool in economies and business research, since most of the problems of
economic analysis are based on cause and effect relationship.

Sl No Correlation Regression

1 It measures the degree and direction of It measures the nature and extent of average
relationship between the variables. relationship between two or more variables
in terms of the original units of the data

2 It is a relative measure showing It is an absolute measure of a relationship.


association between the variables.

3 Correlation Coefficient is independent Regression Coefficient is independent of


of change of both origin and scale. change of origin but not scale.

4 Correlation Coefficient is independent Regression Coefficient is not independent


of units of measurement. of units of measurement

5 Expression of the relationship between Expression of the relationship between the


the variables ranges from –1 to +1. variables may be in any of the forms like:
Y = a + bX
Y = a + bX + cX2

6 It is not a forecasting device. It is a forecasting device which can be used


to predict the value of a dependent variable
from the given value of an independent
variable.

7 There may be zero correlation such as There is nothing like zero regression.
the weight of the wife and income of
the husband.

Regression Lines and Regression Equation: Regression lines and regression equations are
used synonymously. Regression equations are algebraic expressions of the regression lines. Let
us consider two variables: X & Y. If y depends on x, then the result comes in the form of simple
regression. If we take the case of two variables X and Y, we shall have two regression lines as the
regression line of X on Y and the regression line of Y on X. The regression line of Y on X gives
the most probable value of Y for given value of X and the regression line of X on Y given the
most probable value of X for given value of Y. Thus, we have two regression lines. However,
when there is either perfect positive or perfect negative correlation between the two variables,
the two regression lines will coincide, i.e. we will have one line. If the variables are independent,
r is zero and the lines of regression are at right angles i.e. parallel to X axis and Y axis.
Therefore, with the help of simple linear regression model we have the following two regression
lines
1. Regression line of Y on X: This line gives the probable value of Y (Dependent variable) for
any given value of X (Independent variable).
Regression line of Y on X : Y – 𝑌 = 𝑏𝑦𝑥 (X – 𝑋 )
OR : Y = a + bX
2. Regression line of X on Y: This line gives the probable value of X (Dependent variable) for
any given value of Y (Independent variable).
Regression line of X on Y : X – 𝑋= 𝑏𝑥𝑦 (Y –𝑌 )
OR : X = a + bY

In the above two regression lines or regression equations, there are two regression parameters,
which are “a” and “b”. Here “a” is an unknown constant and “b” which is also denoted as “byx”
or “bxy”, is also another unknown constant popularly called a regression coefficient. Hence,
these “a” and “b” are two unknown constants (fixed numerical values) which determine the
position of the line completely. If the value of either or both of them is changed, another line is
determined. The parameter “a” determines the level of the fitted line (i.e. the distance of the line
directly above or below the origin). The parameter “b” determines the slope of the line (i.e. the
change in Y for unit change in X).
If the values of constants “a” and “b” are obtained, the line is completely determined. But
the question is how to obtain these values. The answer is provided by the method of least
squares. With a little algebra and differential calculus, it can be shown that the following two
normal equations, if solved simultaneously, will yield the values of the parameters “a” and “b”.

1st Method - Least Square Method

The Estimating Line is

𝑌 = a + bX

Slope of Best fitting Regression Line is

Σ𝑋𝑌 − 𝑛𝑋 𝑌
b= 2 2
Σ𝑋 −𝑛𝑋

Where
● b = slope of the best fitting estimating line
● X = Values of independent variables
● Y= Values of dependent variables
● 𝑋 = Mean of the values of independent variables
● 𝑌 = Mean of the values of dependent variables
● n= Number of data points

Y-intercept of the Best Fitting Regression line

a = 𝑌 - b𝑋
Where

● a = Y-intercept
● b = slope of the best fitting estimating line
● 𝑋 = Mean of the values of independent variables
● 𝑌 = Mean of the values of dependent variables
With these two equations, we can find the best fitting regression line for any two variable sets of
data points.
Ex- Suppose the Director of Chapel Hill Sanitation Department is interested in the relationship
between the age of a garbage truck and the annual repair expense. Determine this relationship. If
the city has a truck that is 4 years old, predict the annual repair expense for the same. Also
calculate Standard Error of Estimate.

Solution: To calculate the value of a and b make the following table.


Now, to get the estimating equation that describes the relationship between the age of truck and
its annual repair expense, we can substitute the values of a and b in the equation for the
estimating line:

Using this estimating equation we can determine if the city has a truck that is 4 years old, predict
the annual repair expense for the same

Steps to calculate Standard Error of Estimate:


Ex- Calculate the relationship between Research development and profit. Predict what will be
the annual profit if the firm spends 8$ Million for R & D in 1996.

Solution:
To calculate annual profit if the firm spends 8$ Million for R & D in 1996 substitute X as 8.
2nd Method
Instead of solving the normal equations simultaneously, we can obtain the values of a, b, 𝑎1,
𝑏1 as follows:

Σ𝑥Σ𝑦 Σ𝑥Σ𝑦
Σ𝑥𝑦 − 𝑛
Σ𝑥𝑦 − 𝑛
b= 2 𝑏1= 2
2 (Σ𝑥) 2 (Σ𝑦)
Σ𝑥 − 𝑛
Σ𝑦 − 𝑛

𝑎 = 𝑦 - b𝑥 𝑎1 = 𝑥1 - 𝑏1𝑦
2 2
Thus, after calculating the summations Σx, Σy, Σ𝑥 , Σ𝑦 , Σxy values of the constants b, 𝑏1,
a, 𝑎1 can be obtained with the help of above formulae and then the regression equations can
be formed.

To illustrate this method consider the following example.

Example 2: Find the two regression equations and also estimate y when x = 13 and x when y =
10.

x 11 7 9 5 8 6 10

y 16 14 12 11 15 14 17

Solution:

x y 𝑥
2
𝑦
2 xy

11 16 121 256 176

7 14 49 196 98

9 12 81 144 108

5 11 25 121 55

8 15 64 225 120

6 14 36 196 84

10 17 100 289 170

56 99 476 1427 811

Values of b and a are calculated as follows.


Σ𝑥Σ𝑦
Σ𝑥𝑦 − 𝑛
b= 2
2 (Σ𝑥)
Σ𝑥 − 𝑛
56 * 99
811 − 7
b= 2
(56)
476 − 7
b = 0.6786
Now a is calculated as
a = 𝑦 - b𝑥
Σ𝑥
We have, 𝑥 = 𝑛 = 8
Σ𝑦
𝑦= 𝑛
= 14.1429
So, a = 14.1429 - (0.6786 × 8)
= 8.7141
Hence, the regression equation of y on x is

y = 8.7141 + 0.6786x

Now to estimate y when x = 13, substitute in the above equation we get

y = 8.7141 + (0.6786*13)
y = 17.5359 is the estimated value of y when x = 13

Now, for regression equation of x on y we require 𝑏1 and 𝑎1


Σ𝑥Σ𝑦
Σ𝑥𝑦 − 𝑛
𝑏1= 2
2 (Σ𝑦)
Σ𝑦 − 𝑛
56 * 99
811 − 7
𝑏1= 2
(99)
1427 − 7
∴ 𝑏1 = 0.7074
And 𝑎1 is given by the result,
𝑎1 = 𝑥1 - 𝑏1𝑦
𝑎1 = 8 - (0.7074 ×14.1429)
= -2.0047
Hence, the regression equation of x on y is

x = -2.0047 +0.7074y

Now to estimate x when y = 10, substitute in the above equation we get

y = -2.0047 (0.7074*10)
y = 5.0693 is the estimated value of x when y = 10

3rd Method- Alternative Method

Sometimes the means and standard deviations of the two variables, also the value of the
coefficient of correlation are known. Then we need not study the entire set of values again.
Here, we can calculate b and 𝑏1 as follows.

Let 𝑥 , 𝑦 be the means; σ𝑥 , σ𝑦 be the standard deviations and r be the correlation


coefficient. Then
𝑟σ𝑦 𝑟σ𝑥
b= and 𝑏1 =
σ𝑥 σ𝑦
Also a = 𝑦 - b𝑥 & 𝑎1 = 𝑥1 - 𝑏1𝑦
Substituting these values
For regression equation of y on x
i.e. y = a + bx
Substituting for a, we have
y = 𝑦 - b𝑥 + b𝑥
∴ y = 𝑦 + b(x - 𝑥)
𝑟σ𝑦
Substituting for b, we have y = 𝑦 + (x - 𝑥)
σ𝑥
𝑟σ𝑥
Similarly for Regression equation of x on y is x= 𝑥 + (y - 𝑦)
σ𝑦
Example 3: A chief Financial Officer calculates correlation coefficient(r) for revenue (x)
versus cash flow (y) as 0.65. Given the following dataset, Determine the two regression
equations and estimate revenue when cash flow is 40. Also estimate the cash flow when the
revenue is 35.

x y

Mean 43 37

S.D. 3.1 2.8

Solution:
(i) For regression equation of y on x,
𝑟σ𝑦 0.65 × 2.8
b=
σ𝑥
= 3.1
= 0.5871
The regression equation is
∴y = 𝑦 + b(x - 𝑥)
∴y = 37 + 0.5871 (x - 43)
∴y = 37 + 0.5871x - 25.2453
∴y = 0.5871x + 11.7547

y = 0.5871x + 11.7547
To estimate y when x = 40,
∴y = 0.5871 × 40 + 11.7547

y = 35.2387

(ii) For regression equation of x on y,


𝑟σ𝑥 0.65 × 3.1
𝑏1 =
σ𝑦
= 3.1
= 0.7196
The regression equation is
∴x =𝑥 + 𝑏1 (y - 𝑦)
∴x = 43 + 0.7196 (y - 37)
∴x = 43 + 0.7196y - 26.6252
∴x = 0.7196y - 26.6252 + 43
x = 0.7196y + 16.3748
To estimate x when y = 35,
x = 0.7196 × 35 + 16.3748

x = 41.5608

SOME PROPERTIES OF REGRESSION EQUATIONS

We have seen two regression equations and correspondingly two regression lines can be drawn,
one where X is independent and Y is dependent (Y on X) and the other, where Y is independent
and X is dependent (X on Y). It is obvious that the regression coefficients b, 𝑏1 represent slopes

of the regression lines.


If there is perfect positive or negative correlation between the variables then the two regression
lines coincide.
If there is a high degree of correlation then the angle between the two interesting regression
lines is small. The angle becomes large as the correlation decreases and the lines become
perpendicular to each other, if there is no correlation between the variables.
Thus the less angle means less slope of the line leading to high degree of correlation and large
angle means more slope leading to less degree of correlation. So in general, more the slope
means less correlation and vice versa.
We have seen that the point (𝑋, 𝑌) satisfies both the regression equations as it lies on both the
lines so that it is the point of intersection of the two lines. This can be helpful to us whenever the
regression equations are known and the mean values of x and y are to be obtained. In this case,
the two regression equations can be solved simultaneously and the common solution represents 𝑋
and 𝑌 .
We also know that the regression coefficients b and 𝑏1 , can be expressed as

𝑟σ𝑦 𝑟σ𝑥
b= and 𝑏1 =
σ𝑥 σ𝑦
𝑟σ𝑦 𝑟σ𝑥 2
∴ b × 𝑏1 =
σ𝑥
×
σ𝑦
=𝑟
∴ r = ± 𝑏 × 𝑏1
Note that r is positive if b, b, are positive and r is negative if b,𝑏1 are negative.
Thus, r, the correlation coefficient, is the geometric mean of the regression coefficients b and 𝑏1
This property can be used to obtain r, the correlation coefficient from the regression equations.
As an illustration, consider the following example.

Example 4: Consider the relationship between consumption expenditure (C) and income
(Y) in an economy as modeled by two different researchers:
Researcher 1: 2C-Y = 15
Researcher 2: 3C-4Y = -25
Calculate the arithmetic means of income(𝑌) and consumption(𝐶).
Compute the correlation coefficient between income and consumption based on these two
regression equations.

Solution:
(i) To find values of C and Y.
It is given the required lines are
2C - Y - 15 = 0 …(i)
3C - 4Y + 25 = 0 …(ii)
As per properties, (𝑥, 𝑦) will lie on both the regression, Thus,
2𝐶 - 𝑌 - 15 = 0 …(iii)
3𝐶 - 4𝑌 + 25 = 0 …(iv)
To solve the equation,
Multiplying (iii) by 4 and subtracting from (iv)
We get the value 𝐶 = 17
Substituting 𝐶 = 17 in equation (i)
We get the value 𝑌 = 19

𝐶 = 17 , 𝑌 = 19

(ii) To find r, the coefficient of correlation. First we have to find the regression coefficient b and
𝑏1 from the equations. Let equation (i) be the regression equation of C on Y. Write it down in the
standard form. C on Y.
C = 𝑎1+ 𝑏1y
Equation (i) is 2C - Y - 15 = 0
∴2C = Y + 15
𝑌 15
∴C = 2 + 2
Now, by comparing it with the standard form of reg line C on Y.
1
𝑏1 = coefficient of Y in equation = 2

1
𝑏1 = 2

Let equation (ii) be the regression equation of Y on C. Write it down in the standard form.
y = a + bx, we have
3C - 4Y + 25 = 0
∴ 4Y = 3C + 25
3𝐶 25
∴Y= 4
+ 4
Now, by comparing it with the standard form of reg line Y on C.
3
𝑏 = coefficient of C in equation = 4

3
𝑏 = 4

Now, r = ± 𝑏 × 𝑏1
3 1
=± 4
× 2
3
=± 18
= ±0.6123
As b, 𝑏1 are positive, r is also positive.
So, r = 0.6123.

Example 5: Demand and Price of a Product


Consider the relationship between the quantity demanded (Q) and the price (P) of a
product according to two models proposed by economists:
Model 1: 5Q-6P = -90
Model 2: 15Q-8P = 180
Compute the correlation coefficient between advertising expenses and sales revenue based
on these two regression equations.

Solution:
(i) To find mean values of Q and P for the regression equation
5x - 6y + 90 = 0 …(i)
15x - 8y - 180 = 0 …(ii)
As per properties, (𝑄, 𝑃) will lie on both the regression, Thus,
5𝑄 - 𝑃 + 90 = 0 …(iii)
15𝑄 - 8𝑃 - 180 = 0 …(iv)
By solving them simultaneously we get

𝑄 = 36, 𝑃 = 45

(ii) Alternative Method of r:


1. Assuming both the equations to be that of P on Q. Naturally only one of them is of P on
Q. So, we get, from equation (i)
5x - 6y + 90 = 0
∴ 6P = 5Q - 90
5𝑄 90
∴P= 6
- 6

5
∴b= 6

Now consider equation (ii)


15Q - 8P - 180 = 0
∴ 8P = 15Q - 180
15𝑄 180
∴P= 8 - 8

15
∴b= 8

5 15
2. As 6
< 8
, choose the lesser one to be b and the corresponding equation to be that
5
of P on Q. Hence equation (i) is of Q on P and 𝑏 = 6
.
3. So equation (ii) is that of Q on P and 𝑏1 is the inverse of previous b, obtained from
equation (ii)
1 8
∴ 𝑏1 = 15/8
= 15

8
∴ 𝑏1 = 15

Now, r = 𝑏 × 𝑏1
5 8
= 6
× 15
40
= 90
= 0.6667
(iii) To find Standard Deviation of Q,
5 2
We know that, b = 6
, r = 0.6667 = 3
, σ𝑦 = 1.
𝑟σ𝑃
Consider b =
σ𝑄
Substituting the above values
5 2 1
6
= 3
× σ𝑥

∴σ
𝑄
= 2
3
×
6
5
=
4
5
= 0.8
So the standard deviation of Q is 0.8.
Multiple Regression

As we mentioned above,, we can use more than one independent variable to estimate the
dependent variable and, in this way, attempt to increase the accuracy of the estimate. This
process is called multiple regression and correlation analysis. It is based on the same
assumptions and procedures we have encountered using simple regression. General
two-variable regression equation is
𝑌 = a+𝑏1𝑋1+𝑏2𝑋2

Example: Given the following set of data calculate (i) Multiple Regression plane (ii) Predict
Y when 𝑋1= 3 and 𝑋2= 2.7

Y 25 30 11 22 27 19

𝑋1 3.5 6.7 1.5 0.3 4.6 2.0

𝑋2 5.0 4.2 8.5 1.4 3.6 1.3

Solution:
Consider the normal equations for the equation of Y on 𝑋1 and 𝑋2

To solve the normal equations, we require


2 2 2
ΣY, Σ𝑋1,Σ𝑋2, Σ𝑋1𝑌, Σ𝑋 , Σ𝑋1𝑋2, Σ𝑋2𝑌,Σ𝑋1, Σ𝑋2, Σxy. To calculate the summatIons, the
following table is prepared.

Y 𝑋1 𝑋2 𝑋1𝑌 𝑋2𝑌 𝑋1𝑋2 2


𝑋1
2
𝑋2, 𝑌
2

25 3.5 5.0 87.5 125 17.5 12.25 25 625


30 6.7 4.2 201 126 28.14 44.89 17.64 900

11 1.5 8.5 16.5 93.5 12.75 2.25 72.25 121

22 0.3 1.4 6.6 30.8 0.42 0.09 1.96 484

27 4.6 3.6 124.2 97.2 16.56 21.16 12.96 729

19 2.0 1.3 38 24.7 2.6 4 1.69 361

134 18.6 24 473.8 497.2 77.97 84.64 131.5 3220

Now by Considering the normal equations for the equation of Y on 𝑋1 and 𝑋2


6𝑎+ 18. 6𝑏1+24𝑏2= 134
18. 6𝑎+84. 64𝑏1+77. 97𝑏2=473.8
24𝑎+77. 97𝑏1+131. 5𝑏2=497.2
By solving this three equations we get that
a= 20.39
𝑏1= 2.3403
𝑏2= -1.3283
Substituting these three values into the general two-variable regression equation
𝑌 = a+𝑏1𝑋1+𝑏2𝑋2
= 20.39+2.3403𝑋1-1.3283𝑋2
To predict Y when 𝑋1= 3 and 𝑋2= 2.7
Substitute 𝑋1= 3 and 𝑋2= 2.7 in the general two-variable regression equation

𝑌 = 23.82449

You might also like