0% found this document useful (0 votes)
9 views10 pages

[6]Regression-Analysis

bsfmstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

[6]Regression-Analysis

bsfmstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Regression Analysis

Regression Analysis
The regression analysis is a statistical technique of studying the dependence of one variable (called
dependent variable), on one or more variables (called independent / explanatory variables), with a
view to estimating or predicting the population mean or average value of the dependent variable in
terms of the known or fixed values of the independent variables. Regression analysis is a branch of
statistical theory that is widely used in almost all the scientific disciplines. With the help of
regression analysis, we are in a position to find out the average probable change in one variable
given a certain amount of change in another. Or we need to study how an explanatory variable
influences the dependent variable and what is the amount of influence.

Importance of Regression Analysis:


i. The regression equation provides concise and meaningful summary of the relationship
between the dependent variable (y) and independent variable (x).
ii. The relationship can be used for the predictive purpose.
iii. If the form of the relationship between x and y is known then the parameters of interest
can be estimated from the relevant data.
iv. In situations, where the variable of interest (y) depends on a number of factors, then it is
possible to assess and find the contribution of the factors individually with the help of
regression analysis.

Uses of Regression Analysis:


1. Regression analysis helps in establishing a functional relationship between two or more
variables.
2. Since most of the problems of economic analysis are based on cause and effect
relationships, the regression analysis is a highly valuable tool in economic and business
research.
3. Regression analysis predicts the values of dependent variables from the values of
independent variables.
4. We can calculate coefficient of correlation (r) and coefficient of determination ( ) with the
help of regression coefficients.
5. In statistical analysis of demand curves, supply curves, production function, cost function,
consumption function etc., regression analysis is widely used.

Some situation where regression analysis is appropriate:


i. Management of a manufacturing industry wished to investigate the relationship between
the number of unauthorized days absent from work per year and the distance to work for
employees from their residence. A sample of appropriate size of the employees was
drawn. A regression analysis may be employed to see if there is any such relationship
exists between the number of days absent and the distance to work to make any future
prediction of absenteeism based on the distance factor.

1 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

ii. An agronomist may be interested to study the dependence of paddy on temperature,


rainfall, amount of fertilizer and soil fertility. Such a dependence analysis may be enable
the forecasting of the average yield, given information about the explanatory variables.

Types of Regression analysis:


i. Simple and Multiple.
ii. Linear and Non-Linear.
iii. Total and Partial.

Simple and Multiple:


In case of simple relationship only two variables are considered. For example, the influence of
advertising expenditure on sales turnover.
In the case of multiple relationships, more than two variables are involved. On this while one
variable is a dependent variable and the remaining variables are independent ones. For example, the
turnover (y) may be depends on advertising expenditure ( ) and the income of the people ( ).

Linear and Non-Linear:


The linear relationship is based on straight-line trend, the equation of which has no power higher
than one. But the relationship can be both simple and multiple. Normally a linear relationship is
taken into account because besides its simplicity, it has a better predictive value; a linear trend can
be easily projected into the future.
In case of the non-linear relationship curved trend lines are derived. The equations of these are
parabolic.

Total and Partial:


In the case of total relationship all the important variables are considered. Normally they are taken
the form of a multiple relationship because most economic and business phenomena are affected by
multiplicity of cases.
In the case of partial relationship one or more variables are considered. But not all, thus excluding
the influence of those not found relevant for a given purpose.

Assumption of Regression Analysis:


i. The dependent variable y is assumed to be normally distributed. It is observed that for
any given value of x there is a set of observations of y. Each set of y observations from a
sub-population. The variance of all sub-populations are same and denoted by .
ii. The independent variable(s) is / are non-random variable(s).
iii. The independent variable(s) are uncorrelated.
iv. There is no measurement error in measuring the explanatory variable.
v. The error term is normally and independently distributed.

2 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

Fit Regression Line:


Let ( ), ( ), . . . , ( ) be n pairs of values observed from a random sample. Assume
that the variable X and Y follow the regression model.
y = a + bx + e ……………………. (i)
The regression analysis deals with fitting a regression equation based on n pair of sample
observations following the model (i).
Assumed that the fitted regression equation is
̂ = a + bx ………………………... (ii)
The regression analysis indicates that the value of a and b are to be found out using n pairs of
values ( ); i = 1, 2, …, n in such a way that the sum of squares of deviations is given by,
φ=∑ ̂ =∑
The value of φ will be minimum, if the values of a and b are found out in such a way that,
= 0 and =0
These two equations give,
= ∑ =0
2∑ (-1) = 0
∑ =0
=∑
=
⸫ = ̅ b ̅
And
= ∑ =0
2∑ (-x) = 0
∑x =0
=∑x ( )

=∑x

=∑x

( )=∑x

=
( )

⸫ = =
( )

Here is the sample regression coefficient of y on x. the regression equation is


̂ = a + bx
is known as simple regression line of y on x.

3 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

Regression Coefficients:
The quantity b in the regression equations is called the “regression coefficient” or “slope
coefficient”. Since there are two regression equations, therefore, there are two regression
coefficients.
Regression Coefficient of x on y:

=r =
( )

Regression Coefficient of y on x:

=r =
( )

Properties of the Regression Coefficients:


1. The coefficient of correlation is the geometric mean of the two regression coefficients.
Symbolically : r = √
2. If one of the regression coefficients is greater than unity, the other must be less than unity.
Since the value of the coefficient of correlation cannot exceed unity.
3. Both the regression coefficients will have the same sign. i.e., they will be either positive or
negative. In other words, it is not possible that one of the regression coefficients is having
minus sign and the other plus sign.
4. The coefficient of correlation will have the same sign as that of regression coefficients.
5. The average value of the two regression coefficients would be greater than the value of
coefficient of correlation. In symbols > r.
6. Regression coefficients are independent of change of origin but not of scale.

Example:
Fit a linear regression equation of Y on X for the given data below:
X 45 42 44 43 41 45 43 40
Y 40 38 36 35 38 39 37 41

Solution:

x y xy
45 40 2025 1800
42 38 1764 1596
44 36 1936 1584
43 35 1849 1505
41 38 1681 1558
45 39 2025 1755
43 37 1849 1591
40 41 1600 1640
∑x = 343 ∑y = 304 ∑ = 14729 ∑xy = 13029

4 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

The fitted regression line is


̂ = a + bx
Where,

= ̅ b ̅ and =
( )

∑x = 343, ∑y = 304, ∑ = 14729, ∑xy = 13029


̅= = = 38 and ̅= = = 42.88
So,

= = = 0.219
( ) ( )

= ̅ b ̅ = 38 ( 0.219) 42.88 = 47.39


Therefore, the fitted regression line is
̂ = 47.39 – 0.219 x

Example:
After investigation it has been found that the demand for automobiles in a city depends mainly, if
not entirely, upon the number of families residing in that city. Below are given figures for the sales
of automobiles in the five cities for the year 2014 and the number of families residing in those
cities:

City No. of families Sale of Automobiles


in lakh (x) in 000‟s (y)
A 70 25.2
B 75 28.6
C 80 30.2
D 60 22.3
E 90 35.4
Fit a linear regression equation of y on x by the least square method and estimate the sales for the
year 2017 for city A which is estimated to have 100 lakh families assuming that the same
relationship holds true.

Solution:
City x y xy
A 70 25.2 4900 1764
B 75 28.6 5625 2145
C 80 30.2 6400 2416
D 60 22.3 3600 1338
E 90 35.4 8100 3186
∑x = 375 ∑y = 141.7 ∑ = 28625 ∑xy = 10849

5 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

The fitted regression line is


̂ = a + bx
Where,

= ̅ b ̅ and =
( )

∑x = 375, ∑y = 141.7, ∑ = 28625, ∑xy = 10849


̅= = = 28.34 and ̅= = = 75
So,

= = = 0.443
( ) ( )

= ̅ b ̅ = 28.34 0.443 75 = 4.885


Therefore, the fitted regression line is
̂ = 4.885 0.443 x
Estimated sales for the year 2017 for city A
̂ = 4.885 0.443 100 = 39.415
Hence, it is expected that about 39415 autos would be sold in city A having a population of 100
lakh families.
Exercise 1:
The following data relate to advertising expenditure (in lakh of BDT.) and their corresponding sales
(in crore of BDT.):
Advertising Expenditure 10 12 15 23 20
Sales 14 17 23 25 21
Estimate (i) the sales corresponding to advertising expenditure of BDT. 30 lakh and (ii) the
advertising expenditure for a sales target of BDT. 35 crore.
Solution: (i) ̂ = 8.608 0.712 30 = 29.968, (ii) ̂ = 1.05 35 = 31.75

Exercise 2:
From the following data find the regression equation of Y on X. if X = 15, find Y?
X 8 11 7 10 12 5 4 6
Y 11 30 25 44 38 25 20 27
Solution: ̂ = 2.10 , ̂ = 42.48

Exercise 3:
The following data represent the values of gestation age (x, days) and birth-weight (y, pounds) of
some new born babies of 10 mothers.
X 265 250 270 255 260 258 255 265 248 250
Y 6.2 5.8 7.0 5.6 6.0 5.6 6.4 6.8 5.2 6.0
Fit a regression line of y on x and draw the fitted regression line. Estimate the birth-weight of a
baby if his gestation age is 270 days.
Solution: ̂ = 9.087 0.0588 , ̂ = 6.78916

6 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

Test Regarding Regression Coefficient:


We know that, the regression coefficient ( ) measures the rate of change of y for unit change in
the value of x. The significant rate of change of y for unit change in x implies the significant
relationship of y on x. The value of = 0 indicates that the value of y does not change in with the
change in x. Therefore, to verify the significance of the relation of y on x, we need to test the null
hypothesis.
: =0
: ≠0
At α % level of significance we will test the above hypothesis.
The test statistic for this hypothesis is
t=

This „t‟ is distributed as student „t‟ with (n-2) degrees of freedom.


Under , „t‟ is given by,
t=

Where, = [ ]

SS(x) =

SS(y) =
SP(xy) =

b=
Reject ; if | | ≥ . Otherwise do not reject .

The significance of the regression parameter α in the regression model


y = α + βx + e
we need to test the null hypothesis.
: =0
: ≠0
The test statistic for this hypothesis is
t=
̅
√ ( )

This „t‟ is distributed as student „t‟ with (n-2) degrees of freedom.


Reject ; if | | ≥ . Otherwise do not reject .

7 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

Example:
In a statistical investigation the grade point average (y) and study hour (x) of a group of students in
one semester are recorded as follows:
X 12 17 16 14 12 15 13 14 15 16 15 12 12 14 12
Y 3.0 3.2 3.5 4.0 3.5 3.8 4.0 4.0 4.0 3.8 2.0 2.0 2.5 2.5 2.0
Fit a regression line of grade point average on study hour and test the significance of the regression.
Estimate the grade point average of a student if he works for 16 hours.
Solution:
The fitted regression line is
̂ = a + bx
Where
= ̅ b ̅ and b=

SP(xy) = = = 7.187

SS(x) = = = 40.933

b= = = 0.176

̅= = = 3.19 and ̅= = = 13.93


= ̅ b ̅ = 3.19 0.176 13.93 =
Therefore, the fitted regression line is
̂ = 0.738 + 0.176 x
We have,
SS(y) = = = 8.797
= [ ]= [ ]
= 0.579
We need to test the null hypothesis.
: =0
: ≠0
At α % level of significance we will test the above hypothesis.
The test statistic under is
t= = = = 1.48
√ √

Comment: | | < = 2.16. So, we may not reject the null hypothesis at α % level of
significance. That means there is no evidence against the null hypothesis and the regression is not
significant. The grade point average is not influenced by study hour.
The estimated grade point average of a student who works for x = 16 hours is
̂ = 0.738 + 0.176 16 = 3.55

8 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

Exercise:
During a rainy season. It is observed that rain fall and temperatures are related. To study the
relationship the data on rain fall (x, mm) and temperature ( ) are recorded, where the data
are
X 5.5 6.8 7.0 4.2 2.3 8.0 9.0 4.5 5.6 7.2 5.0
Y 32.5 31.0 30.0 33.5 34.2 32.0 30.0 34.0 34.0 32.0 34.0
Fit a regression line of temperature (y) on amount of rain fall (x) and test the significance of the
regression. Draw the fitted regression line of y on x. Estimate the temperature if rain fall is 5 mm.

Standard Error of the Estimate:


The value of Y estimated by the regression equation may not exactly be equal to the observed
value. Thus the estimated may be in error. This is due to the fact that variation in Y may not be
exactly due to variation in X.
A measure for estimating this prediction error is known as the standard error of the estimate. In
other words, the measure of variation of the observations around the computed regression line is
referred to as the standard error of estimate. It is a measure of the reliability of the regression
prediction. It is a measure of dispersion of observed values from the predicted values. Therefore
when a prediction is made, the standard error of the estimate may be used to estimate the
confidence interval around the predicted value.
The standard error (S.E) of estimate of regression equation of y on x is

=√ ; = estimated value, Y = actual/observed value


The standard error (S.E) of estimate of regression equation of x on y is

=√
The standard error of estimate can very easily be calculated with the help of the following formula:
= √ ; = √
The standard error of estimate measures the accuracy of the estimated figures. The smaller the
value of standard error of estimate, the closer will be dots to the regression line and the better the
estimates based on the equation for this line. If standard error of estimate is zero, then there is no
variation about the line and the correlation will be perfect. Thus with the help of standard error of
estimate it is possible for us to ascertain how good and representative the regression line is as a
description of the average relationship between two series.

Coefficient of Determination:
The ratio of the unexplained variation to the total variation represents the proportion of variation in
Y that is not explained by regression on X. Subtraction of this proportion from 1.0 gives the
proportion of variation in Y that is explained by regression on X. The statistic used to express this
proportion is called the coefficient of determination and is denoted by . It may be written as
follows:
=
=

9 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur
Regression Analysis

The value of is the proportion of the variation in the dependent variable Y explained by
regression on the independent variable X.
= 0.88; which implies 88% of the total variation of Y (dependent variable) is explained by the
regression line (or by the variation in X).

Comparison of Correlation and Regression:


There are the following comparisons between correlation and regression.

Points of Correlation Regression


difference
1. Definition Correlation indicates whether there
Regression measures the probable
is any relation between the movement of one variable in terms
variables. of the other.
2. Coefficients For correlation, = But for regression, ≠
3. Limits of The limit of correlation coefficient
The limit of regression coefficient
coefficients is 1 r +1 is b +
4. Indication Correlation indicates only linearRegression indicates any type of
relationship between two variables.
relationship.
5. Measurement Correlation measures the degree of
Regression measures the form of
association between the variables.
the relationship between one
dependent and one or more
independent variables.
6. Mathematical Correlation is not very useful for Regression is widely used for
treatment further mathematical treatment. further mathematical treatment.

10 Md. Zahidul Alam


Lecturer of Statistics
BSFMSTU, Jamalpur

You might also like