[6]Regression-Analysis
[6]Regression-Analysis
Regression Analysis
The regression analysis is a statistical technique of studying the dependence of one variable (called
dependent variable), on one or more variables (called independent / explanatory variables), with a
view to estimating or predicting the population mean or average value of the dependent variable in
terms of the known or fixed values of the independent variables. Regression analysis is a branch of
statistical theory that is widely used in almost all the scientific disciplines. With the help of
regression analysis, we are in a position to find out the average probable change in one variable
given a certain amount of change in another. Or we need to study how an explanatory variable
influences the dependent variable and what is the amount of influence.
=∑x
=∑x
( )=∑x
=
( )
⸫ = =
( )
Regression Coefficients:
The quantity b in the regression equations is called the “regression coefficient” or “slope
coefficient”. Since there are two regression equations, therefore, there are two regression
coefficients.
Regression Coefficient of x on y:
=r =
( )
Regression Coefficient of y on x:
=r =
( )
Example:
Fit a linear regression equation of Y on X for the given data below:
X 45 42 44 43 41 45 43 40
Y 40 38 36 35 38 39 37 41
Solution:
x y xy
45 40 2025 1800
42 38 1764 1596
44 36 1936 1584
43 35 1849 1505
41 38 1681 1558
45 39 2025 1755
43 37 1849 1591
40 41 1600 1640
∑x = 343 ∑y = 304 ∑ = 14729 ∑xy = 13029
= ̅ b ̅ and =
( )
= = = 0.219
( ) ( )
Example:
After investigation it has been found that the demand for automobiles in a city depends mainly, if
not entirely, upon the number of families residing in that city. Below are given figures for the sales
of automobiles in the five cities for the year 2014 and the number of families residing in those
cities:
Solution:
City x y xy
A 70 25.2 4900 1764
B 75 28.6 5625 2145
C 80 30.2 6400 2416
D 60 22.3 3600 1338
E 90 35.4 8100 3186
∑x = 375 ∑y = 141.7 ∑ = 28625 ∑xy = 10849
= ̅ b ̅ and =
( )
= = = 0.443
( ) ( )
Exercise 2:
From the following data find the regression equation of Y on X. if X = 15, find Y?
X 8 11 7 10 12 5 4 6
Y 11 30 25 44 38 25 20 27
Solution: ̂ = 2.10 , ̂ = 42.48
Exercise 3:
The following data represent the values of gestation age (x, days) and birth-weight (y, pounds) of
some new born babies of 10 mothers.
X 265 250 270 255 260 258 255 265 248 250
Y 6.2 5.8 7.0 5.6 6.0 5.6 6.4 6.8 5.2 6.0
Fit a regression line of y on x and draw the fitted regression line. Estimate the birth-weight of a
baby if his gestation age is 270 days.
Solution: ̂ = 9.087 0.0588 , ̂ = 6.78916
Where, = [ ]
SS(x) =
SS(y) =
SP(xy) =
b=
Reject ; if | | ≥ . Otherwise do not reject .
Example:
In a statistical investigation the grade point average (y) and study hour (x) of a group of students in
one semester are recorded as follows:
X 12 17 16 14 12 15 13 14 15 16 15 12 12 14 12
Y 3.0 3.2 3.5 4.0 3.5 3.8 4.0 4.0 4.0 3.8 2.0 2.0 2.5 2.5 2.0
Fit a regression line of grade point average on study hour and test the significance of the regression.
Estimate the grade point average of a student if he works for 16 hours.
Solution:
The fitted regression line is
̂ = a + bx
Where
= ̅ b ̅ and b=
SP(xy) = = = 7.187
SS(x) = = = 40.933
b= = = 0.176
Comment: | | < = 2.16. So, we may not reject the null hypothesis at α % level of
significance. That means there is no evidence against the null hypothesis and the regression is not
significant. The grade point average is not influenced by study hour.
The estimated grade point average of a student who works for x = 16 hours is
̂ = 0.738 + 0.176 16 = 3.55
Exercise:
During a rainy season. It is observed that rain fall and temperatures are related. To study the
relationship the data on rain fall (x, mm) and temperature ( ) are recorded, where the data
are
X 5.5 6.8 7.0 4.2 2.3 8.0 9.0 4.5 5.6 7.2 5.0
Y 32.5 31.0 30.0 33.5 34.2 32.0 30.0 34.0 34.0 32.0 34.0
Fit a regression line of temperature (y) on amount of rain fall (x) and test the significance of the
regression. Draw the fitted regression line of y on x. Estimate the temperature if rain fall is 5 mm.
=√
The standard error of estimate can very easily be calculated with the help of the following formula:
= √ ; = √
The standard error of estimate measures the accuracy of the estimated figures. The smaller the
value of standard error of estimate, the closer will be dots to the regression line and the better the
estimates based on the equation for this line. If standard error of estimate is zero, then there is no
variation about the line and the correlation will be perfect. Thus with the help of standard error of
estimate it is possible for us to ascertain how good and representative the regression line is as a
description of the average relationship between two series.
Coefficient of Determination:
The ratio of the unexplained variation to the total variation represents the proportion of variation in
Y that is not explained by regression on X. Subtraction of this proportion from 1.0 gives the
proportion of variation in Y that is explained by regression on X. The statistic used to express this
proportion is called the coefficient of determination and is denoted by . It may be written as
follows:
=
=
The value of is the proportion of the variation in the dependent variable Y explained by
regression on the independent variable X.
= 0.88; which implies 88% of the total variation of Y (dependent variable) is explained by the
regression line (or by the variation in X).