0% found this document useful (0 votes)
10 views

MGT-Three

Chapter Three discusses the development and analysis of multiple linear regression models, which extend simple linear regression by incorporating multiple explanatory variables. It outlines the assumptions necessary for multiple linear regression, including the need for no perfect multicollinearity and the independence of error terms. The chapter also emphasizes the importance of correctly specifying the model to avoid specification errors.

Uploaded by

Bahiru Gebeyehu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

MGT-Three

Chapter Three discusses the development and analysis of multiple linear regression models, which extend simple linear regression by incorporating multiple explanatory variables. It outlines the assumptions necessary for multiple linear regression, including the need for no perfect multicollinearity and the independence of error terms. The chapter also emphasizes the importance of correctly specifying the model to avoid specification errors.

Uploaded by

Bahiru Gebeyehu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

CHAPTER THREE

Further Development And Analysis Of


The Classical Linear Regression Model

Multiple Linear Regression Analysis


(BMgt.3021)
AAU 11/16/2024
3.1 Introduction
❑ Simple linear regression allows to make predictions
about one variable based on the information that is
known about another variable.
❑ A multiple (multivariable) linear regression model
extends to several explanatory variables.
❑ The two-variable model studied extensively in the
previous chapter is often inadequate in practice.
❑ In our consumption–income example, for instance,
it was assumed implicitly that only income X
affects consumption Y.

2
❑ But economic theory is seldom so simple for,
besides income, a number of other variables are
also likely to affect consumption expenditure.
❑ An obvious example is wealth of the consumer.
❑ Similarly, the number of cars sold might plausibly
depend on the
1. price of cars
2. price of public transport
3. price of petrol
4. extent of the public’s concern about global
warming 3
❑ Therefore, we need to extend our simple two-
variable regression model to cover models
involving more than two variables.

❑ Adding more variables leads us to the discussion of


multiple regression models, that is, models in which
the dependent variable, or regressand, Y depends on
two or more explanatory variables, or regressors.
❑ In this chapter we shall extend the simple linear
regression model to relationships with two
explanatory variables and consequently to
relationships with any number of explanatory
variables.
4
3.2 The Multiple Linear Regression - MLR
Relationship between a dependent & two/more
independent variables is linear in parameters.

Population Population slopes Random


Y-intercept Error

Yi =  0 + 1 X 1i +  2 X 2i + • • • +  K X Ki +  i

Residual
Dependent (Response) Independent/Explanatory
5
variable (for sample) variables (for sample)
What changes as we move from simple to
multiple regression?
1. Potentially more explanatory power with
more variables;
2. The ability to control for other variables;
(and the interaction of various explanatory
variables: correlations and multicollinearity);
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R2 is no longer simply the square of the
correlation coefficient b/n Y and X.

6
Slope (  j ):
Ceteris paribus, Y changes by units for every
1 unit change in X j, on average.
Y-Intercept (  0 ): j

The average value of Y when all X j s are zero.


(may not be meaningful all the time)
A MLR model is linear in parameters, and
may not be linear in regressors.
Thus, the definition of MLR includes
polynomial regression.
Ex: Yi =  0 + 1 X 1i +  2 X 2i +  3 X +  4 X 1i X 2i +  i
2
1i
7
3.3 Relaxing the CLRM Basic Assumptions
 In order to specify our multiple linear
regression model and proceed our analysis
with regard to this model, some
assumptions are compulsory.

 But these assumptions are the same as in


the single explanatory variable model
developed earlier except the assumption of
no perfect multicollinearity.

8
❑ Further Explanations of the Assumptions:
1) The error term has zero mean E(ui) = 0
❑ This means that for each value of x, the random
variable(u) may assume various values, some
greater than zero and some smaller than zero.
❑ But if we considered all the positive and
negative values of u, for any given value of X,
they would have on average value equal to zero.
❑ The positive and negative values of u cancel
each other.
❑ Mathematically: E (U i ) = 0
Or E(ɛi|Xji) = 0. (for all i = 1, 2, …, n; j = 1, …, K)
9
2).The error term has a constant variance
(Homoscedasticity)
❑ Var (𝑢𝑖 ) = 𝐸 𝑢𝑖 2 = 𝛿 2 ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 or var(ɛi|Xji) = σ2.
❑ For all values of X, the u’s will show the same
dispersion around their mean.
❑ In Fig.2.c this assumption is denoted by the fact that
the values that u can assume lie within the same limits,
irrespective of the value of X.

❑ For, u can assume any value within the range AB

❑ For, u can assume any value with in the range CD


which is equal to AB and so on.
10
❑ Graphically;

❑ Mathematically; Var (U i ) = E[U i − E (U i )] 2 = E (U i ) 2 =  2


❑ This is b/s E (U i ) = 0
❑ This constant variance is called
homoscedasticity assumption and the constant
variance itself is called homoscedastic variance.

11
3). No autocorrelation
❑ The error term ui is statistically independent of
one another (No error autocorrelation)
❑ This means the value which the random term
assumed in one period does not depend on the
value which it assumed in any other period.
❑ Cov (ui,uj) = 0 or cov(ɛi,ɛs|Xji,Xjs) = 0

 
Cov(u i u j ) =  [(u i − (u i )][ u j − (u j )]

= E (u i u j ) = 0
12
4). No r/ship b/n the error and
corresponding x variate
❑ All explanatory variables are uncorrelated with
the error term.
❑ This means there is no correlation between the
random variable and the explanatory variable.
❑ If two variables are unrelated their covariance is
zero.
❑ Hence Cov (ui,xi) = 0 or cov(ɛi,Xji) = 0.

13 ❑ Errors are orthogonal to Xs.


5). The error term (ui) is normally distributed with
mean zero and variance 𝜹𝟐 ∀ 𝒊

❑ This means the values of u (for each x) have a

bell-shaped symmetrical distribution about their

zero mean and constant variance 𝜎 2 .

❑ Normally distributed errors.

❑ That is 𝑈𝑖 𝑁(0, 𝜎 2 ) or ɛi ~N(0, σ2).


14
6). Xj is non-stochastic and must take d/t
values.

❑ The independent variables (or predictors) are

treated as fixed and non-random.

❑ This means that their values are determined

outside the model and are not influenced by

random variation or probabilistic processes.

15
7). n > K+1
❑ Number of observations (n) > number of
parameters estimated (K+1).
❑ Number of parameters is K+1 in this case ( β0,
β1, …, βK )
❑ This ensures that the regression model can be
properly estimated, leading to reliable and
interpretable results.

16
8). No perfect multicollinearity
❑ This is the Additional Assumption under the
MLR
❑ The explanatory variables are not perfectly
linearly correlated.
❑ That is, no exact linear relation exists b/n any
subset of explanatory variables.
❑ In the presence of perfect (deterministic) linear
r/ship b/n or among any set of the Xjs, the
impact of a single variable (  j ) cannot be
identified.

17
Model Specification
❑ Before any equation can be estimated, the
model must be completely specified.
❑ Broadly speaking, specifying an econometric
equation consists of the following:

➢ Choosing the “correct” explanatory variables

➢ Choosing the “correct” functional form

18
Specification error can arise in a number of

ways:

❑ Omission of a relevant explanatory variable

❑ Inclusion of an irrelevant explanatory variable

❑ Adopting the wrong functional form

19
3.4 A Model with K Explanatory Variables
Yi =  0 + 1 X 1i +  2 X 2i +  +  K X Ki +  i
ˆ ˆ ˆ ˆ
Y1 =  0 + 1 X 11 +  2 X 21 +  +  K X K 1 + e1
Y = ˆ + ˆ X + ˆ X +  + ˆ X + e
2 0 1 12 2 22 K K2 2

Y3 = ˆ0 + ˆ1 X 13 + ˆ 2 X 23 +  + ˆ K X K 3 + e3
      
Yn = ˆ0 + ˆ1 X 1n + ˆ 2 X 2 n +  + ˆ K X Kn + en
• 0 is the coefficient attached to the constant term (which we called
20  before).
 Y1  1 X 11 X 21 X 31  X K1   βˆ 0   e 1 
 Y  1  ˆ   
 2  X 12 X 22 X 32  X K2   β 1  e 2 
 Y3  = 1 X 13 X 23 X 33  X K3  •  βˆ 2  + e 3 
       

             
Yn  1 X 1n X 2n X 3n  X Kn  βˆ K  e n 
n  ( K + 1) ( K + 1)  1
n 1 n 1

Y = X̂ + e . . . Written in matrix form


21
 e 1   Y1  1 X 11 X 21 X 31  X K1   βˆ 0 
e   Y  1  ˆ 
 2  2  X 12 X 22 X 32  X K2   β 1 
e 3  =  Y3  − 1 X 13 X 23 X 33  X K3  *  βˆ 2 
       
               
e n  Yn  1 X 1n X 2n X 3n  X Kn  βˆ K 

e = Y − X̂
22
3.5 A Model with Two Explanatory Variables

❑ Read further on the case of k-explanatory

variables.

❑ In order to understand the nature of multiple

regression model easily, let’s focus our analysis

with the case of two explanatory variables,


23
Estimation of the parameters
❑ The model: Y =  0 + 1 X 1 +  2 X 2 + U i is multiple regression
with two explanatory variables.

❑ The model is population regression equation.

❑ Since the population regression equation is


unknown to any investigator, it has to be estimated
from sample data.
❑ Let us suppose that the sample data has been used
to estimate the population regression equation.
24
❑ Given sample observation on, we estimate using
the method of least square (OLS).
❑ The above equation has been estimated by
sample regression equation, which we write as:
Yˆ = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + 𝑒𝑖

❑ Where
𝑦𝑖 = 𝑌𝑖 − 𝑌ത𝑖 ,
𝑥1𝑖 = 𝑋1𝑖 − 𝑋ത1𝑖 and
𝑥2𝑖 = 𝑋2𝑖 − 𝑋ത2𝑖
25
❑ The RSS = σ 𝜀𝑖 2
❑ 𝑅𝑆𝑆 = 𝑌𝑖 − 𝑌ത𝑖 = σ(𝑦𝑖 − 𝛽1 𝑥1𝑖 − 𝛽2 𝑥2𝑖 ) 2

26
Estimation of parameters of two-explanatory
variables model
❑ The model is: Y =  +  X +  X + U …………… . . .. . (1)
0 1 1 2 2 i

❑ The expected value of the above model is called


population regression equation.
❑ I.e. E(Y ) =  +  X +  X since E(U i ) = 0 …………. .. . (2)
0 1 1 2 2

where  i is the population parameters.


 0 is referred to as the intercept and
 1 and  2 are also some times known as regression slopes of the regression.
Note that,
 2 for example measures the effect on E (Y ) of a unit change in X 2 when X 1 is
held constant. 27
❑ Since the population regression equation is
unknown to any investigator, it has to be
estimated from sample data.
❑ Let us suppose that the sample data has been
used to estimate the population regression
equation.
❑ Assume that equation (2) has been estimated by
sample regression equation, which we write as:
❑ Yˆ = ˆ0 + ˆ1 X 1 + ˆ2 X 2 ………………. . . . . . . . ..….(3)
Where
ˆ j are estimates of the  j and

Yˆ is known as the predicted value of Y.


28
❑ Now it is time to state how (1) is estimated.
❑ Given sample observation on Y , X 1 & X 2 , we
estimate (1) using the method of least square
(OLS).
❑ Yˆ = ˆ0 + ˆ1 X 1i + ˆ2 X 2i + ei . ……………………. . . . (4)
❑ Equation (4) is sample relation between Y , X 1 & X 2
❑ ei = Yi − Yˆ = Yi − ˆ0 − ˆ1 X 1 − ˆ2 X 2 … ……………. . . . (5)
❑ To obtain expressions for the least square
estimators, form the sum of squared deviations
of the observed Yi’s from the regression line.
❑ Then we partially differentiate  e with respect to
2
i

ˆ , ˆ and ˆ and set the partial derivatives equal to zero.


0 1 2

29
❑ e 2
i = (
Yi − ˆ0 − ˆ1 X 1i − ˆ2 X 2i )2
. . . . . . . . . . . . . . (6)


   ei2  (
= −2 Yi − ˆ0 − ˆ1 X 1i − ˆ 2 X 2i = 0 ) …. . . . . . . . . . . . (7)
ˆ 0


  ei2  ( )

ˆ
= −2 X 1i Yi − ˆ0 − ˆ1 X 1i − ˆ2 X 1i = 0 …. . . . . . . . . . . . (8)
1

 e  = −2 (Y )

. .……..…….(9)
2

❑ ˆ 2
i
X 2i i − ˆ 0 − ˆ1 X 1i − ˆ 2 X 2i = 0

30
❑ Summing from 7 to 9, the multiple regression
equation produces three Normal Equations:
❑  Y = nˆ + ˆ X + ˆ X . . . . . . . . . . . . (10)
0 1 1i 2 2i

❑  1i i 0 1i 1 1i 2 1i 1i
X Y = ˆ X + ˆ X 2 + ˆ X X
 . . . . . . . . . . . . (11)

❑  2i i 0 2i 1 1i 2i 2 2i
X Y = ˆ X + ˆ X X + ˆ X 2
 . . . . . . . . . . . . (12)

31
❑ From (10) we obtain ̂ 0

❑ ˆ0 = Y − ˆ1 X 1 − ˆ2 X 2 . . . . . . . . . . . . . . . . . . .. (13)


❑ Substituting (13) in (11) , we get:

 X 1iYi = (Y − 1 X 1 −  2 X 2 )X 1i + 1X 1i + ˆ2 X 2i


ˆ ˆ ˆ 2

  X 1iYi − YˆX 1i = ˆ1 (X 1i 2 − X 1X 2i ) + ˆ2 (X 1i X 2i − X 2 X 2i )

  1i i
X Y − nY X 1i = ˆ
 2 (X 1i
2
− nX
2 ˆ
1i ) +  2 (X 1i X 2 − nX 1 X 2 ) . . . . (14)
❑ We know that
( X − Y )
 i i = (X iYi − nX iYi ) = xi yi
2

 (X − X i ) = (X i − nX i ) = xi
2 2 2 2
i
32
❑ Substituting the above equations in equation
(13), the normal equation (11) can be written in
deviation form as follows:
❑  1 x y = ˆ x 2 + ˆ x x … . . . . . . .. (15)
1 1 2 1 2

❑ Using the above procedure if we substitute (13)


in (12), we get
❑  2 x y = ˆ x x + ˆ x 2 … . . . . . . . . . . . . . (16)
 1 1 2 2 2

❑ From the simple regression model: Yi =  + X i + U i


xi yi X i Yi
➢ Let’s bring ˆ =
xi2
and ˆ =
X i2
together

33
❑  1
x y = ˆ x 2 + ˆ x x
 1 1 2 1 2
. . . . . . . . . . . . . . (17)

❑  2
x y = ˆ x x + ˆ x 2 . . . . . . . . . . . . . . . (18)
 1 1 2 2 2

❑ ˆ1 and ˆ 2 can easily be solved using matrix

❑ We can rewrite the above two equations (17


and 18) in matrix form as follows
We can rewrite the above two equations in matrix form as follows.

x x x ˆ1 x y ………….(3
2
1 1 2 1

❑ = ..... (19)
 x1 x2 x 2
2
ˆ 2  x2 y
34
❑ If we use Cramer’s rule to solve the above
matrix we obtain

𝛴𝑥1 𝑦 . 𝛴𝑥22 − 𝛴𝑥2 𝑦 . 𝛴𝑥1 𝑥2 . . . . . . . . (20)
𝛽መ1 =
𝛴𝑥12 . 𝛴𝑥22 − 𝛴(𝑥1 𝑥2 )2

𝛴𝑥2 𝑦. 𝛴𝑥12 − 𝛴𝑥1 𝑦. 𝛴𝑥1 𝑥2


❑ 𝛽መ2 = . . . . . . . . . (21)
𝛴𝑥12 . 𝛴𝑥22 − 𝛴(𝑥1 𝑥2 )2

❑ X’s and Y’s in the above formula are in the


deviation form

x2 y = X 2Y − nX 2Y
❑ We know that:
x22 = X 22 − nX 22 35
❑ We can also express ˆ1 and ˆ 2 in terms of
covariance and variances of Y , X 1 and X2

❑ ˆ1 = Cov( X 1 , Y ) . Var ( X 1 ) − Cov( X 1 , X 2 ) . Cov2 ( X 2 , Y ) . . . . (22)


Var ( X 1 ).Var ( X 2 ) − [cov( X 1 , X 2 )]

Cov( X 2 , Y ) . Var ( X 1 ) − Cov( X 1 , X 2 ) . Cov( X 1 , Y )


❑ ˆ2 =
Var ( X 1 ).Var ( X 2 ) − [Cov( X 1 , X 2 )]2
. . . . (23)

36
❑ An unbiased estimator of the variance of the
errors 𝛿 2 is given by:

❑ . . . . (24)

❑ The variances of estimated regression


coefficients 𝛽መ1 𝑎𝑛𝑑 𝛽መ2 are estimated,
respectively as:

❑ . . . . . . . . . . . (25)

❑ .. . . . . . . . . . . (26)
37
❑ Standard error of coefficients

❑ 𝑆𝐸(𝛽መ1 ) = 𝑣𝑎𝑟( 𝛽መ1 )-----------------------------27

❑ 𝑆𝐸(𝛽መ2 ) = 𝑣𝑎𝑟( 𝛽መ2 )-----------------------------28

38
The coefficient of determination (R2): Two
explanatory variables case
❑ In the simple regression model, we introduced
R2 as a measure of the proportion of variation in
the dependent variable that is explained by
variation in the explanatory variable.

❑ In multiple regression model the same measure


is relevant, and the same formulas are valid.

❑ But now we talk of the proportion of variation in


the dependent variable explained by all
explanatory variables included in the model.
39
❑ The coefficient of determination is:
ei
2
ESS RSS
= = 1− = 1− . . . . (29)
2
❑ R
y i
2
TSS TSS

ESS = Explained sum of square


TSS = Total sum of square

RSS = Residual sum of Square = 𝛴𝑒𝑖2

40
❑ In the present model of two explanatory variables:

ei2 = ( yi − ˆ1 x1i − ˆ2 x2i ) 2


= e ( y − ˆ x − ˆ x )
i i 1 1i 2 2i

= ei y − ˆ1x1i ei − ˆ2 ei x2i

since ei x1i = ei x2i = 0


= ei yi

= yi ( yi − ˆ1 x1i − ˆ2 x2i )


i.e ei2 = y 2 − ˆ1x1i yi − ˆ2 x2i yi . . . . . (30)
 y 2 ˆ x y + 
=  ˆ x y + ei
2
 
1 1i

i 2

2i
i 
Total sum of Explained sum of Re sidual sum of squares
square( Total square( Explained ( un exp lained var iation)
var iation) var iation)

ESS ˆ1 x1i y i + ˆ 2 x 2 i y i .41. . . . (31)


 R 2
= =
TSS y 2
❑ As in simple regression, R2 is also viewed as a
measure of the prediction ability of the model over
the sample period, or as a measure of how well the
estimated regression fits the data.
❑ The value of R2 is also equal to the squared sample
correlation coefficient between Yˆ & Yt
❑ Since the sample correlation coefficient measures
the linear association between two variables, if R2
is high, that means there is a close association
between the values of Yt and the values of predicted
by the model,Yˆt .
❑ In this case, the model is said to “fit” the data well.
❑ If R2 is low, there is no association between the
values of Yt and the values predicted by the model, Yˆt
and the model does not fit the data well. 42
Adjusted Coefficient of Determination (R 2 )
❑ One difficulty with R 2 is that it can be made large by
adding more and more variables, even if the variables
added have no economic justification.
❑ Algebraically, it is the fact that as the variables are
added the sum of squared errors (RSS) goes down (it
can remain unchanged, but this is rare) and thus R 2goes
up.

2
If the model contains n-1 variables then R = 1.
❑ The manipulation of model just to obtain a high R is 2

not wise.
❑ An alternative measure of goodness of fit, called the
adjusted R and often symbolized as R , is usually reported
2
2

by regression programs.
43
❑ It is computed as:
ei /n−k
2
2  n −1 
R = 1−
2
= 1 − (1 − R ) 
y / n − 1 n−k
2

. . . . . (32)

❑ This measure does not always goes up when a


variable is added because of the degree of
freedom term n-k is the numerator.
❑ As the number of variables k increases, RSS
goes down, but so does n-k.
44
• The effect on R 2 depends on the amount by which R 2 falls.

• While solving one problem, this corrected measure of goodness

of fit unfortunately introduces another one.

• It losses its interpretation; R2 is no longer the percent of variation

explained.

• This modified R2 is sometimes used and misused as a device for

selecting the appropriate set of explanatory variables.


45
3.5 Hypothesis Testing in Multiple Linear
Regression Model
❑ In multiple regression models we will undertake
two tests of significance.
❑ One is significance of individual parameters of
the model.
❑ This test of significance is the same as the tests
discussed in simple regression model.
❑ The second test is overall significance of the
model. 46
3.5.1 Tests of individual significance

❑ We can use the t-test to test a hypothesis about

any individual partial regression coefficient.

❑ To illustrate consider the following example.

❑ Let Y = ˆ0 + ˆ1 X 1 + ˆ2 X 2 + ei

A. H 0 : 1 = 0

𝐻1 : 𝛽1 ≠ 0

B. H0 : 2 = 0

H1 :  2  0 47
❑ The null hypothesis in (A) states that, holding

X2 constant X1 has no (linear) influence on Y.

❑ Similarly, hypothesis (B) states that holding X1

constant, X2 has no influence on the dependent

variable Yi.

❑ To test these null hypotheses, we will use the

following tests: 48
The student’s t-test:
❑ We compute the t-ratio for each ˆ i

ˆi − 
t* = ~ t n -k
SE ( ˆi )

Where

❑ n is number of observation and

❑ k is number of parameters and



❑ 𝑡 is the computed t-value.
49
❑ If we have 3 parameters, the degree of freedom
will be n-3.
❑ So; t* = ˆ −  with n-3 degree of freedom
2 2

SE ( ˆ 2 )

❑ In our null hypothesis 2 = 0, the t* becomes:


ˆ 2
t* =
SE ( ˆ 2 )

❑ If t*<t (tabulated), we accept the null hypothesis.


❑ I.e, we can conclude that ˆ is not significant
2

❑ Hence the regressor does not appear to


contribute to the explanation of the variations
in Y.
50
❑ If t*>t (tabulated), we reject the null hypothesis
and we accept the alternative one.
❑ ˆ 2 is statistically significant.
❑ Thus, the greater the value of t* the stronger
the evidence that is statistically significant.

51
3.5.2 Test of Overall Significance
❑ Throughout the previous section we were
concerned with testing the significance of the
estimated partial regression coefficients
individually.

❑ I.e. under the separate hypothesis that each of


the true population partial regression coefficient
was zero.

❑ In this section we extend this idea to joint test


of the relevance of all the included explanatory
variables. 52
❑ Now consider the following:
Y =  0 + 1 X 1 +  2 X 2 + ......... +  k X k + U i

H 0 : 1 =  2 =  3 = ............ =  k = 0

at least one of the  k is non-zero


H1 :
❑ This null hypothesis is a joint hypothesis that
1 ,  2 ,........ k are jointly or simultaneously
equal to zero.
❑ A test of such a hypothesis is called a test of
overall significance of the observed or estimated
regression line, that is, whether Y is linearly
related to X 1 , X 2 ,........ X k
53
❑ To conduct a test of overall significance F-

distribution with k-1 and n-k degrees of freedom

for the numerator and denominator respectively is

used.

❑ Computed F-value

𝑅 2 /𝑘−1
or 𝐹 = … . …… (33)
ESS / k − 1 1−𝑅 2 /𝑛−𝑘
F =
RSS / n − k 54
❑ If the null hypothesis is not true, then the difference
between ESS & RSS becomes large, implying that the
constraints placed on the model by the null hypothesis
have large effect on the ability of the model to fit the
data, and the value of F tends to be large.

❑ Thus, we reject the null hypothesis if the computed


value of F becomes too large.

❑ This value is compared with the critical value of F


which leaves the probability of  in the upper tail of the
F-distribution with k-1 and n-k degree of freedom.
55
❑ If the computed value of F is greater than the
critical value of F (k-1, n-k), then (Ho is
rejected) the parameters of the model are jointly
significant or the dependent variable Y is
linearly related to the independent variables
included in the model.

56
3.6 Application of Multiple Linear Regression

❑ Ex 1: Consider the data given in Table 2.1

below to fit a linear function:

𝑌෠ = 𝑎ො + 𝛽መ1 𝑋1 + 𝛽መ2 𝑋2 + 𝑒𝑖

57
❑ On the basis of the information given below answer the
following question
X 12 = 3200 X 1 X 2 = 4300 X 2 = 400
X 22 = 7300 X 1Y = 8400 X 2Y = 13500
Y = 800 X 1 = 250 n = 25
Yi 2 = 28,000

A. Find the OLS estimate of the slope coefficients

B. Compute variance and Standard error of 𝛽መ2

C. Test the significance of  2 slope parameter at 5% level of significance

D. Compute R 2 and R 2 and interpret the result

E. Test the overall significance of the model 58


Solution:
A).
❑ Since the above model is a two explanatory
variable model, we can estimate ˆ1 and ˆ 2
using the formula in equation (20) and (21)
❑ I.e.
𝛴𝑥1 𝑦 . 𝛴𝑥22 − 𝛴𝑥2 𝑦 . 𝛴𝑥1 𝑥2
𝛽መ1 =
𝛴𝑥12 . 𝛴𝑥22 − (𝛴𝑥1 𝑥2 )2

𝛴𝑥2 𝑦. 𝛴𝑥12 − 𝛴𝑥1 𝑦. 𝛴𝑥1 𝑥2


𝛽መ2 =
𝛴𝑥12 . 𝛴𝑥22 − (𝛴𝑥1 𝑥2 )2

59
❑ Since the x’s and y’s in the above formula are in
deviation form, we have to find the corresponding
deviation forms of the above given values.

❑ We know that:
𝛴𝑥1 𝑦 = 𝛴𝑋1 𝑌 − 𝑛𝑋ሜ1 𝑌ሜ x2 y = X 2Y − nX 2Y x22 = X 22 − nX 22

= 8400-25(10)(32) = 13500 − 25(16)(32) = 7300 − 25(16) 2


= 700 = 900
= 400

x1 x2 = X 1 X 2 − nX 1 X 2 x12 = X 12 − nX 12
= 4300 − (25)(10)(16) = 3200 − 25(10) 2
= 300 = 700

60
❑ Now we can compute the parameters.
𝛴𝑥1 𝑦 . 𝛴𝑥22 − 𝛴𝑥2 𝑦 . 𝛴𝑥1 𝑥2
𝛽መ1 =
𝛴𝑥12 . 𝛴𝑥22 − (𝛴𝑥1 𝑥2 )2
(400)(900)−(700)(300)
= = 0.278
(700)(900)−(300)2

𝛴𝑥2 𝑦. 𝛴𝑥12 − 𝛴𝑥1 𝑦. 𝛴𝑥1 𝑥2


𝛽መ2 =
𝛴𝑥12 . 𝛴𝑥22 − (𝛴𝑥1 𝑥2 )2

(700)(700) − (400)(300)
= 2
= 0.685
700 900 − 300
61
❑ The intercept parameter can be computed using
the following formula (as equation 13).
ˆ = Y − ˆ1 X 1 − ˆ 2 X 2
= 32 − (0.278)(10) − (0.685)(16)
= 18.26
ˆ 2 x 22
B). var( ˆ ) = x
1 2
1 x 22 − (x1 x 2 ) 2
Eq. 25
ei2
 ̂ 2 =
n−k
❑ Where k is the number of parameters (eq. 24)
In our case k = 3  ˆ = n−e 3
2
❑ 2 i

❑ 𝛴𝑒12 = 𝛴𝑦 2 − 𝛽መ1 𝛴𝑥1 𝑦 − 𝛽መ2 𝛴𝑥2 𝑦 (eq. 30)


= 2400 − 0.278(400) − (0.685)(700)
= 1809.3 62
❑ ˆ =
2 ei
2
=
1809.3
= 82.24
n−3 25−3

(82.24)(900)
❑ ⇒ 𝑣𝑎𝑟( 𝛽መ1 ) = = 0.137
700 900 −(300)2

❑ መ =
𝑆𝐸(𝛽1) 𝑣𝑎𝑟( 𝛽መ1 ) = 0.137 = 0.370 (eq.27)

ˆ 2 x12
❑ var( ˆ 2 ) = (Eq.26)
x x − (x1 x2 )
2
1
2
1
2

(82.24)(700)
= 2
= 0.1067
(700)(900) − (300)

❑ ෢ =
𝐸(𝛽2) 𝑣𝑎𝑟( 𝛽መ2 ) = 0.1067 = 0.327 (eq.28)
63
C.
❑ 𝛽መ1 can be tested using students t-test
H 0 : 1 = 0

𝐻1 : 𝛽1 ≠ 0
❑ This is done by comparing the computed value
of t and critical value of t which is obtained
from the table at  2 level of significance and n-k
degree of freedom.
∗ ෡1
𝛽 0.278
❑ Hence; t = ෡1 ) = = 0.751
𝑆𝐸(𝛽 0.370
64
❑ The critical value of t from the t-table at  2 = 0.05 2 = 0.025
level of significance and 22 degree of freedom is 2.074.
❑ 𝑡𝑐 = 2.074
❑ 𝑡 ∗= 0.751
❑ ⇒ 𝑡 ∗< 𝑡𝑐
Where:
➢ tc is critical value of t and

➢ t* is calculated value of t

❑ The decision rule if t*  t c to accept the null hypothesis


that says  is equal to zero and reject the alternative
hypothesis that says  is different from zero and.
65
❑ The decision is Accept Ho.
❑ The conclusion is ˆ1 is statistically insignificant
or the sample we use to estimate ˆ1 is drawn
from the population of Y & X1 in which there is
no relationship between Y and X1 (i.e. 1 = 0 ).
D.
❑ R 2 can be easily calculated using the following
equation 2 ESS
R = = 1-
RSS
TSS TSS

❑ We know that RSS = ei2 (eq. 29 and 30)


❑ And TSS = y 2
and
𝐸𝑆𝑆 = 𝛴𝑦ො 2 = 𝛽መ1 𝛴𝑥1 𝑦 + 𝛽መ2 𝛴𝑥2 𝑦 66
❑ And
❑ 𝛴𝑦
ต 2
= መ
𝛽1 𝛴𝑥 𝑦
1𝑖 𝑖 + መ
𝛽2 𝛴𝑥 𝑦
2𝑖 𝑖 + 𝛴
ถ 𝑒𝑖
2

𝑇𝑆𝑆 𝐸𝑆𝑆 𝑅𝑆𝑆


❑ TSS = 0.278(400) + 0.682(700) + 1809.3 = 2398
❑ For two explanatory variable model:
𝑅𝑆𝑆 1809.3
1- =1−
𝑇𝑆𝑆 2398
= 0.25
25% of the total variation in Y is explained by
the regression line Yˆ = 18.26 + 0.278 X 1 + 0.685 X 2

or by the explanatory variables (X1 and X2).

67
e / n − k
2
(1 − R )( n − 1)
2
❑ Adjusted R = 1 − = 1−
2 i

y / n − 1
2
n−k

(1 − 0.24)( 24)
= 1−
22
= 0.178
E.
❑ Let’s set first the joint hypothesis as
H 0 : 1 =  2 = 0 against

H 1 : at least one of the slope parameters is


different from zero.
68
❑ The joint test hypothesis is testing using the F-
test given below.
ESS / k − 1
F *( k −1), ( n − k )  =
RSS / n − k
R2 / k −1
=
1− R2 / n − k

❑ From (D) R 2 = 0.24 and k = 3

0.25/3 − 1
=
1 − 0.25/25 − 3

F *( 2, 22) = 3.67
69
❑ This is the computed value of F.
❑ Let’s compare this with the critical value F
at 5% level of significance and with k-1 and
n-k degrees of freedom in the numerator
and denominator respectively.
❑ F (2,22) at 5% level of significance = 3.44.
❑ F* = 3.67
❑ Fc(2,22) = 3.44
70
❑ Where F*-calculated value of F & Fc -
critical value of F
❑ F*>Fc, the decision rule is to reject H0 and
accept H1.
❑ We can say that the model is significant.
❑ I.e. the dependent variable is, at least,
linearly related to one of the explanatory
variables.
71
3.7 Dummy Variables
❑ In general, the explanatory variables in any
regression analysis are assumed to be
quantitative in nature.

❑ Ex: the variables like temperature, distance, age


etc. are quantitative in the sense that they are
recorded on a well-defined scale.

❑ In many applications, the variables may not be


defined on a well-defined scale, and they are
qualitative in nature.
72
❑ Ex: the variables like sex (male or female),
colour (black, white), nationality, employment
status (employed, unemployed) are defined on a
nominal scale.
❑ Such variables do not have any natural scale of
measurement.
❑ Such variables usually indicate the presence or
absence of a “quality” or an attribute like
employed or unemployed, graduate or non-
graduate, smokers or non-smokers, yes or no,
acceptance or rejection, so they are defined on
a nominal scale.
❑ Such variables can be quantified by artificially
constructing the variables that take the values.
73
Ex:
❑ 1 and 0 where “1” usually indicates the
presence of attribute and “0” usually indicates
the absence of the attribute.
❑ “1” indicates that the person is male and “0”
indicates that the person is female.
❑ Similarly, “1” may indicate that the person is
employed and then “0” indicates that the person
is unemployed.
❑ Such variables classify the data into mutually
exclusive categories.
❑ These variables are called indicator variable
or dummy variables. 74
❑ In econometrics, particularly in regression
analysis, a dummy variable is one that takes
only the value 0 or 1 to indicate the absence or
presence of some categorical effect that may be
expected to shift the outcome.
❑ Usually, the indicator variables take on the
values 0 and 1 to identify the mutually exclusive
classes of the explanatory variables.

75
❑ Example:

76
How do we incorporate binary information into
regression models?
❑ In the simplest case, we just add it as an independent
variable in the equation.
❑ Ex: Consider the following simple model of hourly
wage determination:
Wage = β0 + β1female + β2educ + u
❑ In the above model (equation), only two observed factors
affect wage:
➢ gender and
➢ education.
77
❑ Female = 1 when the person is female, and
female = 0 when the person is male
❑ The parameter β1 has the following
interpretation:
✓ β1 is the difference in hourly wage between
females and males, given the same amount of
education (and the same error term u).
✓ The coefficient β1 determines whether there is
discrimination against women:

➢ If β1 < 0, then for the same level of other


factors, women earn less than men on average.
78
3.8 Violation of CLRM Assumptions
❑ Recall the 5 assumptions of the classical linear
regression model (CLRM).
❑ And one additional assumption was made in
multiple linear regression model such as no
perfect multicollinearity among explanatory
variables.
❑ The assumptions were required to show that:
a) the estimation technique, ordinary least squares
(OLS), had a number of desirable properties,
b) and also that hypothesis tests regarding the
coefficient estimates could validly be conducted.
79
1. Heteroscedasticity
❑ Heteroscedasticity occurs when the error
variance has non-constant variance.
❑ If the errors do not have a constant variance,
we say that they are heteroscedastic
2. Autocorrelation
❑ Autocorrelation occurs when the errors are
correlated.
❑ This is essentially the same as saying there is
pattern in the errors.

80
3. Perfect Multicollinearity
❑ Perfect Multicollinearity occurs when two or
more independent variables have an
exact linear relationship between them.
❑ This problem occurs when the explanatory
variables are very highly correlated with each
other.
❑ One of the assumptions of CLRM is that there
is no exact linear relationship between the
independent variables.
❑ If two explanatory variables are significantly
related, then the OLS computer program will
find it difficult to distinguish the effects of one
variable from the effects of the other.
81
❑ The more highly correlated two (or more)
independent variables are, the more difficult it
becomes to accurately estimate the coefficients
of the true model.
❑ If two variables move identically, then there is
no hope of distinguishing between their
impacts.
❑ But if the variables are only roughly correlated,
then we still might be able to estimate the two
effects accurately enough for most purposes.

82
4. Non-normality
❑ If the error terms are not normally distributed,
inferences about the regression coefficients
(using t-tests) and the overall equation (using
the F-test) will become unreliable.

❑ However, as long as the sample sizes are large


(namely the sample size minus the number of
estimated coefficients is greater than or equal
to 30) and the error terms are not extremely
different from a normal distribution, such tests
are likely to be robust.
83
❑ Whether the error terms are normally
distributed can be assessed by using methods
like the normal probability plot.
❑ The formal tests to detect non-normal errors
one can estimate the values of skewness and
kurtosis.
❑ These values can be obtained from the
descriptive statistics.
❑ A normal distribution is not skewed and is
defined to have a coefficient of kurtosis of 3.
❑ The kurtosis of the normal distribution is 3 so
its excess kurtosis (Kurto-3) is zero.
84
Consequences of Violation of CLRM Assumptions
❑ The consequences for the model if an
assumption is violated but this fact is ignored
and the researcher proceeds regardless is that,
the model could encounter any combination of
three problems:
a) the coefficient estimates are wrong
b) the associated standard errors are wrong
c) the distributions that we assumed for the test
statistics will be inappropriate.
d) The OLS will no longer be best linear unbiased
estimator (BLUE)
85
THANK YOU FOR YOUR ATTENTION@
BEING AN EFFECTIVE A BUSINESS
MANAGER IN ALL YOUR JOURNEY!!!!!

Question?
11/16/2024 AU

You might also like