CHAPTER TWO
THE CLASSICAL LINEAR REGRESSION MODEL
Prepared by: Etebark M.
Terminology and Notation
02/01/24 Prepared by: Etebark M.
.
02/01/24 Prepared by: Etebark M.
Dependent variable:
the variable that is influenced by the independent
variable(s).
For example, in a Multiple Linear Regression Model
(MLRM), output is influenced by independent variables
like fertilizers cost, labor cost, pesticides cost etc.
Independent variable:
a variable, whose values does not depend upon other
variable, but influences dependent variable.
Examples include, fertilizers cost, pesticides cost etc.
02/01/24 Prepared by: Etebark M.
Regression Analysis
Economic theories are mainly concerned with the relationships
among various economic variables.
Example: demand theory, supply theory, consumption
theory, etc.
These relationships, when phrased in mathematical terms, can
predict the effect of one variable on another.
The functional relationships of these variables define the
dependence of one variable upon the other variable (s) in the
specific form.
The specific functional forms may be;
linear, quadratic, logarithmic, exponential, hyperbolic, or
any other form.
02/01/24 Prepared by: Etebark M.
Cont.…
The objective of linear regression analysis: is to estimate
and/or predict the mean or average value of the dependent
variable on the basis of the known or fixed values of the
explanatory variables.
That is to estimate the population regression function (PRF) on
the basis of sample regression function (SRF) as accurately as
possible.
02/01/24 Prepared by: Etebark M.
2.1 The concept of regression Analysis
Regression: is the most important tool to Econometricians.
Regression analysis:
It is concerned with the study of the dependence of
one variable on one or more other variables.
It is with a view to estimate and/or predict the
(population) mean or average value of the dependent
in terms of the known or fixed (in repeated sampling)
values of the latter.
02/01/24 Prepared by: Etebark M.
Cont…
Simple, or two-variable, regression analysis: if we are studying
the dependence of a variable on only a single explanatory
variable.
E.g. consumption expenditure on real income
Multiple regression analysis: if we are studying the dependence
of one variable on more than one explanatory variable.
Note: in two-variable regression there is only one explanatory variable,
whereas in multiple regressions there is more than one explanatory variable.
02/01/24 Prepared by: Etebark M.
Cont…
Simple Linear Regression:
Represented by single equation regression model
Y = f(x)
The dependent variable expressed as a function of
only a single explanatory variable
Causal relationship between variables flow in one
direction only.
Example:
02/01/24 Prepared by: Etebark M.
Cont…
Multiple Linear Regression:
Dependent variable explained by more than one explanatory
variable.
Example; Y = f(X, Z, K, O)
• Regression equation of Y on X.
Variation in C = systematic variation + random variation.
Consumption = f(Income, Wage rate)
02/01/24 Prepared by: Etebark M.
Cont…
Note: a frequent objective in research is the specification of a
functional relationship between two variables.
E.g. Y = f (x)
Y – Explained variable, or dependent variable, or predicted
variable.
X- Explanatory variable, or independent variable, control
variable, or regressor
02/01/24 Prepared by: Etebark M.
Cont…
02/01/24 Prepared by: Etebark M.
2.2. Stochastic and Non-stochastic Relationships
Econometricians say relationship between variables (X and Y) are
generally inexact (stochastic).
02/01/24 Prepared by: Etebark M.
Cont…
A. Non-Stochastic Model:
A relationship between X and Y, characterized as Y = f(X) is
said to be deterministic or non-stochastic if for each value of the
independent variable (X) there is one and only one
corresponding value of dependent variable (Y).
Example:
Without the error/or disturbance term (u), the relationship is said
to be exact/deterministic, otherwise stochastic or inexact.
02/01/24 Prepared by: Etebark M.
Cont.…
B. Stochastic /Inexact Relationship
A relationship between X and Y is said to be stochastic if for a
particular value of X there is a whole probabilistic distribution of
values of Y.
In such a case, for any given value of X, the dependent variable
Y assumes some specific value only with some probability.
Stochastic model: is a model in which the dependent variable is
not only determined by the explanatory variable but also others
variables which are not included in the model.
E.g
02/01/24 Prepared by: Etebark M.
Cont…
Existence of the disturbance is justified in the following points:
Omission of other variables
Measurement error/data collection difficulties.
Randomness in human behavior /humans are not machines
that will do as instructed/
Imperfect specification of the model
Poor proxy variable
Note: In regression analysis we are concerned with a stochastic
or statistical relationship and not of a deterministic or non
stochastic or mathematical relationship.
02/01/24 Prepared by: Etebark M.
Cont…
Example: assume a supply function
The supply for a certain commodity depends on its price (other
determinants taken to be constant) and the function being linear,
the relationship can be put as:
For a particular value of P, there is only one corresponding value
of Q.
This is a deterministic (non-stochastic) relationship since for each
price there is always only one corresponding quantity supplied.
All the variation in Q is due solely to changes in P, and that there
are no other factors affecting the dependent variable.
02/01/24 Prepared by: Etebark M.
Cont…
If this were true all the points of price-quantity pairs, if plotted
on a two- dimensional plane, would fall on a straight line.
However, if we gather observations on the quantity actually
supplied in the market at various prices and we plot them on a
diagram we see that they do not fall on a straight line.
02/01/24 Prepared by: Etebark M.
Cont…
Note: The derivation of the observation from the line may be
attributed to several factors.
Omission of variables from the function
Random behavior of human beings
Imperfect specification of the mathematical form of
the model
Error of aggregation
Error of measurement
02/01/24 Prepared by: Etebark M.
Cont…
To take into account the above sources of errors we introduce in
econometric functions a random variable which is usually
denoted by the letter ‘u’ or ‘ℇ’
And is called error term or random disturbance or
stochastic term of the function,
So called be cause u is supposed to ‘disturb’ the exact linear
relationship which is assumed to exist between X and Y.
By introducing this random variable in the function the model is
rendered stochastic of the form:
02/01/24 Prepared by: Etebark M.
Cont…
Thus a stochastic model is a model in which the dependent
variable is not only determined by the explanatory variable(s)
included in the model but also by others which are not included
in the model.
In order to take all these sources of error into account, we
introduce the stochastic/random disturbance term into our
econometric models and hence the complete simple econometric
model becomes:
02/01/24 Prepared by: Etebark M.
2.3. Simple Linear Regression model.
Economic theories are mainly concerned with the relationship
among varies economic variables.
The stochastic relationship with one explanatory variable is
called simple linear regression model.
A simple linear regression model:
It is a relationship between two variables related in a
linear form.
02/01/24 Prepared by: Etebark M.
Cont…
The true relationship which connects the variables
involved is split into two parts:
i. a part represented by a line and
ii. a part represented by the random term ‘u’.
02/01/24 Prepared by: Etebark M.
Cont…
The scatter of observations represents the true relationship
between Y and X.
The line represents the exact part of the relationship and the
deviation of the observation from the line represents the random
component of the relationship.
These points diverge from the regression line by U1 ,U2,
…….Un.
The first component is the part of Y explained by the changes in X and
The second is the part of Y not explained by X, that is to say the
change in Y is due to the random influence of ui.
02/01/24 Prepared by: Etebark M.
Definition of the simple linear regression model
Explains variable in terms of variable “
02/01/24 Prepared by: Etebark M.
.
02/01/24 Prepared by: Etebark M.
2.3.1 Assumptions of the Classical Linear
Stochastic Regression Model.
The objective of a regression analysis is not only estimate the
unknown parameters, β‘s, (coefficients ).
Y = f(X) + U = Β0 + Β1 X + Ui
The classical made important assumption in their analysis of
regression.
A. Some assumptions are related to Y and X
B. Some assumptions are related to X and X
C. Some assumptions are related to U
The most important of these assumptions are discussed as
folllows.
02/01/24 Prepared by: Etebark M.
Assumption 1:
A. The model is linear in parameters
The model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or
not.
This is because if the parameters are non-linear it is difficult to
estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
02/01/24 Prepared by: Etebark M.
Cont…
Check yourself whether the following models satisfy the above
assumption or not.
Linearity in variables implies that an equation is linear model if
it is expressed in a straight line.
02/01/24
The parameters are raised Prepared by: Etebark M.
to their first degree.
Cont…
Note: Linear regression means linear in parameter but it
may not be linear in the explanatory variable.
02/01/24 Prepared by: Etebark M.
Assumption 2:
B. Ui is a Random Real Variable
This means that the value which u may assume in any one
period depends on chance;
it may be positive,
negative or
zero.
Every value has a certain probability of being assumed by u in
any particular instance.
02/01/24 Prepared by: Etebark M.
Assumption 3:
C. Zero Mean Value of the Error term
That is; given the value of X the mean or expected value
of the disturbance term is zero.
Technically, the conditional mean value of ε is zero.
Mathematically,
02/01/24 Prepared by: Etebark M.
Assumption 4:
D. The variance of the random variable(U) is constant in each
period: (The assumption of Homoscedasticity)
Equal variance of the error term. Given the value of X, the
variance of is the error term (u) the same for all observations.
The variation of each ℇi around all values of the explanatory
value is the same.
The dispersion of the disturbance is the same.
This constant variance is called homoscedasticity assumption and
02/01/24
The constant variance itself is called homoscedastic variance.
Prepared by: Etebark M.
Assumption 5:
E. The Random Variable (U) has a Normal Distribution
This means the values of u (for each x) have a bell shaped
symmetrical distribution around their zero mean and constant
variance ,
Normality Test
02/01/24 Prepared by: Etebark M.
Assumption 6:
F. The random terms of different observations (Ui ,Uj)
are independent.
(The assumption of no autocorrelation)
This means the value which the random term assumed in one
period does not depend on the value which it assumed in any
other period.
02/01/24 Prepared by: Etebark M.
Assumption 7:
G. The random variable (U) is independent of the explanatory variables.
There is no correlation between the random variable and the
explanatory variable.
If two variables are unrelated their covariance is zero.
02/01/24 Prepared by: Etebark M.
Assumption 8:
H. The explanatory variables are measured without error
Y = f(X) + Ui
U absorbs the influence of omitted variables and possibly errors
of measurement in the y’s.
i.e., we will assume that the regressors are error free, while y
values may or may not include errors of measurement.
02/01/24 Prepared by: Etebark M.
Cont…
Additionally
The regression model is correctly specified.
There is no perfect multicollinearity (this holds in the case of
multiple linear regression model).
The number of observations ( n ) must be greater than the
number of parameters ( k ) to be estimated (in multiple linear
regression
Assumption of dependant variable:
We have two assumptions of dependent variables:
o The dependent variable i Y is normally distributed and
o Successive values of the dependent variable are independent.
02/01/24 Prepared by: Etebark M.
Example 1:
Let y = a+ bx: is a linear relationship between x and y.
y = a + b x2 : is a non linear relationship between x and y.
Example:
Variable Y Variable X a. Find the function.
b. Identify the slope and intercept
2 1 of a function.
4 2 c. Interpret the slope and intercept
of a function
6 3
8 4
First find the intercept and slope of a function
02/01/24
Write the mathematicalPrepared
relationship between x and y
by: Etebark M.
Example 2: you are given a data on saving and income of five
households as follows:
i Y = saving X = income a. write the function
1 200 500 b. What do observe
c. Is the slope the same
2 100 300 d. Can we solve the above
3 600 1000 equations using math's?
4 700 800
5 400 450
The slope varies. But we need to establish a linear r/hip between x and y.
The relationship is not exact.
So math’s failed to do so.
But econometrics can make it. How?
02/01/24 Prepared by: Etebark M.
Cont…
02/01/24 Prepared by: Etebark M.
2.2 Multivariate Case of CLRM
In simple regression we study the relationship between a
dependent variable and a single explanatory (independent
variable); assume that a dependent variable is influenced by
only one explanatory variable.
However, many economic variables are influenced by
several factors or variables.
For instance;
In decision to investment studies we study the relationship
between quantity invested (or either to invest or not) and
interest rate, share price , exchange rate, etc.
The demand for a commodity is dependent on price of the
same commodity, price of other competing or
complementary goods, income of the consumer, etc.
02/01/24 Prepared by: Etebark M.
Cont….
Hence the two variable model is often inadequate in practical
works.
Therefore, we need to discuss multiple regression models.
The multiple linear regression is entirely concerned with the
relationship between a dependent variable (Y) and two or more
explanatory variables (X1, X2, …, and Xn).
Why do we need multiple regression?
1. One of the motivation for multiple regression is the omitted
variable bias in the simple regression analysis.
It is the primary drawback of the simple regression but multiple
regression allows us to explicitly control for many other factors
which simultaneously affect the dependent variable.
02/01/24 Prepared by: Etebark M.
Example: wages vs. education
Imagine we want to measure the (causal) effect of an
additional year of education on a person’s wage.
If we want to the model: wage = β0+ β1educ + u and
interpret β1 as the ceteris paribus effect of educ on wage,
we have to assume that educ and u are uncorrelated.
Consider a different model now: wage= β0+ β1educ +
β2exper + u, where exper is a person’s working experience
(in years).
Since the equation contains experience explicitly, we will
be able to measure the effect of education on wage, holding
experience fixed.
Multiple regression analysis is also useful for generalizing
functional relationships between variables
02/01/24 Prepared by: Etebark M.
Simple Régression vs. Multiple Régression
Most of the properties of the simple regression
model directly extend to the multiple regression.
We derived many of the formulas for the simple
regression model; however, with multiple
variables, formulas can get difficult when
explanatory variables more than two.
As far as the interpretation of the model is
concerned, there’s a new important fact: the
coefficient βj captures the effect of jth explanatory
variable, holding all the remaining explanatory
variables fixed.
02/01/24 Prepared by: Etebark M.
NOTE
Multiple regression analysis is an extension of
simple regression analysis to cover cases in which the
dependent variable is hypothesized to depend on
more than one explanatory variable.
Much of the analysis will be a straightforward
extension of the simple regression model.
In multiple linear regression, we have one dependent
variable Y, and k number explanatory variables.
The relationship between a dependent & two/more
independent variables is linear in parameters, and
may not be linear in variables.
02/01/24 Prepared by: Etebark M.
Cont……
02/01/24 Prepared by: Etebark M.
What changes as we move from simple to
multiple regression?
Potentially more explanatory power with more
variables;
The ability to control for other variables; (and the
interaction of various explanatory variables:
correlations and multicollinearity);
Harder to visualize drawing a line through three or
more (n)-dimensional space.
The R2 is no longer simply the square of the
correlation coefficient between Y and X.
02/01/24 Prepared by: Etebark M.
Cont……
02/01/24 Prepared by: Etebark M.
2.2.1 Assumptions of the Multiple Linear
Regression
In order to specify our multiple linear regression
model and proceed our analysis with regard to this
model, some assumptions are compulsory.
But these assumptions are the same as in the single
explanatory variable model developed earlier except
the assumption of no perfect multicollinearity.
These assumptions are:
02/01/24 Prepared by: Etebark M.
Cont….
02/01/24 Prepared by: Etebark M.
Model With Two Explanatory Variables
In order to understand the nature of multiple
regression model easily, we start our analysis with
the case of two explanatory variables, then extend
this to the case of k-explanatory variables .
Estimation of parameters of two-explanatory
variables model
02/01/24 Prepared by: Etebark M.
Cont….
Since the population regression equation is unknown to any
investigator, it has to be estimated from sample data.
Let us suppose that the sample data has been used to estimate the
population regression equation.
We leave the method of estimation unspecified for the present
and merely assume that the equation has been estimated by sample
regression equation, which we write as:
02/01/24 Prepared by: Etebark M.
Given sample observation on Y, X1,, & X2, we
estimate the model using method of least square (OLS
02/01/24 Prepared by: Etebark M.
Cont….
02/01/24 Prepared by: Etebark M.
Cont….
02/01/24 Prepared by: Etebark M.
Cont…..
02/01/24 Prepared by: Etebark M.
Cont….
02/01/24 Prepared by: Etebark M.
2.3.2. Methods of Estimation
Specifying the model and stating its underlying assumptions are
the first stage of any econometric application.
The next step is the estimation of the numerical values of the
parameters of economic relationships.
The parameters of the linear regression model can be estimated
by various methods.
02/01/24 Prepared by: Etebark M.
Cont…
Three of the most commonly used methods are:
Method of moments (MM)
Ordinary least square method (OLS)
Maximum likelihood method (MLM)
But, here we will deal with the OLS methods of estimation.
02/01/24 Prepared by: Etebark M.
2.3.2.2. The Ordinary Least Squares
(OLS Method)
The model Yi X i i is called the true relationship
between Y and X because Y and X represent their perspective
population value, and α and β are called the true parameters
since they are estimated from the population value of Y and X.
But it is difficult to obtain the population value of Y and X
because of technical or economic reasons.
So we are forced to take the sample value of Y and X.
The parameters estimated from the sample value of Y and X are
called the estimators of the true parameters α and β and are
symbolized as
02/01/24 Prepared by: Etebark M.
Cont.…
Estimation of by least square method (OLS) or
classical least square (CLS) involves finding values for the
estimates and which will minimize the sum of
square of the squared residuals (e 2 ).
02/01/24 Prepared by: Etebark M.
Cont.…
Meaning, the residuals should be small.
Therefore, when assessing the fit of a line, the vertical distances of
the points from the line are the only distances that matter.
The OLS method calculates the best-fitting line for a dataset by
minimizing the sum of the squares of the vertical deviations from
each data point to the line (the Residual Sum of Squares, RSS)
Minimize RSS =
we will use differential calculus
02/01/24 Prepared by: Etebark M.
Cont.…
Why the sum of the squared residuals?
Why not just minimize the sum of the residuals?
To prevent negative residuals from cancelling positive
ones.
If we use , all the error terms e i would receive equal
importance no matter how closely/widely scattered the
individual observations are from SRF.
02/01/24 Prepared by: Etebark M.
Cont.…
02/01/24 Prepared by: Etebark M.
Rearranging we will get:
Divide both sides by “n”:
02/01/24 Prepared by: Etebark M.
Rewriting:
Rearranging the above equation we obtain:
Substituting the values of α we get:
02/01/24 Prepared by: Etebark M.
Rewritten in somewhat different way as follows;
02/01/24 Prepared by: Etebark M.
02/01/24 Prepared by: Etebark M.
2.3. Statistical Properties of Least Square
Estimators
There are various econometric methods with which
we may obtain the estimates of the parameters of
economic relationships.
We would like to an estimate to be as close as the
value of the true population parameters i.e. to vary
within only a small range around the true
parameter.
How are we to choose among the different
econometric methods, the one that gives ‘good’
estimates?
We need some criteria for judging the ‘goodness’ of
an estimate.
02/01/24 Prepared by: Etebark M.
Cont.…
‘Closeness’ of the estimate to the population
parameter is measured by the mean and variance
or standard deviation of the sampling distribution
of the estimates of the different econometric
methods.
We assume the usual process of repeated
sampling i.e. we assume that we get a very large
number of samples each of size ‘n’; we compute
the estimates β ’s from each sample, and for each
econometric method and we form their
distribution.
02/01/24 Prepared by: Etebark M.
Cont.…
We next compare the mean (expected value) and
the variances of these distributions and we choose
among the alternative estimates the one whose
distribution is concentrated as close as possible
around the population parameter.
02/01/24 Prepared by: Etebark M.
.
According to the Gauss-Markov theorem, the OLS estimators possess all the
BLUE properties. That is:
02/01/24 Prepared by: Etebark M.
.
02/01/24 Prepared by: Etebark M.