BST32202:
LINEAR
REGRESSION
6. Simple Linear
Regression
Lindizgani K. Ndovie,
2025
OUTLINE
● One way and two way ANOVA: ○ model adequacy:
○ analysis of variance equation, ■ normality assumption,
○ F-statistic, ■ constant variance assumption,
○ multiple comparison procedures. ■ colinearity,
● Simple and multiple linear regression: ○ influential observations:
○ assumptions, ■ leverage,
○ least squares estimation: ■ outliers;
■ derivation of parameter estimates, ○ polynomial regression;
■ properties of least squares ○ transformation techniques.
estimates, ● Relationship between ANOVA and the
○ correlation and regression, linear model:
○ inference of parameters ○ one way ANOVA model,
■ (test of significance of coefficients), ○ two way ANOVA model
○ inference of predicted values using the ● Use of SPSS, STATA, R to perform ANOVA
model; and linear regression.
INTRODUCTION
• We now turn to the analysis of relationships between variables.
• Here we will concentrate entirely on linear relationships.
• For example, we might be interested in the relationship between the quantity of money
demanded and the volume of transactions that people make as represented by the
level of money income.
• Or we might be interested in the relationship between family expenditures on food and
family income and family size.
• Regression analysis is used to analyse and predict the relationship between the
response or dependent variable (money holdings and family expenditure on food in
the above examples) and one or more independent, explanatory, or predictor
variables.
• In the demand for money example the single independent variable was the level of
income; in the family food expenditure example, there were two independent
variables, family income and family size.
• We must distinguish two types of relationships between variables.
INTRODUCTION
• A deterministic relationship exists if the value of Y is uniquely determined when the
value of X is specified—the relationship between the two variables is exact.
• For example, we might have Y = 𝜶 + βX where β is some gradient constant such as 10
and 𝜶 is the y-intercept.
• Figure 1 presents an example of a
deterministic straight-line
relationship between X and Y
along which all observed
combinations of the two variables
lie.
INTRODUCTION
• On the other hand, there may be a relationship between two variables that involves
some random component or random error.
• This relationship is called a probabilistic or statistical relationship.
• In this case we might have Y = 𝜶 + βX + ϵ which can be viewed as a probabilistic model
containing two component
• A deterministic component 𝜶 +
βX plus a random error ϵ.
• An example of a probabilistic
relationship is given in Figure 2.
THE SIMPLE LINEAR REGRESSION MODEL
• When the statistical relationship is linear, the regression model for the observation 𝑌!
takes the form 𝑌! = 𝛽" + 𝛽# 𝑋! + 𝜖!
• where
• functional or deterministic relationship between the variables is given by 𝛽" + 𝛽# 𝑋!
• 𝜖! is the random scatter component
• 𝑌! is the dependent variable for the ith observation,
• 𝑋! is the independent variable for the ith observation, assumed to be non-random,
• 𝛽" and 𝛽# are parameters
• The 𝜖! are the deviations of the 𝑌! from their predicted levels based on 𝑋! , 𝛽" and 𝛽# .
• The error term is assumed to have the following properties:
• The 𝜖! are normally distributed.
• The expected value of the error term, denoted by 𝐸[𝜖! ], equals zero.
• The variance of the 𝜖! is a constant, 𝜎 $ .
• The 𝜖! are statistically independent—the covariance between 𝜖! and 𝜖% is zero.
• In other words, 𝜖! ~𝑁(0, 𝜎 )$
• This normality assumption for the ϵi is quite appropriate in many cases.
THE SIMPLE LINEAR REGRESSION MODEL
• Since the error term 𝜖! is a random variable, so is the dependent variable 𝑌! .
• The expected value of 𝑌! equals
• 𝐸 𝑌! = 𝐸 𝛽" + 𝛽# 𝑋! + 𝜖! = 𝐸 𝛽" + 𝐸 𝛽# 𝑋! + 𝐸 𝜖! = 𝛽" + 𝛽# 𝐸 𝑋! + 0 = 𝛽" + 𝛽# 𝑋!
• where 𝐸 𝑋! = 𝑋! because these 𝑋! are a series of pre-determined nonrandom numbers.
• This equation, the underlying deterministic relationship is called the regression
function.
• Its the line of means that relates the mean of Y to the value of the independent variable
X.
• The parameter 𝛽# is the slope of this line and 𝛽" is its intercept.
• The variance of 𝑌! given 𝑋! equals
$
• 𝑉𝑎𝑟 𝑌! |𝑋! = 𝑉𝑎𝑟 𝛽" + 𝛽# 𝑋! + 𝜖! = 𝑉𝑎𝑟 𝛽" + 𝛽# 𝑋! + 𝑉𝑎𝑟 𝜖! = 0 + 𝑉𝑎𝑟 𝜖! = 𝜎
• Where the regression function 𝛽" + 𝛽# 𝑋! is deterministic and therefore does not vary.
• Thus the 𝑌! have the same variability around their means at all 𝑋! .
• Finally, since the 𝜖! are assumed to be independent for the various observations, so are
the 𝑌! conditional upon the 𝑋! .
$
• Hence it follows that 𝑌! ~𝑁 𝛽" + 𝛽# 𝑋! , 𝜎
METHOD OF LEAST SQUARES
• Point estimates of 𝛽" and 𝛽# can be obtained using a number of alternative estimators.
• The most common estimation method is the method of least squares.
• This method involves choosing the estimated regression line so that the sum of the
squared deviations of 𝑌! from the value predicted by the line is minimized.
• Let us denote the deviations of 𝑌! from the fitted regression line by 𝑒! and our least-
squares estimates of 𝛽" and 𝛽# by 𝑏" and 𝑏# respectively.
• Remember 𝑌! = 𝛽" + 𝛽# 𝑋! + 𝜖!
• Then we have the following sum of squares
' ' '
$ $ $
𝑆= : 𝑒! = : 𝑌! − 𝐸 𝑌! = : 𝑌! − 𝑏" − 𝑏# 𝑋!
!&# !&# !&#
• Where S is the sum of squared deviations of the 𝑌! from the values predicted by the
line.
• The least-squares estimation procedure involves choosing 𝑏" and 𝑏# , the intercept and
slope of the line, so as to minimize S.
ESTIMATION OF THE REGRESSION PARAMETERS
• This minimizes the sum of the squared lengths of the vertical lines in Figure 3 below.
• Expanding the previous equation, we have
' '
$ $
𝑆= : 𝑒! = : 𝑌! − 𝑏" − 𝑏# 𝑋!
!&# !&# e7
' ' e6
= : 𝑌!$ + 𝑛𝑏"$ + : 𝑏#$ 𝑋!$ e5
!&# !&#
' ' '
− 2𝑏" : 𝑌! − 2𝑏# : 𝑌! 𝑋! + 2𝑏" 𝑏# : 𝑋!
e3
!&# !&# !&# e2
e4
e1
ESTIMATION OF THE REGRESSION PARAMETERS
• To find the least squares minimizing values of 𝑏" and 𝑏# we differentiate S with respect
to each of these parameters and set the resulting derivatives equal to zero.
• This yields
' '
𝜕𝑆
= 2𝑛𝑏" − 2 : 𝑌! + 2𝑏# : 𝑋! = 0
𝜕𝑏"
!&# !&#
' ' '
𝜕𝑆 $
= 2𝑏# : 𝑋! − 2 : 𝑋! 𝑌! + 2𝑏" : 𝑋! = 0
𝜕𝑏#
!&# !&# !&#
• which simplify to
' '
: 𝑌! = 𝑛𝑏" + 𝑏# : 𝑋!
!&# !&#
' ' '
$
: 𝑋! 𝑌! = 𝑏" : 𝑋! + 𝑏# : 𝑋!
!&# !&# !&#
ESTIMATION OF THE REGRESSION PARAMETERS
• These two equations can now be solved simultaneously for 𝑏" and 𝑏# .
• Dividing the first equation by n, rearranging to put 𝑏" on the left side and noting that
∑ 𝑋! = 𝑛𝑋@ and ∑ 𝑌! = 𝑛𝑌,@ we obtain
𝑏" = 𝑌@ − 𝑏# 𝑋@
• Substituting this into the second equation, we obtain
' ' ' '
: 𝑋! 𝑌! = 𝑌@ : 𝑋! − 𝑏# 𝑋@ : 𝑋! + 𝑏# : 𝑋!$
!&# !&# !&# !&#
• which can be rearranged to yield
' ' ' '
: 𝑋! 𝑌! − 𝑌@ : 𝑋! = 𝑏# : 𝑋!$ − 𝑋@ : 𝑋!
!&# !&# !&# !&#
' '
: 𝑋! 𝑌! − 𝑛𝑌@ 𝑋@ = 𝑏# : 𝑋!$ − 𝑛𝑋@ $
!&# !&#
ESTIMATION OF THE REGRESSION PARAMETERS
∑'!&# 𝑋! 𝑌! − 𝑛𝑌@ 𝑋@
𝑏# = '
∑!&# 𝑋!$ − 𝑛𝑋@ $
• By expansion it can be shown that
' '
: 𝑋! − 𝑋@ 𝑌! − 𝑌@ = : 𝑋! 𝑌! − 𝑛𝑌@ 𝑋@
!&# !&#
• and
' '
: 𝑋! − 𝑋@ $
= : 𝑋!$ @
− 𝑛𝑋 $
!&# !&#
• so that by substitution into, we obtain
∑'!&# 𝑋! − 𝑋@ 𝑌! − 𝑌@ 𝑆()
𝑏# = =
∑!&# 𝑋! − 𝑋@ $
' 𝑆((
• where x = 𝑋! − 𝑋@ and y = 𝑌! − 𝑌@ are the deviations of the variables from their
respective means and the summation is over i = 1 . . . n.
PROPERTIES OF LEAST SQUARE ESTIMATES
• The least-squares estimators 𝑏" and 𝑏# are unbiased and, linearly dependent on the n
sample values 𝑌! .
• It can be shown that least-squares estimators are more efficient—that is, have lower
variance—than all other possible unbiased estimators of 𝛽" and 𝛽# that are linearly
dependent on the 𝑌! .
• It can also be shown that these desirable properties do not depend upon the
assumption that the 𝜖! are normally distributed.
• Estimators of 𝛽" and 𝛽# can also be developed using the method of maximum
likelihood (under the assumption that the 𝜖! are normally distributed).
• These estimators turn out to be identical with the least-squares estimators.
• The regression function 𝐸 𝑌 = 𝛽" + 𝛽# 𝑋 is estimated as 𝑌C = 𝑏" + 𝑏# 𝑋 , where 𝑌C is
referred to as the predicted value of Y .
THE PROPERTIES OF THE RESIDUALS
• To make inferences (i.e., construct confidence intervals and do statistical tests) in
regression analysis we need to estimate the magnitude of the random variation in Y .
• We measure the scatter of the observations around the regression line by comparing
the observed values 𝑌! with the predicted values associated with the corresponding 𝑋! .
• The difference between the observed and predicted values for the ith observation is
the residual for that observation calculated as 𝑒! = 𝑌! − 𝑏" − 𝑏# 𝑋! .
• Note that 𝑒! is the estimated residual while 𝜖! is the true residual or error term which
measures the deviations of 𝑌! from its true mean E{Y }.
• The least-squares residuals have the following properties.
• They sum to zero: ∑ 𝑒! = 0.
'
• The sum of the squared residuals ∑!&# 𝑒!$ is a minimum—this follows because the
method of least squares minimizes Q.
• The sum of the weighted residuals is zero when each residual is weighted by the
corresponding level of the independent variable : ∑ 𝑋! 𝑒! = 0
• The sum of the weighted residuals is zero when each residual is weighted by the
corresponding fitted (predicted) value ∑ 𝑌C! 𝑒! = 0
EXAMPLE 1
• Given the following data on length and width of wood planks produced by a timber
company, calculate b0 and b1 values and give equation for the best line of fit.
• It is given that that length depends on the width of the plank.
Length 3.4 3.4 6.3 3.2 7.5 8.1 9.1 11.5 12.1 14.2 18.5
Width 8.7 9.2 11.2 11.5 11.6 11.8 12.3 13.9 14.7 17.1 18.1
• We know
𝑏" = 𝑌@ − 𝑏# 𝑋@
∑'!&# 𝑋! 𝑌! − 𝑛𝑌@ 𝑋@
𝑏# = '
∑!&# 𝑋!$ − 𝑛𝑋@ $
EXAMPLE 1
Length (Y) Width (X) 𝑋! 𝑌! 𝟐
𝑿𝒊 • Therefore, we have
3.4 8.7 29.58 75.69 ∑ '
• !&# 𝑌! = 97.30
3.4 9.2 31.28 84.64
6.3 11.2 70.56 125.44 • ∑'!&# 𝑋! = 140.10
3.2 11.5 36.8 132.25
'
7.5 11.6 87 134.56 • ∑!&# 𝑋! 𝑌! = 1378.12
8.1 11.8 95.58 139.24
9.1 12.3 111.93 151.29 • ∑'!&# 𝑋!$ = 1872.43
11.5 13.9 159.85 193.21
*+.-"
12.1 14.7 177.87 216.09 @
• 𝑌 = ## = 8.85
14.2 17.1 242.82 292.41
#.".#"
18.5 18.1 334.85 327.61 @
• 𝑋= = 12.74
##
∑𝒏𝒊&𝟏 𝒀𝒊 = ∑𝒏𝒊&𝟏 𝑿𝒊 = ∑𝒏𝒊&𝟏 𝑿𝒊 𝒀𝒊 = 𝒏 𝟐=
∑𝒊&𝟏 𝑿𝒊
97.30 140.10 1378.12 1872.43 • n = 11
EXAMPLE 1
Substituting in the formulas, we will have the following
∑'!&# 𝑋! 𝑌! − 𝑛𝑌@ 𝑋@ 1378.12 − 11(12.74)(8.85) 137.881
𝑏# = ' = = = 1.58
∑!&# 𝑋! − 𝑛𝑋@ $
$ 1872.43 − 11(12.74)$ 87.0464
𝑏" = 𝑌@ − 𝑏# 𝑋@ = 8.85 − 1.58 12.74 = −11.28
Therefore, the line that best fits the data is
𝑌C! = −11.28 + 1.58𝑋C!
We can use this model to predict the length of the plank given its width
TUTORIAL
• Exercise 1: Given the following data on rental companies, find the regression line and
sketch it
Company Cars (in ten thousands) Revenue (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
TUTORIAL
• Exercise 2: Given the following data obtained in the study of the number of absences
and the final grade of the seven students in the statistics class, find the regression line
and sketch it.
Student Number of absences Final Grade (in %)
1 6 82
2 2 86
3 15 43
4 9 74
5 12 58
6 5 90
7 8 78
TUTORIAL
• Exercise 3: The following is data on living area x and selling price y of 5 homes:
Residence x (square feet) y (in thousands)
1 14 178
2 15 230
3 17 240
4 19 275
5 16 200
• Verify that the regression line is y = − 53.86 + 17.189x
• Sketch the line.