Simple Linear
Regression
1
Part I
Background Review
2
Regression Analysis
Statistical model is a mathematical description of the
data structure/data generating mechanism
Parametric model
Easier to fit, interpret, infer
More powerful (statistically)
Model complexity is fixed
Nonparametric model
No distributional assumption
More flexible
Model complexity may grow
Semiparametric model
3
Regression Analysis
Example: exam scores
Parametric: approximate the class distribution by a normal
distribution with certain parameters (mean and variance)
(hence we can say mean +/- one standard deviation ~ 68%)
Nonparametric: use the histogram
So far, we are dealing with only one variable
What if we have more variables available?
Want to exploit other information for a better picture
4
Regression Analysis
Regression studies the relationship between
Response/outcome/dependent variables; and
Predictor/explanatory/independent variables
In this course, we only deal with parametric approach
Goal: estimate the parameters
After building the model: interpret, infer, and predict
As we will see, the regression framework covers many of
the techniques you learnt in CB2200
5
Types of Variables
Nominal Binary
Qualitative/ No orderings in categories Only two categories
• Martial Status • Yes/No
Categorical • Eye Color • Male/Female
Categories are naturally ordered
Ordinal • Likert/rating scale
• Letter grades
Variables
• Number of children
Discrete • Defects per hour
Quantitative/
Numerical
• Weight
Continuous • Voltage
6
Let’s look at the simplest
case
To study the relationship between two numerical
variables, such as
Exam score vs. Time spent on doing revision
Apartment price vs. Gross floor area
Electricity consumption vs. Air temperature
We can use Pearson correlation coefficient, also known
as linear correlation coefficient
7
Linear Correlation
Analysis
Scatter plot
8
Linear Correlation
Analysis Cont’d
(Sample) Linear correlation coefficient,
Dimensionless
“Sign” indicates the direction (positive / negative) of a
linear relationship
“Magnitude” measures the strength of a linear
relationship
are the sample means
, are the sample variances
9
Linear Correlation Analysis Cont’d
t-test for correlation coefficient
Important!! Note the
(no linear correlation) slight abuse of notations
(linear correlation exists) • Upright t denotes the
value of statistic
t-statistic t • denotes the
distribution itself
p-value |t| • denotes its upper tail
denotes a distribution with degree of freedom
quantile (d.f.)
Reject if |t| > or p-value <
10
Example
Is residential apartment price related to its gross floor
area and age of the building?
Source: HKEA Transaction Records, http://www.hkea.com.hk/private/TransServ
11
Data: Transactions of residential apartments in Tseung Kwan O during 1 – 8 April 2014
Example Cont’d
The data file consists of records, with records contain
missing values, we are going to make use of the
following three variables
Price = Price in million HK$
GrossFA = Gross floor area in ft2
Age = Age of building in years
12
Example • R will not process codes after #, use for comments
#set working directory
setwd(“C:/Users/chiwchu/Google Drive/Academic/CityU/MS3252/Lecture")
#load the data • library(…) is to load packages
library(readxl) • attach(…) so that variables in the
Example = read_excel("Example.xlsx") database can be used directly;
o/w, use Example$Price, etc
attach(Example)
#scatter plots of Price vs GrossFA and Price vs Age
plot(GrossFA,Price)
plot(Age,Price)
• cbind(…) is to combine by columns
#compute correlations and t-tests
• rbind(…) is to combine by rows
cor(cbind(Price,GrossFA,Age))
cor.test(Price,GrossFA)
13
Example Cont’d
14
Example Cont’d
Correlation coefficient Correlation coefficient
between Price itself between Price and GrossFA
Calculated value of p-value of t-test for
t-test statistics correlation coefficient
15
Conditional Distribution
Probability/density -> Distribution
Conditional probability/density -> Conditional
distribution
e.g. Let denote the random variable of whether it will
rain tomorrow (1=yes, 0=no)
If the probability of raining tomorrow is 0.4, then has a
Bernoulli(0.4) distribution,
But what if we know whether a typhoon is coming?
Let denote the random variable of whether a typhoon is
coming (1=yes, 0=no)
can be random itself, but we can think of it as fixed
16
Conditional Distribution
Given the information of , the probability of raining
tomorrow and hence the distribution of may change!
Say, conditional probability , then the conditional
distribution of is
Similarly, the conditional distribution of could be
The conditional distribution of , particularly the
conditional mean, varies across different values of
Regression is about the study of conditional distribution!
17
Part II
Formulation and Estimation
18
Overview of Regression Analysis
Input
Response / outcome / dependent variable,
The variable we wish to explain or predict
Predictor / covariate / explanatory / independent variable,
The variable used to explain the response variable
Output
A (linear) function that allows us to
Model association: Explain the variation of the response caused
by the predictor(s)
Provide prediction: Estimate the value of the response based on
value(s) of the predictor(s)
19
Simple Linear Regression -
Formulation
A simple linear regression model consists of two
components
Regression line: A straight line that describes the
dependence of the average value (conditional mean) of the
-variable on one -variable
Random error: The unexpected deviation of the observed
value from the expected value
Population Slope Coefficient
Population Intercept
Predictor
Response Random Error
Regression Line 20
Simple Linear Regression -
Formulation
(Linear) regression model
Assumptions
Linearity of regression equation
is a linear function
Error normality (can be dropped if sample size is large, why?)
has a normal distribution for all
Zero mean of errors (not really an assumption with the intercept)
Constant variances of errors
Error independence
are independent for all
21
Simple Linear Regression -
Formulation
Equivalently, the linear regression model can be written as
(mean function)
(variance function)
are independent and normally distributed
In other words, are independent
denotes a normal distribution with mean and variance
We also call it mean regression model
22
Simple Linear Regression -
Formulation
Framework: we have one response and predictors
here because we only have one
We obtain a random sample of size , containing the values
of and for each individual/subject/observation ,
Our goal is to model/infer about the conditional mean of
given
As the conditional mean is characterized by and , that
means we need to estimate and from the data
23
Simple Linear Regression -
Estimation
Goal: estimate and
Let’s denote these estimates by and
our notations for parameters: Greek alphabets represent
the population/true versions; English alphabets represent
the sample/estimated analogies.
Two methods (turn out to be equivalent for linear
regression):
Least Squares Estimator (LSE)/Ordinary Least Squares (OLS)
Maximum Likelihood Estimator (MLE)
24
Simple Linear Regression -
Estimation
𝑌
𝒀𝒊
𝒆𝒊
^
𝒀 𝒊
^ =𝒃 +𝒃 𝑿
𝒀 𝒊 𝟎 𝟏 𝒊
We are assuming
𝑋 normality of Y for
𝑋𝑖
every level of X
represents the sample intercept
represents the sample slope coefficient
represents the sample residual error 25
Simple Linear Regression -
Estimation
and are estimated using the least squares method,
which minimize the sum of squares errors (SSE)
26
Simple Linear Regression -
Estimation
The solution to and can be obtained by differentiating
with respect to and
That is to solve
and
simultaneously
27
Simple Linear Regression -
Estimation
The solutions are
and
Also, the estimate for the error variance is given by
28
Simple Linear Regression -
Estimation
Maximum Likelihood Estimation is to find the parameters
that maximize the likelihood/probability of observing the
sample
Recall that
The density function of is
Assume is known and equals 1 for simplicity…
The joint likelihood/probability of observing these given
these will be
Maximizing this likelihood function is equivalent to
minimizing , which is exactly the SSE, so MLE = LSE!
29
Example Cont’d
• lm(…) means linear model
• summary(…) reports summary
of variables/ model results
𝑏0
𝑏1
𝑆 𝑒= √ 𝑀𝑆𝐸
or
30
Example – The Model &
Interpretation of Coefficients Cont’d
The estimated simple linear regression equation
where = Price in million HK$
= Gross floor area in ft2
The estimated slope coefficient,
Measures the estimated change in the average value of as
a result of a one-unit increase in
says that the price of an apartment increases by , on
average, for each square foot increase in gross floor area
31
Example – The Model &
Interpretation of Coefficients Cont’d
The estimated simple linear regression equation
where = Price in million HK$
= Gross floor area in ft2
The estimated intercept coefficient,
Denotes the estimated average value of when is zero
says that the price of an apartment is , on average, when
the gross floor area is zero (any problem?)
Interpret with caution with the -value is out of range!!
32
Example Cont’d
Regress Price against Age
33
Example Cont’d
The relationship between apartment price and age of
the building is
where = Price in million HK$
= Age of building in years
If the building gets 1 year older, the average apartment
price decreases by
34
Confidence Interval (CI)
Confidence interval estimate for slope coefficient
R program
% CI for
Upon repeated sampling, those CI will cover the true
parameter with approximately 95% chance
We are 95% confident that the population parameter is
contained in between the CI
35
Special Case I: One Sample
In other words, no are considered (imagine )
The linear regression model assumes that
are independent
This is equivalent to fitting a normal distribution to the
entire sample!!
Intuitively, what are the best estimates for and ?
36
Special Case II: Two Groups
Now, the are either 0 or 1 indicating which group the
observation belongs to
The linear regression model assumes that
are independent
are independent
This is equivalent to fitting two normal distributions to the
two groups respectively!!
37
Part III
Goodness of Fit,
Parameter Inference,
and Model Significance
38
Goodness of Fit and Model
Significance
We want to compare the fitted model with against the
null model without
Fitted/Full model = the model you considered
Null model = special case I = a horizontal line at
(Saturated model = data = the model with perfect fit)
Saturated model <----- Fitted model -----> Null model
One simplest way to evaluate the goodness of fit is to
look at the variance/variation breakdown (although not
always a good idea) 39
Analysis of Variance
(ANOVA)
Total variation of the -variable is made up of two parts
Sum of Squares Total,
Total variation of around their mean,
Sum of Squares Regression,
Variation explained by the regression equation
Sum of Squares Errors,
Variation not explained by the regression equation
Question: what are the values of and for the null and
saturated models?
40
Analysis of Variance
(ANOVA) Cont’d
SST, the total variation of -variable
we wish to explain
Price
SSE = SST - SSR
GrossFA
SSR, the variation of -variable that
being explained by the regression
equation with the p
41
Analysis of Variance
(ANOVA) Cont’d
Coefficient of determination,
Measures the proportion of variation of explained by the
regression equation with the predictor
Measures the “goodness of fit” of the regression model
Remark!! in simple linear regression, i.e. when there is
one -variable
42
Example
Which independent variable, GrossFA or Age, provides a
better explanation to the variation of apartment price?
SSR
SSE
=MSE
For SST, use either of
• sum(anova(m1)[,2])
• var(Price)*(length(Price)-1)
43
Inferences about the Parameters –
A -Variable Significance
t-test for a slope coefficient
(no linear relationship)
(linear relationship exists)
t-statistic t
where = standard error of the slope
p-value |t|
has a distribution with d.f.
Reject if |t| > or p-value <
44
Inferences about the Parameters –
A -Variable Significance Cont’d
measures the variation in the slope of regression lines from
different possible samples (one color denotes one sample)
𝒀 𝒀
Small 𝑿 Large 𝑿
variation of the errors around the regression line
45
Inferences about the Parameters –
A -Variable Significance Cont’d
Recall , we can show that !!
Test for is equivalent to the t-test for linear correlation
coefficient
46
Example
Is GrossFA significantly affecting the apartment price?
𝑏1 𝑆𝑏 1
t p-value
d.f. =
47
Example Cont’d
Is GrossFA significantly affecting the apartment price?
In R, use
• qt(.975,78) to obtain C.V.
• 2*(1-pt(10.81,78)) to obtain p-value
t
In exam,
At = • use t-table to obtain C.V.
• p-value is not computable by hand,
d.f. = = but a range can be found at best
C.V. =
Reject , GrossFA significantly affects
apartment price.
p-value < < , reject 48
Example Cont’d
Is Age having a significant negative relationship with
price?
t
At =
d.f. = ) =
C.V. =
Reject , Age has a significant negative
impacts on apartment price.
p-value = < , reject
Note: For one-tail test, 49
Inferences about the Parameters –
Overall Model Significance
F-test for the overall model
(the model is insignificant)
(the model is significant)
F-statistic F
where = Mean Squares Regression
= Mean Squares Errors
Again a slight abuse
= no. of predictors (excluding intercept) of notations
= no. of observations
p-value F
denotes an distribution with d.f. and
Reject if F > or p-value <
50
Example
Is the model significant?
F
d.f. =
p-value
SSR MSR
SSE MSE
51
Example Cont’d
Is the model significant?
In R, use
• qf(.95,78) to obtain C.V.
• 1-pf(116.90,78) to obtain p-value
F
In exam,
• use F-table to obtain C.V.
At = • p-value is not computable by hand
d.f. = =
C.V. =
Reject , the model is significant.
p-value < < , reject 52
Inferences about the Parameters –
Overall Model Significance
• The F statistic is a monotone transform of
• Essentially it is testing whether the is significantly bigger
than 0
• For simple linear regression (only one predictor)
• , i.e. F-test is equivalent to two-tail t-test!!
53
Part IV
Prediction and Diagnostics
54
Prediction of New Observations –
Point Prediction
Convert the given -value into the same measurement
scale as the observed -value
As the estimated slope coefficient is scale dependent
Ideally, only use the regression equation to predict the -
value when the given -value is inside the observed data
range
As we are not sure whether the linear relationship will go
beyond the range of observed -value
55
Example Cont’d
What is the estimated price for an apartment with gross floor
area ft2?
Prediction given by the simple linear regression equation
where = Price in million HK$
= Gross floor area in ft2
The expected price for an apartment with gross floor area ft 2
is
What is the estimated mean price for apartments with gross
floor area ft2? – same estimate, but any differences?
56
Prediction of New
Observations Cont’d
The prediction given by regression models raised from
different possible samples will vary
𝒀
^
𝒀 𝒊 Which prediction
^
𝒀 𝒊 we should trust?
^
𝒀 𝒊
𝑿𝒊 𝑿
57
Prediction of New Observations –
Interval Prediction Cont’d
Confidence interval estimate for the mean of -variable
given a -value
where , is the given -value
R program
predict(m1,level=.95,interval="confidence")
Note that , where is the sample variance of
58
Prediction of New Observations –
Interval Prediction Cont’d
Prediction interval estimate for an individual -value given
a -value
where
R program
predict(m1,level=.95,interval=“prediction")
It is still a type of confidence interval, although we are using
the term prediction interval to differentiate them
59
Prediction of New Observations –
Interval Prediction Cont’d
𝒀 Prediction Interval for an
individual -value
Confidence Interval for the
mean of -variable
𝑿𝒊 𝑿
60
Example Cont’d
Determine a % confidence interval for the mean
apartment price for flats of ft2 gross area
Also, construct a % prediction interval for the apartment
price for a flat of ft2 gross area
confidence interval for mean
point estimate for mean
61
Regression Assumptions
Linearity of regression equation
is a linear function
Error normality
has a normal distribution for all
Constant variances of errors
Error independence
are independent for all
62
Residual Analysis
Check the regression assumptions by examining the
residuals
Residuals (or errors),
Plot
Residuals against the predictor for checking linearity and
constant variances
Residuals against index for checking error independence
Histogram of the residuals for examining error normality
63
Residual Analysis
e Cont’d
0
e
𝑿
Residuals has a systematic
0 pattern, and -variables are not
having a liner relationship, but a
𝑿 e curved one
Residuals fall within a
horizontal band centered 0
around 0, displaying a
random pattern
𝑿
Error variance increases with -
64
value
Residual Analysis Cont’d
e e
0 0
Index(Time) Index(Time)
Residuals displaying a
random pattern
Negative residuals are
associated mainly with the
early trials, and positive
residuals with the later trials,
time of the data being
collected affects the residuals
and -values
65
Residual Analysis Cont’d
35 35
30 30
25 25
20 20
%
%
15 15
10 10
5 5
0 0
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 -0.75 -0.5 -0.25 0 0.25 0.5 0.75
e e
Residuals follow a
symmetrical and bell shape
Residuals are being right-
skewed
distribution
66
Summary
Description Response Predictor Correlation Error
Population Version
Sample Analogy
Variance of Estimator
(take square root to get
standard error)
Intercept Slope Expected value of New observation of
67
Summary
is the breakdown of variance / variations
= a single number in that quantifies the model explained
variation / measures the goodness of fit
t-statistic t tests the significance of a single predictor, i.e.
whether
F-statistic F tests the significance of the entire model, i.e.
whether all , is the number of predictors
In this chapter with a single , and
Point prediction and confidence interval prediction
68