DATA 503 – Applied Regression Analysis
Lecture 9: Linear Model, Inference, and Prediction Highlights
By Dr. Ellie Small
OverviewTopics:
• Initial Data Analysis
• Linear Model
• Identifiability and Orthogonality
• Compare Two Models
• Hypothesis Tests for Parameters
• Permutation Tests
• Confidence Intervals and Regions
• Bootstrap Confidence Intervals
• Predictions
2
Initial Data Analysis
We first check the data for errors (often data entry errors):
• summary(data) in R. Look for:
₋ Unreasonable range – Change the minimum/maximum values
₋ Look for coding of missing values – Set them to NA
₋ Look for variables that should have been designated as factors (few distinct values) - factor(var) in R
• Check graphs for unusual behavior/effects:
₋ Histogram for single variable – hist(var)
₋ Plot the density of a single variable - plot(density(var))
₋ Scatterplot for 2 variables – plot(var1~var2)
₋ Grouped boxplot for 2 variables where var2 is a factor – plot(var1~var2)
3
Linear Model
𝑌 = 𝒙′
𝜷 + 𝜀 𝑓𝑜𝑟 1 𝑐𝑎𝑠𝑒 , 𝒙 =
1
𝑥2
⋮
𝑥 𝑝
∈ ℝ 𝑝
𝑡ℎ𝑒 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒐𝒓𝒔
𝒀 = X𝜷 + 𝜺 (𝑓𝑜𝑟 𝑎 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛)
Where X
nxp
=
𝒙1
′
⋮
𝒙 𝑛
′
=
1 𝑥12 ⋯ 𝑥1𝑝
⋮ ⋮ ⋱ ⋮
1 𝑥 𝑛2 ⋯ 𝑥 𝑛𝑝
= 𝟏 𝑛 𝒙 2 ⋯ 𝒙(𝑝)
𝒙𝑖 is the ith set of predictors (for case i), while 𝒙 𝑗 contains all values for the jth predictor (or variable).
𝒀 =
𝑌1
⋮
𝑌𝑛
, 𝜺 =
𝜀1
⋮
𝜀 𝑛
, and 𝜷 =
𝛽1
⋮
𝛽 𝑝
Assumptions: 𝐸 𝜺 = 𝟎, 𝑉𝑎𝑟 𝜺 = 𝜎2 𝐼 𝑛
4
Response vector
Error vector
Parameter vector
Model matrix
Linear Model - 2
Estimation:
𝒀 = X 𝜷 + 𝜺
𝒀 = 𝒀 + 𝜺
, where 𝒀 = X 𝜷 = 𝑃X 𝒀, RSS = 𝜺 2
, 𝜎2
=
𝑅𝑆𝑆
𝑛−𝑝
, 𝑑𝑓 = 𝑛 − 𝑝
This is the least squares estimate and minimizes the RSS compared to all other linear
combinations of the column vectors of X.
𝑆𝑆𝑡𝑐 is the RSS for the model without predictors. 𝑅2
is the proportion of variance explained
by the model, or the improvement of the model compared to the model without predictors:
𝑅2
=
𝑆𝑆𝑡𝑐−𝑅𝑆𝑆
𝑆𝑆𝑡𝑐
.
Normal Equations: X
′
X 𝜷 = X
′
𝒀 . If 𝑟𝑎𝑛𝑘 X
nxp
= 𝑝, then 𝜷 = X
′
X
−1
X
′
𝒀
5
Regression coefficient vector
Residual vector
Fitted value vector
Sum of Squares Total Corrected
Linear Model - 3
In R:
lmod=lm(response~vars,data=data)
𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information
𝒀=fitted(lmod)
𝜺=residuals(lmod)
RSS=deviance(lmod)
𝜷=coef(lmod)
df=n-p=df.residual(lmod)
6
𝑟𝑎𝑛𝑘 X = 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘
𝜎 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑠𝑖𝑔𝑚𝑎
𝑅2
= 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑟. 𝑠𝑞𝑢𝑎𝑟𝑒𝑑
Identifiability and Orthogonality
7
Identifiability
Normal Equations: X
′
X 𝜷 = X
′
𝒀
if 𝑟𝑎𝑛𝑘 X ≠ 𝑝, then at least one of the variables (columns in X) is a linear combination of the others. This means that X
′
X is
not invertible and there are many solutions for 𝜷 in the system of linear equations given by the normal equations.
𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information.
Check to see if any of the 𝛽𝑖 are set (by R) to NA, or if 𝑟𝑎𝑛𝑘 X ≠ 𝑝 (via 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘) in which case one or more of the
variables (columns in X) are a linear combination of the others. Check the relationships and remove the appropriate
variable(s).
Orthogonality
If the columns of the model matrix (the variables) are orthogonal, then any model with a subset of those variables will have
the same estimates for the parameters of those variables, i.e. their regression coefficients 𝛽𝑖 are equal between the models.
Note, however, that the estimate of the error variance will be different between the models, which will affect the CI of each
𝛽𝑖.
Compare Two Models
8
Assume we have normality, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 .
Model Ω: 𝒀 = X
nxp
𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = X 𝜔
nxq
𝜷 𝜔 + 𝜺 𝜔 where X 𝜔
∈ 𝑀 X
𝐻0: 𝒀 = X 𝜔
𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺
If X 𝜔
contains the first q columns of X, then this is equivalent to:
𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 =
𝛽 𝑞+1
⋮
𝛽 𝑝
If 𝐻0 holds, i.e. there is no relationship between 𝜷 𝑟 and 𝒀, then the difference between 𝑅𝑆𝑆 𝜔 and 𝑅𝑆𝑆 is random
(note that 𝑅𝑆𝑆 𝜔 ≥ 𝑅𝑆𝑆 ), and 𝐹 =
𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞
𝑅𝑆𝑆/ 𝑛−𝑝
follows an 𝐹𝑝−𝑞,𝑛−𝑝 distribution. If 𝐹 is much larger than
expected, then that is evidence against 𝐻0 (note that 𝑑𝑓 = 𝑛 − 𝑝 and 𝑑𝑓𝜔 = 𝑛 − 𝑞).
We reject 𝐻0 if 𝑃 𝐹𝑝−𝑞,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data); lmodo=lm(response~varso,data=data); anova(lmodo, lmod)
p-value
Compare Two Models - 2
9
Special Case (also under normality):
Model Ω: 𝒀 = X
nxp
𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = 𝟏𝛽 𝜔 + 𝜺 𝜔, i.e. no predictors.
𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 =
𝛽2
⋮
𝛽 𝑝
For this case we have 𝑅𝑆𝑆 𝜔 = 𝑆𝑆𝑡𝑐. We define 𝑆𝑆 𝑟𝑒𝑔 = 𝑆𝑆𝑡𝑐 − RSS, and so we have
𝐹 =
𝑆𝑆 𝑟𝑒𝑔/ 𝑝−1
𝑅𝑆𝑆/ 𝑛−𝑝
, which follows an 𝐹𝑝−1,𝑛−𝑝 distribution under 𝐻0. If 𝐹 is much larger than
expected, then that is evidence against 𝐻0.
We reject 𝐻0 if 𝑃 𝐹𝑝−1,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data)
R will perform this special case automatically when you run a linear model; both the F-score and the
p-value are displayed at the bottom of the summary output obtained via 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod .
p-value
Hypothesis Tests for Parameters
10
Under Normality:
𝐻0: 𝛽𝑖 = 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺
We can perform a t-test for this case: 𝑡 =
𝛽𝑖−𝑐
𝑠𝑒 𝛽𝑖
, which follows a 𝑡 𝑛−𝑝 distribution under 𝐻0. If 𝑡 is
much larger than expected, then that is evidence against 𝐻0.
𝑠𝑒 𝛽𝑖 is found by taking the square root of the ith diagonal of 𝜎2 X
′
X
−1
. In R, it is found next to
the appropriate regression coefficient in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ).
We reject 𝐻0 if 𝑃 𝑡 𝑛−𝑝 < − 𝑡 𝑜𝑟 𝑡 𝑛−𝑝 > 𝑡 < 𝛼, and fail to reject otherwise.
In R: lmod=lm(response~vars,data=data)
Calculate t using the above formula (t=(coef(summary(lmod))[i,1]-c)/coef(summary(lmod))[i,2])),
then 2 ∗ 1 − 𝑝𝑡(𝑎𝑏𝑠 𝑡 , 𝑛 − 𝑝 will give the p-value.
For the special case where 𝑐 = 0, the t-score and the p-value are displayed in the summary of the
linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ) next to 𝑠𝑒 𝛽𝑖 .
p-value
Permutation Tests
11
1) Assuming normality does NOT hold, we want to test two models with X 𝜔
∈ 𝑀 X :
𝐻0: 𝒀 = X 𝜔
𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺
We still calculate 𝐹 =
𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞
𝑅𝑆𝑆/ 𝑛−𝑝
(in R: anova(lmod2,lmod)[2,5]) but 𝐹 doesn’t follow an 𝐹𝑝−𝑞,𝑛−𝑝 distribution under
𝐻0. Instead we find a distribution to compare 𝐹 to.
If 𝜔 is the model without intercepts, randomly permute the responses, run a linear model for each permutation (in R:
update(lmod,sample(y)~.,data)), and calculate the 𝐹 for the permuted model (in R: summary(lmod)$fstat[1]). We do this
many times. The p-value then equals the proportion of permuted 𝐹s that are larger than the original 𝐹.
Otherwise, we can permute the variables not in 𝜔, and calculate the 𝐹-score for the comparison of the two. Do this many
times so we have a distribution of 𝐹-scores. The p-value, once again, equals the proportion of permuted 𝐹s that are larger
than the original 𝐹 (in R: mean(permuted fs>original f)).
2) Assuming normality does NOT hold, we want to test whether one of the parameter values equals 0.
𝐻0: 𝛽𝑖 = 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺
For this case first we calculate the usual 𝑡 =
𝛽𝑖
𝑠𝑒 𝛽𝑖
(in R: coef(summary(lmod))[i,1]/coef(summary(lmod))[i,2]). Then we
permute the values of 𝒙 𝑖 and calculate the 𝑡-score; do this many times to get a distribution for those t-scores. The p-
value equals the proportion of permuted 𝑡s that are larger than the original 𝑡 (in R: mean(permuted ts>original t)).
Confidence Intervals and Regions
12
Confidence Intervals:
If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, then the confidence interval (CI) for any 𝛽𝑖 is:
𝛽𝑖 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝑠𝑒 𝛽𝑖
(in R: confint(lmod)[i,]).
Confidence Regions:
If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2
𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, the confidence region for 𝛽𝑖 and 𝛽𝑗
simultaneously is an ellipse.
In R: plot(ellipse(lmod,c(i,j)),type="l").
To add the center: points(summary(lmod)$coef[i,], summary(lmod)$coef[j,]).
To add the individual Cis: abline(v=confint(lmod)[i,]); abline(h=confint(lmod)[j,])
Bootstrap Confidence Intervals
13
If normality does NOT hold, we create bootstrap confidence intervals. First we
estimate 𝒀 = X 𝜷 + 𝜺 for the model 𝒀 = X𝜷 + 𝜺 the usual way. Then we create an
error distribution for 𝜷 as follows:
1. Generate 𝜺∗ by sampling with replacement from 𝜺 (in R:
boote=sample(residuals(lmod),rep=T)).
2. Form 𝒀∗
= 𝒀+ 𝜺∗
(in R: bootY= fitted(lmod))+boote).
3. Calculate 𝜷∗
for 𝒀∗
= X𝜷∗
+ 𝜺∗
(in R: bootlmod=update(lmod,bootY~vars.),
where 𝜷∗
, or bootbeta =coef(bootlmod)).
We do this many times until we have a distribution of bootstrap betas. We can
obtain variances, standard errors, and Cis from this distribution (Cis in R:
quantile(bootbetas,c(alpha/2,1-alpha/2))).
Predictions
14
We found an estimated model 𝒀 = X 𝜷 + 𝜺, which for one case with predictors 𝒙 equals:
𝑌 = 𝒙′ 𝜷 + 𝜀
For a new set of predictors 𝒙0 =
1
𝑥02
⋮
𝑥0𝑝
, we can now estimate the response: 𝑌0 = 𝒙0
′
𝜷 .
In R: y0=crossprod(x0,coef(lmod)) or predict(lmod,new=data.frame(t(x0)), where in the latter case the vector
x0 must have the correct variable names.
NOTE: Since 𝑉𝑎𝑟 𝜷 = 𝜎2
X
′
X
−1
, we have 𝑉𝑎𝑟 𝑌0 = 𝜎2
𝒙0
′
X
′
X
−1
𝒙0.
• Prediction Interval (PI) for the prediction of a future observation: 𝑌0 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝜎 1 + 𝒙0
′
X
′
X
−1
𝒙0
(in R: predict(lmod,new=data.frame(t(x0)),interval="prediction"), bear in mind the vector x0 must have the
correct variable names)
• Confidence Interval (CI) for the prediction of a future mean response: 𝑌0 ± 𝑡 𝑛−𝑝,
𝛼
2
∙ 𝜎 𝒙0
′
X
′
X
−1
𝒙0
(in R: predict(lmod,new=data.frame(t(x0)),interval=“confidence"), bear in mind the vector x0 must have the
correct variable names)

More Related Content

PDF
Computational logic First Order Logic
PDF
Computational logic First Order Logic_part2
PPTX
Computational logic Propositional Calculus proof system
PPTX
31A WePrep Presentation
PPT
Lecture 1 maximum likelihood
PPTX
Tutorfly Review Session Math 31A
PDF
A Comparative Study of Two-Sample t-Test Under Fuzzy Environments Using Trape...
PPT
Chapter 3: Roots of Equations
Computational logic First Order Logic
Computational logic First Order Logic_part2
Computational logic Propositional Calculus proof system
31A WePrep Presentation
Lecture 1 maximum likelihood
Tutorfly Review Session Math 31A
A Comparative Study of Two-Sample t-Test Under Fuzzy Environments Using Trape...
Chapter 3: Roots of Equations

What's hot (20)

PPTX
Differential calculus
PPTX
Maths ppt partial diffrentian eqn
PPTX
Histroy of partial differential equation
PPTX
PPTX
ROOTS OF EQUATIONS
PPTX
The False-Position Method
DOCX
Dicrete structure
PDF
B02110105012
PDF
PPTX
Secent method
PDF
OPERATIONS RESEARCH
PPTX
Infinite series 8.3
PDF
Chapter 3: Linear Systems and Matrices - Part 3/Slides
PDF
Chapter 3: Linear Systems and Matrices - Part 2/Slides
PDF
A New Approach on the Log - Convex Orderings and Integral inequalities of the...
PDF
Chapter 3: Linear Systems and Matrices - Part 1/Slides
PPT
Wk 6 part 2 non linearites and non linearization april 05
PPTX
Cs419 lec11 bottom-up parsing
PDF
Roots of equations
PPTX
Cs419 lec10 left recursion and left factoring
Differential calculus
Maths ppt partial diffrentian eqn
Histroy of partial differential equation
ROOTS OF EQUATIONS
The False-Position Method
Dicrete structure
B02110105012
Secent method
OPERATIONS RESEARCH
Infinite series 8.3
Chapter 3: Linear Systems and Matrices - Part 3/Slides
Chapter 3: Linear Systems and Matrices - Part 2/Slides
A New Approach on the Log - Convex Orderings and Integral inequalities of the...
Chapter 3: Linear Systems and Matrices - Part 1/Slides
Wk 6 part 2 non linearites and non linearization april 05
Cs419 lec11 bottom-up parsing
Roots of equations
Cs419 lec10 left recursion and left factoring
Ad

Similar to 1. linear model, inference, prediction (20)

PDF
The linear regression model: Theory and Application
PDF
eR-Biostat_LinearRegressioninR_2017_V1.pdf
PDF
Linear_Models_with_R_----_(2._Estimation).pdf
PPT
Lec1.regression
PDF
Materi_Business_Intelligence_1.pdf
PDF
Lecture 1.pdf
PPT
Chapter14
PDF
Chapter6
PPTX
2. diagnostics, collinearity, transformation, and missing data
PDF
Linear regression model in econometrics undergraduate
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PDF
Subject-3---Bayesian-regression-models-2024.pdf
PDF
I stata
PPT
Chapter14
PPTX
Lecture 2_Chapter 4_Simple linear regression.pptx
PPTX
Linear Regression
PPTX
Types of regression ii
DOCX
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
The linear regression model: Theory and Application
eR-Biostat_LinearRegressioninR_2017_V1.pdf
Linear_Models_with_R_----_(2._Estimation).pdf
Lec1.regression
Materi_Business_Intelligence_1.pdf
Lecture 1.pdf
Chapter14
Chapter6
2. diagnostics, collinearity, transformation, and missing data
Linear regression model in econometrics undergraduate
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Subject-3---Bayesian-regression-models-2024.pdf
I stata
Chapter14
Lecture 2_Chapter 4_Simple linear regression.pptx
Linear Regression
Types of regression ii
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Ad

Recently uploaded (20)

PPTX
Sistem Informasi Manejemn-Sistem Manajemen Database
PPTX
UNIT-1 NOTES Data warehousing and data mining.pptx
PDF
Library Hi Tech, technology of the world
PDF
Stochastic Programming problem presentationLuedtke.pdf
PPTX
Chapter_5_ network layer control plan v8.2.pptx
PDF
American Journal of Multidisciplinary Research and Review
PPTX
Basic Statistical Analysis for experimental data.pptx
PDF
TenneT-Integrated-Annual-Report-2018.pdf
PPT
Handout for Lean and Six Sigma application
PDF
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
PDF
GPL License Terms of document persentaion
PPTX
BDA_Basics of Big data Unit-1.pptx Big data
PPTX
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
PPTX
PSU research training.pptxPSU research training.pptx
PPTX
Transport System for Biology students in the 11th grade
PPTX
An Introduction to Lean Six Sigma for Bilginer
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PPTX
cardiac failure and associated notes.pptx
PPTX
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
PPT
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt
Sistem Informasi Manejemn-Sistem Manajemen Database
UNIT-1 NOTES Data warehousing and data mining.pptx
Library Hi Tech, technology of the world
Stochastic Programming problem presentationLuedtke.pdf
Chapter_5_ network layer control plan v8.2.pptx
American Journal of Multidisciplinary Research and Review
Basic Statistical Analysis for experimental data.pptx
TenneT-Integrated-Annual-Report-2018.pdf
Handout for Lean and Six Sigma application
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
GPL License Terms of document persentaion
BDA_Basics of Big data Unit-1.pptx Big data
Microsoft Fabric Modernization Pathways in Action: Strategic Insights for Dat...
PSU research training.pptxPSU research training.pptx
Transport System for Biology students in the 11th grade
An Introduction to Lean Six Sigma for Bilginer
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
cardiac failure and associated notes.pptx
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
genetics-16bbbbbbhhbbbjjjjjjjjffggg11-.ppt

1. linear model, inference, prediction

  • 1. DATA 503 – Applied Regression Analysis Lecture 9: Linear Model, Inference, and Prediction Highlights By Dr. Ellie Small
  • 2. OverviewTopics: • Initial Data Analysis • Linear Model • Identifiability and Orthogonality • Compare Two Models • Hypothesis Tests for Parameters • Permutation Tests • Confidence Intervals and Regions • Bootstrap Confidence Intervals • Predictions 2
  • 3. Initial Data Analysis We first check the data for errors (often data entry errors): • summary(data) in R. Look for: ₋ Unreasonable range – Change the minimum/maximum values ₋ Look for coding of missing values – Set them to NA ₋ Look for variables that should have been designated as factors (few distinct values) - factor(var) in R • Check graphs for unusual behavior/effects: ₋ Histogram for single variable – hist(var) ₋ Plot the density of a single variable - plot(density(var)) ₋ Scatterplot for 2 variables – plot(var1~var2) ₋ Grouped boxplot for 2 variables where var2 is a factor – plot(var1~var2) 3
  • 4. Linear Model 𝑌 = 𝒙′ 𝜷 + 𝜀 𝑓𝑜𝑟 1 𝑐𝑎𝑠𝑒 , 𝒙 = 1 𝑥2 ⋮ 𝑥 𝑝 ∈ ℝ 𝑝 𝑡ℎ𝑒 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒐𝒓𝒔 𝒀 = X𝜷 + 𝜺 (𝑓𝑜𝑟 𝑎 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛) Where X nxp = 𝒙1 ′ ⋮ 𝒙 𝑛 ′ = 1 𝑥12 ⋯ 𝑥1𝑝 ⋮ ⋮ ⋱ ⋮ 1 𝑥 𝑛2 ⋯ 𝑥 𝑛𝑝 = 𝟏 𝑛 𝒙 2 ⋯ 𝒙(𝑝) 𝒙𝑖 is the ith set of predictors (for case i), while 𝒙 𝑗 contains all values for the jth predictor (or variable). 𝒀 = 𝑌1 ⋮ 𝑌𝑛 , 𝜺 = 𝜀1 ⋮ 𝜀 𝑛 , and 𝜷 = 𝛽1 ⋮ 𝛽 𝑝 Assumptions: 𝐸 𝜺 = 𝟎, 𝑉𝑎𝑟 𝜺 = 𝜎2 𝐼 𝑛 4 Response vector Error vector Parameter vector Model matrix
  • 5. Linear Model - 2 Estimation: 𝒀 = X 𝜷 + 𝜺 𝒀 = 𝒀 + 𝜺 , where 𝒀 = X 𝜷 = 𝑃X 𝒀, RSS = 𝜺 2 , 𝜎2 = 𝑅𝑆𝑆 𝑛−𝑝 , 𝑑𝑓 = 𝑛 − 𝑝 This is the least squares estimate and minimizes the RSS compared to all other linear combinations of the column vectors of X. 𝑆𝑆𝑡𝑐 is the RSS for the model without predictors. 𝑅2 is the proportion of variance explained by the model, or the improvement of the model compared to the model without predictors: 𝑅2 = 𝑆𝑆𝑡𝑐−𝑅𝑆𝑆 𝑆𝑆𝑡𝑐 . Normal Equations: X ′ X 𝜷 = X ′ 𝒀 . If 𝑟𝑎𝑛𝑘 X nxp = 𝑝, then 𝜷 = X ′ X −1 X ′ 𝒀 5 Regression coefficient vector Residual vector Fitted value vector Sum of Squares Total Corrected
  • 6. Linear Model - 3 In R: lmod=lm(response~vars,data=data) 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information 𝒀=fitted(lmod) 𝜺=residuals(lmod) RSS=deviance(lmod) 𝜷=coef(lmod) df=n-p=df.residual(lmod) 6 𝑟𝑎𝑛𝑘 X = 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘 𝜎 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑠𝑖𝑔𝑚𝑎 𝑅2 = 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod $𝑟. 𝑠𝑞𝑢𝑎𝑟𝑒𝑑
  • 7. Identifiability and Orthogonality 7 Identifiability Normal Equations: X ′ X 𝜷 = X ′ 𝒀 if 𝑟𝑎𝑛𝑘 X ≠ 𝑝, then at least one of the variables (columns in X) is a linear combination of the others. This means that X ′ X is not invertible and there are many solutions for 𝜷 in the system of linear equations given by the normal equations. 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod gives all the summary information. Check to see if any of the 𝛽𝑖 are set (by R) to NA, or if 𝑟𝑎𝑛𝑘 X ≠ 𝑝 (via 𝑙𝑚𝑜𝑑$𝑟𝑎𝑛𝑘) in which case one or more of the variables (columns in X) are a linear combination of the others. Check the relationships and remove the appropriate variable(s). Orthogonality If the columns of the model matrix (the variables) are orthogonal, then any model with a subset of those variables will have the same estimates for the parameters of those variables, i.e. their regression coefficients 𝛽𝑖 are equal between the models. Note, however, that the estimate of the error variance will be different between the models, which will affect the CI of each 𝛽𝑖.
  • 8. Compare Two Models 8 Assume we have normality, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 . Model Ω: 𝒀 = X nxp 𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = X 𝜔 nxq 𝜷 𝜔 + 𝜺 𝜔 where X 𝜔 ∈ 𝑀 X 𝐻0: 𝒀 = X 𝜔 𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺 If X 𝜔 contains the first q columns of X, then this is equivalent to: 𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 = 𝛽 𝑞+1 ⋮ 𝛽 𝑝 If 𝐻0 holds, i.e. there is no relationship between 𝜷 𝑟 and 𝒀, then the difference between 𝑅𝑆𝑆 𝜔 and 𝑅𝑆𝑆 is random (note that 𝑅𝑆𝑆 𝜔 ≥ 𝑅𝑆𝑆 ), and 𝐹 = 𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞 𝑅𝑆𝑆/ 𝑛−𝑝 follows an 𝐹𝑝−𝑞,𝑛−𝑝 distribution. If 𝐹 is much larger than expected, then that is evidence against 𝐻0 (note that 𝑑𝑓 = 𝑛 − 𝑝 and 𝑑𝑓𝜔 = 𝑛 − 𝑞). We reject 𝐻0 if 𝑃 𝐹𝑝−𝑞,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data); lmodo=lm(response~varso,data=data); anova(lmodo, lmod) p-value
  • 9. Compare Two Models - 2 9 Special Case (also under normality): Model Ω: 𝒀 = X nxp 𝜷 + 𝜺 vs. Model 𝜔: 𝒀 = 𝟏𝛽 𝜔 + 𝜺 𝜔, i.e. no predictors. 𝐻0: 𝜷 𝑟 = 𝟎 𝐻1: 𝜷 𝑟 ≠ 𝟎 where 𝜷 𝑟 = 𝛽2 ⋮ 𝛽 𝑝 For this case we have 𝑅𝑆𝑆 𝜔 = 𝑆𝑆𝑡𝑐. We define 𝑆𝑆 𝑟𝑒𝑔 = 𝑆𝑆𝑡𝑐 − RSS, and so we have 𝐹 = 𝑆𝑆 𝑟𝑒𝑔/ 𝑝−1 𝑅𝑆𝑆/ 𝑛−𝑝 , which follows an 𝐹𝑝−1,𝑛−𝑝 distribution under 𝐻0. If 𝐹 is much larger than expected, then that is evidence against 𝐻0. We reject 𝐻0 if 𝑃 𝐹𝑝−1,𝑛−𝑝 > 𝐹 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data) R will perform this special case automatically when you run a linear model; both the F-score and the p-value are displayed at the bottom of the summary output obtained via 𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod . p-value
  • 10. Hypothesis Tests for Parameters 10 Under Normality: 𝐻0: 𝛽𝑖 = 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 𝑐 𝑖𝑛 𝒀 = X𝜷 + 𝜺 We can perform a t-test for this case: 𝑡 = 𝛽𝑖−𝑐 𝑠𝑒 𝛽𝑖 , which follows a 𝑡 𝑛−𝑝 distribution under 𝐻0. If 𝑡 is much larger than expected, then that is evidence against 𝐻0. 𝑠𝑒 𝛽𝑖 is found by taking the square root of the ith diagonal of 𝜎2 X ′ X −1 . In R, it is found next to the appropriate regression coefficient in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ). We reject 𝐻0 if 𝑃 𝑡 𝑛−𝑝 < − 𝑡 𝑜𝑟 𝑡 𝑛−𝑝 > 𝑡 < 𝛼, and fail to reject otherwise. In R: lmod=lm(response~vars,data=data) Calculate t using the above formula (t=(coef(summary(lmod))[i,1]-c)/coef(summary(lmod))[i,2])), then 2 ∗ 1 − 𝑝𝑡(𝑎𝑏𝑠 𝑡 , 𝑛 − 𝑝 will give the p-value. For the special case where 𝑐 = 0, the t-score and the p-value are displayed in the summary of the linear model (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 lmod ) next to 𝑠𝑒 𝛽𝑖 . p-value
  • 11. Permutation Tests 11 1) Assuming normality does NOT hold, we want to test two models with X 𝜔 ∈ 𝑀 X : 𝐻0: 𝒀 = X 𝜔 𝜷 𝜔 + 𝜺 𝜔 𝐻1: 𝒀 = X𝜷 + 𝜺 We still calculate 𝐹 = 𝑅𝑆𝑆 𝜔−𝑅𝑆𝑆 / 𝑝−𝑞 𝑅𝑆𝑆/ 𝑛−𝑝 (in R: anova(lmod2,lmod)[2,5]) but 𝐹 doesn’t follow an 𝐹𝑝−𝑞,𝑛−𝑝 distribution under 𝐻0. Instead we find a distribution to compare 𝐹 to. If 𝜔 is the model without intercepts, randomly permute the responses, run a linear model for each permutation (in R: update(lmod,sample(y)~.,data)), and calculate the 𝐹 for the permuted model (in R: summary(lmod)$fstat[1]). We do this many times. The p-value then equals the proportion of permuted 𝐹s that are larger than the original 𝐹. Otherwise, we can permute the variables not in 𝜔, and calculate the 𝐹-score for the comparison of the two. Do this many times so we have a distribution of 𝐹-scores. The p-value, once again, equals the proportion of permuted 𝐹s that are larger than the original 𝐹 (in R: mean(permuted fs>original f)). 2) Assuming normality does NOT hold, we want to test whether one of the parameter values equals 0. 𝐻0: 𝛽𝑖 = 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 𝐻1: 𝛽𝑖 ≠ 0 𝑖𝑛 𝒀 = X𝜷 + 𝜺 For this case first we calculate the usual 𝑡 = 𝛽𝑖 𝑠𝑒 𝛽𝑖 (in R: coef(summary(lmod))[i,1]/coef(summary(lmod))[i,2]). Then we permute the values of 𝒙 𝑖 and calculate the 𝑡-score; do this many times to get a distribution for those t-scores. The p- value equals the proportion of permuted 𝑡s that are larger than the original 𝑡 (in R: mean(permuted ts>original t)).
  • 12. Confidence Intervals and Regions 12 Confidence Intervals: If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, then the confidence interval (CI) for any 𝛽𝑖 is: 𝛽𝑖 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝑠𝑒 𝛽𝑖 (in R: confint(lmod)[i,]). Confidence Regions: If normality holds, i.e. 𝜺~𝑁 𝑛 𝟎, 𝜎2 𝐼 , and 𝑟𝑎𝑛𝑘 X = 𝑝, the confidence region for 𝛽𝑖 and 𝛽𝑗 simultaneously is an ellipse. In R: plot(ellipse(lmod,c(i,j)),type="l"). To add the center: points(summary(lmod)$coef[i,], summary(lmod)$coef[j,]). To add the individual Cis: abline(v=confint(lmod)[i,]); abline(h=confint(lmod)[j,])
  • 13. Bootstrap Confidence Intervals 13 If normality does NOT hold, we create bootstrap confidence intervals. First we estimate 𝒀 = X 𝜷 + 𝜺 for the model 𝒀 = X𝜷 + 𝜺 the usual way. Then we create an error distribution for 𝜷 as follows: 1. Generate 𝜺∗ by sampling with replacement from 𝜺 (in R: boote=sample(residuals(lmod),rep=T)). 2. Form 𝒀∗ = 𝒀+ 𝜺∗ (in R: bootY= fitted(lmod))+boote). 3. Calculate 𝜷∗ for 𝒀∗ = X𝜷∗ + 𝜺∗ (in R: bootlmod=update(lmod,bootY~vars.), where 𝜷∗ , or bootbeta =coef(bootlmod)). We do this many times until we have a distribution of bootstrap betas. We can obtain variances, standard errors, and Cis from this distribution (Cis in R: quantile(bootbetas,c(alpha/2,1-alpha/2))).
  • 14. Predictions 14 We found an estimated model 𝒀 = X 𝜷 + 𝜺, which for one case with predictors 𝒙 equals: 𝑌 = 𝒙′ 𝜷 + 𝜀 For a new set of predictors 𝒙0 = 1 𝑥02 ⋮ 𝑥0𝑝 , we can now estimate the response: 𝑌0 = 𝒙0 ′ 𝜷 . In R: y0=crossprod(x0,coef(lmod)) or predict(lmod,new=data.frame(t(x0)), where in the latter case the vector x0 must have the correct variable names. NOTE: Since 𝑉𝑎𝑟 𝜷 = 𝜎2 X ′ X −1 , we have 𝑉𝑎𝑟 𝑌0 = 𝜎2 𝒙0 ′ X ′ X −1 𝒙0. • Prediction Interval (PI) for the prediction of a future observation: 𝑌0 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝜎 1 + 𝒙0 ′ X ′ X −1 𝒙0 (in R: predict(lmod,new=data.frame(t(x0)),interval="prediction"), bear in mind the vector x0 must have the correct variable names) • Confidence Interval (CI) for the prediction of a future mean response: 𝑌0 ± 𝑡 𝑛−𝑝, 𝛼 2 ∙ 𝜎 𝒙0 ′ X ′ X −1 𝒙0 (in R: predict(lmod,new=data.frame(t(x0)),interval=“confidence"), bear in mind the vector x0 must have the correct variable names)