0% found this document useful (0 votes)

115 views14 pages

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

This document discusses inference and prediction in simple linear regression models. It reviews how to calculate standard errors and test statistics for regression coefficients that follow a t-distribution. Confidence intervals can be constructed using these statistics. Prediction intervals are also discussed, which account for variability in predicting new outcomes based on the regression model. Examples using diamond price data are provided to demonstrate calculating and plotting confidence and prediction intervals.

Uploaded by

Alex Boncu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views14 pages

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Alex Boncu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Inference in regression

Brian Caffo, Jeff Leek and Roger Peng

Johns Hopkins Bloomberg School of Public Health

Recall our model and fitted values

Considerthemodel

Yi = 0 + 1 Xi + i
N(0, 2 ) .
Weassumethatthetruemodelisknown.
Weassumethatyou'veseenconfidenceintervalsandhypothesistestsbefore.
0 = Y 1 X
1 = Cor(Y, X)

Sd(Y)
Sd(X)

2/14

Review

Statisticslike oftenhavethefollowingproperties.

1. IsnormallydistributedandhasafinitesampleStudent'sTdistributioniftheestimatedvariance
isreplacedwithasampleestimate(undernormalityassumptions).
2. CanbeusedtotestH0 : = 0 versusHa : >, <, 0 .
3. Canbeusedtocreateaconfidenceintervalfor via Q1/2 where Q1/2 istherelevant
quantilefromeitheranormalorTdistribution.
Inthecaseofregressionwithiidsamplingassumptionsandnormalerrors,ourinferenceswillfollow
verysimilarilytowhatyousawinyourinferenceclass.
Wewon'tcoverasymptoticsforregressionanalysis,butsufficeittosaythatunderassumptionson
thewaysinwhichthe X valuesarecollected,theiidsamplingmodel,andmeanmodel,thenormal
resultsholdtocreateintervalsandconfidenceintervals

3/14

Standard errors (conditioned on X)

ni=1 (Yi Y )(Xi X )
Var( 1 ) = Var
(
n (Xi X )2
i=1

n
Var (i=1 Yi (Xi X ))
2
n
i=1 (Xi X )2 )
n 2 (Xi X )2
(

i=1

2
n
i=1 (Xi X) 2 )

2
ni=1 (Xi X) 2

4/14

Results
2 = Var( 1 ) = 2 / i=1 ( Xi X ) 2
n

2 = Var( ) =
0

1
n
(

2
X
ni=1 (X iX ) 2

Inpractice, isreplacedbyitsestimate.
It'sprobablynotsurprisingthatunderiidGaussianerrors

j j

followsat distributionwithn 2 degreesoffreedomandanormaldistributionforlargen.

Thiscanbeusedtocreateconfidenceintervalsandperformhypothesistests.

5/14

Example diamond data set

library(UsingR); data(diamond)
y <- diamond$price; x <- diamond$carat; n <- length(y)
beta1 <- cor(y, x) * sd(y) / sd(x)
beta0 <- mean(y) - beta1 * mean(x)
e <- y - beta0 - beta1 * x
sigma <- sqrt(sum(e^2) / (n-2))
ssx <- sum((x - mean(x))^2)
seBeta0 <- (1 / n + mean(x) ^ 2 / ssx) ^ .5 * sigma
seBeta1 <- sigma / sqrt(ssx)
tBeta0 <- beta0 / seBeta0; tBeta1 <- beta1 / seBeta1
pBeta0 <- 2 * pt(abs(tBeta0), df = n - 2, lower.tail = FALSE)
pBeta1 <- 2 * pt(abs(tBeta1), df = n - 2, lower.tail = FALSE)
coefTable <- rbind(c(beta0, seBeta0, tBeta0, pBeta0), c(beta1, seBeta1, tBeta1, pBeta1))
colnames(coefTable) <- c("Estimate", "Std. Error", "t value", "P(>|t|)")
rownames(coefTable) <- c("(Intercept)", "x")

6/14

Example continued
coefTable

Estimate Std. Error t value P(>|t|)

(Intercept) -259.6
17.32 -14.99 2.523e-19
x
3721.0
81.79 45.50 6.751e-40

fit <- lm(y ~ x);

summary(fit)$coefficients

Estimate Std. Error t value Pr(>|t|)

(Intercept) -259.6
17.32 -14.99 2.523e-19
x
3721.0
81.79 45.50 6.751e-40

7/14

Getting a confidence interval

sumCoef <- summary(fit)$coefficients
sumCoef[1,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[1, 2]

[1] -294.5 -224.8

sumCoef[2,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[2, 2]

[1] 3556 3886

With95%confidence,weestimatethata0.1caratincreaseindiamondsizeresultsina355.6to388.6
increaseinpricein(Singapore)dollars.

8/14

Prediction of outcomes
ConsiderpredictingY atavalueofX
Predictingthepriceofadiamondgiventhecarat
Predictingtheheightofachildgiventheheightoftheparents
Theobviousestimateforpredictionatpointx 0 is

0 + 1 x 0
Astandarderrorisneededtocreateapredictioninterval.
There'sadistinctionbetweenintervalsfortheregressionlineatpoint x 0 andthepredictionofwhat
aywouldbeatpointx 0 .
Lineatx se,
0

1
(x0 X ) 2
+
n
2
n

i=1 (X iX )

Predictionintervalseatx ,
0

(x0
X ) 2
1 + 1n + n
2

i=1 (X iX )

9/14

Plotting the prediction intervals

plot(x, y, frame=FALSE,xlab="Carat",ylab="Dollars",pch=21,col="black", bg="lightblue", cex=2)
abline(fit, lwd = 2)
xVals <- seq(min(x), max(x), by = .01)
yVals <- beta0 + beta1 * xVals
se1 <- sigma * sqrt(1 / n + (xVals - mean(x))^2/ssx)
se2 <- sigma * sqrt(1 + 1 / n + (xVals - mean(x))^2/ssx)
lines(xVals, yVals + 2 * se1)
lines(xVals, yVals - 2 * se1)
lines(xVals, yVals + 2 * se2)
lines(xVals, yVals - 2 * se2)

10/14

Plotting the prediction intervals

11/14

Discussion
Bothintervalshavevaryingwidths.
LeastwidthatthemeanoftheXs.
Wearequiteconfidentintheregressionline,sothatintervalisverynarrow.
Ifweknew 0 and 1 thisintervalwouldhavezerowidth.
Thepredictionintervalmustincorporatethevariabilibityinthedataaroundtheline.
Evenifweknew 0 and 1 thisintervalwouldstillhavewidth.

12/14

In R
newdata <- data.frame(x = xVals)
p1 <- predict(fit, newdata, interval = ("confidence"))
p2 <- predict(fit, newdata, interval = ("prediction"))
plot(x, y, frame=FALSE,xlab="Carat",ylab="Dollars",pch=21,col="black", bg="lightblue", cex=2)
abline(fit, lwd = 2)
lines(xVals, p1[,2]); lines(xVals, p1[,3])
lines(xVals, p2[,2]); lines(xVals, p2[,3])

13/14

In R

14/14

POM - Assignment 2 - Nov 2015
No ratings yet
POM - Assignment 2 - Nov 2015
10 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Researcher's Toolbox PDF
100% (2)
Researcher's Toolbox PDF
161 pages
Anne Anastasi - Psychological Testing I
91% (11)
Anne Anastasi - Psychological Testing I
104 pages
Asymptotic_Theory (5)
No ratings yet
Asymptotic_Theory (5)
43 pages
Pl300 - Lab Practise
No ratings yet
Pl300 - Lab Practise
319 pages
Ch10
No ratings yet
Ch10
71 pages
Chapter 2: Simple Linear Regression (Cont'd)
No ratings yet
Chapter 2: Simple Linear Regression (Cont'd)
37 pages
AMS 103 Lecture Notes
No ratings yet
AMS 103 Lecture Notes
33 pages
ECON835 Lecture Notes Part 1 Probability Through Asymptotics (Fall 2014)
No ratings yet
ECON835 Lecture Notes Part 1 Probability Through Asymptotics (Fall 2014)
75 pages
Full Stack Data Science Combined Brochure
No ratings yet
Full Stack Data Science Combined Brochure
15 pages
Regression
No ratings yet
Regression
60 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Non Tech Data Analytics Roadmap 1689017100
No ratings yet
Non Tech Data Analytics Roadmap 1689017100
10 pages
Job Profiles AND Descriptions: Tech Global
No ratings yet
Job Profiles AND Descriptions: Tech Global
9 pages
M1T1L4-Simple Linear Regression Final
No ratings yet
M1T1L4-Simple Linear Regression Final
17 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
(Ebook PDF) Basic Marketing Research 9th Edition by Tom J. Brown PDF Download
100% (2)
(Ebook PDF) Basic Marketing Research 9th Edition by Tom J. Brown PDF Download
55 pages
Cuốn 3 (Bản 1982 - 11 Chapters)
No ratings yet
Cuốn 3 (Bản 1982 - 11 Chapters)
127 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Aarti Resume1111
No ratings yet
Aarti Resume1111
3 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Correlation and Hypthesis Testing MCQs
No ratings yet
Correlation and Hypthesis Testing MCQs
11 pages
Data Science in The BFSI Domain: Transforming Financial Services
No ratings yet
Data Science in The BFSI Domain: Transforming Financial Services
19 pages
Econometría
No ratings yet
Econometría
43 pages
Astros
No ratings yet
Astros
20 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Business Analytics Unit 5
No ratings yet
Business Analytics Unit 5
19 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Energizer Scrambled Sentences
No ratings yet
Energizer Scrambled Sentences
2 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Regression Models For Data Science in R
No ratings yet
Regression Models For Data Science in R
137 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Sendek Eyasu
No ratings yet
Sendek Eyasu
106 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Introduction To Descriptive Analytics
No ratings yet
Introduction To Descriptive Analytics
21 pages
ML IA1 Answers
No ratings yet
ML IA1 Answers
26 pages
Bisecting K Means
No ratings yet
Bisecting K Means
2 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
2.4 Confidence Intervals and Prediction Intervals in Linear Models
No ratings yet
2.4 Confidence Intervals and Prediction Intervals in Linear Models
7 pages
MCQ From Predictive Analytics
No ratings yet
MCQ From Predictive Analytics
10 pages
Simple and Multiple Linear Regression and Correlation
No ratings yet
Simple and Multiple Linear Regression and Correlation
41 pages
Lecture 1 (With Ans)
No ratings yet
Lecture 1 (With Ans)
10 pages
3334 Exam Cheat Sheet
No ratings yet
3334 Exam Cheat Sheet
26 pages
A System For The Behavioral Assessment of Athletic Coaches. Research Quarterly, 48, 401-407
No ratings yet
A System For The Behavioral Assessment of Athletic Coaches. Research Quarterly, 48, 401-407
8 pages
RUL 670 - Research Proposal
No ratings yet
RUL 670 - Research Proposal
28 pages
1st DMEPA ATTACHMENTS 1
No ratings yet
1st DMEPA ATTACHMENTS 1
10 pages
Confidence Interval, Model Fitness and Prediction: S S T B
No ratings yet
Confidence Interval, Model Fitness and Prediction: S S T B
8 pages
Cheatsheet
No ratings yet
Cheatsheet
4 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
4 pages
Statistical Inference Notes Melon University
No ratings yet
Statistical Inference Notes Melon University
5 pages
Curran 2011
No ratings yet
Curran 2011
7 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Journal of Accounting Education: Lindsay M. Andiola, Erin Masters, Carolyn Norman
No ratings yet
Journal of Accounting Education: Lindsay M. Andiola, Erin Masters, Carolyn Norman
18 pages
Ethical Principles of Psychologists and Code of Conduct
No ratings yet
Ethical Principles of Psychologists and Code of Conduct
18 pages
Hollembeak 2005
No ratings yet
Hollembeak 2005
18 pages
Common Risk and Protective Factors in Successful Prevention Programs
No ratings yet
Common Risk and Protective Factors in Successful Prevention Programs
10 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
International Review of Sport and Exercise Psychology
No ratings yet
International Review of Sport and Exercise Psychology
26 pages
Multiple Intelligences1
No ratings yet
Multiple Intelligences1
41 pages
PERT Time Estimates
No ratings yet
PERT Time Estimates
1 page
Gri Zenko 1994
No ratings yet
Gri Zenko 1994
11 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
7 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Regression 101
No ratings yet
Regression 101
18 pages
Inference in The Regression Model
No ratings yet
Inference in The Regression Model
4 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
9 pages
Amos Proc Calis (Sas) Eqs JMP Lisrel Mplus MX Openmx R Sepath (Statistica) Sem (Stata) W.Plsmodel
No ratings yet
Amos Proc Calis (Sas) Eqs JMP Lisrel Mplus MX Openmx R Sepath (Statistica) Sem (Stata) W.Plsmodel
18 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Project 5 Surabhi Sood - Report
No ratings yet
Project 5 Surabhi Sood - Report
34 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Data Mining by Worapoj Kreesuradej
No ratings yet
Data Mining by Worapoj Kreesuradej
43 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Personality and Individual Differences: Alexandra Martins, Nelson Ramalho, Estelle Morin
No ratings yet
Personality and Individual Differences: Alexandra Martins, Nelson Ramalho, Estelle Morin
11 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
14 pages
Oferta Bauturi Alcoolice
No ratings yet
Oferta Bauturi Alcoolice
69 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
No ratings yet
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
24 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
DrAshrafElsafty-E-RM-44D-FinalExam-Ahmed Fawzy-V17
No ratings yet
DrAshrafElsafty-E-RM-44D-FinalExam-Ahmed Fawzy-V17
63 pages

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Inference in regression

Brian Caffo, Jeff Leek and Roger Peng

Recall our model and fitted values

Standard errors (conditioned on X)

followsat distributionwithn 2 degreesoffreedomandanormaldistributionforlargen.

Example diamond data set

Estimate Std. Error t value P(>|t|)

fit <- lm(y ~ x);

Estimate Std. Error t value Pr(>|t|)

Getting a confidence interval

[1] -294.5 -224.8

sumCoef[2,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[2, 2]

[1] 3556 3886

Plotting the prediction intervals

Plotting the prediction intervals

You might also like