Inference in Normal Regression
Model
Dr. Frank Wood
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 1
Remember
• Last class we derived the sampling variance
of the estimator of the slope, it being
2
2 σ
σ {b1 } = (Xi −X̄)2
• And we made the point that an estimate of
σ{b1} could be arrived at by substituting the
MSE for the unknown error variance.
SSE
2 M SE
s {b1 } = (X −X̄)2 = (X
n−2
2
i i −X̄)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 2
Sampling Distribution of (b1 - β)/s{b1}
• We determined that b1 is normally distributed
so (b1-β)/σ{b1} is a standard normal variable
• We don’t know σ{b1} so it must be estimated
from data. We have already denoted it’s
estimate s{b1}
• Using this estimate we it can be shown that
b1 −β1
s{b1 } ∼ t(n − 2) s{b1 } = s2 {b1 }
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 3
Where does this come from?
• We need to rely upon the following theorem
– For the normal error regression model
(Yi −Ŷi )2
SSE
σ2 = σ2 ∼ χ2 (n − 2)
and is independent of b0 and b1
• Intuitively this follows the standard result for the sum
of squared normal random variables
– Here there are two linear constraints imposed by the
regression parameter estimation that each reduce the
number of degrees of freedom by one.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 4
Another useful fact : t distribution
• Let z and χ(ν) be independent random
variables (standard normal (N(0,1)) and χ
respectively). We then define a t random
variable as follows:
t(ν) = χz2 (ν)
ν
This version of the t distribution has one
parameter, the degrees of freedom ν
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 5
Distribution of the studentized statistic
• To derive the distribution of this statistic using
the provided theorems, first we do the
following rewrite The numerator
is a N(0,1)
b1 −β1 normal variable
b1 −β1 σ{b1 }
s{b1 } = s{b1 }
σ{b1 }
s{b1 } s2 {b1 }
σ{b1 } = σ 2 {b1 }
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 6
Studentized statistic cont.
• And note the following
M SE
s2 {b1 } (Xi −X̄)2
M SE SSE
σ 2 {b1 } = σ2
= σ2 = σ 2 (n−2)
(Xi −X̄)2
where we know (by the given theorem) the
distribution of the last term is χ and indep. of
b1 and b0
SSE χ2 (n−2)
σ 2 (n−2) ∼ n−2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 7
Studentized statistic final
• But by the given definition of the t distribution
we have our result
b1 −β1 z
s{b1 } ∼
χ2 (n−2)
n−2
because putting everything together we can
see that
b1 −β1
s{b1 } ∼ t(n − 2)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 8
Confidence Intervals and Hypothesis Tests
• Now that we know the sampling distribution of
b (t with n-2 degrees of freedom) we can
construct confidence intervals and hypothesis
tests easily
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 9
Confidence Interval for β
• Since the “studentized” statistic follows a t
distribution we can make the following
probability statement
b1 −β1
P (t(α/2; n − 2) ≤ s{b1 } ≤ t(1 − α/2; n − 2)) = 1 − α
0.4 1 3
ν = 10 ν = 10 ICDF ν = 10
0.9
0.35
2
0.8
0.3
0.7
1
0.25
0.6
0.2 0.5 0
0.4
0.15
0.3 -1
0.1
0.2
-2
0.05
0.1
0 0
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10 -3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 10
Interval arriving from picking α
• Note that by symmetry
t(α/2; n − 2) = −t(1 − α/2; n − 2)
• Rearranging terms and using this fact we
have
P (b1 − t(1 − α/2; n − 2)s{b1 } ≤ β1 ≤ b1 + t(1 − α/2; n − 2)s{b1 }) = 1 − α
• And now we can use a table to look up and
produce confidence intervals
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 11
Using tables for Computing Intervals
• The tables in the book (table B.2 in the
appendix) for t(1-α/2;ν) where
– P{t(ν) ≤ t(1-α/2; ν)} = A
• Provides the inverse CDF of the t-distribution
• This can be arrived at computationally as well
– Matlab: tinv(1-α/2, ν)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 12
1-α confidence limits for β
• The 1-α confidence limits for β are
b1 ± t(1 − α/2; n − 2)s{b1 }
• Note that this quantity can be used to
calculate confidence intervals given n and α.
– Fixing α can guide the choice of sample size if a
particular confidence interval is desired
– Give a sample size, vice versa.
• Also useful for hypothesis testing
Show demo.m
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 13
Tests Concerning β
• Example 1
– Two-sided test
• H0 : β = 0
• Ha : β ≠ 0
• Test statistic
∗ b1 −0
t = Ŝ(b1 )
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 14
Tests Concerning β
• We have an estimate of the sampling
distribution of b1 from the data.
• If the null hypothesis holds then the b1
estimate coming from the data should be
within the 95% confidence interval of the
sampling distribution centered at 0 (in this
case)
∗ b1 −0
t = s{b1 }
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 15
Decision rules
if |t∗ | ≤ t(1 − α/2; n − 2), conclude H0
if |t∗ | > t(1 − α/2; n − 2), conclude Hα
• Absolute values make the test two-sided
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 16
Intuition 1-α confidence interval
Test statistic
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
p-value is value of α
0 that moves the green line
-10 -8 -6 -4 -2 0 2 4 6 8 10
( β est. - β) / σ est.
to the blue line
β
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 17
Calculating the p-value
• The p-value, or attained significance level, is
the smallest level of significance α for which
the observed data indicate that the null
hypothesis should be rejected.
• This can be looked up using the CDF of the
test statistic.
• In Matlab
– Two-sided p-value
• 2*(1-tcdf(|t*|,ν))
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 18
Inferences Concerning β
• Largely, inference procedures regarding β
can be performed in the same way as those
for β
• Remember the point estimator b0 for β
b0 = Ȳ − b1 X̄
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 19
Sampling distribution of b0
• The sampling distribution of b0 refers to the
different values of b0 that would be obtained
with repeated sampling when the levels of the
predictor variable X are held constant from
sample to sample.
• For the normal regression model the
sampling distribution of b0 is normal
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 20
Sampling distribution of b0
• When error variance is known
E(b0 ) = β0
2
2 2 1 X̄
σ {b0 } = σ (n + (Xi −X̄) 2
)
• When error variance is unknown
2
2
s {b0 } = M SE( n1 + X̄
(Xi −X̄) 2
)
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 21
Confidence interval for β
• The 1-α confidence limits for β are obtained
in the same manner as those for β
b0 ± t(1 − α/2; n − 2)s{b0 }
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 22
Considerations on Inferences on β and β
• Effects of departures from normality
– The estimators of β and β have the property of
asymptotic normality – their distributions
approach normality as the sample size increases
(under general conditions)
• Spacing of the X levels
– The variances of b0 and b1 (for a given n and σ)
depend strongly on the spacing of X
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 23
Sampling distribution of point estimator of mean response
• Let Xh be the level of X for which we would
like an estimate of the mean response
– Needs to be one of the observed X’s
• The mean response when X=Xh is denoted by
E{Yh}
• The point estimator of E{Yh} is
Ŷh = b0 + b1 Xh
We are interested in the sampling distribution
of this quantity
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 24
Sampling Distribution of Ŷh
• We have
Ŷh = b0 + b1 Xh
• Since this quantity is itself a linear
combination of the Yi’s it’s sampling
distribution is itself normal.
• The mean of the sampling distribution is
E{Ŷh } = E{b0 } + E{b1 }Xh = β0 + β1 Xh
Biased or unbiased?
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 25
Sampling Distribution of Ŷh
• To derive the sampling distribution variance
of the mean response we first show that b1
and (1/n) ∑ Yi are uncorrelated and, hence,
for the normal error regression model
independent
• We start with the definitions
Ȳ = ( n1 )Yi
(Xi − X̄)
b1 = ki Yi , ki =
(Xi − X̄)2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 26
Sampling Distribution of Ŷh
• We want to show that mean response and the
estimate b1 are uncorrelated
Cov(Ȳ , b1 ) = σ 2 {Ȳ , b1 } = 0
• To do this we need the following result (A.32)
n n n
σ { i=1 ai Yi , i=1 ci Yi } = i=1 ai ci σ 2 {Yi }
2
when the Yi are independent
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 27
Sampling Distribution of Ŷh
• Using this fact we have
n n n
2 1 1
σ { Yi , ki Yi } = ki σ 2 {Yi } from appendix
i=1
n i=1 i=1
n
n
1
= ki σ 2
i=1
n
σ n2
= ki i ki = 0
n i=1
= 0
So the mean of Y and b1 are uncorrelated
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 28
Sampling Distribution of Ŷh
• This means that we can write down the
variance
2 2
σ {Ŷh } = σ {Ȳ + b1 (Xh − X̄)}
alternative and equivalent
form of regression function
• But we know that the mean of Y and b1 are
uncorrelated so
σ 2 {Ŷh } = σ 2 {Ȳ } + σ 2 {b1 }(Xh − X̄)2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 29
Sampling Distribution of Ŷh
• We know (from last lecture)
σ2
σ 2 {b1 } =
(Xi − X̄)2
M SE
s2 {b1 } =
(Xi − X̄)2
• And we can find
2 1
2 nσ 2 σ2
σ {Ȳ } = n2 σ {Ȳ } = n2 = n
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 30
Sampling Distribution of Ŷh
• So, plugging in, we get
2 σ2 2
σ {Ŷh } = n + σ
(Xi −X̄)2
(X h − X̄)2
• Or
2
1 (X − X̄)
σ 2 {Ŷh } = σ 2 n + h
(Xi −X̄)2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 31
Sampling Distribution of Ŷh
• Since we often won’t know σ we can, as
usual, plug in s2 = SSE/(n-2), our estimate for
it to get our estimate of this sampling
distribution variance
2
1 (X − X̄)
s2 {Ŷh } = s2 n + h
(Xi −X̄)2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 32
No surprise…
• The sampling distribution of our point
estimator for the output is distributed as a t-
distribution with two degrees of freedom
Ŷh −E{Yh }
s{Ŷh }
∼ t(n − 2)
• This means that we can construct confidence
intervals in the same manner as before.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 33
Confidence Intervals for E{Yh}
• The 1-α confidence intervals for E{Yh} are
Ŷh ± t(1 − α/2; n − 2)s{Ŷh }
• From this hypothesis tests can be constructed
as usual.
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 34
Comments
• The variance of the estimator for E{Yh} is
smallest near the mean of X. Designing
studies such that the mean of X is near Xh will
improve inference precision
• When Xh is zero the variance of the estimator
fo E{Yh} reduces to the variance of the
estimator b0 for β
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 35
Prediction interval for single new observation
• Essentially follows the sampling distribution
arguments for E{Yh}
• If all regression parameters are known then
the 1-α prediction interval for a new
observation Yh is
E{Yh } ± z(1 − α/2)σ
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 36
Prediction interval for single new observation
• If the regression parameters are unknown the 1-α
prediction interval for a new observation Yh is given
by the following theorem
Ŷh ± t(1 − α/2; n − 2)s{pred}
• This is very nearly the same as prediction for a
known value of X but includes a correction for the
fact that there is additional variability arising from
the fact that the new input location was not used in
the orginal estimates of b1, b0, and s2
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 37
Prediction interval for single new observation
• The value of s2{pred} is given by
2
2 1 (Xh −X̄)
s {pred} = M SE 1 + n + (X −X̄)2
i
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 38
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 39
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 40
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 41
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 42
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 43
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 44
Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 5, Slide 45