Lect 6
Lect 6
1 The model
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + ǫi , i = 1, 2, · · · , n, (1)
The assumptions for ǫi and yi are analogous to as those for simple linear regression,
namely
or
y = Xβ + ǫ.
1. E(ǫ) = 0 or E(y) = Xβ .
2. cov(ǫ) = σ 2 I or cov(y) = σ 2 I .
β̂ = (X ′ X)−1 X ′ y.
P ROOF : Exercise.
P ROOF : Exercise.
AX = I
since the relationship AXβ = β must hold for any positive value of β .
The covariance matrix for Ay is
The variance of the βj ’s are on the diagonal of σ 2 AA′ , and therefore we need to
choose A (subject to
bAX = I ) so that the diagonal elements of AA′ are minimized. Since
Ay = (X ′ X)−1 X ′ y,
Corollary 2.1 If E(y) = Xβ and cov(y) = σ 2 I , the best linear unbiased estima-
′ ′
tor of a′ β is a′ β̂ , where β = (X X)−1 X y .
A fourth property of β̂ is the following: the predicted value ŷ = β̂0 + β̂1 x1 +
′
· · · + β̂k xk = β̂ x is invariant to simple linear changes of scale on the x’s, where
x = (1, x1 , x2 , · · · , xk )′ .
Hence,
′ −1 ′ ′
β̂ z z = (D β̂) Dx = β̂ x.
2.3 An estimator for σ 2
By assumption 1, E(yi ) = x′i β , and by assumption 2, σ 2 = E[yi − E(yi )]2 , we
have
σ 2 = E(yi − x′i β)2 .
X n
2 1
s = (yi − x′i β)2
n − k − 1 i=1
1
= (y − X β̂)′ (y − X β̂)
n−k−1
SSE
= .
n−k−1
With the denominator n − k − 1, s2 is an unbiased estimator of σ 2 .
E(s2 ) = σ 2 .
P ROOF : Exercise.
c β̂) = s2 (X ′ X)−1 .
cov(
4.1 Assumptions
Normality assumption:
β̂ = (X ′ X)−1 X ′ y,
1
σ̂ 2 = (y − X β̂)′ (y − X β̂).
n
P ROOF : Exercise.
The maximum likelihood estimator β̂ is the same as the least squares estimator
β̂ . The estimator σ̂ 2 is biased since the denominator is n rather n − k − 1. We often
use the unbiased estimator s2 to estimate σ 2 .
′ ′
(iii) Since β̂ = (X X)−1 X y and
Theorem 4.3 If y is Nn (Xβ, σ 2 I), then β̂ and σ̂ 2 are jointly sufficient for β and
σ2 .
P ROOF : Using the Neyman factorization theorem. For details, see Rencher and Schaalje
(2008, pp.159-160).
Since β̂ and σ̂ 2 are jointly sufficient for β and σ 2 , no other estimators can improve
on the information they extract from the sample to estimate β and σ 2 . Thus, it is not
surprising that β̂ and s2 are minimum variance unbiased estimators.
Theorem 4.4 If y is Nn (Xβ, σ 2 I), then β̂ and s2 have minimum variance among
all unbiased estimators.
2
5 R in fixed-x regression
The proportion of the total sum of squares due to regression is measured by
SSR2 SSE
R = =1− ,
SST SST
Pn 2
Pn 2 ′ ′ 2
where SST = i=1 (yi − ȳ) , SSR = i=1 (ŷi − ȳ) = β̂ X y − nȳ , and
1. The range of R2 is 0 ≤ R2 ≤ 1. If all the β̂j ’s were zero, except for β̂0 , R2
would be zero. (This event has probability zero for continuous data.) If all the
y -values fell on the fitted surface, that is, if yi = ŷi , i = 1, 2, · · · , n, then R2
would be 1.
2. R = ryŷ ; that is, the multiple correlation is equal to the simple correlation
between the observed yi ’s and the fitted ŷi ’s.
3. Adding a variable x to the model increases (can not decrease) the value of R2 .
4. If β1 = β2 = · · · = βk = 0, then
k
E(R2 ) = .
n−1
Note that the β̂j ’s will not be zero when the βj ’s are zero.
5. R2 cannot be partitioned into k components, each of which is uniquely at-
tributable to an xj , unless the x’s are mutually orthogonal, that is, unless
Pn
i=1 (xij − x̄j )(xim − x̄m ) = 0 for j 6= m.
Adjusted R2 (AdjR2 )
Hence, p r
(ŷ − ȳj)′ (ŷ − ȳj) SSR
cos θ = = = R.
(y − ȳj)′ (y − ȳj) SST
β̂ = (X ′ V −1 X)−1 X ′ V −1 y.
cov(β) = σ 2 (X ′ V −1 X)−1 .
2 (y − X β̂)′ V −1 (y − X β̂)
s = .
n−k−1
V is positive definite, there exists an n × n nonsingular matrix P
P ROOF : (i) Since
′ −1
such that V = P P . Multiplying y = Xβ + ǫ by P , we obtain
P −1 y = P −1 Xβ + P −1 ǫ,
β̂ = [X ′ (P −1 )′ P −1 X]−1 X ′ (P −1 )′ P −1 y
= (X ′ V −1 X)−1 X ′ V −1 y.
X is full rank, X ′ V −1 X is positive definite. The estimator β̂ =
Note that since
(X ′ V −1 X)−1 X ′ V −1 y is usually called the generalized least squares estimator.
(ii) and (iii) are left as exercises.
β̂ = (X ′ V −1 X)−1 X ′ V −1 y,
2 1
σ̂ = (y − X β̂)′ V −1 (y − X β̂).
n
P ROOF : Exercise.