Models, Testing, and Correction of Heteroskedasticity James L. Powell
Models, Testing, and Correction of Heteroskedasticity James L. Powell
James L. Powell
Department of Economics
University of California, Berkeley
where the matrix Σ is not proportional to an identity matrix. The special case of a heteroskedastic linear
model assumes Σ is a diagonal matrix, i.e.,
Σ = diag[σ 2i ]
for some variances {σ 2i , i = 1, . . . , N } which can vary across observations (usually as some functions of xi ).
1 P 1 P
where only the group average values y i ≡ Mj j yij and xi ≡ Mj j xij are observed, the diagonal
elements of the Σ matrix are of the form σ 2i = σ 2 /Mi .
When the diagonal elements of Σ are known (as in the grouped-data regression model), we can transform
the data to satisfy the conditions of the Classical Regression Model; the Classical Least Squares Estimator
applied to the transformed data yields the Generalized Least Squares Estimator, which in this case reduces
to Weighted Least Squares (WLS):
where wi ≡ 1/σ 2i . That is, each term in the sum of squares is weighted by the inverse of its error variance.
If the covariance matrix Σ involves unknown parameters (aside from a constant of proportionality), then
1
this estimator isn’t feasible; to construct a Feasible WLS estimator for β, replacing an estimator Σ̂ for the
unknown Σ, we need a model for the variance terms V ar(yi ) = σ 2i .
ui ≡ ci εi
for εi ∼ i.i.d., E(εi ) = 0, V (εi ) = σ 2 . (If the errors are normally distributed given xi , then this representa-
tion is always available.) Also, it is almost always assumed that the heteroskedasticity function c2i has an
underlying linear (or “single index”) form,
The variables zi are some observable functions of the regressors xi (excluding a constant term); and the
function h(·) is normalized so that h(0) = 1 with a derivative h0 (·) assumed to be nonzero at zero, h0 (0) 6= 0.
Here are some examples of models which fit into this framework.
(1) Random Coefficients Model: The observable variables xi and yi are assumed to satisfy
yi = αi + xi βi ,
where αi and βi are jointly i.i.d. and independent of xi , with E[αi ] = α, E[βi ] = β, V ar(αi ) = σ 2 ,
yi ≡ α + x0i β + ui ,
with
E(ui |xi ) = 0,
≡ σ 2 (1 + z0i θ),
2
where zi has the levels and cross-products of the components of the regression vector xi . When αi and
βi are jointly normal, it is straightforward to write the error term ui as a multiple of an i.i.d error term
εi ∼ N(0, σ 2 ), as for the multiplicative heteroskedasticity model.
(2) Exponential Heteroskedasticity: Here it is simply assumed that ci = exp{x0i θ/2}, so that
σ 2i = σ 2 exp{x0i θ},
with zi ≡ xi .
yi = α + x0i β + εi ,
σ 2i = σ 2 (1 + x0i θ)2 ,
ε2i ≡ σ 2 + z0i δ + ri ,
where E(ri |zi ) = 0, V (ri |zi ) ≡ τ , and the true δ = 0 under H0 : θ = 0. (A Taylor’s series expansion would
suggest that δ ∼
= h0 (0) · θ if θ ∼
= 0.) If ε2i were observed, we could test δ = 0 in usual way; since it isn’t,
we can use the squared values of the least squares residuals e2i ≡ (yi − x0i β̂)2 in their place, since these are
consistent estimators of the true squared errors. The resulting test, termed the ”Studentized LM Test”
by Koenker (1981), is a modification of the Score or Lagrange Multiplier (LM) test for heteroskedasticity
proposed by Breusch and Pagan (1979). The steps to carry out this test of H0 : θ = 0 are:
3
0
where β̂ = (X X)−1 X0 y;
(2) Regress the squared residuals e2i on 1 and zi , and obtain the R2 from this “squared residual
regression.”
T ≡ N R2 ;
d
under H0 , T → χ2 (p), where p = dim(δ) = dim(zi ), so we would reject Ho if T exceeds the upper critical
R 2 /p A
F = (N − K) ∼ F (p, N − K),
1−R 2
which should have F ∼ = 1 and, under H0 , 1 − R2 ∼
= T /p for large N (since (N − p)/N ∼ = 0). Critical values
from the F tables rather than the chi-squared tables would be appropriate here, though the results would
(3”) When the error terms εi are assumed to be normally distributed, Breusch and Pagan (1979)
showed that the Score test statistic for the null hypothesis that θ = 0 is of the form
RSS
S≡ ,
2σ̂ 4
where
N
X
RSS ≡ [(zi −z)0 δ̂]2
i=1
is the “regression (or explained) sum of squares” from the squared residual regression and
N
1 X 2
σ̂ 2 ≡ ei
N
i=1
is the ML estimator of σ 2 under the null of homoskedasticity. For the normal distribution, τ ≡ V ar(ε2ι ) =
2[V ar(εi )]2 = 2σ 4 ; more generally though, no such relation exists between the second moment and τ =
E(ε4i ) − σ 4 . It is straightforward to show that the Studentized LM test statistic can be written in the form
RSS
T ≡ ,
τ̂
4
for
N
1 X 4
τ̂ ≡ ei − σ̂ 4
N
i=1
which is the same form as the Score test with a more general estimator for V ar(ε2i ).
Feasible WLS
If the null hypothesis H0 : θ = 0 is rejected, a ”correction for heteroskedasticity” (either by Feasible
WLS or a heteroskedasticity-consistent covariance matrix estimator for LS) is needed. To do Feasible WLS,
we can use fact that E(ε2i |xi ) ≡ σ 2i = σ 2 · h(z0i θ), which is a nonlinear regression model for the squared
error terms ε2i . This proceeds in two steps:
(i) Replace ε2i by e2i ∼
= ε2i , and then estimate θ (and σ 2 ) by nonlinear LS (which, in many cases, can be
reduced to linear LS).
(ii) Do Feasible WLS using Ω̂ = diag[h(z0i θ̂)]; that is, replace yi and xi with
q
yi∗ ≡ yi / h(z0i θ̂), x∗i = xi · [h(z0i θ̂)]−1/2 ,
and do LS using yi∗ , x∗i . If σ 2i = σ 2 h(z0i θ) is a correct specification of heteroskedasticity, the usual standard
errors formulae for LS using the transformed data will be (asymptotically) correct.
Some examples of the first step for particular models are as follows:
= σ 2 (1 + z0i θ)
≡ σ 2 + z0i δ,
i = σ 2 + z0i δ + vi
5
we can take logarithms of both sides to obtain
and, making the additional assumption that ui is independent of xi (not just zero mean and constant
variance), we get that E[log(ε2i )] = α + x0i θ, where α ≡ E[log(ui )], i.e.,
log(ε2i ) = α + x0i θ + vi
with E(vi |xi ) ≡ 0, so we would regress ln(ei ) on a constant and xi to get the preliminary estimator θ̃.
where α and β are already estimated by the intercept term α̂ and slope coefficients β̂ of the classical LS
estimator b. Since γ is just a scaling factor common to all observations, we can take h(z0i θ̂) = (α̂ + x0i β̂)2 .
of Feasible GLS. In a general heteroskedastic setting, we can calculate the covariance matrix of the WLS
estimator β̂ GLS = (X 0 Ω−1 X)−1 X 0 Ω−1 y to be
where
Ω ≡ diag[h(z0i θ)]
A
β̂ F W LS = (X0 Ω̂−1 X)−1 X0 Ω̂−1 y ∼ N (β, V)
in general if the original linear model for yi is correctly specified (where Ω̂ = Ω(θ̂)), but the usual estimator
of the covariance matrix V (assuming a correct specification of the form of heteroskedasticity) will be
inconsistent. A consistent estimator of V (properly normalized) would use
6
Ω̂ = diag[h(z0i θ̃)],
in place of Ω and Γ in the expression for V above, where ẽi = yi − x0i β̂F W LS are the residuals from the
Feasible WLS fit. The resulting estimator V̂ is known as the Eicker-White covariance matrix estimator
(after Eicker (1967) and White (1980)), and is usually applied in the special case of no heteroskedasticity
correction - that is, with Ω̂ = I, so that the heteroskedasticity-consistent covariance matrix estimator for
least squares is
As a side note, White proposed this estimator in the context of a test for consistency of the traditional
estimator σ̂ 2 (X0 X)−1 of the covariance matrix of the classical LS estimator, and showed how such a test
was equivalent to the test of the null hypothesis of homoskedasticity against the alternative of a random
Goldfeld-Quandt Test
When the error terms can be assumed to be normally distributed, and the regressor matrix X can be
taken to be fixed (as in the Classical or Neoclassical Normal Regression Model), Goldfeld and Quandt
(1965) proposed an exact test of the null hypothesis of homoskedasticity. Their tests presumes that the
possible form of heteroskedasticity permits division of the sample into ”high” and ”low” heteroskedasticity
groups (without preliminary estimates of heteroskedasticity parameters). Separate least squares fits are
obtained for the two groups, and the ratio of the residual variances will have an F distribution under the
null hypothesis. A one-sided test would reject if the ratio of the ”high” to ”low heteroskedasticity” residual
variances exceeds the upper α critical value from an F table, while a two-sided test would reject if either
the ratio or its inverse exceeded the upper α/2 cutoff point (assuming an equal number of observations in
each subsample). Goldfeld and Quandt suggest that the power of the test can be improved by dropping
10% to 20% of the observations with ”intermediate” magnitudes of the conditional variances under the
alternative.
7
References
[1] Breusch, T. and A. Pagan(1979), “ASimple Test for Heteroscedasticity and Random Coefficient Vari-
ation,” Econometrica, 47,1287-1294.
[2] Eicker, F. (1967), “Limit Theorems for Regression with Unequal and Dependent Errors,” Proceedings
of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of
California Press.
[3] Goldberger, A.S. (1990), A Course In Econometrics. Cambridge: Harvard University Press.
[4] Goldfeld, S. and R. Quandt (1965), “Some Tests for Heteroskedasticity,” Journal of the American
Statistical Association, 60,539-547.
[5] Koenker, R. (1981), “A Note on Studentizing a Test for Heteroskedasticity,” Journal of Econometrics,
17,107-112.
[6] White, H. (1980), “A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroscedasticity,” Econometrica, 48,817-838.