0% found this document useful (0 votes)
17 views35 pages

Kelejian 2017

Book chaptee on panel data analysis

Uploaded by

lonersclub11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views35 pages

Kelejian 2017

Book chaptee on panel data analysis

Uploaded by

lonersclub11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 15

Panel Data Models


15.1 INTRODUCTORY COMMENTS1
All of the previous chapters have been in a single panel framework. However, in
recent years panel data sets have become available, and many, if not most, recent
studies are based on them. Panel data come in various ways. The most typical
form is a time series on the units being studied, e.g., instead of the dependent
N × 1 vector y, in a time series panel model the dependent vector would be
yt , t = 1, ..., T where yt is an N × 1 vector of observations on the units being
explained at time t. In most of these cases the number of units, N , is typically
large relative to the number of time periods, T . Many, but not all, time series
panels are balanced in that at each point in time, the number of units observed
is the same.
Another type of panel relates to cross-sectional groups of units where each
group contains, in many cases, a different number of units. One example would
be a model explaining the grades of individual high school students in terms of
various neighborhood and family characteristics, as well as the characteristics
of the schools they attend. In this case the data set might relate to G > 1 schools
and the number of students in the j th school might be Nj , j = 1, ..., G. Another
example would be the case in which farming productivity of individual farmers
is modeled in terms of various farming inputs (fertilizer, etc.) and spillover ef-
fects that reflect how a farmer may learn techniques from other farmers in the
same village, but perhaps not from farmers in other villages. Again, the number
of farmers may vary over the villages.

15.2 SOME IMPORTANT PRELIMINARIES


Let
JT
Q0 = (IT − ) ⊗ IN , (15.2.1)
T

1. There is a vast literature dealing with, or related to spatial panel data models. Some important
references are Anselin et al. (2008), Arellano (2003), Baltagi (2005), Baltagi et al. (2003), Blundell
and Bond (1998), Elhorst (2010, 2014), Kapoor et al. (2007), Lee and Yu (2010a,b, 2012a), Mutl
and Pfaffermayr (2011), Pesaran and Tosetti (2011), and Piras (2013, 2014). See also Chapter 13 in
the new edition of Baltagi’s book and Chapter 30 in Pesaran (2015).

Spatial Econometrics. http://dx.doi.org/10.1016/B978-0-12-813387-3.00015-9


Copyright © 2017 Elsevier Inc. All rights reserved. 307
308 Spatial Econometrics

JT = eT eT

where ⊗ denotes the Kronecker product, eT is the T × 1 vector of ones, and let

JT
Q1 = ⊗ IN . (15.2.2)
T

Let Ft be any N × 1 vector, and let F = (F1 , ..., FT ) . Let F̄ = T −1 (F1 +
... + FT ), i.e., if t relates to time, then F̄ is the time average of the T vectors,
F1 , ..., FT . Then, the matrix Q0 is such that

Q0 F = [(F1 − F̄ ) , (F2 − F̄ ) , ..., (F1 − F̄ ) ] . (15.2.3)

That is, premultiplying a vector such as F by the matrix Q0 produces a vector


of time deviations.
Now consider the multiplication of F be the matrix Q1 . In this case

Q1 F = (eT ⊗ F̄ ), (15.2.4)

which is a vector identical to F except that each Ft is replaced by its time


average.
Let G be any N × s matrix whose elements do not vary over time. Then, it
is left to the reader to show that

Q0 Q0 = Q0 , (15.2.5)
Q1 Q1 = Q1 ,
Q0 + Q1 = IN T ,
Q0 Q1 = 0,
Q0 (eT ⊗ G) = 0.

15.3 THE RANDOM EFFECTS MODEL


There are a number of random effects panel data models which differ in at least
two important respects. The first is their degree of generality which relates to the
number and types of regressors in the model. The second relates to the degree
of generality of the error term.
In the first case, the simplest model is the one in which all of the regressors
are exogenous. In addition to exogenous variables, the more complex models
in this case may also contain spatial lags in the dependent variable, as well
as additional endogenous variables. Some of these variables may also be time
lagged, making the model a dynamic panel data model. The simplest model
in the second case would contain error terms which are i.i.d. (0, σ 2 ). A more
Panel Data Models Chapter | 15 309

general model in this case might contain nonparametrically specified error terms
which allow for heteroskedasticity as well as spatial correlation.
In this section we start with the simplest random effects model which only
has exogenous regressors, and a structurally specified error term. Generaliza-
tions will be straightforward.
Consider the panel data model

y t = X t β + ut , (15.3.1)
ut = ρ2 W ut + εt , |ρ2 | < 1,
εt = μ + vt , t = 1, ..., T

where yt is the N × 1 vector of observations on the dependent variable at time


t, Xt is an N × k matrix of observations on k exogenous regressors at time t ,
W is an N × N weighting matrix, ut is the corresponding N × 1 error term, εt
is the innovation vector which is the sum of an N × 1 random vector, μ, which
is not time dependent, and an N × 1 random vector which is time dependent.
The vector μ is often thought of as describing the random differences in the
intercepts between units.
Let

y = (y1 , ..., yT ) , (15.3.2)


X = (X1 , ..., XT ) ,
u = (u1 , ..., uT ) ,
ε = (ε1 , ..., εT ) ,
v = (v1 , ..., vT ) .

Then the time stacked form of the model in (15.3.1) is

y = Xβ + u, (15.3.3)
u = (IT ⊗ ρ2 W )u + ε,
ε = (eT ⊗ IN )μ + v.

In the random effects model it is typically assumed that the elements of μ


are i.i.d. with mean and variance of 0 and σμ2 , respectively, the elements of v are
i.i.d. with mean and variance of 0 and σv2 , respectively, and the vectors μ and v
are independent. Also, in order for the error vector u to have a solution in terms
of the innovation vector ε, assume that (IN − aW ) is nonsingular for all |a| < 1.
An extensive formal set of assumptions is given in Kapoor et al. (2007).
Given these assumptions,

u = [IN T − (IT ⊗ ρ2 W )]−1 ε (15.3.4)


310 Spatial Econometrics

= [IT ⊗ (IN − ρ2 W )−1 ]ε,

since IN T = IT ⊗ IN . It follows from (15.3.4) that E(u) = 0 and E(uu ) = u


where

u = [IT ⊗ (IN − ρ2 W )−1 ]ε [IT ⊗ (IN − ρ2 W )−1 ] (15.3.5)

with ε being the VC matrix of ε. Since μ and v are independent, it then follows
from the third line in (15.3.3) and (15.2.5) that

ε = σμ2 (eT ⊗ IN )(eT ⊗ IN ) + σv2 IN T (15.3.6)


= σμ2 (JT ⊗ IN ) + σv2 IN T
= T σμ2 Q1 + σv2 IN T .

Since Q0 + Q1 = IN T , we leave it to the reader to show that

ε = σv2 Q0 + σ12 Q1 , (15.3.7)


σ12 = σv2 + T σμ2 .

Recalling (15.2.5), it is also left to the reader to show that


−2
−1 −2
ε = σv Q0 + σ1 Q1 , (15.3.8)
−1/2
ε = σv−1 Q0 + σ1−1 Q1 ,
σv −1/2
ε = IN T − θ Q1 , θ = 1 − σv /σ1

−1/2 −1/2
where ε ε ε = IN T . The third line in (15.3.8) follows immediately by
multiplying the second line across by σv , and then setting Q0 = IN T − Q1 .
−1/2
Suppose ρ2 , σv2 , σμ2 , were known. Given this, ε would also be known
via (15.3.7). In this case one would estimate (15.3.3) by first transforming the
model to eliminate the spatial correlation induced by the second line in (15.3.3),
−1/2
and then transforming the resulting model by premultiplying it across by ε .
Specifically, let

y(ρ2 ) = y − (IT ⊗ ρ2 W )y, (15.3.9)


X(ρ2 ) = X − (IT ⊗ ρ2 W )X,
u(ρ2 ) = u − (IT ⊗ ρ2 W )u

where, by (15.3.3), u(ρ2 ) = ε. Applying this transformation to the first line in


(15.3.3) would yield the model

y(ρ2 ) = X(ρ2 )β + ε. (15.3.10)


Panel Data Models Chapter | 15 311

Given the VC matrix ε of ε in (15.3.7) and the results in (15.3.8), one would
−1/2
then premultiply (15.3.10) across by ε to obtain
−1/2
ε y(ρ2 ) = −1/2
ε X(ρ2 )β + −1/2
ε ε (15.3.11)
= −1/2
ε X(ρ2 )β +ψ
−1/2
where ψ = ε ε. It follows from (15.3.7) and (15.3.8) that
E(ψ) = 0, (15.3.12)
−1/2
E(ψψ  ) = −1/2
ε E(εε )e
= −1/2
ε ε −1/2
ε
= IN T .

Thus, if ρ2 , σv2 , σμ2 were known, the estimation of β in the model in (15.3.3)
may now be evident. Specifically, let
y ∗ = −1/2
ε y(ρ2 ), (15.3.13)

X = −1/2
ε X(ρ2 ),

so that the model in (15.3.11) can be expressed as


y ∗ = X ∗ β + ψ. (15.3.14)

Given that ρ2 , σv2 , and σμ2 are known, the estimator of β would just be the OLS
estimator based on (15.3.14) which can be expressed as a GLS estimator based
on (15.3.10), namely

β̂GLS = (X ∗ X ∗ )−1 X ∗ y ∗ (15.3.15)


= [X(ρ2 ) −1
 −1  −1
ε X(ρ2 )] X(ρ2 ) ε y(ρ2 ).

Under standard conditions given in Kapoor et al. (2007), β̂GLS is consistent and
asymptotically normal with the anticipated distribution. In particular,
D
(N T )−1/2 (β̂GLS − β) → N (0, V C), (15.3.16)
V C = lim (N T )[X(ρ2 ) −1
ε X(ρ2 )]
−1
N→∞
= lim (N T )(X ∗ X ∗ )−1 .
N→∞

Finite sample inferences would be based on the approximation

β̂GLS  N (β, [X(ρ2 ) −1 −1


ε X(ρ2 )] ) (15.3.17)
= N (β, [X ∗ X ∗ ]−1 ).
312 Spatial Econometrics

Sometimes for computational ease, researchers premultiply (15.3.10) across


−1/2
by σv ε = IN T − θ Q1 , and then estimate the resulting model by OLS. Let

yρ2 ,θ = (IN T − θ Q1 )y(ρ2 ), (15.3.18)


Xρ2 ,θ = (IN T − θ Q1 )X(ρ2 ).

In this case the estimator is


β̂GLS,1 = (Xρ 2 ,θ Xρ2 ,θ )−1 Xρ 2 ,θ yρ2 ,θ . (15.3.19)

We leave it as an exercise to show that β̂GLS,1 = β̂GLS , so that inferences based


on β̂GLS,1 are the same as those based on β̂GLS .
The estimators β̂GLS and β̂GLS,1 are not feasible because ρ2 , σv2 , and σμ2 are
not known. Let ρˆ2 , σ̂v2 , and σ̂μ2 be any consistent estimators of ρ2 , σv2 , σμ2 , and
let
σ̂12 = σ̂v2 + T σ̂μ2 , (15.3.20)
ˆ ε = σ̂v2 Q0 + σ̂12 Q1 .


Then the feasible GLS estimator of β is β̂F GLS where


ˆ −1
β̂F GLS = [X(ρˆ2 )  −1  ˆ −1
ε X(ρˆ2 )] X(ρˆ2 ) ε y(ρˆ2 ) (15.3.21)

with
X(ρˆ2 ) = X − (IT ⊗ ρˆ2 W )X,
y(ρˆ2 ) = y − (IT ⊗ ρˆ2 W )y.

Then, under reasonable conditions, Kapoor et al. (2007) show that β̂F GLS is
consistent and asymptotically normal with the same distribution as that of β̂GLS .
In particular,
D
(N T )−1/2 (β̂F GLS − β) → N (0, V C), (15.3.22)
V C = lim (N T )[X(ρ2 ) −1 −1
ε X(ρ2 )] .
N→∞

Small sample inferences can be based on the approximation

ˆ −1
β̂F GLS  N (β, [X(ρˆ2 )  −1
ε X(ρˆ2 )] ). (15.3.23)

The Estimation of ρ2 , σv2 , and σμ2


Clearly, the empirical implementation of the above results requires consistent
estimators of ρ2 , σv2 , and σμ2 . Kapoor et al. (2007) suggest consistent estimators
based on moment conditions. These moment conditions are a generalization of
Panel Data Models Chapter | 15 313

those given in Section 2.2.4 in reference to the GMM procedure for the estima-
tion of ρ2 .
In reference to u and ε in (15.3.3), let
ū = (IT ⊗ W )u, (15.3.24)
¯ū = (IT ⊗ W )ū,
ε = u − ρ2 ū,
¯
ε̄ = ū − ρ2 ū.

Also, based on (15.3.3), let β̂ = (X  X)−1 X  y and û = y − X β̂, and correspond-


ingly let
ū = (IT ⊗ ρ2 W )û, 
 ū¯ = (IT ⊗ ρ2 W )
ū, (15.3.25)
ε̂ = û − ρ 
ū, 
ε̄ =  ¯
ū − ρ ū.
2 2

Then results given in Kapoor et al. (2007) imply, for T ≥ 2,


⎡ ⎤
1
ε  Q0 ε ⎡ ⎤
−1) σν2 ⎡ ⎤
⎢ N (T ⎥ δ1
⎢ 1  ⎥ ⎢ ⎥
⎢ N (T −1) ε̄ Q0 ε̄ ⎥ ⎢ σ 2 1 T r(W  W ) ⎥ ⎢ δ ⎥
⎢ ⎥ ⎢ νN ⎥ ⎢ 2 ⎥
⎢ 1 Q ε ⎥ ⎢ ⎥ ⎢ ⎥
⎢ N (T −1) ε̄ 0 ⎥ ⎢ 0 ⎥ ⎢ δ3 ⎥
⎢ ⎥=⎢ ⎥+⎢ ⎥, (15.3.26)
⎢ 1  ⎥ ⎢ σ 2 ⎥ ⎢ δ4 ⎥
⎢ N ε Q1 ε ⎥ ⎢ 1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 21  ⎥ ⎣ δ5 ⎦
⎢ 1 ε̄  Q ε̄ ⎥ ⎣ σ1 N T r(W W ) ⎦
⎣ N 1 ⎦ δ6
1  0
N ε̄ Q1 ε
E(δi ) = 0, i = 1, ..., 6.

Setting ε = u − ρ2 ū and ε̄ = ū − ρ2 ū, ¯ and replacing u, ū, and ū,


¯ with their
 
¯
respective expressions in (15.3.25), namely û, ū, and ū, the feasible form of the
expressions in (15.3.26) is
⎡ ⎤
1
(û − ρ2 ū) Q0 (û − ρ2 ū) ⎡ ⎤ ⎡ ⎤
⎢ N (T −1) ⎥ 2 δ̂
⎢ σν 1
⎢ N (T1−1) (ū − ρ2 ū − ρ2
¯  Q0 ( ¯ ⎥⎥ ⎢ ⎥ ⎢ ⎢ δ̂ ⎥

⎥ ⎢ σν2 N1 T r(W  W ) ⎥

ū) ū)
⎢ ⎥ ⎢ 2 ⎥
⎢ ⎥ ⎥ ⎢ ⎥
⎢  ¯   ⎥ ⎢ 0 ⎥ ⎢ δ̂3 ⎥
⎢ N (T −1) (ū − ρ2 ū) Q0 (û − ρ2 ū) ⎥ = ⎢
1
⎥+⎢ ⎢ ⎥
⎢ 1 ⎥ ⎢ ⎢ σ2 ⎥ ⎢ δ̂4 ⎥
⎢ (û − ρ2  
ū) Q1 (û − ρ2 ū) ⎥ ⎥
⎢ N ⎥ ⎢ ⎢ 21
1 ⎥ ⎢
⎥ ⎢ ⎥
⎢ 1   ⎥ ⎣ σ1 N T r(W W ) ⎦ ⎣ δ̂5 ⎥

⎢ ( ū − ρ ¯
ū) Q (ū − ρ ¯
ū)) ⎥ ⎦
⎣ N 2 1 2 ⎦
0
1   ¯   δ̂6
N (ū − ρ2 ū) Q1 (û − ρ2 ū)
(15.3.27)

where δ̂J , J = 1, ..., 6 are residuals.


314 Spatial Econometrics

If the quadratic forms in (15.3.27) are multiplied out, e.g., (û − ρ2ū) Q0 (û −
ρ2
ū) = û Q0 û + ρ2 
 2 ū Q0
 ū − 2ρ2 û Q0
 ū, the six equations in (15.3.27) can be
expressed in a form that is more conducive to estimation. Specifically, let
⎡ ⎤ ⎡ ⎤
δ̂1
1
−1) û Q0 û
⎡ ⎤ ⎢ ⎥ ⎢ N (T ⎥
⎢ δ̂ ⎥ ⎢ 1  Q  ⎥
ρ2 ⎢ 2 ⎥ ⎢ ū 0 ū ⎥
⎢ 2 ⎥ ⎢ ⎥ ⎢ N (T −1) ⎥
⎢ ρ2 ⎥ ⎢ δ̂3 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ N (T −1) û Q0
1  ū ⎥
γ = ⎢ 2 ⎥ , δ̂ = ⎢ ⎥ , S = ⎢ ⎥, (15.3.28)
⎣ σv ⎦ ⎢ δ̂4 ⎥ ⎢ 1  ⎥
⎢ ⎥ ⎢ û Q 1 û ⎥
⎢ δ̂ ⎥ ⎢ N ⎥
σ12 ⎢ 1  ⎥
⎣ 5 ⎦ ⎣ ū Q1
N ū ⎦
δ̂6 1  
N û Q1 ū
⎡ ⎤
2   −1   1 0
N (T −1) û Q0 ū N (T −1) ū Q0 ū
⎢ ⎥
⎢ 
¯  −1  ¯  ¯ 1  ⎥
⎢ 2
N (T −1) ū Q0 ū N (T −1) ū Q0 ū N T r(W W ) 0 ⎥
⎢ ⎥
⎢ 1   ⎥
⎢  ¯   −1  ¯ ⎥
M =⎢ N (T −1) [û Q0 ū + ū Q0 ū] N (T −1) ū Q0 ū 0 0 ⎥.
⎢ ⎥
⎢ 2   −1   0 1 ⎥
⎢ N û Q1 ū N ū Q1 ū ⎥
⎢ ⎥
−1 
_
⎢ 2 ¯  ¯ 1  ⎥
⎣ N ū Q1 ū N ū Q1 ū 0 N T r(W W )⎦
[û Q 
1 
N ū¯ + 
1 ū Q 
ū] 1
−1   ¯
N ū Q1 ū 0 0
Then, the six equations in (15.3.27) can be expressed as
S = Mγ + δ̂. (15.3.29)

Kapoor et al. (2007) show that consistent estimators of σv2 and ρ2 can be
obtained by nonlinear least squares based only on the first three equations
in (15.3.29), namely by finding
argminσv2 ,ρ2 [δ̂12 + δ̂22 + δ̂32 ]. (15.3.30)

Let ρˆ2 and σ̂v2 be the resulting estimators of ρ2 and σv2 . Then a consistent esti-
mator of σ12 can be obtained from the fourth equation in (15.3.29):
1
σ̂12 = (û − ρˆ2
ū) Q1 (û − ρˆ2
ū). (15.3.31)
N
Kapoor et al. (2007) also suggested a more efficient GLS-type estimators
of ρ2 , σv2 , and σ12 which are based on all six equations in (15.3.29). Assuming
normality of the innovation vector ε in (15.3.3), ε ∼ N (0, ε ) where ε is given
in (15.3.7), they obtained the VC matrix of δ  = (δ1 , ..., δ6 ) in (15.3.26), namely
1 4
T −1 σv 0
Vδ = ⊗ D, (15.3.32)
0 σ14
Panel Data Models Chapter | 15 315

⎡ 

2 2T r( WNW ) 0
⎢ ⎥
D=⎢ ) ⎥
 W (W  +W )
W W W W W
⎣ 2T r( N ) 2T r( W N ) T r( W N ⎦.
 W (W  +W ) W )
0 T r( W N ) T r( W W +W
N )

Let V̂δ be identical to Vδ except that σv4 and σ14 are now replaced by their corre-
sponding consistent estimators obtained from (15.3.29) and (15.3.31). Then the
nonlinear GLS-type estimators are determined by finding

argminρ2 ,σ 2 ,σ 2 [δ̂  V̂δ−1 δ̂] = argminρ2 ,σ 2 ,σ 2 (S − Mγ ) V̂δ−1 (S − Mγ ). (15.3.33)


v 1 v 1

Given consistent estimators of ρ2 , σv2 , and σ12 , inferences concerning β would


be based on (15.3.23).

Illustration 15.3.1: The effect of public capital on gross state product


In the present illustration we use the well-known Munnell (1990) data set on
public capital productivity in 48 US states observed over 17 years (from 1970
to 1986). Munnell (1990) specifies a Cobb–Douglas production function that
relates the gross state product (gsp) to the input of public capital including high-
ways and streets (pcap), private capital stock based on the estimates released by
the Bureau of Economic Analysis (pc), labor (employment in nonagricultural
sectors) input (emp), and state unemployment rates (unemp) added to capture
business cycle effects. Baltagi and Pinnoi (1995) find that the OLS and the
between panel estimators show that the sign of the public capital variable is
positive and significant; whereas the within effect estimator as well as the GLS
estimator find that public capital is not significant.2 In other words, after con-

2. To describe the within and between estimators, consider the model in (A). Using evident notation
let

(A) yit = α + xit β + μi + εit ,


i = 1, ..., N, t = 1, ..., T ,

where xit is a 1 × k row vector, etc. Let


T T
ȳi = T −1 yit , x̄i. = T −1 xit ,
t=1 t=1
y̌it = (yit − ȳi ), x̌i. = (xit − x̄i. ).

Then the model in (A) implies

(B) y̌it = x̌it β + ε̌it , i = 1, ..., N, t = 1, ..., T ,

since α and μi cancel, and where ε̌it is the transformed error term.
The within estimator of β is the OLS estimator based on (B). The between estimator of β is the
OLS estimator of β in the regression of ȳi on the constant and x̄i. , i = 1, ..., N .
316 Spatial Econometrics

trolling for state specific effects, public capital is no longer significant and plays
no role in production. The same data set was also used in Millo and Piras (2012)
to illustrate their R library dealing with spatial panel data models. The spatial
weighting matrix used by Millo and Piras (2012) in their illustration is a sim-
ple row-standardized binary contiguity matrix. This weighting matrix is also the
one we use in this illustration.
The estimates obtained using (15.3.1) and following the procedure described
in the previous section are reported below:
 = 2.227(0.135) + 0.054(0.022) ln(pcap) + 0.257(0.021) ln(pc)
ln(gsp)
+ 0.728(0.025) ln(emp) − 0.004(0.001) unemp.

The estimated value for ρ2 is 0.548, and σv2 = 0.001, σ12 = 0.088, and θ =
0.887. All the coefficients in the regression equation are significant and have
the expected sign. This means that when the error term accounts for spatial
correlation as specified in (15.3.1), the variable reflecting public sector capital
has a positive and significant effect. The value of the spatial coefficient ρ2 is
positive (and its magnitude is quite large!). However, the procedure discussed
above does not determine the statistical significance of the estimator of ρ2 .

15.4 A GENERALIZATION OF THE RANDOM EFFECTS MODEL


The model considered in Section 15.3 does not have a spatial lag in the depen-
dent variable, nor does it have additional endogenous variables. In this section
we expand that model to include these complications. Fortunately, the exten-
sions are straightforward.
Consider the model
y = Xβ1 + ρ1 (IT ⊗ W )y + Yβ2 + u (15.4.1)
= Zγ + u,
Z = (X, (IT ⊗ W )y, Y ), γ  = (β1 , ρ1 , β2 ),
u = (IT ⊗ ρ2 W )u + ε,
ε = (eT ⊗ IN )μ + v

where y, X, u, ε, and v are defined in a manner corresponding to (15.3.2), and


Y  = (Y1 , ..., YT ) where Yt , t = 1, ..., T is an N × h matrix of additional en-
dogenous variables. Let the specifications of W, u, ε, and v be exactly what they
were in reference to (15.3.3). Also, let be an N T × r matrix of observable
exogenous variables the researcher knows to be in the system determining Y ,

Typically, the within estimator is considered in a fixed effects framework (Section 15.5 below); the
between estimator is typically in a random effects framework.
Panel Data Models Chapter | 15 317

r ≥ h. The variables defining need not be the only exogenous variables in


that system.
Because the spatial lag of y and Y are endogenous, least squares estimation
of the model will not produce consistent estimators. Since we do not assume
that the equations determining Y are known, maximum likelihood and Bayesian
methods are not considered. Thus, we again turn to an IV procedure. Particu-
larly, we use the equations in (15.3.27) to estimate ρ2 , σv2 , and σ12 .

Estimation of ρ2 , σv2 and σ12 : An Outline


Following Baltagi and Liu (2011) and Piras (2013), we first multiply the model
across by Q0 and estimate Q0 u using the instrument matrix H∗ where

H∗ = Q0 [(X, (IT ⊗ W )X, (IT ⊗ W 2 )X, , (IT ⊗ W ) , (IT ⊗ W 2 ) ]LI .


(15.4.2)

We then multiply the model across by Q1 and estimate Q1 u using the instrument
matrix H+ where

H+ = Q1 [(X, (IT ⊗ W )X, (IT ⊗ W 2 )X, , (IT ⊗ W ) , (IT ⊗ W 2 ) ]LI .


(15.4.3)

The estimates of Q0 u and Q1 u can then be used via (15.3.27)–(15.3.29) to


estimate ρ2 , σv2 , and σ12 . Given these estimates, the model is then transformed
to account for the spatial lag in u, and the V C matrix of ε. That transformed
model is then estimated using the instrument matrix H# where

H# = (H∗ , H+ ).

Model Estimation: Details


Premultiplying the regression model in (15.4.1) across by Q0 yields

Q0 y = Q0 Xβ1 + ρ1 (IT ⊗ W )Q0 y + Q0 Yβ2 + Q0 u (15.4.4)


= Q0 Zγ + Q0 u.

Under reasonable conditions, (15.4.4) can be consistently estimated by 2SLS


using the instrument matrix H∗ defined in (15.4.2). Specifically, let PH∗ =
H∗ (H∗ H∗ )−1 H∗ and Ẑ∗ = PH∗ Q0 Z. Then, the 2SLS estimator of γ based on
(15.4.4) is
γ̂∗ = (Ẑ∗ Ẑ∗ )−1 Ẑ∗ Q0 y (15.4.5)
and the estimate of Q0 u is

Q0 u = Q0 y − Q0 Z γ̂∗ . (15.4.6)
318 Spatial Econometrics

Note that although Q0 u can be estimated by this procedure, u cannot be es-


timated if X contains variables which only vary cross-sectionally. Therefore
Q1 u cannot be estimated with the results based on the regression in (15.4.4).
For example, suppose X = (X1 , X2 ) and correspondingly let β1 = (β1,1 , β  ).
1,2
Suppose X1 is N T × k1 whose elements vary over both cross-sections and
time, and let X2 = (eT ⊗ M) where M is an N × k2 matrix whose ele-
ments do not vary over time. Recalling (15.2.5), Q0 X = (Q0 X1 , 0) and so
Q0 Xβ1 = Q0 (X1 , X2 )β1 = Q0 X1 β1,1 . The parameter vector β1,2 would not
appear in (15.4.4), and so u could not be estimated from the estimation results
corresponding to (15.4.4).
In order to estimate Q1 u the model in (15.4.1) is multiplied across by Q1 to
obtain

Q1 y = Q1 Xβ1 + ρ1 (IT ⊗ W )Q1 y + Q1 Yβ2 + Q1 u (15.4.7)


= Q1 Zγ + Q1 u.

Let PH+ = H+ (H+ H+ )−1 H+ and Z̃+ = PH+ Q1 Z. Then



γ̃+ = (Z̃+ Z̃+ )−1 Z̃+

Q1 y (15.4.8)

and Q1 u is then estimated as

Q1 u = Q1 y − Q1 Z γ̃+ . (15.4.9)

Given Q0 u in (15.4.6) and Q1 u in (15.4.9), the equations in (15.3.27)–(15.3.29)


can be used to estimate ρ2 , σv2 , and σ12 as, say, ρˇ2 , σ̌v2 , and σ̌12 , respectively.
Finally, consider the estimation of (15.4.1). Let

y(ρˇ2 ) = y − (IT ⊗ ρˇ2 W )y,


Z(ρˇ2 ) = Z − (IT ⊗ ρˇ2 W )Z,

and

yρˇ2 ,θ̌ = (IN T − θ̌Q1 )y(ρˇ2 ), (15.4.10)


Zρˇ2 ,θ̌ = (IN T − θ̌Q1 )Z(ρˇ2 )

where θ̌ = 1 − σ̌v /σ̌1 . Note that in these transformations one is first transforming
to account for spatial correlation, and then transforming again to account for the
covariance matrix of ε.
Let P# = H# (H# H# )−1 H# and

Žρˇ2 ,θ̌ = P# Zρˇ2 ,θ̌ .


Panel Data Models Chapter | 15 319

Then, the proposed estimator of γ is

γ̌ = (Žρˇ ,θ̌ Žρˇ2 ,θ̌ )−1 Žρˇ ,θ̌ yρˇ2 ,θ̌ . (15.4.11)
2 2

Small sample inferences would be based on the finite sample approximation

γ̌  N [γ , (Žρˇ ,θ̌ Žρˇ2 ,θ̌ )−1 ]. (15.4.12)


2

Illustration 15.4.1: A model of crime in North Carolina


The example considered in this illustration is based on a well known panel data
set initially used by Cornwell and Trumbull (1994). The authors specify an eco-
nomic model for crime relating the crime rate (lcrmrte) to a number of proxy
variables meant to control for the return to legal opportunities. In addition, they
also include a set of deterrent variables such as probability of arrest, probability
of conviction conditional on arrest, and probability of imprisonment conditional
on conviction. The panel reports information on counties in North Carolina and
covers a fairly large time period ranging from 1981 to 1987.
The dependent variable in the model is crime per capita (lcrmrte). Some of
the main explanatory variables are deterrents to crime. Specifically, the prob-
ability of arrest (lprbarr) is proxied by the ratio of arrest to offenses; the ratio
of convictions to arrest is a proxy for the probability of conviction (lprbconv),
and, finally, the proportion of total convictions resulting in prison sentences is a
proxy for the probability of imprisonment (lprbpris). The model also includes a
measure of sanction severity (lavgsen) measured by the average prison sentence
as well as the number of police per capita (lpolpc). All the other variables are ei-
ther observable county characteristics, or controls for the relative return to legal
activities.
The relative return to legal activities is captured by the average weekly
wage in the county in various sectors, such as construction (lwcon); trans-
portation, utilities, and communications (lwtuc); wholesale and retail trade
(lwtrd); finance, insurance, and real estate (lwfir); services (lwser); manufac-
turing (lwmfg); and federal (lwfed), state (lwsta), and local government (lwloc).
A dummy variable (urban) controls for differences in participation in the legal
sector that may occur between urban and rural environment. A similar role is
played by a density variable (ldensity) which measures the ratio between county
population and county land area.
The model also includes the proportion of the male population between the
ages of 15–24 (lpctymle), and the proportion of the minority population (lpct-
min). Finally, time dummies are included as well as two other dummy variables
(central and western regional) are added in order to consider regional or cul-
tural factors that may affect the crime rate. All variables, except for the dummy
variables, are on logarithmic scale.
320 Spatial Econometrics

Cornwell and Trumbull (1994) were concerned about the endogeneity of


police per capita (lpolpc) and the probability of arrest. For this reason they sug-
gested an instrumental variable procedure based on two additional instruments:
the logarithm of the offense mix (lomix) and the logarithm of per capita tax
ratio (lpctaxr). Offense mix was defined as the ratio of crimes involving face-
to-face contacts (such as, for example, robbery) to crimes that do not involve
face-to-face contacts. The assumption implicitly made by using this instrument
is that the chance to capture a criminal when the criminal can be identified is
higher. The inclusion of the second instrument is based on the argument that
counties with residents who had greater preferences for law enforcement were
also willing to pay higher taxes to found larger police force.
Results based on (15.4.11) are reported below3 :

 = −0.624(1.145) + 0.372(0.048)lpolpc − 0.333(0.060)lprbarr


lcrmrte
− 0.276(0.032) lprbconv − 0.161(0.034) lprbpris
− 0.013(0.025) lavgsen + 0.452(0.049) ldensity
− 0.008(0.037) lwcon + 0.040(0.017) lwtuc
− 0.009(0.039) lwtrd − 0.006(0.027) lwf ir
+ 0.005(0.019) lwser − 0.194(0.075) lwmfg
− 0.081(0.141) lwf ed − 0.027(0.098) lwsta
+ 0.137(0.111) lwloc − 0.068(0.117) lpctymle
+ 0.187(0.036) lpctmin − 0.220(0.091) west
+ 0.137(0.111) central − 0.068(0.117) urban.

The estimated value of ρ1 is 0.269 and it is strongly statistically significant (with


standard error 0.070). The estimated value for ρ2 is negative (−0.254) and the
estimates of the two variance components σv2 and σ12 are 0.018 and 0.264, re-
spectively. A closer look at the results show that the probability of arrest, the
probability of conviction, the probability of imprisonment, and the measure of
sanction severity all have the expected negative sign. However, only three of
them are statistically significant (i.e., the probability of arrest, the probability
of conviction, and the probability of imprisonment). The variable measuring
the police per capita is positive and statistically significant. This is to say that
in counties where the number of police per capita is higher, the crime rate is
generally lower. Both regional dummies and the density variable are strongly
significant. On the other hand, only a few of the variables that are meant to
capture the return to legal opportunities are significant (i.e., the average weekly

3. The coefficients of the time dummies are not reported.


Panel Data Models Chapter | 15 321

wage in manufacturing sector). One thing is worth noticing here. As it was for
cross-sectional models, the presence of ρ1 complicates the interpretation of the
other coefficients. In a model without spatial lags, and without additional en-
dogenous variables, the coefficients would be interpreted as elasticities. On the
other hand, in the absence of additional endogenous variables, for models such
as that in (15.4.1), the interpretation of the coefficients is a bit different.4 Some
of the software that implements the estimation of spillover effects in models in-
volving spatial lags, but no additional endogenous variables, is available in R or
Matlab, among other packages.

15.5 THE FIXED EFFECTS MODEL


Essentially, the fixed effects model differs from the random effects model in
that it conditionalizes on intercept differences between units. That is, instead
of assuming stochastic conditions for the elements of a vector such as μ, as
in (15.3.1), the fixed effects model simply assumes the elements of μ are fixed
constants, i.e., each unit has its own fixed intercept. Without further assump-
tions, the fixed effect vector μ is a parameter vector of the model.
Essentially, the random effects model is a special case of the fixed effects
model. In the fixed effects model it does not matter how the intercepts were
generated as long as they are uniformly bounded in absolute value. That is,
these fixed effects were generated and we just conditionalize on their generated
values. The random effects model is based on specific assumptions about how
the intercepts were generated, such as those in (15.3.1). These assumptions can
be tested using a modified version of the Hausman specification test; see, e.g.,
Mutl and Pfaffermayr (2011).
Consider the fixed effects model

yt = Xt β1 + ρ1 Wyt + Yt β2 + μ + ut , (15.5.1)
ut = ρ2 W ut + vt , t = 1, ..., T

where, at time t , yt is an N × 1 vector of observations on the dependent vari-


able, Xt is an N × k matrix of observations on k exogenous variables whose
values vary over both cross-sectional units and time, Yt is an N × h matrix of
additional endogenous variables, β1 is a k × 1 parameter vector, β2 is a h × 1
parameter vector, W is an N × N exogenous weighting matrix, ρ1 and ρ2 are

4. Spillover effects in models which have additional endogenous variables are more complex. The
reason for this is that the system involving these variables also involves the dependent variable of
the model being considered, as well as exogenous variables. Therefore spillover effects relate not to
the single equation being considered, but to the complete system. At present, there are no results in
the spatial literature relating to this.
322 Spatial Econometrics

scalar parameters, μ is an N × 1 vector of fixed effects, ut is the disturbance


term, and vt is an N × 1 vector of stochastic innovations. Also, let be an
N T × r matrix of observable exogenous variables the researcher knows to be
in the system determining Y = (Y1 , ..., YT ) , r ≥ h. The variables defining
need not be the only exogenous variables in that system.
At this point, assume |ρ1 | < 1, |ρ2 | < 1, and (IN − aW ) is nonsingular for
all |a| < 1. Let v = (v1 , ..., vT ) . Then, the assumption on the innovation vector
is that the elements of v, say vit , are i.i.d. over both i = 1, ..., N and t = 1, ..., T
with mean and variance of 0 and σv2 , respectively, and finite fourth moment.
Again, this assumption does not account for triangular arrays, and is given only
to simplify the presentation. For a formal counterpart which does account for
triangular arrays, see the central limit theorem in Section A.15 of Appendix A,
as well as in Kapoor et al. (2007). Finally, at this point the development below
is based on the assumption that N is “large” relative to T , i.e., the large sample
results are based on N → ∞ and T being finite.

A Note on Identification
Before continuing we note that in a fixed effects model, the coefficients of re-
gressors whose values do not vary over time are not identified. This is the case
whether those variables are exogenous or endogenous. For example, suppose
(15.5.1) were extended to

yt = Sβ0 + Xt β1 + ρ1 Wyt + Yt β2 + μ + ut , t = 1, ..., T (15.5.2)

where S is an N × ks regressor matrix whose values do not vary over time. Since
Sβ0 is a time invariant N × 1 vector the model in (15.5.2) reduces to

yt = Xt β + ρ1 Wyt + Yt β2 + μ1 + ut , t = 1, ..., T (15.5.3)

where μ1 = μ + Sβ0 . In a sense, μ1 could be viewed as a “new” fixed effects


vector! Clearly, at best μ1 , but not μ or β0 , can be estimated, although not
consistently.

Issues Relating to the Fixed Effects Vector


The time pooled version of the model in (15.5.1) is

y = Xβ1 + ρ1 (IT ⊗ W )y + Yβ2 + (eT ⊗ IN )μ + u, (15.5.4)


u = ρ2 (IT ⊗ W )u + v

where eT is an T × 1 vector of unit elements and

y = (y1 , ..., yT ) ,


Panel Data Models Chapter | 15 323

X = (X1 , ..., XT ) ,


Y = (Y1 , ..., YT ) ,
u = (u1 , ..., uT ) ,
v = (v1 , ..., vT ) .

Since μ is an N × 1 parameter vector, the number of regression parameters


in (15.5.4) is (k + h + 1 + N ) → ∞ as N → ∞. The elements of μ cannot
be consistently estimated since T is assumed to be finite. To see this note that
even if, in (15.5.4), β1 = 0, ρ1 = ρ2 = 0, and β2 = 0, the vector μ cannot be
consistently estimated. For example, in this case the model in (15.5.4) reduces
to
y = (eT ⊗ IN )μ + v. (15.5.5)
The model in (15.5.5) is a linear regression model with an exogenous regressor
matrix, (eT ⊗ IN ), and an error vector whose elements are i.i.d. (0, σv2 ). If,
in addition, v is normally distributed, the maximum likelihood estimator of μ
would be an efficient estimator, and would just be the OLS estimator, say μ̂,
namely

μ̂ = [(eT ⊗ IN )(eT ⊗ IN )]−1 (eT ⊗ IN )y (15.5.6)


T
1
= yt = ȳ.
T
t=1

The properties of μ̂ are easily determined by substituting (15.5.5) into the first
line of (15.5.6):

μ̂ = [(eT ⊗ IN )(eT ⊗ IN )]−1 (eT ⊗ IN )] [(eT ⊗ IN )μ + v] (15.5.7)


= μ + [(eT ⊗ IN )(eT ⊗ IN )] −1
(eT ⊗ IN )]v.

Clearly,

E(μ̂) = μ, (15.5.8)
V Cμ̂ = σv2 [(eT ⊗ IN )(eT ⊗ IN )]−1
σv2
= IN .
T
It should be evident that issues relating to the consistency of μ̂ involve T . For
example, if N is assumed to be given, and T → ∞, then by (15.5.8) V Cμ̂ → 0,
and since μ̂ is unbiased, Chebyshev’s inequality in Section A.3 of Appendix A
P
implies that μ̂ is consistent, μ̂ → μ. Now let μ̂i , i = 1, ..., N be the ith element
324 Spatial Econometrics

of μ̂ and consider the case in which N → ∞ and T → ∞. In this case (15.5.8)


implies E(μ̂i ) = μi and var(μ̂i ) → 0, and so again Chebyshev’s inequality
P
implies μ̂i → μi , i = 1, ..., N .5
In most spatial models the sample configuration considered is N → ∞ with
T being fixed. This is one of the basic assumptions we have used in this chapter.
Because in this case a consistent estimator of the fixed effects vector μ does not
exist, the model is typically transformed to eliminate μ. This permits a simpler
focus on the parameters in (15.5.4) which can be consistently estimated, as we
show below.

Eliminating Fixed Effects and Obtaining Instruments


Premultiplying (15.5.4) across by Q0 and noting that (15.2.5) implies Q0 (eT ⊗
IN ) = 0 yields

Q0 y = Q0 Xβ1 + ρ1 Q0 (IT ⊗ W )y + Q0 Yβ2 + Q0 u (15.5.9)


= Q0 Xβ1 + ρ1 (IT ⊗ W )Q0 y + Q0 Yβ2 + Q0 u
= Q0 Zγ + Q0 u

and

Q0 u = ρ2 (IT ⊗ W )Q0 u + Q0 v (15.5.10)

where Q0 Z = [Q0 X, (IT ⊗ W )Q0 y, Q0 Y ] and γ = (β1 , ρ1 , β2 ) . Assuming


that [IN T − a(IT ⊗ W )] is nonsingular for all |a| < 1, the second line in (15.5.9)
implies

Q0 y = [IN T − ρ1 (IT ⊗ W )]−1 [Q0 Xβ1 + Q0 Yβ2 + Q0 u], (15.5.11)

and so the expected value of Q0 y is

E[Q0 y] = [IN T − ρ1 (IT ⊗ W )]−1 [Q0 Xβ1 + Q0 E(Y )β2 ]. (15.5.12)

The suggested matrix of instruments for estimating γ in (15.5.9) based on


(15.5.12) is

H∗ = Q0 [X, (IT ⊗ W )X, (IT ⊗ W 2 )X, , (IT ⊗ W ) , (IT ⊗ W 2 ) ]LI


(15.5.13)

P
5. In this case N → ∞ and we are not saying that μ̂ → μ as N → ∞ because this “limit” makes
no sense. The reason for this is that μ is an N × 1 vector and so, in the limit, μ cannot even be
defined – there is no upper limit to ∞.
Panel Data Models Chapter | 15 325

where, again, is the N T × r matrix of observable exogenous variables that


are in the system determining Y , and where higher powers than 2 could be used.6

Estimation
The estimation procedure takes place in three steps. In the first step a consistent
but inefficient estimator, say γ̂ , of γ in (15.5.9) is determined. Then, γ̂ is used
to estimate the error vector in (15.5.9), namely Q0 u. In the second step the
estimator of Q0 u is used to estimate the parameters ρ2 and σv2 . In the third step
the model in (15.5.9) is transformed to account for the spatial correlation, and
then a more efficient estimator of γ is obtained. An expression is then given
which enables finite sample inferences.

Step 1
Let PH∗ = H∗ (H∗ H∗ )−1 H∗ and Ẑ∗ = PH∗ Q0 Z. Then, the 2SLS estimator of γ
in (15.5.9), based on the instruments in (15.5.13) is
γ̂ = (Ẑ∗ Ẑ∗ )−1 Ẑ∗ Q0 y. (15.5.14)
P
Under standard conditions, γ̂ can easily be shown to be consistent, γ̂ → γ .
Given γ̂ , the evident estimator of Q0 u in (15.5.9) is
Q0 u = Q0 y − Q0 Z γˆ. (15.5.15)

For future reference note that


Q0 Q0 u = Q0 Q0 y − Q0 Q0 Z γ̂ (15.5.16)
= Q0 y − Q0 Z γ̂
= Q0 u.

Step 2
Given Q0 u, the parameters ρ2 and σv2 can be consistently estimated using the
first three equations in Kapoor et al. (2007). For example, noting from (15.2.5)
that Q0 = Q0 and Q20 = Q0 , the empirical form of the first three equations in
their paper can be expressed in terms of Q0 u in (15.5.15) as
1
(Q0 û − ρ2 Q0
ū) (Q0 û − ρ2 Q0
ū) = σv2 + δ̂1 , (15.5.17)
N (T − 1)
1 ¯ = σν2 1 T r(W  W ) + δ̂2 ,
(Q0ū − ρ2 Q0¯  (Q0
ū) ū − ρ2 Q0
ū)
N (T − 1) N

6. Again, because the model has additional endogenous variables, maximum likelihood or Bayesian
methods cannot be implemented unless the entire system generating the endogenous variables is
known!
326 Spatial Econometrics

1
ū − ρ2 Q0
(Q0 ¯  (Q0 û − ρ2 Q0
ū) ū) = 0 + δ̂3
N (T − 1)

where δ̂i , i = 1, 2, 3 are error terms.7 The estimators of ρ2 and σv2 , say ρ̌2 and
σ̌v2 , are then obtained by nonlinear least squares, namely by finding

arg min (δ̂12 + δ̂22 + δ̂32 ). (15.5.18)


ρ2 ,σv2

Let


ū0 = (IT ⊗ W )û0 ,

ū¯ 0 = (IT ⊗ W )
ū0 ,
û0 = Q0 u.

Note that in light of (15.5.16) the three equations in (15.5.17) can also be ex-
pressed as
1
(û0 − ρ2
ū0 ) (û0 − ρ2 ū0 ) = σv2 + δ̂1 , (15.5.19)
N (T − 1)
1 1
ū0 − ρ2
( ū0 − ρ2
ū¯ 0 ) ( ū¯ 0 ) = σν2 T r(W  W ) + δ̂2 ,
N (T − 1) N
1
ū0 − ρ2
( ū¯ 0 ) (û0 − ρ2
ū0 ) = 0 + δ̂3 ,
N (T − 1)
since Q0 (IT ⊗ ρ2 W ) = (IT ⊗ ρ2 W )Q0 .

Step 3
Finally, one needs to transform the variables in (15.5.9) in order to account for
spatial correlation. Specifically, let

yρˇ2 = Q0 y − (IT ⊗ ρˇ2 W )Q0 y, (15.5.20)

7. For example, since Q0 is symmetric and idempotent, any quadratic form such as M  Q0 M can
be expressed as

M  Q0 M = M  Q0 Q0 M
= (Q0 M) (Q0 M).

Using this, the first equation in Kapoor et al. (2007) can be written as
1 1
ε Q0 ε = (Q0 ε) (Q0 ε)
N (T − 1) N (T − 1)
= σv2 + δ1

where ε = u − ρ(IT ⊗ W )u is the innovation vector, and E(δ1 ) = 0. Thus, to estimate ε Q0 ε one
only has to estimate Q0 ε.
Panel Data Models Chapter | 15 327

Zρˇ2 = Q0 Z − (IT ⊗ ρˇ2 W )Q0 Z.

Applying this transformation to (15.5.9) yields the approximation model


.
yρˇ2  Zρˇ2 γ + Q0 v (15.5.21)

where the approximation would be perfect if ρˇ2 = ρ2 .


Let
Žρˇ2 = PH∗ Zρˇ2 (15.5.22)
where PH∗ = H∗ (H∗ H∗ )−1 H∗ and H∗ is defined in (15.5.13). Then, the sug-
gested estimator of γ in (15.5.21) is

γ̌ = (Žρˇ2 Žρˇ2 )−1 Žρˇ2 yρˇ2 . (15.5.23)

Straightforward calculations will demonstrate the consistency of γ̌ , as well as


its asymptotic normality. Small sample inferences can be based on the finite
sample approximation

γ̌  N [γ , σ̂v2 (Žρˇ2 Žρˇ2 )−1 ] (15.5.24)

where σ̂v2 is a consistent estimator of σv2 . One such estimator is σ̌v2 which is
determined by the GMM approach described by (15.5.18). Another one is based
on (15.5.21). Let

Q0 v = yρˇ2 − Zρˇ2 γ̌ . (15.5.25)
Then, under reasonable conditions, another consistent estimator of σv2 is σ̂v2
where
1  
σ̂v2 = (Q0 v) (Q0 v). (15.5.26)
N (T − 1)
Illustration 15.5.1: A fixed effects version of the model of crime
We consider again the model of crime in North Carolina and the three-step pro-
cedure described above for the fixed effects model. Furthermore, for that model
we consider the two additional instruments (offense mix and per capita tax ratio)
to control for the endogeneity of police per capita and the probability of arrest.
Results from the estimation are reported below:

 = 0.427(0.106)lpolpc − 0.250(0.113)lprbarr
lcrmrte
− 0.246(0.065) lprbconv − 0.142(0.048) lprbpris
− 0.007(0.027) lavgsen + 0.417(0.289) ldensity
328 Spatial Econometrics

− 0.042(0.039) lwcon + 0.038(0.018) lwtuc


− 0.017(0.040) lwtrd − 0.009(0.028) lwf ir
+ 0.017(0.020) lwser − 0.307(0.124) lwmfg
− 0.332(0.185) lwf ed + 0.056(0.117) lwsta
+ 0.173(0.120) lwloc + 0.316(0.387) lpctymle.

The value of ρ1 = 0.370 is strongly statistically significant (with standard error


0.181). The estimated value of ρ2 is negative (−0.254), and the estimated value
of the variance component σv2 is 0.018. These two values are extremely similar
to those obtained in Illustration 15.4.1 in the previous section. As for the de-
terrent variables (i.e., the probability of arrest, the probability of conviction, the
probability of imprisonment and the measure of sanction severity) all have the
expected negative sign and are (in absolute value) lower than those obtained in
the random effect model of Illustration 15.4.1. The variable measuring the level
of police per capita is also statistically significant and has the expected positive
sign.
Clearly, the dummy variables west, central, and urban, as well as all the
other variables that do not vary over time, are wiped out when we apply the
fixed effect transformation, and, therefore, their coefficients are not identified.

A Dynamic Version of the Fixed Effects Model


Another specification of the fixed effects panel data model is a variation of a
dynamic random effects model given by Baltagi et al. (2014b).8 The model we
discuss here is a dynamic version of the model in (15.5.1) but it does not have
additional endogenous variables.
Using evident notation the model is

yt = Xt β + ρ1 Wyt + θyt−1 + μ + ut , (15.5.27)


ut = ρ2 W ut + vt , t = 1, ..., T

where yt is an N × 1 vector, μ is an N × 1 fixed effects vector, etc. The elements


of the N T × 1 vector v  = (v1 , ..., vT ) are assumed to be i.i.d. (0, σv2 ).
Following Baltagi et al. (2014b), the fixed effects vector μ can be eliminated
by time differencing (15.5.27) to obtain

yt = Xt β + ρ1 W yt + θ yt−1 + ut , (15.5.28)


ut = ρ2 W ut + vt , t = 1, ..., T

8. See also Arellano and Bond (1991).


Panel Data Models Chapter | 15 329

where yt = yt − yt−1 , etc.9 The error vector ut in (15.5.28) can be expressed
as
ut = [IN − ρ2 W ]−1 vt . (15.5.29)
 ) = 0 for all s ≤ t −2 implies E(u y  ) = 0 for all s ≤ t −2,
Since E(vt yt−s t t−s
in their framework Baltagi et al. (2014b) suggest the use of time lagged de-
pendent as well as exogenous variables as instruments to estimate their model.
Many steps in their procedure would carry over to the estimation of (15.5.28).
The overall procedure is interesting, but a little bit tedious. It also depends cru-
cially on the assumption that the elements of vt are i.i.d. (0, σv2 ) over both
i = 1, ..., N and t = 1, ..., T . We do not describe the details of the procedure
because in Section 15.6 we present a general panel data fixed effects model
which contains both (15.5.1) and (15.5.27) as special cases.

15.6 A GENERALIZATION OF THE FIXED EFFECTS MODEL


The fixed effects model in (15.5.1) is generalized in this section. This gener-
alization involves a nonparametric specification of the error term, as well as
the contemporaneous presence of additional endogenous regressors, and a dy-
namic term in the model. Ironically, formal large sample results are available
and easily obtained for this generalized model! The reason for this is that the
nonparametric specification of the error term precludes the use of lagged depen-
dent variables as instruments. Again, the large sample theory described in this
section relates to the case in which N → ∞ and T is fixed.
The nonparametric specification of the error term is especially important
and, as we strongly suggested in Chapter 9. It is indicative of a “new wave” of
spatial research. For the convenience of readers who may not have read Chap-
ter 9, we outline the arguments given in that chapter. Further details are given in
Chapter 9.
As we mentioned in Chapter 9, in time series econometrics, researchers used
to specify heteroskedasticity in terms of a known function of exogenous regres-
sors and a parameter vector, say θ ,

σi2 = f (xi , θ), i = 1, ..., N, (15.6.1)

in order to reduce the number of parameters.10 At the time, the general un-
derstanding was that without an assumption such as (15.6.1) the variance–

9. Baltagi et al. (2014b) eliminate their random effects vector because its elements are correlated
with the time lagged dependent variable. So, although their model is quite different than ours in that
they have random effects while our model here has fixed effects, the approach taken for estimation
is quite similar.
10. Note that in a panel context, the expression in (15.6.1) can be further complicated if, for exam-
ple, time lagged variables are considered.
330 Spatial Econometrics

covariance matrix of the estimators could not be consistently estimated. The


assumed reason for this was that it would involve N unknown variances, which
would be viewed as unknown parameters, where N is the size of the sample.
Given an assumption such as (15.6.1), those kinds of model were estimated
by ML, or by an iterative method in which one would first estimate the model
ignoring the heteroskedasticity, then use the estimated squares of the residu-
als to estimate θ , and then account for the heteroskedasticity by dividing the
model across by [f (xi , θ̂ )]1/2 . This was the approach followed years ago before
the influential paper of White (1980). White (1980) pointed out that estimation
procedures based on structural specifications of the variances, such as (15.6.1),
introduce biases. Among other things, his suggestion was to estimate the model
ignoring the heteroskedasticity and then estimate the resulting VC matrix of
the estimators by a robust procedure (HAC). His procedure essentially assumed
a nonparametric specification of the error term variances. Nowadays it is rare
to find a study in which models involving heteroskedasticity are estimated by
assuming a structure such as (15.6.1).
In both applied and theoretical spatial modeling it is extremely common for
researchers to assume a particular structure for their error terms. Two common
structures assumed are the spatial autoregressive (SAR) and the spatial moving
average (SMA) models, etc. We recommend again that spatial researchers take a
lesson from the heteroskedasticity literature and stop modeling their error terms
structurally – the error terms are the unknown parts of the model!
In this section we specify the error terms nonparametrically in such a way
that it allows for very general patterns of heteroskedasticity, as well as spatial
and time correlation. The resulting VC matrix of the estimators is estimated by
an HAC procedure.
Consider the model

yt = Xt β1 + Pt β2 + ρ1 Wyt + αyt−1 + Yt β3 + μ + ut , (15.6.2)


t = 1, ..., T

where, at time t , yt is the N × 1 vector of observations on the dependent vari-


able, Xt is an N × kx matrix of observations on exogenous variables, Pt is an
N × T matrix of observations on T time dummy variables, W is an observed
N × N exogenous weighting matrix, Yt is an N × kY matrix of observations on
endogenous regressors, μ is an N × 1 vector of fixed effects, and ut is the cor-
responding N × 1 disturbance vector. The parameter vectors are β1 , β2 , and β3
which are respectively kx × 1, kP × 1, and kY × 1. The parameters ρ1 , and α are
scalars. For ease of presentation, assume the available data are from t = 0, ..., T
so that yt , yt−1 , Xt , Yt , and Pt are observed for all t = 1, ..., T .
Panel Data Models Chapter | 15 331

Stacking the variables of the model over time, let

y = (y1 , ..., yT ) ,


X = (X1 , ..., XT ) ,
P = (P1 , ..., PT ) ,
y− = (y0 , ..., yT −1 ) ,
u = (u1 , ..., uT ) .

Given these definitions, the stacked form of the model is

y = Xβ1 + Pβ2 + ρ1 (IT ⊗ W )y + αy− + Yβ3 + (eT ⊗ μ) + u (15.6.3)


= Zγ + (eT ⊗ μ) + u

where eT is a T × 1 vector of unit elements, and

Z = (X, P , (IT ⊗ W )y, y− , Y ),


γ = (β1 , β2 , ρ1 , α, β3 ).

Since consistent estimators of the elements of the fixed effects vector μ are not
possible, the fixed effects vector will be eliminated from the model. As we saw
in Section 15.5, there are at least two “typical” ways of doing this. One is to
take time differences; the other is to premultiply the model across by Q0 . In
this section we eliminate the fixed effects by premultiplying the model across
by Q0 . Specifically,

Q0 y = Q0 Zγ + Q0 u, (15.6.4)
∗ ∗ ∗
y =Z γ +u ,

since Q0 (eT ⊗ μ) = 0, and where y ∗ = Q0 y, Z ∗ = Q0 Z, and u∗ = Q0 u.


Instead of assuming a structural form of spatial correlation and heteroskedas-
ticity for the error term, we assume
t
ut = Rtj εj , t = 1, ..., T (15.6.5)
j =1

where the Rtj is an N × N unknown exogenous matrix, j = 1, ..., T , and εj ,


j = 1, ..., T are N × 1 random innovation vectors. The expression in (15.6.5)
implies that the error vector in time period t, namely ut , can be expressed as
a weighted sum of random innovation N × 1 vectors, εj , relating to periods
j = 1, ...., t . In matrix terms (15.6.5) can be expressed as

u = Rε, (15.6.6)
332 Spatial Econometrics

⎡ ⎤
R11 0 . . . 0
⎡ ⎤ ⎢ ⎥⎡ ⎤
u1 ⎢ R21 R22 0 . . 0 ⎥ ε1
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ . . . 0 . . ⎥⎢ .. ⎥
⎣ . ⎦ ⎢ ⎢ . . . . 0 .
⎥⎣
⎥ . ⎦
⎢ ⎥
uT ⎣ . . . . . 0 ⎦ εT
RT 1 RT 2 . . . RT T

where R is the N T × N T matrix whose (i, j )th block is the N × N matrix Rij ,
i, j = 1, ..., T .
Let be an N T × h, h ≥ kY matrix of observations on exogenous variables
that are not in (15.6.3), but are in the system determining Y . As described in
earlier models, the variables in may only be a subset of the variables in that
system.
The parameters in the model in (15.6.3) cannot be estimated by maximum
likelihood or by Bayesian techniques unless all the equations determining Y
are known. Therefore, as for earlier models the estimation procedure will be
instrumental variables.
Let H∗ = [Q0 X, Q0 P , Q0 ]. Then the IV matrix is H , where

H = {H∗ , (IT ⊗ W )Q0 H∗ , ..., (IT ⊗ W r )Q0 H∗ }LI (15.6.7)


= {H∗ , (IT ⊗ W )Q0 X, (IT ⊗ W )Q0 P , (IT ⊗ W )Q0 , ..., (IT ⊗ W )Q0 } r

where typically r = 2 and Q0 Q0 = Q0 . For future reference, note that H  Q0 u =


H  u = H  Rε, since Q20 = Q0 , and so Q0 H = H .
The assumptions underlying the large sample theory for the parameter esti-
mators of this model are given below. These assumptions are intuitive, and are
sufficient for the large sample results presented below.
Assumption 15.1. The elements of ε are i.i.d. with mean and variance of 0
and 1, respectively, and have finite fourth moment.11
Assumption 15.2. The unknown exogenous matrix R in (15.6.4) is nonsingular.
Assumption 15.3. The elements of H in (15.6.7) are uniformly bounded in
absolute value.
Assumption 15.4. (IN − aW ) is nonsingular for all |a| < 1.
Assumption 15.5. The large sample theory relates to N → ∞, while T remains
fixed and finite. Assume the following limits

(A) lim (N T )−1 H  H = H H ,


N→∞

11. Again, this does not account for triangular arrays; see Section A.15 in the appendix on large
sample theory, or Kapoor et al. (2007) for specifications that do account for triangular arrays.
Panel Data Models Chapter | 15 333

(B) lim (N T )−1 H  RR  H = H RRH ,


N→∞
(C) p lim (N T )−1 H  Z ∗ = p lim (N T )−1 H  Z = H Z ,
N→∞ N→∞
−1
(D) p lim (N T ) Z Z = p lim (N T )−1 Z  Q0 Z = ZQ0 Z
∗ ∗
N→∞ N→∞

where H H , H RRH , and ZQ0 Z are nonsingular finite matrices, and H Z has
full column rank.

Implications of the Assumptions


Assumption 15.1 implies that the VC matrix of the disturbance term u specified
in (15.6.6) is
E(uu ) = RR  . (15.6.8)
The implication of (15.6.8) is that the disturbance term may be both spatially and
time correlated, as well as heteroskedastic. Furthermore since R is unknown,
very general patterns of spatial and time correlation, as well as heteroskedas-
ticity, are consistent with (15.6.8). Another implication of (15.6.8) is that time
lagged dependent variables cannot be included in the set of instruments be-
cause general patterns of time correlation are consistent with (15.6.8) and hence
all time lagged endogenous variables must be viewed as endogenous.
Effectively, (C) of Assumption 15.5 is an identification condition. The re-
maining assumptions are somewhat standard and imply, among other things, that
the model in (15.6.4) is complete in that it can be solved for y in terms of the
remaining variables of the model (Assumption 15.4). They also rule out peculiar
sequences of the exogenous variables (Assumption 15.5). For example, let xi,t
be the first regressor in Xt . Then (A) of Assumption 15.5 rules out sequences
such as

x1,t = 1, x2,t = 5; x3,t = 7; x4,t = 1, x5,t = 5, x6,t = 7, x8,t = 1, ... .

Estimation and Large Sample Properties


The model in (15.6.4) can be estimated by 2SLS, and the asymptotic VC ma-
trix can be estimated by an HAC procedure. Let Ẑ ∗ = PH Z ∗ where PH =
H (H  H )−1 H  . Then, the 2SLS estimator of γ in (15.6.4) is

γ̂ = (Ẑ ∗ Ẑ ∗ )−1 Ẑ ∗ y ∗ . (15.6.9)

Given (15.6.4) it follows that

(N T )1/2 (γ̂ − γ ) (15.6.10)


= (N T )(Ẑ ∗ Ẑ ∗ )−1 (N T )−1/2 Ẑ ∗ u∗
334 Spatial Econometrics

= (N T )[Z ∗ H (H  H )−1 H  Z ∗ ]−1 Z ∗ H (H  H )−1 (N T )−1/2 H  u


= [(N T )−1 Z  H (H  H )−1 H  Z]−1 Z  H (H  H )−1 (N T )−1/2 H  Rε

again, since Q20 = Q0 . Note that there are no parameters of the error term that
have to be estimated in order to obtain γ̂ .
The large sample distribution of γ̂ is
D
(N T )1/2 (γ̂ − γ ) → N (0, V Cγ̂ ), (15.6.11)

V Cγ̂ = SH RRH S ,
S = [H Z −1 −1  −1
H H H Z ] H Z H H ,
H RRH = lim (N T )−1 H  RR  H.
N→∞

The proof for the result in (15.6.11) is straightforward.


In finite sample inferences can be based on

γ̂  N [γ , (N T )−1 V C γ̂ ] (15.6.12)

where

ˆ H RRH Ŝ  ,
V C γ̂ = Ŝ 
Ŝ = [(N T )−1 Z  H (H  H )−1 H  Z]−1 Z  H (H  H )−1 ,

and where ˆ H RRH is the HAC estimator of H RRH . In constructing this HAC
estimator, RR  should be viewed as the VC matrix of the error term u; see
Chapter 9.

Illustration 15.6.1: A dynamic model of cigarette consumption


In this illustration we use a dynamic demand model for cigarettes. The data
set is based on a panel over the period 1964–1992 for 46 US states and was
originally used for the first time (over a limited period) in Baltagi and Levin
(1986). The authors estimated a dynamic demand for cigarettes to address vari-
ous policy issues. One of their main findings is that cigarette sales are negatively
affected by the average price of cigarettes with a price elasticity of −0.2. They
also found that the income effect is not relevant. A nice feature of their model is
that cigarette sales in each state is related to the lowest cigarette price in neigh-
boring states. This price variable, which in a sense is similar to a spatially lagged
dependent variable, was meant to capture cross-state shopping by cigarette con-
sumers as well as a “bootlegging” effect. This bootlegging effect was found to
be positive and statistically significant in their model. Baltagi and Levin (1992)
improved the results of their previous analysis by considering an extended time
Panel Data Models Chapter | 15 335

frame. They also considered different ways of modeling the bootlegging effect.
In fact, they studies the sensitivity of their results by replacing their minimum
price variable with a maximum neighboring price variable.
In this example we formulate a slightly modified version of the model con-
sidered by Baltagi and Levin (1992) in which we replace their minimum price
with an average price variable based on the six nearest neighbors states; we also
consider the spatial lag of cigarette consumption.
More specifically, the model that we estimate in this example is

ln Cit = β1 ln Cit−1 + β2 ln pit + β3 ln Iit


46 46
+ β4 wij ln pj t + λ wij ln Cj t + μi + δt + uit
j =1 j =1

where i = 1, ..., 46 denotes states, t = 1, ..., 29 denotes time periods. Cit is


cigarette sales per capita in constant dollars to persons of smoking age in state
i at time t; pit is the real price of cigarettes; Iit is real per capita disposable in-
come; wij is an element of the spatial weighting matrix; μi is the fixed effect for
state i, and δt is the fixed time effect for period t . The error term uit is assumed
to have the nonparametric specification described in (15.6.6).
In order to estimate the model, we use the 2SLS procedure described in
Section 15.6. The matrix of instruments is specified as

H = Q0 [X, X− , (IT ⊗ W )X, (IT ⊗ W 2 )X, (IT ⊗ W )X− , (IT ⊗ W 2 )X− , ]

where X is the N T × 2 matrix of observations on the price variable ln pit , and


the income variable ln Iit , X− is the time lag of the variables in X, and  is a
matrix of the time dummy variables.
Results from the estimation using the matrix H are reported below (except
for the time dummies):

ln C = 0.643(0.037) ln C− − 0.437(0.045) ln p + 0.166(0.032) ln I


+ 0.177(0.097)W ln p + 0.217(0.056)W ln C.

A glance at the results shows that the coefficients of the (time) lagged con-
sumption variable, and of price and income have the expected signs and are also
statistically significant. In fact, one would expect that consumption habits are
persistent, the price effect on demand is negative, and the income effect is posi-
tive. However, it should be stressed once more that these coefficients cannot be
interpreted as elasticities because of the presence of the spatially lagged depen-
dent variable whose coefficient is positive and significant.12 The average price

12. Obtaining the elasticity for this dynamic panel data model would be even more complicated
than usual. For an example of a dynamic panel, see Parent and LeSage (2010).
336 Spatial Econometrics

of the six nearest neighbors state is not statistically significant at the usual 5%
level.
A final point relates to statistical inference. Standard errors are produced
using the spatial HAC estimator with a Parzen kernel. In doing this we specify
a variable bandwidth based on the distance to the six nearest neighbors.

15.7 TESTS OF PANEL MODELS: THE J -TEST


In this section we focus on testing the null panel data fixed effects model against
a set of alternatives using the J -test, which was described in Chapter 12 in a
single panel framework. The results in this section should demonstrate the ease
of extending the results in Chapter 12 to a panel framework.

The Null Model


The assumed null model is the same as the general model specified in Sec-
tion 15.6. The time stacked form of that model in (15.6.3) and its error term
specification in (15.6.6) are repeated here for the convenience of the reader:

y = Xβ1 + Pβ2 + ρ1 (IT ⊗ W )y + αy− + Yβ3 + (eT ⊗ μ) + u (15.7.1)


= Zγ + (eT ⊗ μ) + u,
Z = (X, P , (IT ⊗ W )y, y− , Y ), γ = (β1 , β2 , ρ1 , α, β3 ),
u = Rε

where y is the N T × 1 vector of observations on the dependent variable, X is


the N T × kx matrix of observations on exogenous variables which vary over
both time and cross-sections, P is an N T × T matrix of observations on T time
dummy variables, W is an observed N × N exogenous weighting matrix, Y is
an N T × kY matrix of observations on endogenous regressors, μ is an N × 1
vector of fixed effects, and u is an N T × 1 disturbance vector. The parameter
vectors are β1 , β2 , and β3 which are respectively kx × 1, kP × 1, and kY × 1. The
parameters ρ1 and α are scalar. For ease of presentation, assume the available
data are from t = 0, ..., T so that yt , yt−1 , Xt , Yt , and Pt are observed for all
t = 1, ..., T . The assumptions for the error term and for R are the same as in
Section 15.6 and so
E(uu ) = RR  . (15.7.2)
As in Section 15.6, the elements of the fixed effects vector μ cannot be
consistently estimated, and so μ is eliminated from the model. There are at least
two “typical” ways of doing this. One is to take time differences as was done
in Section 15.5, and the other is to premultiply the model across by Q0 , as was
Panel Data Models Chapter | 15 337

done in Section 15.6. In this section we use the Q0 method, which turns out to
be convenient.
Premultiplying the fixed effects model in (15.7.1) by Q0 yields

Q0 y = Q0 Zγ + Q0 u, (15.7.3)
∗ ∗ ∗
y =Z γ +u ,

since Q0 (eT ⊗ μ) = 0, and where y ∗ = Q0 y, Z ∗ = Q0 Z, and u∗ = Q0 u. The


model in (15.7.3) can be viewed as the final form of the null model.

The Alternative Models


Suppose there are G alternative models, where G ≥ 1 is finite. Also, suppose
the researcher specifies these alternatives in the general form as (15.7.1)

y = XJ βJ,1 + PβJ,2 + ρJ,1 (IT ⊗ WJ )y + αJ y− + YJ βJ,3 (15.7.4)


+ (eT ⊗ μJ ) + uJ
= ZJ γJ + (eT ⊗ μJ ) + uJ , ZJ = (XJ , P , (IT ⊗ WJ )y, y− , YJ ),
uJ = RJ εJ , J = 1, ..., G

where, using evident notation, XJ and YJ are respectively the N T × kJ,x and
N T × kJ,Y matrices of observations on the exogenous and endogenous variables
in the J th alternative model, WJ is the corresponding weighting matrix, etc. The
unit specific vector, μJ , can be either a random or a fixed effects vector. Note
that some alternative models may only differ from the null in terms of their
weighting matrix, others may only differ in their set of regressors, while others
may differ in both!
As in Chapter 12, the J -test is based on augmenting the null model with
predictions of the dependent variable based on the alternative models, and then
testing for the significance of those augmenting variables. The dependent vector
in the final form of the null model is y ∗ = Q0 y. Therefore, the J -test in this
panel data framework is based on testing for the significance of predictions of
Q0 y based on the alternative models.
Premultiplying (15.7.4) by Q0 yields

y ∗ = ZJ∗ γJ + u∗J , (15.7.5)


ZJ∗ = Q0 ZJ ,
u∗J = Q0 uJ .

The Augmented Model


Let γ̂J be the researcher’s estimator of γJ based on his/her assumptions of the
J th alternative model, J = 1, ..., G. As described in Chapter 12, there are at
338 Spatial Econometrics

least two ways of predicting the dependent vector based on the J th model. One
is just the estimated right-hand side of that model based on γ̂J . The other is
based on the reduced form. Under reasonable conditions, Kelejian and Piras
(2016b) show that, in a panel data framework, if there is only one alternative,
G = 1, the asymptotic power of the test is the same for these two types of pre-
dictors. They also give Monte Carlo results which suggest that in finite samples
the power is roughly the same for these two types of predictors even if G > 1.
Because the predictor based on the estimated right-hand side is computationally
simpler in that it does not involve inverting a matrix, we suggest its use.
Let ŷJ∗ = ZJ∗ γ̂J = Q0 ZJ γ̂J be the predicted value of the dependent vector
based on the J th model, J = 1, ..., G. Let

Ŷ1,G = (ŷ1∗ , ..., ŷG

), (15.7.6)

δ = (δ1 , ..., δG

)

where δ is a parameter vector. Then the augmented model is

y ∗ = Z ∗ γ + Ŷ1,G

δ + u∗ (15.7.7)
∗ ∗
=M F +u

∗ ) and F = (γ  , δ  ) . Recalling (15.7.1) and (15.7.3), u∗ =


where M ∗ = (Z ∗ , Ŷ1,G
Q0 u = Q0 Rε.
Let H be an instrument matrix (whose elements are specified below), PH =
H (H  H )−1 H  , and M̂ ∗ = PH M ∗ . Then, the 2SLS estimator of F based on
(15.7.7) is F̂ where
F̂ = (M̂ ∗ M̂ ∗ )−1 M̂ ∗ y ∗ . (15.7.8)

The Instruments
In a manner similar to Chapter 6, let J be the N T × hJ matrix of observations
on the exogenous variables the researcher knows to be in the system determin-
ing YJ , and assume that hJ ≥ kJ,Y . Also, let XJ,− and J,− be identical to
XJ and J except that each element now is lagged by one time period. Let
J = (XJ , J , XJ,− , J,− ) and let

J = (J , (IT ⊗ WJ )J , ..., (IT ⊗ WJr )J )LI , J = 1, ..., G, (15.7.9)

where r would typically be taken as r = 2.


Similarly, let be the N T × h matrix of observations on the exogenous
variables the researcher knows to be in the system determining Y in the null
model, h ≥ kY . Also, let X− and − be identical to X and except that now
Panel Data Models Chapter | 15 339

each element is lagged by one time period. Let  = (X, , X− , −) and

 = (, P , (IT ⊗ W ), ..., (IT ⊗ W r )), (15.7.10)

where typically r = 2. Then, the instrument matrix for estimating the augmented
model is
H = Q0 (, 1 , ..., G )LI . (15.7.11)

Assumptions
The assumptions relating to the augmented model are quite similar to those
in Section 15.6. Specifically, with respect to the augmented model in (15.7.7)
assume Assumptions 15.1, 15.2, 15.3, and 15.4. Assumption 15.5 is replaced
by13
Assumption 15.6. The large sample theory relates to N → ∞, while T remains
fixed and finite, and

(A) lim (N T )−1 H  H = H H ,


N→∞
(B) lim (N T )−1 H  RR  H = H RRH ,
N→∞
(C) p lim (N T )−1 H  M ∗ = H M ∗ ,
N→∞
(D) p lim (N T )−1 M ∗ M ∗ = M ∗ M ∗
N→∞

where H H , H RRH , and M ∗ M ∗ are nonsingular finite matrices, and H M ∗


has full column rank.
Given Assumptions 15.1–15.4 and 15.6, it is straightforward to show that
D
(N T )1/2 (F̂ − F ) → N (0, V CF̂ ), (15.7.12)

V CF̂ = S H RRH S ,
S = p lim (N T (M̂ ∗ M̂ ∗ )−1 H M ∗ −1
H H ),
N→∞
p lim N T (M̂ ∗ M̂ ∗ )−1 = (H M ∗ −1 −1
H H H M ) .

N→∞

Small sample inferences can be based on the large sample approximation14

ˆ H RRH Ŝ  ),
F̂  N (F, Ŝ  (15.7.13)

13. The assumptions below are “high” level assumptions which should be more than adequate to
convince the reader of the large sample result given below. More technical readers should see the
list of assumptions given in Kelejian and Piras (2016a and 2016b).
14. See Fingleton and Palombi (2015) for an alternative approach based on bootstrap methods.
340 Spatial Econometrics

Ŝ = (M̂ ∗ M̂ ∗ )−1 M ∗ H,

ˆ H RRH is the HAC estimator of H RRH .


and where 

Illustration 15.7.1: A J -test application of the dynamic model of cigarette


consumption
Kelejian and Piras (2016b) developed a J -test for dynamic panel models with
fixed effects and nonparametric error terms. In their paper there is an empirical
application based on the same dynamic model of cigarette consumption that was
used in Illustration 15.6.1.
The model under the null was specified as

ln Cit = α1 ln Cit−1 + α2 ln pit + α3 ln Iit + α4 ln p̄it + μi + δt + uit

where the variable description can be found in Illustration 15.6.1, and p̄ is the
minimum price used by Baltagi and Levin (1992).
The model under H1 was identical to the one in Illustration 15.6.1. Following
the J -test procedure described in this section Kelejian and Piras (2016b) found
that at the 5% level, the J -test rejected the null model since the chi-squared
variable = 19.063 > χ12 = 3.841. They concluded that the cross-state purchases
are better captured by the model under the alternative that includes the spatial
lag of the dependent variable!
Panel Data Models Chapter | 15 341

SUGGESTED PROBLEMS
1. Demonstrate the results given in (15.2.5), namely

Q0 Q0 = Q0 , (15.2.5)
Q1 Q1 = Q1 ,
Q0 + Q1 = IN T ,
Q0 Q1 = 0,
Q0 (eT ⊗ G) = 0.

2. Demonstrate the results given in (15.3.6) and (15.3.7), namely

ε = σμ2 (eT ⊗ IN )(eT ⊗ IN ) + σv2 IN T (15.3.6)


= σμ2 (JT ⊗ IN ) + σv2 IN T
= T σμ2 Q1 + σv2 IN T
= σv2 Q0 + σ12 Q1 ,
σ12 = σv2 + T σμ2 .

3. Let a and b be nonzero constants. Demonstrate that the inverse of

aQ0 + bQ1

is
a −1 Q0 + b−1 Q1 .
4. In reference to model (15.4.1),
(a) What would be required in order for the model in (15.4.1) to be estimated
by maximum likelihood?
(b) Suppose r < h in (15.4.1). Can the model still be estimated? Explain
why, or why not.

You might also like