0% found this document useful (0 votes)
7 views9 pages

Baltagi 2014

Uploaded by

Joseph DAVID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Baltagi 2014

Uploaded by

Joseph DAVID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Panel Data and Difference-in-Differences Estimation

BH Baltagi, Syracuse University, Syracuse, NY, USA


r 2014 Elsevier Inc. All rights reserved.

Introduction follows 191 countries over the period 1993–97. Becker,


Grossman, and Murphy (1994), who estimate a rational ad-
Panel data refer to data sets consisting of multiple obser- diction model for cigarette consumption across 50 states (and
vations on each sampling unit. This could be generated by the District of Columbia) over the period 1955–85. Baltagi
pooling time series observations across a variety of cross- and Moscone (2010) who use a panel of 20 Organization for
sectional units, including countries, hospitals, firms, or ran- Economic Co-operation and Development countries observed
domly sampled individuals, like nurses, doctors, and patients. over the period 1971–2004 to estimate the long-run economic
This encompasses longitudinal data analysis in which the relationship between health-care expenditure and income.
primary focus is on individual histories. Two well-known ex- Macropanels follow aggregates like countries, states, or regions
amples of the US panel data are the Panel Study of Income and usually involve a longer period of time than micropanels.
Dynamics, and the National Longitudinal Surveys of Labor The asymptotics for micropanels has to be for large N, as T is
Market Experience. European panels include the German fixed and usually small, whereas the asymptotics for macro-
Socioeconomic Panel, the British Household Panel Survey panels can be for large N and T. Also, with a longer time series
(BHPS), and the European Community Household Panel for macropanels one has to deal with issues of nonstationarity
(ECHP). Panel data methods in health economics have been in the time series, like unit roots, structural breaks, and
used to estimate the labor supply of physicians and nurses; cointegration (see Chapter 12 of Baltagi, 2008). Additionally,
study the relationship between health and wages and health with macropanels, one has to deal with cross-country de-
and economic growth; examine the productivity and cost ef- pendence. This is usually not an issue in micropanels where
ficiency of hospitals; and estimate the effect of pollutants on the households are randomly sampled and hence not likely
mortality. They have also been used to study the relationship correlated.
between obesity and fast food prices; determine whether beer Some of the benefits of using panel data include a much
taxes will reduce motor vehicle fatality rates; and whether larger data set. This means that there will be more variability
cigarette taxes will reduce teenage smoking, to mention a few and less collinearity among the variables than is typical of
applications. For example, Askildsen et al., 2003 estimate cross-sectional or time series data. With additional, more in-
nurse’s labor supply for Norway. The panel data used include formative data, one can get more reliable estimates and test
detailed information on 19 638 nurses observed over the more sophisticated behavioral models with less restrictive as-
period 1993–98. The policy question tackled is whether in- sumptions. Another advantage of panel data is their ability to
creasing wages would entice nurses to supply more hours of control for individual heterogeneity. Not controlling for these
work. Contoyannis and Rice, 2001 estimate the impact of unobserved individual-specific effects leads to bias in the re-
health on wage rates using the first six waves of the BHPS. sulting estimates. For example, consider the Abrevaya (2006)
Abrevaya (2006) utilizes the federal Natality Data Sets (re- application, where one is estimating the causal effect of
leased by the National Center for Health Statistics) from 1990 smoking on birth weight. One would expect that mothers who
to 1998, to estimate the causal effect of smoking on birth smoke during pregnancy are more likely to adopt other un-
outcomes. Identification of the smoking effect is achieved in healthy behavior such as drinking, poor nutritional intake, etc.
this panel from women who change their smoking behavior These variables are unobserved and hence omitted from the
from one pregnancy to another. Abrevaya constructs a mat- regression. If these omitted variables are positively correlated
ched panel data set that identifies mothers with multiple with the mother’s decision to smoke, then ordinary least
births. With the most stringent matched criterion, this data set squares (OLS) will result in an overestimation of the effect of
contains 296 218 birth observations with 141 929 distinct smoking on birth weight. Similarly, in the Contoyannis and
mothers. Baltagi and Geishecker (2006) estimate a rational Rice (2001) study, where one is estimating the effect of health
addiction model for alcohol consumption in Russia. Their status on earnings, one would expect the health status of the
panel data set includes eight rounds of the Russian Longi- individual to be correlated with unobservable attributes of
tudinal Monitoring Survey spanning the period 1994–2003. that individual, which, in turn, affect productivity and wages.
These are four examples of micropanel data applications in If this correlation is positive, one would expect an over-
health economics and as clear from these data sets, they follow estimation of the effect of health status on wages. Cross-
a large number of individuals over a short period of time. sectional studies attempt to control for this unobserved ability
In contrast, examples of macropanels in health economics by collecting hard-to-get data on twins. However, using indi-
include Ruhm (1996) who uses panel data of 48 states (ex- vidual panel data, one can, for example, difference the data
cluding Alaska, Hawaii, and the District of Columbia) over the over time and wipe out the unobserved individual invariant
period 1982–88 to study the impact of beer taxes and a variety ability.
of alcohol-control policies on motor vehicle fatality rates. Another advantage of panels over cross-sectional data is
Greene (2010) who uses the World Health Organization’s that individuals ‘anchor’ their scale at different levels, ren-
panel data set to distinguish between cross-country hetero- dering interpersonal comparisons of responses meaningless.
geneity and inefficiency in health-care delivery. This panel When you ask people about their health status on a scale of

Encyclopedia of Health Economics, Volume 2 doi:10.1016/B978-0-12-375678-7.00720-3 425


426 Panel Data and Difference-in-Differences Estimation

1–10, Sam’s 5 may be different from Monica’s 5, but in a cross- health has on the whole been excellent/good/fair/poor/very
sectional regression you assume they are the same. Panel data poor?’’ Contoyannis and Rice constructed three dummy vari-
help if the metric used by individuals is time-invariant. Fixed ables: (sahex¼ 1, if an individual has excellent health),
effects (FE) makes inference based on intra- rather than (sahgd ¼ 1, if an individual has good health), and (sahfp¼ 1, if
interpersonal comparisons of satisfaction. This avoids not an individual has fair health or worse). They also included a
only the potential bias caused by anchoring but also bias General Health Questionnaire: Likert Scale score which was
caused by other unobserved individual-specific factors. originally developed as a screening instrument for psychiatric
Limitations of panel data sets include problems in the illness but is often used as an indicator of subjective well-
design, data collection, and data management of panel sur- being. Contoyannis and Rice constructed a composite meas-
veys. These include the problems of coverage (incomplete ure derived from the results of this questionnaire which is
account of the population of interest), nonresponse (due to increasing in ill health (hlghq1).
lack of cooperation of the respondent or because of inter-
viewer error), recall (respondent not remembering correctly),
Fixed Effects
frequency of interviewing, interview spacing, reference period,
the use of bounding to prevent the shifting of events from Note that mi is time invariant and it accounts for any indi-
outside the recall period into the recall period, and time-in- vidual-specific effect that is not included in the regression. If
0
sample bias. Another limitation of panel data sets is the dis- the mi s are assumed as fixed parameters to be estimated and
tortion due to measurement errors. Measurement errors may 0
the X its are assumed independent of the vit for all i and t, the
arise because of faulty response due to unclear questions, FE model is obtained. Estimation in this case amounts to in-
memory errors, deliberate distortion of responses (e.g., pres- cluding (N  1) individual dummies to estimate these indi-
tige bias), inappropriate informants, misrecording of re- vidual invariant effects. This leads to an enormous loss in
sponses, and interviewer effects. Although these problems can degrees of freedom and attenuates the problem of multi-
occur in cross-sectional studies, they are aggravated in panel collinearity among the regressors. Furthermore, this may not
data studies. Panel data sets may also exhibit bias due to be computationally feasible for large micropanels. By the
sample selection problems. For the initial wave of the panel, Frisch–Waugh–Lovell Theorem (Baltagi, 2008) one can get
respondents may refuse to participate or the interviewer may this FE estimator by running least squares of ~y it ¼ yit  yi: on
not find anybody at home. This may cause some bias in the the X~ it ’s similarly defined, where the dot indicates summation
inference drawn from this sample. Although this nonresponse over that index and the bar denotes averaging. This transfor-
can also occur in cross-sectional data sets, it is more serious mation eliminates the m’is and is known as the within trans-
with panels because subsequent waves of the panel are still formation and the corresponding estimator of b is called the
subject to nonresponse. Respondents may die, move, or find within estimator or the FE estimator. Note that the FE esti-
that the cost of responding is high. mator cannot estimate the effect of any time-invariant vari-
able, such as race or education. These variables are wiped out
by the within transformation. This is a major disadvantage if
The Model the effect of these variables on earnings is of interest. Note
that, if T is fixed and N-N as typical in short labor panels,
Most panel data applications use a simple regression with then only the FE estimator of b is consistent; the FE estimators
error component disturbances: of the individual effects (a þ mi) are not consistent because
yit ¼ a þ Xit0 b þ mi þ nit i ¼ 1;y,N; t ¼ 1;y,T ½1 the number of these parameters increases as N increases. This
is known as the incidental parameter problem. Note that
with i denoting individuals, hospitals, countries, etc. and t when the true model is FE, OLS suffers from omission vari-
denoting time. The i subscript, therefore, denotes the cross- ables bias and inference using OLS is misleading. For the
sectional dimension, whereas t denotes the time series di- sample of 859 males in the Contoyannis and Rice (2001)
mension. The panel data are balanced in that none of the study, the OLS estimate for excellent health is 0.065 and sig-
observations are missing whether randomly or nonrandomly nificant, whereas the OLS estimate for good health is 0.019
due to attrition or sample selection. a is a scalar, b is K  1, and and insignificant (both are contrasted against a baseline of
Xit is the it-th observation on K explanatory variables. mi de- fair, poor, and very poor health). The FE estimates are 0.013
notes the unobservable individual-specific effect and vit de- for excellent health and 0.010 for good health, and both are
notes the remainder disturbances, which are assumed  to be insignificant. The OLS estimate for the General Health Ques-
independent and identically distributed IID 0;s2n . For ex- tionnaire: Likert Scale score (hlghq1) is  0.002 and in-
ample, in the Contoyannis and Rice (2001) study of the im- significant, whereas the FE estimate is  0.003 and significant.
pact of health on wage rates using the first six waves of the More dramatically, for the Ruhm (1996) study, OLS gets a
BHPS, yit is log of average hourly wage, whereas Xit contains a positive (0.012) and significant effect of real beer taxes on
set of variables like age, age2, experience, experience2, union motor vehicle fatality rates, whereas FE obtains a negative
membership, marital status, number of children, race, edu- (  0.324) and significant effect of real beer taxes on motor
cation, occupation, region indicator, etc. The variable of vehicle fatality rates.
interest is a self-assessed health variable, which is obtained Janke et al. (2009) examine the relationship between
from the response to the following question: ‘‘Please think population mortality and common sources of airborne pol-
back over the last 12 months about how your health has been. lution in England. The data covers 312 local authorities over
Compared to people of your own age, would you say that your the period 1998–2005. They find that higher levels of PM10
Panel Data and Difference-in-Differences Estimation 427

(particulate matter less than 10 mm in diameter) and ozone subject is in the treatment group, and 0 otherwise; dt is a
(O3) have a positive and significant effect on mortality rates. dummy variable which takes the value 1 for the posttreatment
The OLS estimate for (PM10/10), controlling for three other period, and 0 otherwise. In this case, dt  dg takes the value 1
measures of pollutants (carbon monoxide, nitrogen dioxide, only for observations in the treatment group and in the post-
and O3), smoking rate, employment rate, etc. is 2.33, whereas treatment period. The OLS estimate of the coefficient of dt  dg
that for FE is 2.74. The OLS estimate for (O3/10) in the same yields the DID estimator. Another advantage of running this
regression is  0.55, whereas that for FE is 0.80. Only the FE regression is that one can robustify the standard errors with
estimates for these pollutants are significant at the 5% level. standard software.
One could test the joint significance of the individual ef- In economics, one cannot conduct medical experiments.
fects, i.e., H0;m1 ¼ m2 ¼ y¼ mN1 ¼ 0, by performing an F-test. Card (1990) used a natural experiment to see whether im-
This is a simple Chow test with the restricted residual sums migration reduces wages. Taking advantage of the ‘Mariel
of squares being that of OLS on the pooled model and the boatlift’ where a large number of Cuban immigrants entered
unrestricted residual sums of squares (URSS) being that Miami, Card (1990) compared the change in wages of low-
which includes the (N  1) individual dummies. By the skilled workers in Miami with the change in wages of similar
Frisch–Waugh–Lovell theorem (Baltagi, 2008), URSS can be workers in other comparable US cities over the same period.
obtained from the within regression residual sum of squares. Card concluded that the influx of Cuban immigrants had a
In this case negligible effect on wages of less-skilled workers. Gruber and
Poterba (1994) use the DID estimator to show that a change
ðRRSS  URSSÞ=ðN  1Þ in the tax law did increase the purchase of health insurance
F0 ¼ H0 FN1;NðT1ÞK ½2
URSS=ðNT  N  KÞ B among the self-employed. They compared the fraction of the
self-employed who had health insurance before the tax change
For the Contoyannis and Rice (2001) application, This F- 1985–86 with the period after the tax change 1988–89. The
statistic is 12.50 and is distributed under the null hypothesis control group was the fraction of employed (not self-
as F(858, 3406). This is significant and rejects H0. One can employed) workers with health insurance in those years.
infer that the OLS estimates are biased and inconsistent and Donald and Lang (2007) warn that the standard asymp-
yield misleading inference. totics for the DID estimator cannot be applied when the
number of groups is small, as in the case where one compares
two states in 2 years or self-employed workers and employees
Difference-in-Differences
over a small number of years. They reconsider the Gruber
Note that the FE transformation ð~y it ¼ yit  yi Þ is not the only and Poterba (1994) paper on health insurance and self-
transformation that will wipe out the individual effects. In fact, employment and Card’s (1990) study of the Mariel boatlift.
FD will also do the trick (Dyit ¼ yit  yi,t1). This is a crucial tool They show that analyzing the t-statistic, taking into account a
used in the difference-in-differences (DID) estimator. Before the possible group error component, dramatically reduces the
approval of any drug, it is necessary to assign patients randomly precision of their results. In fact for Card’s (1990) Mariel
to receive the drug or a placebo and the drug is approved or boatlift study, their findings suggest that the data cannot
disapproved depending on the difference in the health outcome exclude large effects of the migration on blacks in Miami.
between these two groups. In this case, the FDA is concerned Bertrand et al. (2004) argued that several DID studies in
with the drug’s safety and its effectiveness. However, one runs economics rely on a long time series. They warn that in this
into problems in setting this experiment. How can one hold case, serial correlation will understate the standard error of the
other factors constant? Even twins which have been used in estimated treatment effects, leading to overestimation of
economic studies are not identical and may have different life t-statistics and significance levels. They show that the block
experiences. With panel data, observations on the same subjects bootstrap (taking into account the autocorrelation of the data)
before and after a health policy change allow us to estimate the works well when the number of states is large enough. Readers
effectiveness of this policy on the treated and control groups are advised to refer to Hansen (2007) for inference in panel
without the contamination of individual effects. In simple re- models with serial correlation and FE and to Stock and
gression form, assuming the assignment to the control and Watson (2008) for a heteroskedasticity-robust variance matrix
treatment groups is random, one regresses the change in the estimator for the FE estimator. Hausman and Kuersteiner
health outcome before and after the health policy is enacted on (2008) warn that both the DID and the FE estimators are not
a dummy variable which takes the value 1 if the individual is in efficient if the stochastic disturbances are serially correlated.
the affected (treatment) group and 0 if the individual is in The optimal estimator in this case is generalized least squares
the unaffected (control) group. This regression computes the (GLS), but this is rarely used in applications of DID studies.
average change in the health outcome for the treatment group Hausman and Kuersteiner (2008) use higher order Edgeworth
before and after the policy change and subtracts that from the expansion to construct a size-corrected t-statistic (based on
average change in the health outcome for the control group. feasible GLS) for the significance of treatment variables in DID
One can include additional regressors which measure the indi- regressions. They find that size-corrected t-statistic based on
vidual characteristics before the policy change. Examples are feasible GLS yields accurate size and is significantly more
gender, race, education, and age of the individual. This is known powerful than robust OLS when serial correlation in the level
as the DID estimator in econometrics. Alternatively, one can data is high.
regress the health outcome y on dg dt and their interaction Conley and Taber (2011) consider the case where there are
dt  dg. dg is a dummy variable that takes the value 1 if the only a small number N1 of treatment groups, say states, that
428 Panel Data and Difference-in-Differences Estimation

change a law or policy within a fixed time span T. Let N0 model is identified nonparametrically and extend the model
denote the number of control groups (states) that do not to allow for discrete outcomes. They also provide extensions to
change their policy. Conley and Taber argue that the standard settings with multiple groups and multiple time periods. They
large-sample approximations used for inference can be mis- revisit the Meyer et al. (1995) study on the effects of disability
leading especially in the case of non-Gaussian or serially cor- insurance on injury durations. They show that the CIC ap-
related errors. They suggest an alternative approach to proach leads to results that differ from the standard DID re-
inference under the assumption that N1 is finite, using sults in terms of magnitude and significance. They attribute
asymptotic approximations that let N0 grow large, with T this to the restrictive assumptions required for the standard
fixed. Point estimators of the treatment effect parameter(s) are DID methods.
not consistent as N1 and T are fixed. However, they use in- Laporte and Windmeijer (2005) show that the FE and FD
formation from the N0 control groups to consistently estimate estimators lead to very different estimates of treatment effects
the distribution of these point estimators up to the true values when these are not constant over time, and treatment is a state
of the parameter. that only changes occasionally. They suggest allowing for
DID estimation has its benefits and limitations. It is simple flexible time-varying treatment effects when estimating panel
to compute and it controls for heterogeneity of the individuals data models with binary indicator variables. They illustrate
or the groups considered before and after the policy change. this by looking at the effect of divorce on mental well-being
However, it does not account for the possible endogeneity of using the BHPS. They show that divorce has an adverse effect
the interventions themselves (Besley and Case, 2000). Abadie on mental well-being that starts before the actual divorce,
(2005) discusses how well the comparison groups used in peaks in the year of the divorce, and diminishes rapidly
nonexperimental studies approximate appropriate control thereafter. A model that implies a constant instantaneous ef-
groups. Athey and Imbens (2006) critique the linearity as- fect of divorce leads to very different FD and FE estimates,
sumptions used in DID estimation and provide a general whereas a model that allows for flexibility in these effects lead
changes-in-changes (CIC) estimator that does not require such to similar results. In general, the FE estimator is more efficient
assumptions. than the FD estimator when the remainder disturbance
The DID estimator requires that, in the absence of the nitBIIDð0;s2n Þ: The FD estimator is more efficient than the FE
treatment, the average outcomes for the treated and control estimator when the remainder disturbance nit is a random
groups would have followed parallel paths over time. This walk (Wooldridge, 2002). These estimators are affected dif-
assumption may be too restrictive. Abadie (2005) considers ferently by measurement error and by nonstationarity (Baltagi,
the case in which differences in observed characteristics create 2008).
nonparallel outcome dynamics between treated and controls. Certainly, this analysis can be refined to account for per-
He proposes a family of semiparametric DID estimators which haps better control and treatment groups. If a policy is enacted
can be used to estimate the average effect of the treatment for by state s to reduce teenage smoking or motor vehicle fatality
the treated. Abadie et al. (2010) advocate the use of data-driven due to alcohol consumption or healthcare service for the
procedures to construct suitable comparison groups. Data- elderly, then, for the two periods case, dt takes the value 1 for
driven procedures reduce discretion in the choice of the the postpolicy period, and 0 otherwise; ds takes the value 1 if
comparison control units, forcing researchers to demonstrate the state has implemented this policy, and 0 otherwise; and dg
the affinities between the affected and unaffected units using takes the value 1 for the treatment group affected by this
observed quantifiable characteristics. The idea behind the policy like the elderly, and 0 otherwise. In this case, one re-
synthetic control approach is that a combination of units gresses health-care outcome on dt,ds,dg, dt  dg,dt  ds,ds  dg
often provides a better comparison for the unit exposed to the and dt  ds  dg. The OLS estimate of the coefficient of
intervention than any single unit alone. They apply the syn- dt  ds  dg yields the difference-in-difference-in-differences
thetic control method to study the effects of California’s estimator of this policy. This estimator computes the average
Proposition 99, a large-scale tobacco control program imple- change in the health outcome for the elderly in the treatment
mented in California in 1988. They demonstrate that fol- state before and after the policy is implemented, and then
lowing the passage of Proposition 99, tobacco consumption subtracts from that the average change in the health outcome
fell markedly in California relative to a comparable synthetic for the elderly in the control state, as well as the average
control region. They estimated that by the year 2000, annual change in the health outcome for the nonelderly in the
per capita cigarette sales in California were approximately 26 treatment state.
packs lower than what they would have been in the absence of Carpenter (2004) studied the effect of zero-tolerance (ZT)
Proposition 99. driving laws on alcohol-related behaviors of 18–20-year olds,
Athey and Imbens (2006) generalize the DID methodology controlling for macroeconomic conditions, other alcohol
to what they call the CIC methodology. Their approach allow policies, state FE, survey year and month effects, and linear
the effects of both time and the treatment to differ system- state-specific time trends. ZT Laws make it illegal for drivers
atically across individuals, as when new medical technology under age of 21 years to have measurable amounts of alcohol
differentially benefits sicker patients. They propose an esti- in their blood, resulting in immediate license suspension and
mator for the entire counterfactual distribution of effects of fines. Carpenter uses the Behavioral Risk Factor Surveillance
the treatment on the treatment group as well as the distri- System, which includes information on alcohol consumption
bution of effects of the treatment on the control group, where and drunk driving behavior for young adults over the age of
the two distributions may differ from each other in arbitrary 18 years for the years 1984–2001. He estimates the effects of
ways. They provide conditions under which the proposed ZT Laws using the DID approach. The control group is
Panel Data and Difference-in-Differences Estimation 429

composed of individuals aged 22–24 years who are otherwise estimators ^q ¼ b ^ b ^ tends to zero in probability limits
FE RE
similar to treated individuals (18–20-year olds) but who under the null hypothesis and is nonzero under the
should have been unaffected by the ZT policies. Let dZT be a alternative. The variance of this difference is equal to the dif-
dummy variable that takes the value 1 if the state has ZT in ference in variances, varð^qÞ ¼ varðb~ FE Þ  varðb ^ Þ, because
RE
that year, and 0 otherwise; and dg is a dummy variable that ^ Þ ¼ 0 under the null hypothesis. Hausman’s test
covð^q, bRE
0
takes the value 1 if the subject is in the treatment group, and 0 statistic is based on m ¼ ^q ½varð^qÞ1 ^q and is asymptoti-
2
otherwise. Alcohol consumption is regressed on dZT, d1820, cally distributed as wK under the null hypothesis. For the
dZT  d1820, and other control variables mentioned above. The Contoyannis and Rice (2001) application, Hausman’s test
OLS estimate of the coefficient of dZT  d1820 yields the DID statistic is 322.39 and is distributed as w229 : But the varð^qÞ is not
estimator of the ZT laws. Carpenter’s results indicate that the positive definite. Using an alternative computation of this
laws reduced heavy episodic drinking (five or more drinks at Hausman (1978) test based on an artificial regression, the null
one sitting) among underage males by 13%. For a recent re- hypothesis is rejected and one can infer that the RE estimator
view of DID health economics applications as well as a is inconsistent and should not be used for inference.
summary table of these applications, see Jones (2012). Powell (2009) uses four waves of the 1997 National Lon-
gitudinal Survey of Youth and external data to examine the
relationship between adolescent body mass index (BMI), fast
Random Effects food prices, and fast food restaurant availability. The OLS es-
timate of the fast food price elasticity of BMI is  0.095,
There are too many parameters in the FE model and the loss of
whereas the RE estimate is  0.084. The latter is closer to the
degrees of freedom can be avoided if mi can be assumed ran-
FE estimate of  0.078, but the RE estimator is rejected by the
dom. In this case, miBIIDð0;s2m Þ,nit B IIDð0;s2n Þ and the mi is
Hausman test. The number of fast food restaurants per capita
independent of the nit. In addition, the Xit is independent of
was not found to be significant.
the mi and nit, for all i and t. This random-effects (RE) model
can be estimated by GLS, which can be obtained using a least
squares regression of yit ¼ yit  yyi: on Xit similarly defined. Hausman and Taylor Estimator
y ¼ 1  (sn/s1) where s21 ¼ Ts2m þ s2n : The best quadratic un-
The RE model is rejected because it assumes no correlation
biased estimators of the variance components depend on the
between the explanatory variables and the individual effects.
true disturbances, and these are minimum variance unbiased
The FE estimator, however, assumes that all the explanatory
under normality of the disturbances. One can obtain feasible
variables are correlated with the individual effects. Instead of
estimates of these variance components by replacing the true
this ‘all or nothing’ correlation among the X and the mi,
disturbances by OLS or FE residuals (see Chapter 2 of Baltagi
Hausman and Taylor (1981) consider a model where some of
(2008) for details).
the explanatory variables are related to the mi. In particular,
Under the assumption of normality of the disturbances,
they consider the following model:
Breusch and Pagan (1980) derived a Lagrange multiplier (LM)
test to test H0 ; s2m ¼ 0: The resulting LM statistic requires only yit ¼ Xit0 b þ Zi0 g þ mi þ nit ½3
OLS residuals and is easy to compute. Under H0, this LM
statistic is asymptotically distributed as a w21 (see Chapter 4 of where the Zi is cross-sectional time-invariant variable. Hausman
Baltagi (2008) for details.) For the Contoyannis and Rice and Taylor (1981), hereafter HT, split X and Z into two sets of
(2001) application, this LM statistic is 3355.26 and is signifi- variables: X¼ [X1; X2] and Z¼ [Z1; Z2] where X1 is n  k1, X2 is
cant. This means that heterogeneity across individuals is sig- n  k2, Z1 is n  g1, Z2 is n  g2, and n¼ NT. X1 and Z1 are
nificant and ignoring it as OLS does will lead to misleading assumed exogenous in that they are not correlated with mi and
inference. The RE estimates are 0.028 for excellent health, nit, whereas X2 and Z2 are endogenous because they are correl-
0.013 for good health, and  0.002 for the General Health ated with mi, but not with nit. The Within transformation sweeps
Questionnaire: Likert Scale score (hlghq1), with only the good the mi and removes the bias, but in the process it would also
health estimate being statistically insignificant. sweep the Z0 is and hence the Within estimator will not give an
estimate of g. To get around that, HT suggest obtaining the FE
residuals and averaging them over time:
Hausman Test 0
^
di ¼ yi  Xi b~ FE ½4
A specification test based on the difference between the FE and
RE estimators is known as the Hausman test. The null hy- Then, one can run 2SLS of d ^i on Zi with the set of instru-
pothesis is that the individual effects are not correlated with ments A¼ [X1, Z1] to get a consistent estimate of g which is
the X0 its. The basic idea behind this test is that the FE estimator called ^g2SLS . For this to be feasible, the order condition for
b~ FE is consistent, whether or not the effects are correlated with identification has to hold (k1Z g2). This means that there has
the X0 its. This is true because the within transformation ~y it to be as many time-varying (X1) exogenous variables as there
wipes out the mi’s from the model. However, if the null hy- are time-invariant endogenous variables (Z2). The intuition
pothesis is true, the FE estimator is not efficient under the RE here is that every Xit can be written as the sum of X ~ it ¼ ðXit 
specification because it relies only on the within variation in Xi Þ and X i : It is the latter term that contains mi as it is swept
the data. However, the RE estimator b ^ is efficient under the away from the former. If X2 is correlated with mi, it must be in
RE
null hypothesis but is biased and inconsistent when the effects X 2 , which makes X ~ 2 the ideal instrument. HT use X1 twice
are correlated with the X0 its. The difference between these because it is exogenous, once as X ~ 1 and another time as X 1 : Z1
430 Panel Data and Difference-in-Differences Estimation

is exogenous and Z2 can be instrumented by the additional and among themselves. This dynamic panel data regression
instruments gained from X1. With consistent estimates of the model is characterized by two sources of persistence over time.
disturbances obtained from b~ FE and ^g2SLS , one can obtain Autocorrelation due to the presence of a lagged dependent
consistent estimates of the variance components and hence y. variable among the regressors and individual effects charac-
This, in turn, allows us to compute yit ¼ yit  yyi and Xit and terizing the heterogeneity among the individuals. As yit is a
Z ¼ (1  y)Z. HT suggest an efficient estimator that can be function of mi, it immediately follows that yi,t1 is also a
obtained by running 2SLS of yit on Xit and Z using function of mi, Therefore, yi,t1 is correlated with the error
~ 1 ,Z1  as instruments.
AHT ¼ ½X,X term. This renders the OLS estimator biased and inconsistent
even if the nit are not serially correlated. For the FE estimator,
1. If k1og2, then the equation is underidentified. In this case,
^ ¼ b~ and g cannot be estimated. the Within transformation wipes out the mi, but ðyi,t1  yi1 Þ
b P
HT FE
where yi1 ¼ Tt ¼ 2 yi,t1 =ðT  1Þ will still be correlated with
2. If k1 ¼ g2, then the equation is just-identified. In this case,
^ ¼ b~ and ^g ¼ ^g ðnit  ni: Þ even if the nit are not serially correlated. This is be-
b HT FE HT 2SLS .
cause yi,t1 is correlated with ni: by construction. The latter
3. If k14g2, then the equation is over-identified and the HT
average contains ni,t1 which is obviously correlated with yi,t1.
estimator is more efficient than the FE estimator.
In fact, the Within estimator will be biased of O(1/T) and its
A test for over-identification is obtained by computing consistency will depend on T being large (Nickell, 1981).
Therefore, for the typical micropanel where N is large and T is
^ 2 ¼ ^q02 ½varðb~ FE Þ  varðb
m ^ Þ ^q
HT 2 ½5 fixed, the Within estimator is biased and inconsistent. It is
H0
with ^q2 ¼ b~ FE  b
^ and ^2n m-
s ^ w2k1 g2 : worth emphasizing that only if T-N will the Within esti-
HT
Contoyannis and Rice (2001) applied the HT estimator, mator of d and b be consistent for the dynamic error com-
choosing race to be exogenous (the only time-invariant Z1) ponent model. For macropanels, some researchers may still
and education to be endogenous (the only time-invariant Z2). favor the Within estimator arguing that its bias may not be
They also chose the health variables that are time varying to be large. Judson and Owen (1999) performed some Monte Carlo
endogenous (sahex, sahgd, and hlghq1) as well as (prof, experiments for N¼ 20 or 100 and T ¼ 5, 10, 20, and 30 and
manag, skllnm, and skllm). The HT estimates are 0.013 for found that the bias in the Within estimator can be sizable,
excellent health, 0.010 for good health, and  0.003 for the even when T ¼ 30. This bias increases with d and decreases
General Health Questionnaire: Likert Scale score (hlghq1), with T. But even for T¼ 30, this bias could be as much as 20%
with only the latter estimate being statistically significant. of the true value of the coefficient of interest.
Arellano and Bond (1991) suggested FD model to get rid of
the mi and then using a Generalized Method of Moments
Dynamic Panel Data Models (GMM) procedure that utilizes the orthogonality conditions
that exist between lagged values of yit and the disturbances nit.
Many economic relationships are dynamic in nature and one It is illustrated with the simple autoregressive model with no
of the advantages of panel data is that they allow the re- regressors. With a three-wave panel, i.e., T ¼ 3, the differenced
searcher to better understand the dynamics of adjustment. For equation becomes:
example, a key feature of the rational addiction theory studied
by Becker, Grossman, and Murphy (1994) is that con- yi3  yi2 ¼ dðyi2  yi1 Þ þ ðvi3  vi2 Þ
sumption of cigarettes is addictive and will depend on future
In this case, yi1 is a valid instrument because it is highly
as well as past consumption. Consumers are rational if they
correlated with (yi2  yi1) and not correlated with (ni3  ni2) as
are forward-looking in the sense that they anticipate the ex-
long as the nit are not serially correlated. But note what hap-
pected future consequences of their current actions. They
pens if the fourth wave is obtained:
recognize the addictive nature of their choices but they may
elect to make them because the gains from the activity exceed yi4  yi3 ¼ dðyi3  yi2 Þ þ ðni4  ni3 Þ
the costs through future addiction. The more they smoke the
higher is the current utility derived. However, the individual In this case, yi2 as well as yi1 are valid instruments for
recognizes that he or she is building up a stock of this ad- (yi3  yi2) because both yi2 and yi1 are not correlated with
dictive good that is harmful. The individual rationally trades (ni4  ni3). One can continue in this fashion, adding an extra
off these factors to determine the appropriate level of smok- valid instrument with each forward period, so that for period
ing. Finding future consumption statistically significant is a T, the set of valid instruments becomes (yi1, yi2, y, yi,T2). The
rejection of the myopic model of consumption behavior. In optimal Arellano and Bond (1991) GMM estimator of d util-
the latter model of addictive behavior, only past consumption izes all these moment conditions weighting them by a sand-
stimulates current consumption, because individuals ignore wich heteroskedasticity auto-correlation estimator of the
the future in making their consumption decisions. variance–covariance matrix of the disturbances. Arellano and
More formally, dynamic relationships are characterized by Bond (1991) propose testing for serial correlation for the
the presence of a lagged dependent variable among the disturbances of the first-differenced equation. This test is im-
regressors, i.e., portant because the consistency of the GMM estimator relies
yit ¼ dyi,t1 þ x0it b þ mi þ nit i ¼ 1;y,N t ¼ 1;y,T ½6 on the assumption of no serial correlation in the n0 its. Add-
itionally, Arellano and Bond (1991) suggest a Sargan test for
where d is a scalar, x0 it is 1  K and b is K  1, where over-identifying. One has to reject the existence of serial cor-
miB IIDð0;s2m Þ and nitB IIDð0;s2n Þ independent of each other relation in the n0 its and not reject the over-identifying
Panel Data and Difference-in-Differences Estimation 431

restrictions. Failing these diagnostics renders this procedure endogeneity of the lagged dependent variable. The Arellano
inconsistent. and Bond (1991) GMM estimator yields a lagged con-
Using Monte Carlo experiments, Bowsher (2002) finds that sumption coefficient estimate of 0.70 and an own price elas-
the use of too many moment conditions causes the Sargan test ticity of  0.40, both highly significant (Baltagi, 2008). The
for overidentifying restrictions to be undersized and have ex- two-step Sargan test for over-identification does not reject the
tremely low power. The Sargan test never rejects when T is too null, but this could be due to the bad power of this test for
large for a given N. Zero rejection rates under the null and N ¼ 46 and T ¼ 28. The test for first-order serial correlation
alternative were observed for the following (N,T) pairs rejects the null of no first-order serial correlation, but it does
(125,16), (85,13), and (40,10). This is attributed to poor es- not reject the null that there is no second-order serial correl-
timates of the weighting matrix in GMM. Using Monte Carlo ation. This is what one expects in a first-differenced equation
experiments, Ziliak (1997) found that there was a bias/effi- with the original untransformed disturbances assumed to be
ciency trade-off for the Arellano and Bond (1991) GMM esti- not serially correlated. Blundell and Bond (1998) system
mator as the number of moment conditions increase and that GMM estimator yields a lagged consumption coefficient esti-
one is better off with suboptimal instruments. Ziliak attributes mate of 0.70 and an own price elasticity of  0.42, both
the bias in GMM to the correlation between the sample mo- highly significant, but with higher standard errors than the
ments used in estimation and the estimated weight matrix. corresponding Arellano and Bond estimators. Sargan’s test for
Blundell and Bond (1998) attributed the bias and the poor over-identification does not reject the null, and the tests for
precision of the first difference GMM estimator to the problem first- and second-order serial correlation yield the expected
of weak instruments. They show that an additional mild sta- diagnostics for system GMM.
tionarity restriction on the initial conditions process allows Scott and Coote (2010) applied the dynamic panel data
the use of a system GMM estimator which captures additional system GMM estimator to estimate the effect of regional
nonlinear moment conditions that are ignored by the Arellano primary-care organizations on primary-care performance.
and Bond (1991) estimator. These additional nonlinear mo- They utilize a panel of 119 Divisions of General Practice in
ment conditions are described in Ahn and Schmidt (1995) Australia observed quarterly over the period 2000–05. Using
and can be linearized by adding a set of equations in levels on four different measures of primary-care performance, a high
top of the set of equations in first differences of Arellano level of persistence was found. The results show that Div-
and Bond, hence a system of equations (see Baltagi, 2008, isions were more likely to influence general practice infra-
Chapter 8, for details). In this case, one uses lagged differences structure than clinical performance in diabetes, asthma, and
of yit as instruments for equations in levels, in addition to cervical screening. Other applications of dynamic panel data
lagged levels of yit as instruments for equations in first differ- GMM estimation methods include Baltagi and Griffin (2001)
ences. The system GMM estimator is shown to have dramatic to a rational addiction model of cigarettes and Suhrcke and
efficiency gains over the basic first-difference Arellano and Urban (2010) to the impact of cardiovascular disease mor-
Bond GMM estimator as d-1, i.e., as the process tends to unit ality on economic growth.
root and nonstationarity. Ng et al. (2012) study the relative importance of diet,
Baltagi et al. (2000) estimate a dynamic demand model for physical activity, and health behavior of smoking and drinking
cigarettes based on panel data from 46 American states over on weight for a set of Chinese males, using panel data from
the period 1963–92. The estimated equation is: the China Health and Nutrition Survey. The authors use a
dynamic panel system GMM approach that explicitly includes
lnCit ¼ a þ b1 lnCi,t1 þ b2 lnPi,t þ b3 lnYit þ b4 lnPnit þ mi time and spatially varying community-level urban city and
þlt þ nit ½7 price measures as instruments, to obtain estimates for the ef-
fects of diet, physical activity, drinking, and smoking on
where the subscript i denotes the i-th state (i ¼ 1, y, 46), and weight. Results show that approximately 5.4% of weight gain
the subscript t denotes the t-th year (t ¼ 1,y,30). Cit is real is due to declines in physical activity and 2.8–3.1% is due to
per capita sales of cigarettes by persons of smoking age (14 dietary changes over time.
years and older). This is measured in packs of cigarettes per
head. Pit is the average retail price of a pack of cigarettes
measured in real terms. Yit is real per capita disposable in- Limited Dependent Variable Panel Data Models
come. Pnit denotes the minimum real price of cigarettes in any
neighboring state. This last variable is a proxy for the casual In some health studies, the dependent variable is binary. For
smuggling effect across state borders. mi denotes the state- example, individual i may be in good health at time t, i.e.,
specific effects, and lt denotes the year-specific effects. OLS, yit ¼ 1 with probability pit; or in bad health, yit ¼ 0 with
which ignores the state and time effects, yields a low short-run probability 1  pit. Good health occurs when a latent un-
price elasticity of  0.09. However, the coefficient of lagged observed index of health yit is positive
consumption is 0.97 which implies a high long-run price
elasticity of  2.98. The FE estimator with both state and time yit ¼ 1 if yit 40
½8
effects yields a higher short-run price elasticity of  0.30, but a ¼ 0 if y r 0 it
lower long-run price elasticity of  1.79. Both state and time
dummies were jointly significant with an observed F-statistic with yit ¼ x0it b þ mi þ nit : So that
of 7.39 and a p-value of .0001. This is a dynamic equation and
the OLS and FE estimators do not take into account the Pr½yit ¼ 1 ¼ Pr½yit 40 ¼ Pr½nit 4  x0it b  mi  ¼ Fðx0it b þ mi Þ ½9
432 Panel Data and Difference-in-Differences Estimation

where the last equality holds as long as the density function individuals within the Member States of the European Union.
describing F is symmetric around zero. This is true for the The RE probit specification conditions on previous health
logistic and normal density functions which are the most used status and parameterizes the unobserved individual effect as a
in practice. This is a nonlinear panel data model because F is a function of initial period observations on time-varying
cumulative density function, and one cannot get rid of the regressors and health (Wooldridge, 2005). Results reveal high
individual effects as in the linear panel data case with a within state dependence of health limitations, which remains after
transformation. Hsiao, 2003 showed that unlike the linear FE controlling for measures of socioeconomic status. There is also
panel data case, where the inconsistency of the m0 is did not heterogeneity in the socioeconomic gradient across countries.
transmit into inconsistency for the b0 s. For the nonlinear panel The importance of regarding health as a dynamic concept has
data case, the inconsistency of the m0 is renders the maximum implications for policy development. They imply that medical
likelihood estimates of the b0 s inconsistent. The usual solution interventions or health improvement policies that create
around this incidental parameters problem is to assume a health gains, will have multiplier effects in the long run.
PT
logistic function and to condition on t ¼ 1 yit , which is a
minimum sufficient statistic for mi maximizing the conditional
logistic likelihood function Further Readings
!
YN X
T
The panel data econometrics literature has exhibited phe-
Lc ¼ Pr yi1 , y,yiT = yit ½10
i¼1 t¼1
nomenal growth and one cannot do justice to the many the-
oretical contributions to date. Space limitations prevented the
yields the conditional logit estimates for b. By definition of a inclusion of many worthy topics including attrition, sample
sufficient statistic, the distribution of the data given this suf- selection, semiparametric, nonparametric, and Bayesian
ficient statistic will not depend on mi. In contrast to the FE logit methods using panel data. Unbalanced panels, problems as-
model, the conditional likelihood approach does not yield sociated with heteroskedasticity, serial as well as spatial cor-
computational simplifications for the FE probit model. But the relation in panels, measurement error, duration, and quantile
probit specification has been popular for the RE model. In this panel data models to mention a few. More extensive treatment
case, uit ¼ mi þ nit where mIBIINð0;s2m Þ and nitBIINð0;s2n Þ in- of these and other topics are given in textbooks on the subject
dependent of each other and the xit. Because Eðuit uis Þ ¼ s2m for by Baltagi (2008), Wooldridge (2002), and Hsiao (2003). Also
ta s, the joint likelihood of (y1t, y, yNt) can no longer be see the survey by Imbens and Wooldridge (2009) for an ex-
written as the product of the marginal likelihoods of the yit. tensive discussion of alternative econometric methods to
This complicates the derivation of maximum likelihood which program evaluation besides the DID method. Also see Angrist
will now involve T-dimensional integrals. Fortunately, Butler and Pischke (2009) for a textbook discussion of DID. For
and Moffitt (1982) showed that for the probit case, the max- recent applications of panel data methods to health eco-
imum likelihood computations involve only one integral nomics, see the special issue of Empirical Economics edited by
which can be evaluated using the Gaussian–Hermite quadra- Baltagi et al. (2012) and Jones (2012).
ture procedure. For an early application of the RE probit
model, Sickles and Taubman (1986), who estimated a two-
equation structural model of the health and retirement de-
References
cisions of the elderly using five biennial panels of males drawn
from the Retirement History Survey. Both the health and re-
Abadie, A. (2005). Semiparametric difference-in-differences estimators. Review of
tirement variables were limited dependent variables and MLE Economic Studies 72, 1–19.
using the Butler and Moffitt (1982) Gaussian quadrature Abadie, A., Diamond, A. and Hainmueller, J. (2010). Synthetic control methods for
procedure was implemented. Sickles and Taubman found that comparative case studies: Estimating the effect of California’s tobacco control
retirement decisions were strongly affected by health status program. Journal of the American Statistical Association 105, 493–505.
Abrevaya, J. (2006). Estimating the effect of smoking on birth outcomes
and workers not yet eligible for social security were less likely using a matched panel data approach. Journal of Applied Econometrics 21,
to retire. 489–519.
Contoyannis et al. (2004) utilize seven waves (1991–97) of Ahn, S. C. and Schmidt, P. (1995). Efficient estimation of models for dynamic panel
the BHPS to analyze the dynamics of individual health and to data. Journal of Econometrics 68, 5–27.
Angrist, J. D. and Pischke, J-S. (2009). Mostly harmless econometrics: An
decompose the persistence in health outcomes in the BHPS
empiricist’s companion. Princeton: Princeton University Press.
data into components due to state dependence, serial correl- Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte
ation, and unobserved heterogeneity. The indicator of health Carlo evidence and an application to employment equations. Review of
is defined by a binary response to the question: ‘‘Does your Economic Studies 58, 277–297.
health in any way limit your daily activities compared to most Askildsen, J. E., Baltagi, B. H. and Holmas, T. H. (2003). Wage policy in the health
care sector: A panel data analysis of nurses’ labour supply. Health Economics
people of your age?’’ A sample of 6106 individuals resulting in 12, 705–719.
42 742 panel observations are used to estimate static and Athey, S. and Imbens, G. W. (2006). Identification and inference in nonlinear
dynamic panel probit models by maximum simulated likeli- difference-in-differences models. Econometrica 74, 431–497.
hood methods. The dynamic models show strong positive Baltagi, B. H. (2008). The econometrics of panel data. Chichester: Wiley.
Baltagi, B. H. and Geishecker, I. (2006). Rational alcohol addiction: Evidence
state dependence.
from the Russian longitudinal monitoring survey. Health Economics 15,
Hernández-Quevedoa et al. (2008) use eight waves of the 893–914.
ECHP over the period 1994–2001 to estimate a dynamic Baltagi, B. H. and Griffin, J. M. (2001). The econometrics of rational addiction: The
nonlinear panel data model of health limitations for case of cigarettes. Journal of Business and Economic Statistics 19, 449–454.
Panel Data and Difference-in-Differences Estimation 433

Baltagi, B. H., Griffin, J. M. and Xiong, W. (2000). To pool or not to pool: Hausman, J. A. and Kuersteiner, G. (2008). Difference in difference meets
Homogeneous versus heterogeneous estimators applied to cigarette demand. generalized least squares: Higher order properties of hypotheses tests. Journal
Review of Economics and Statistics 82, 117–126. of Econometrics 144, 371–391.
Baltagi, B. H., Jones, A. M., Moscone, F. and Mullahy, J. (2012). Special issue on Hausman, J. A. and Taylor, W. E. (1981). Panel data and unobservable individual
health econometrics: Editors’ introduction. Empirical Economics 42, 365–368. effects. Econometrica 49, 1377–1398.
Baltagi, B. H. and Moscone, F. (2010). Health care expenditure and income in the Hernández-Quevedoa, C., Jones, A. M. and Rice, N. (2008). Persistence in health
OECD reconsidered: Evidence from panel data. Economic Modelling 27, limitations: A European comparative analysis. Journal of Health Economics 27,
804–811. 1472–1488.
Bertrand, M., Duflo, E. and Mullainathan, S. (2004). How much should we trust Hsiao, C. (2003). Analysis of panel data. Cambridge: Cambridge University Press.
differences-in-differences estimates? Quarterly Journal of Economics 119, Imbens, G. W. and Wooldridge, J. M. (2009). Recent developments in the
249–275. econometrics of program evaluation. Journal of Economic Literature 47, 5–86.
Besley, T. and Case, A. (2000). Unnatural experiments? Estimating the incidence of Janke, K., Propper, C. and Henderson, J. (2009). Do current levels of air pollution
endogenous policies. Economic Journal 110, F672–F694. kill? The impact of air pollution on population mortality in England. Health
Blundell, R. and Bond, S. (1998). Initial conditions and moment restrictions in Economics 18, 1031–1055.
dynamic panel data models. Journal of Econometrics 87, 115–143. Jones, A. M. (2012). Panel data methods and applications to health economics. In
Bowsher, C. G. (2002). On testing overidentifying restrictions in dynamic panel data Terence, C., Mills and Patterson, Kerry (eds.) Handbook of econometrics volume
models. Economics Letters 77, 211–220. II: Applied econometrics, pp. 557–631. Basingstoke: Palgrave MacMillan.
Breusch, T. S. and Pagan, A. R. (1980). The Lagrange multiplier test and its Judson, R. A. and Owen, A. L. (1999). Estimating dynamic panel data models: A
applications to model specification in econometrics. Review of Economic Studies guide for macroeconomists. Economics Letters 65, 9–15.
47, 239–253. Laporte, A. and Windmeijer, F. (2005). Estimation of panel data models with binary
Butler, J. S. and Moffitt, R. (1982). A computationally efficient quadrature procedure indicators when treatment effects are not constant over time. Economics Letters
for the one factor multinomial probit model. Econometrica 50, 761–764. 88, 389–396.
Card, D. (1990). The impact of the Mariel boat lift on the Miami labor market. Meyer, B., Viscusi, K. and Durbin, D. (1995). Workers’ compensation and injury
Industrial and Labor Relations Review 43, 245–253. duration: Evidence from a natural experiment. American Economic Review 85,
Carpenter, C. (2004). How do zero tolerance drunk driving laws work? Journal of 322–340.
Health Economics 23, 61–83. Ng, S. W., Norton, E. C., Guilkey, D. K. and Popkin, B. M. (2012). Estimation of a
Conley, T. G. and Taber, C. R. (2011). Inference with ‘‘Difference-in-Differences’’ with dynamic model of weight. Empirical Economics 42, 413–433.
a small number of policy changes. Review of Economics and Statistics 93, Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica 49,
113–125. 1417–1426.
Contoyannis, P., Jones, A. M. and Rice, N. (2004). The dynamics of health Powell, L. M. (2009). Fast food costs and adolescent body mass index: Evidence
in the British Household Panel Survey. Journal of Applied Econometrics 19, from panel data. Journal of Health Economics 28, 963–970.
473–503. Ruhm, C. J. (1996). Alcohol policies and highway vehicle fatalities. Journal of
Contoyannis, P. and Rice, N. (2001). The impact of health on wages: Evidence from Health Economics 15, 435–454.
the British Household Panel Survey. Empirical Economics 26, 599–622. Sickles, R. C. and Taubman, P. (1986). A multivariate error components analysis of
Donald, S. G. and Lang, K. (2007). Inference with difference in differences and the health and retirement study of the elderly. Econometrica 54, 1339–1356.
other panel data. Review of Economics and Statistics 89, 221–233. Stock, J. H. and Watson, M. W. (2008). Heteroskedasticity-robust standard errors
Greene, W. (2010). Distinguishing between heterogeneity and inefficiency: Stochastic for fixed effects panel data regression. Econometrica 76, 155–174.
frontier analysis of the World Health Organization’s panel data on national health Suhrcke, M. and Urban, D. (2010). Are cardiovascular diseases bad for economic
care systems. Health Economics 13, 959–980. growth? Health Economics 19, 1478–1496.
Gruber, J. and Poterba, J. (1994). Tax incentives and the decision to purchase Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data,
health insurance: Evidence from the self-employed. Quarterly Journal of XXIII, pp. 752. Cambridge, Mass: MIT Press.
Economics 109, 701–734. Wooldridge, J. M. (2005). Simple solutions to the initial conditions problem in
Hansen, C. B. (2007). Generalized least squares inference in panel and multilevel dynamic, nonlinear panel data models with unobserved heterogeneity. Journal of
models with serial correlation and fixed effects. Journal of Econometrics 140, Applied Econometrics 20, 39–54.
670–694. Ziliak, J. P. (1997). Efficient estimation with panel data when instruments are
Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46, predetermined: An empirical comparison of moment-condition estimators.
1251–1271. Journal of Business and Economic Statistics 15, 419–431.

You might also like