0% found this document useful (0 votes)
53 views35 pages

De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views35 pages

De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

American Economic Review 2020, 110(9): 2964–2996

https://doi.org/10.1257/aer.20181169

Two-Way Fixed Effects Estimators with


Heterogeneous Treatment Effects†
By Clément de Chaisemartin and Xavier D’Haultfœuille*

Linear regressions with period and group fixed effects are widely
used to estimate treatment effects. We show that they estimate
weighted sums of the average treatment effects (ATE ) in each group
and period, with weights that may be negative. Due to the negative
weights, the linear regression coefficient may for instance be nega-
tive while all the ATEs are positive. We propose another estimator
that solves this issue. In the two applications we revisit, it is signifi-
cantly different from the linear regression estimator. (JEL C21, C23,
D72, J31, J51, L82)

A popular method to estimate the effect of a treatment on an outcome is to com-


pare over time groups experiencing different evolutions of their exposure to treat-
ment. In practice, this idea is implemented by estimating regressions that control for
group and time fixed effects. Hereafter, we refer to those as ­two-way fixed effects
(FE) regressions. We conducted a survey, and found that 19 percent of all empiri-
cal articles published by the American Economic Review (AER) between 2010 and
2012 have used a t­wo-way FE regression to estimate the effect of a treatment on an
outcome. When the treatment effect is constant across groups and over time, such
regressions estimate that effect under the standard “common trends” assumption.
However, it is often implausible that the treatment effect is constant. For instance,
the minimum wage’s effect on employment may vary across US counties, and may
change over time. This paper examines the properties of t­wo-way FE regressions
when the constant effect assumption is violated.
We start by assuming that all observations in the same ​(g, t)​cell have the same
treatment and that the treatment is binary, as is for instance the case when the treat-
ment is a ­county-level law. We consider the regression of ​Y ​ i​,g,t​​​ , the outcome of unit ​i​
in group ​g​at period t​​on group fixed effects, period fixed effects, and ​​Dg​,t​​​, the treat-

* de Chaisemartin: University of California at Santa Barbara (email: [email protected]);


D’Haultfœuille: CREST-ENSAE (email: [email protected]). Thomas Lemieux was the coeditor for
this article. We are very grateful to Olivier Deschêsnes, Guido Imbens, Peter Kuhn, Kyle Meng, Jesse Shapiro,
Dick Startz, Doug Steigerwald, Clémence Tricaud, Gonzalo Vazquez-Bare, members of the UCSB economet-
rics research group, and seminar participants at Bergen, CIREQ Econometrics conference, CREST, Goteborg,
Gothenburg, Groningen, ITAM, Pompeu Fabra, Stanford, SMU, Tinbergen Institute, UCL, UCLA, UC Davis,
UCSB, USC, and Warwick for their helpful comments. Xavier D’Haultfœuille gratefully acknowledges financial
support from the research grants Otelo (­A NR-17-CE26-0015-041) and the Labex Ecodec: Investissements d’Ave-
nir (ANR-11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047).

Go to https://doi.org/10.1257/aer.20181169 to visit the article page for additional materials and author
disclosure statements.

2964
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2965

ment in group ​g​at period ​t​. Let ​​​βˆ ​fe


​ ​​​denote the coefficient of ​​Dg,t
​ ​​​ , and let ​β ​ ​​​ denote
​ fe
its expectation. Under the common trends assumption, we show that ​​βf​e​​​is equal to
a weighted sum of the treatment effect in each treated ​(g, t)​ cell:

​ ​​  = E​ ​  ∑ ​​​W​
(​(g,t)​:​Dg,t )
(1) ​
​βfe ​ g,t​​ ​Δ​g,t​​ ​​,
​ ​​=1

where ​​Δg​ ,t​​​is the average treatment effect (ATE) in group g​ ​and period t​​and the
weights ​​W​g,t​​​sum to 1 but may be negative. Negative weights arise because ​​​βˆ ​​fe​​​ is
a weighted sum of several d­ ifference-in-differences (DID), which compare the
evolution of the outcome between consecutive time periods across pairs of groups.
However, the “control group” in some of those comparisons may be treated at both
periods. Then, its treatment effect at the second period gets differenced out by the
DID, hence the negative weights.
The negative weights are an issue when the ATEs are heterogeneous across
groups or periods. Then, one could have that ​β ​ f​e​​​is negative while all the ATEs are
positive. For instance, 1​ .5 × 1 − 0.5 × 4​, a weighted sum of ​1​and 4​ ​, is strictly
negative. Using the dataset of Gentzkow, Shapiro, and Sinkinson (2011), we find
that 40 percent of the weights attached to ​​β​fe​​​are negative, so ​β ​ ​fe​​​is not robust to
heterogeneous effects.1
Researchers may want to know how serious that issue is in the application they
consider. We show that conditional on all treatments, the absolute value of the expec-
tation of ​​​βˆ ​f​e​​​divided by the standard deviation of the weights is equal to the minimal
value of the standard deviation of the ATEs across the treated (​ g, t)​cells under which
the average treatment on the treated (ATT) may actually have the opposite sign than
that coefficient. One can estimate that ratio to assess the robustness of the t­wo-way
FE coefficient. If that ratio is close to 0, that coefficient and the ATT can be of oppo-
site signs even under a small and plausible amount of treatment effect heterogeneity.
In that case, treatment effect heterogeneity would be a serious concern for the valid-
ity of that coefficient. On the contrary, if that ratio is very large, that coefficient and
the ATT can only be of opposite signs under a very large and implausible amount of
treatment effect heterogeneity.
Finally, we propose a new estimator, ​​DID​M​​​, that is valid even if the treatment
effect is heterogeneous over time or across groups. It estimates the average treat-
ment effect across all the (​g, t)​cells whose treatment changes from t​ − 1​to t​​. It
relies on common trends assumptions on both potential outcomes. Those conditions
are partly testable, and we propose a test that amounts to looking at ­pretrends. This
test differs from the standard event study ­pretrends test (see Autor 2003), which has
been shown to be invalid when treatment effects are heterogeneous (see Abraham
and Sun 2018). We show that our estimator is asymptotically normal. We compute it
in the datasets of Gentzkow, Shapiro, and Sinkinson (2011) and Vella and Verbeek
(1998), and in both cases we find that it is significantly different from β ​​​ˆ ​f​e​​​.2 Our esti-
mator can be used in applications where, for each pair of consecutive dates, there are

1
Gentzkow, Shapiro, and Sinkinson (2011) does not estimate ​​βf​e​​, but ​β ​ ​fd​​, the treatment coefficient in the
­first-difference regression defined below. Forty-six percent of the weights attached to ​​βf​d​​are strictly negative.
2
In both cases, our estimator is also significantly different from ​​​βˆ ​fd
​ ​​​.
2966 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

groups whose treatment does not change. We estimate that this condition is satisfied
for around 80 percent of the papers using t­wo-way fixed effects regressions found
in our survey of the AER.
Overall, our paper has implications for applied researchers estimating ­two-way
fixed effects regressions. First, we recommend that they compute the weights
attached to their regression and the ratio of ​|​​βˆ ​f​e​​|​divided by the standard deviation
of the weights. To do so, they can use the twowayfeweights Stata package that is
available from the SSC repository. If many weights are negative, and if the ratio is
not very large, we recommend that they compute our new estimator, using the fuzzy-
did and did_multiplegt Stata packages, also available from the SSC repository (see
de Chaisemartin, D’Haultfœuille, and Guyonvarch 2019, for explanations on how to
use the former package).
We extend our results in several important directions. First, another ­commonly
used regression is the fi ​ g​,t​​  − ​Y​g,t−1​​​, the change in
­ rst-difference regression of ​Y
the mean outcome in group g​ ​, on period fixed effects and on ​D ​ g​,t​​  − ​D​g,t−1​,​​ the
change in the treatment. We let ​​β​fd​​​denote the expectation of the coefficient
of ​​D​g,t​​  − ​D​g,t−1​​​. We show that under common trends, ​​β​fd​​​also identifies a weighted
sum of treatment effects, with potentially some negative weights. Second, in our
online Appendix we show that our results extend to fuzzy designs, where the treat-
ment varies within ​(g, t)​cells, and to ­two-way fixed effects regressions with a
­nonbinary treatment and with covariates.
Our paper is related to the DID literature. Our main result generalizes Theorem 1
in de Chaisemartin and D’Haultfœuille (2018). When the data have two groups
and two periods, the ­Wald-DID estimand considered therein is equal to ​​β​fe​​​and ​​β​fd​​​.
Our results on ​​β​fe​​​and ​​βf​d​​​are thus extensions of that theorem to the case with mul-
tiple periods and groups.3 Moreover, our ​​DID​M​​​estimator is related to the ­Wald-TC
estimator with many groups and periods proposed in de Chaisemartin and
D’Haultfœuille (2018), and to the ­multiperiod DID estimator proposed by Imai
and Kim (2018). In Section III, we explain the differences between those three
estimators.
More recently, Borusyak and Jaravel (2017), Abraham and Sun (2018), Athey
and Imbens (2018), Callaway and Sant’Anna (2018), and ­Goodman-Bacon (2018)
study the special case of staggered adoption designs, where the treatment of a group
is weakly increasing over time. Those papers derive some important results specific
to that design that we do not consider here. Still, some of the results in those papers
are related to ours, and we describe precisely those connections later in the paper.
The most important dimension on which our paper differs from those is that our
results apply to any t­wo-way fixed effects regressions, not only to those with stag-
gered adoption. In our survey of the AER papers estimating t­wo-way fixed effects
regressions, less than 10 percent have a staggered adoption design. This suggests
that while staggered adoptions are an important research design, they may account
for a relatively small minority of the applications where ­two-way fixed effects
regressions have been used.

3
In fact, a preliminary version of our main result appeared in a working paper version of de Chaisemartin and
D’Haultfœuille (2018): see Theorems S1 and S2 in de Chaisemartin and D’Haultfœuille (2015).
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2967

The paper is organized as follows. Section I introduces the s­etup. Section II


presents our decomposition results. Section III introduces our alternative esti-
mator. Section IV briefly describes some of the extensions covered in our online
Appendix. Section V presents our survey of the articles published in the AER, and
our two empirical applications. The data and codes are given in de Chaisemartin and
D’Haultfœuille (2020b).

I. Setup

One considers observations that can be divided into ​G​groups and ​T​periods. For
every ​(g, t) ∈ {1, …, G} × ​{1, … , T }​​, let ​​N​g,t​​​denote the number of observations
in group g​ ​at period ​t​, and let ​N = ​ ∑g,t ​​ ​N​
​ g,t​​​be the total number of observations.
The data may be an ­individual-level panel or repeated ­cross-section dataset where
groups are, say, individuals’ county of birth. The data could also be a c­ ross sec-
tion where cohort of birth plays the role of time. For instance, Duflo (2001) com-
pares the schooling of different cohorts in Indonesia, some of which were exposed
to a school construction program. It is also possible that for all ​(g, t)​, ​​N​g,t​​  = 1​,
e.g., a group is one individual or firm. All of the above are special cases of the data
structure we consider.
One is interested in measuring the effect of a treatment on some out-
come. Throughout the paper we assume that treatment is binary, but our results
apply to any ordered treatment, as we show in online Appendix Section 3.2.
Then, for every ​ (i, g, t) ∈ {1, …, ​Ng​,t​​} × {1, …, G} × {1, …, T }​ , let ​ ​D​i,g,t​​​
and ​(​Y​i,g,t​​(0), ​Y​i,g,t​​(1))​respectively denote the treatment status and the potential out-
comes without and with treatment of observation ​i​in group ​g​at period ​t​.
The outcome of observation ​i​in group ​g​and period ​t​is ​​Yi​,g,t​​  = ​Y​i,g,t​​(​Di​,g,t​​)​. For
all (​ g, t)​, let

​Ng,t
​ ​​ ​Ng,t
​ ​​
1  ​ ​ ∑ ​​ ​​D​ ​​,
​Dg​,t​​  = ​ _
​  ​ 1  ​ ​ ∑ ​​ ​​Y​ (​​​ 0)​,
Yg​,t​​​(0)​  = ​ _
​Ng​,t​​ i=1
i,g,t
​Ng​,t​​ i=1 i,g,t

​Ng,t
​ ​​ ​Ng,t
​ ​​

Yg,t 1  ​ ​ ∑ ​​ ​​Y​ (​​​ 1)​,
​ ​​​(1)​  = ​ _ and 1  ​ ​ ∑ ​​ ​​Y​ ​.​​
​Yg​,t​​  = ​ _
​Ng​,t​​ i=1
i,g,t
​Ng​,t​​ i=1 i,g,t

Here, ​​Dg​,t​​​denotes the average treatment in group g​​at period t​​, while ​Y ​ ​​(0)​,
​ g,t
​ ​g,t​​(1)​, and ​​Yg,t
​Y ​ ​​​respectively denote the average potential outcomes without and with
treatment and the average observed outcome in group ​g​at period t​​.
Throughout the paper, we maintain the following assumptions.

ASSUMPTION 1 (Balanced Panel of Groups): For all ​


(g, t) ∈ {1, …, G}
× {1, …, T }​, ​​N​g,t​​  > 0​.

Assumption 1 requires that no group appears or disappears over time. This


assumption is often satisfied. Without it, our results still hold but the notation
becomes more complicated as the denominators of some of the fractions below may
then be equal to zero.
2968 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

ASSUMPTION 2 (Sharp Design): For all ​(g, t) ∈ {1, …, G } × {1, …, T }​ and​


i ∈ {1, …, ​Ng​,t​​}​, ​​Di,g,t
​ ​​  = ​D​g,t​​​  .

Assumption 2 requires that units’ treatments do not vary within each ​(g, t)​ cell,
a situation we refer to as a sharp design. This is for instance satisfied when the
treatment is a g­ roup-level variable, for instance a county or a s­ tate law. This is also
mechanically satisfied when ​​Ng​,t​​  = 1​. In our survey in Section IIA, we find that
almost 80 percent of the papers using ­two-way fixed effects regressions and pub-
lished in the AER between 2010 and 2012 consider sharp designs. We focus on sharp
designs because of their prevalence, but in online Appendix Section 2, we show that
all the results in Sections ­II and III can be extended to fuzzy designs.

ASSUMPTION 3 (Independent Groups): The vectors ​​(​Yg​,t​​(0), ​Y​g,t​​(1), ​Dg​,t​​)1​≤t≤T​​​ are


mutually independent.

We consider ​​Dg​,t​​​, ​​Y​g,t​​(0)​, ​​Y​g,t​​(1)​as random variables. For instance, aggregate ran-
dom shocks may affect the average potential outcomes of group ​g​at period ​t​. The
treatment status of group ​g​at period ​t​may also be random. The expectations below
are taken with respect to the distribution of those random variables. Assumption 3
allows for the possibility that the treatments and potential outcomes of a group may
be correlated over time, but it requires that the potential outcomes and treatments of
different groups be independent.

ASSUMPTION 4 (Strong Exogeneity): For all ​(g, t) ∈ {1, …, G } × {2, …, T }​,


​E(​Yg​,t​​(0) − ​Yg​,t−1​​(0) | ​D​g,1​​, …, ​Dg​,T​​) = E(​Yg​,t​​(0) − ​Yg​,t−1​​(0))​.

Assumption 4 requires that the shocks affecting a group’s ​​Yg​,t​​(0)​be mean inde-
pendent of that group’s treatment sequence. This rules out the possibility that a group
gets treated because it experiences negative shocks, the s­ o-called Ashenfelter’s dip
(see Ashenfelter 1978). Assumption 4 is related to the strong exogeneity condition
in panel data models, which, as is ­well known, is necessary to obtain the consistency
of the fixed effects estimator (see, e.g., Wooldridge 2002).
We now define the FE regression described in the introduction.4

REGRESSION 1 (­Fixed Effects Regression): Let ​​​βˆ ​​fe​​​denote the coefficient of ​​D​g,t​​​


in an OLS regression of ​Y ​ ​i,g,t​​​on group fixed effects, period fixed effects, and ​​Dg,t
​ ​​​.
Let ​​β​fe​​  = E[ ​​βˆ ​fe
​ ​​]​.5

For all ​g​and ​t​, let ​​Ng,. ​ ​​  = ​∑Tt=1​​​ ​Ng,t


​ ​​​and ​​N.,t ​ ​​  = ​∑G g=1​​​ ​N​g,t​​​respectively denote
the total number of observations in group ​ g​and in period ​ t​
. For any vari-
​ g​,t​​​defined in each (​ g, t)​cell, let ​X
able ​X ​ ​g,.​​  = ​∑Tt=1​​​  (​Ng,t​ ​​/​Ng,.
​ ​​) ​X​g,t​​​denote the average
value of ​​X​g,t​​​in group ​g​, let ​​X​.,t​​  = ​∑G
g=1​​​(​N​g,t​​/​N.,t
​ ​​) ​Xg,t
​ ​​​denote the average value of ​​Xg,t ​ ​​​

4
Throughout the paper, we assume that ​ ​D​g,t​​in Regression 1 and ​D ​ g​,t​− ​Dg​,t−1​​in Regression 2 are
not collinear with the other independent variables in those regressions, so β ​​​ˆ ​f​e​​​ and ​​​βˆ ​f​d​​​are ­well defined.
5
As the independent variables in Regression 1 are constant within each ​(g, t)​cell, Regression 1 is equivalent to
a ​(g, t)​-level regression of ​​Yg,t
​ ​​on group and period fixed effects and ​​Dg,t
​ ​​, weighted by ​N ​ ​​.
​ g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2969

in period t​​, and let ​​X.,. ​ ​​  = ​∑g,t


 ​​(​
​ N​g,t​​/N ) ​Xg,t​ ​​​denote the average value of ​X ​ ​​​. For
​ g,t
instance, ​D ​ ​3,.​​​and ​​D​.,2​​​respectively denote the average treatment in group 3 across
time and in period 2 across groups, whereas ​Y ​ ​.,.​​​denotes the average value of the
outcome across groups and time. Finally, for any variable ​​Xg​,t​​​, we let X ​ ​ denote the
vector ​​(​Xg​,t​​)​(g,t)∈{1,…,G}×{1,…,T }​​​collecting the values of that variable in each ​(g, t)​
cell. For instance, D ​ ​is the vector (​ ​​ D​g,t​​)​(g,t)∈{1,…,G}×{1,…,T }​​​collecting the treatments
of all the ​(g, t)​ cells.

II. Two-Way Fixed Effects Regressions

A. A Decomposition Result

We study the FE regression under the following common trends assumption.

ASSUMPTION 5 (Common Trends): For t​ ≥ 2​, ​E(​Yg​,t​​(0) − ​Yg​,t−1​​(0))​does not


vary across ​g​.

Assumption 5 requires that the expectation of the outcome without treatment


follow the same evolution over time in every group. When ​t​represents birth cohorts,
Assumption 5 requires that the outcome difference between consecutive cohorts be
the same across groups.
Let ​​N1​​​  = ​∑i,g,t
 ​​ ​D​
​ i,g,t​​​denote the number of treated units, let
1  ​ ​  ∑ ​​​​ ​Y​ ​​​(1)​ − ​Y​ ​​​(0)​ ​​
​N1​​​ ​(i,g,t)​:​D​ ​​=1[ i,g,t ]
​Δ​​  TR​  = ​ _
​ i,g,t
g,t

denote the average treatment effect across all treated units, and let ​δ​ ​​  TR​  = E[​Δ​​  TR​  ]​
denote the expectation of that parameter, hereafter referred to as the ATT. For any​
(g, t) ∈ {1, …, G} × {1, …, T }​, let
​Ng,t
​ ​​
1  ​ ​ ∑ ​​ ​​ ​Y​ ​​​(1)​ − ​Y​ (​​​ 0)​ ​​
​Δ​g,t​​  = ​ _
​ [ i,g,t ]
​Ng​,t​​ i=1
i,g,t

denote the ATE in cell (​ g, t)​. Note that ​​δ​​  TR​​is equal to the expectation of a weighted
average of the treated cells’ ​Δ ​ g​ ,t​​​,
​Ng​,t​​
[g,t:​Dg​,t​​=1 ​N1​​​ ]
​δ​​  TR​  = E​ ​  ∑ ​​ ​ ​_ ​ ​Δ​g,t​​ ​​.
(2) ​

Under the common trends assumption, we show that ​β ​ ​fe​​​is also equal to the expec-
tation of a weighted sum of the ​Δ ​ g​,t​​​  terms, with potentially some negative weights.
Let ​ε​ g​,t​​​denote the residual of observations in cell ​(g, t)​in the regression of ​​D​g,t​​​ on
group and period fixed effects,6

​Dg,t
​ ​ ​​  = α + ​γg​​​  + ​λ​t​​  + ​ε​g,t​​​  .

​εg,t
6 ​
​ ​​arises from a ­unit-level regression, where the dependent and independent variables only vary at the ​(g, t)​
level. Therefore, all the units in the same ​(g, t)​cell have the same value of ​​ε​g,t​​  .
2970 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

One can show that if the regressors in Regression 1 are not collinear, the average
value of ​​εg​ ,t​​​across all treated (​g, t)​cells differs from 0: ∑  ​​(​ Dg​,t​​=1​ N​g,t​​/​N​1​​) ​εg,t
​​ (g,t):​ ​ ​​
≠ 0​. Then we let ​​w​g,t​​​denote ​ε​ g,t
​ ​​​divided by that average:
​εg​,t​​
​wg​,t​​  = ​ _____________
​    ​Ng​,t​​
 ​​  .
​∑​( ​​ ​  ​_  ​ ​ ε ​ ​​
g,t)​:​Dg,t
​ ​​=1 ​N​​​ g,t
1

THEOREM 1: Suppose that Assumptions 1–5 hold. Then,7


​Ng​,t​​
[​(g,t)​:​D​ ​​=1 ​N1​​​ ]
​ ​ ​​  = E​ ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​ ​Δg,t
​βfe ​ ​​ ​​.
g,t

This result implies that in general, ​​β​fe​​  ≠ ​δ​​  TR​​, so ​​​βˆ ​f​e​​​is a biased estimator of the ATT.
To illustrate this, we consider a simple example of a staggered adoption design with
two groups and three periods, and where the treatments are ­nonstochastic: group 1
is untreated at periods 1 and 2 and treated at period 3, while group 2 is untreated at
period 1 and treated both at periods 2 and 3.8 We also assume that ​N ​ ​g,t​​/​N​g,t−1​​​ does
not vary across g​ ​: all groups experience the same growth of their number of obser-
vations from t​ − 1​to ​t​, a requirement that is for instance satisfied when the data is
a balanced panel. Then, one can show that

​ε​g,t​​  = ​D​g,t​​  − ​D​g,.​​  − ​D​.,t​​  + ​D​.,.​​​  ,


thus implying that

​ε1​,3​​  = 1 − 1 / 3 − 1 + 1 / 2 = 1 / 6​,

​ε​2,2​​  = 1 − 2 / 3 − 1 / 2 + 1 / 2 = 1 / 3​,

​ε​2,3​​  = 1 − 2 / 3 − 1 + 1 / 2 = − 1 / 6​.

The residual is negative in group 2 and period 3, because the regression predicts
a treatment probability larger than one in that cell, a classic extrapolation problem
with linear regressions. Then, under the common trends assumption, it follows from
Theorem 1 and the fact that the treatments are ­nonstochastic that

​β​fe​​  = 1 / 2E​[​Δ1,3
​ ​ ]​​ ​  + E​[​Δ2,2
​ ​​]​  − 1 / 2E​[​Δ2,3
​ ​​]​​.

​ f​ e​​​is equal to a weighted sum of the ATEs in group 1 at period 3, group 2 at


Here, ​β
period 2, and group 2 at period 3, the three treated (​ g, t)​cells. However, the weight
assigned to each ATE differs from 1​ /3​, the proportion that each cell accounts for

7
In the proof, we show the following, stronger result:
​N​ ​​
E​[​​βˆ ​​fe​​​|​​D]​ = ​  ∑ ​​_
​ ​  ​Ng,t​ ​​​​wg,t
​ ​​E​[​Δg,t
​ ​​| D]​​.
​(g,t)​:​Dg,t
​ ​​=1 1

8
A similar example appears in Borusyak and Jaravel (2017).
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2971

in the population of treated observations. Therefore, ​β ​ ​fe​​​is not equal to ​δ​ ​​  TR​​. Perhaps
more worryingly, not all the weights are positive: the weight assigned to the ATE in
group 2, period 3 is strictly negative. Consequently, ​​β​fe​​​may be a very misleading
measure of the treatment effect. Assume for instance that ​E[​Δ​1,3​​] = E[​Δ2,2 ​ ​​] = 1​
and E​ [​Δ​2,3​​] = 4​. At the period when they start receiving the treatment, both groups
experience a modest positive ATE. But this effect builds over time and in period 3,
one period after it has started receiving the treatment, group 2 now experiences a
large ATE. Then,

​β​fe​​  = 1 / 2 × 1 + 1 − 1 / 2 × 4 = − 1 / 2​.

Therefore, ​β ​ ​fe​​​is strictly negative, while ​E[​Δ​1,3​​]​, ​E[​Δ​2,2​​]​, and ​E[​Δ2​ ,3​​]​are all posi-
tive. More generally, the negative weights are an issue if the E ​ [​Δg​ ,t​​]​terms are het-
erogeneous, across groups or over time.9 If E ​ [​Δ​1,3​​] = E​[Δ​2,2​​] = E[​Δ2​ ,3​​] = 1​,
then ​​β​fe​​  = 1 = ​δ​​  TR​​.
Here is some intuition as to why one weight is negative in this example. It
follows from equation (A4) in the proof of Theorem 1 (see also Theorem 1 in
­Goodman-Bacon 2018) that in this simple example, ​​β​fe​​  = (​DID​1​​  + ​DID​2​​)/2​, with

​DID​1​​  = E​(​Y2​,2)​​ ​  − E​(​Y2​,1​​)​ − ​(E​(​Y​1,2​​)​  − E​(​Y1​,1​​)​)​​,


​DID​2​​  = E​(​Y​1,3)​​ ​  − E​(​Y​1,2​​)​ − ​(E​(​Y​2,3​​)​  − E​(​Y2​,2​​)​)​​.


The first DID compares the evolution of the mean outcome from period 1 to 2 in
group 2 and in group 1. The second one compares the evolution of the mean out-
come from period 2 to 3 in group 1 and in group 2. The control group in the second
DID, group 2, is treated both in the pre- and in the post-period. Therefore, under
the common trends assumption, it follows from Lemma 1 in Appendix A (a sim-
ilar result appears in Lemma 1 of de Chaisemartin 2011 and in equation (13) of
­Goodman-Bacon 2018) that ​​DID​1​​  = E[​Δ2​ ,2​​]​, but

​DID​2​​  = E​[​Δ1​,3]​​ ​ − ​(E​[​Δ2​,3​​]​  − E​[​Δ2​,2​​]​)​​.


Note that, ​D ​ ID​2​​​is equal to the ATE in group 1, period 3, minus the change in
group 2’s ATE between periods 2 and 3. Intuitively, the mean outcome of groups 1
and 2 may follow different trends from period 2 to 3 either because group 1 becomes
treated, or because group 2’s ATE changes. The intuition that negative weights arise
because ​​​βˆ ​​fe​​​uses treated observations as controls also appears in Borusyak and
Jaravel (2017).
We now generalize the previous illustration by characterizing the ​(g, t)​ cells
whose ATEs are weighted negatively by ​​βf​e​​​  .

PROPOSITION 1: Suppose that Assumption 1 holds and for all ​ t ≥ 2​,


​ ​g,t​​  / ​Ng,t−1
​N ​ ​​​does not vary across ​g​. Then, for all (​g, t, t′ )​such that D
​​ g,t
​ ​​  = ​D​g,t′​​

9
On the other hand, ​​β​fe​​does not rule out heterogeneous treatment effects within ​(g, t)​cells, as it is identified by
variations across (​ g, t)​cells, and does not leverage any ­within-cell variation.
2972 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

= 1​, ​ ​D.,t​ ​​  > ​D​.,t′​​​implies ​w ​​g,t​​  < ​w​g,t′​​​  . Similarly, for all (​ g, g′, t)​ such
that ​​D​g,t​​  = ​D​g′,t​​  = 1​, ​​Dg,.
​ ​​  > ​D​g′,.​​​implies ​​wg,t ​ ​​  < ​w​g′,t​​​  .

Proposition 1 shows that ​​βf​e​​​is more likely to assign a negative weight to periods
where a large fraction of groups are treated, and to groups treated for many periods.
Then, negative weights are a concern when treatment effects differ between periods
with many versus few treated groups, or between groups treated for many versus
few periods.
Proposition 1 has interesting implications in staggered adoption designs, a spe-
cial case of sharp designs defined as follows.

ASSUMPTION 6 (Staggered Adoption Designs): For all ​g​, ​​D​g,t​​  ≥ ​D​g,t−1​​​ for all​
t ≥ 2​.

Assumption 6 is satisfied in applications where groups adopt a treatment at het-


erogeneous dates (see, e.g., Athey and Stern 2002). In that design, Borusyak and
Jaravel (2017) shows that ​β ​ ​fe​​​is more likely to assign a negative weight to treatment
effects at the last periods of the panel. This result is a special case of Proposition 1:
in staggered adoption designs, ​​D​.,t​​​is increasing in ​t​, so Proposition 1 implies that ​​w​g,t​​​
is decreasing in ​t​.10 Proposition 1 also implies that in that design, groups that adopt
the treatment earlier are more likely to receive some negative weights.
Finally, in staggered adoption designs, Athey and Imbens (2018) derives a
decomposition of ​β ​ ​fe​​​that resembles, but differs from, that in Theorem 1. They
derive their decomposition under the assumption that the dates at which each group
starts receiving the treatment are randomly assigned, while we derive ours under a
common trends assumption.

B. Robustness to Heterogeneous Treatment Effects

Theorem 1 shows that in sharp designs with many groups and periods, β ​​​ˆ ​f​e​​​ may
be a misleading measure of the treatment effect under the standard common trends
assumption, if the treatment effect is heterogeneous across groups and time periods.
In the corollary below, we propose two robustness measures that can be used to
assess how serious that concern is.
Those robustness measures are defined conditional on D, the vec-
tor stacking together the treatments of all the (​g, t)​cells. Specifically, for
all (​g, t) ∈ {1, …, G } × {1, …, T }​, let ​​​Δ̃ ​g​,t​​  = E(​Δg,t ​ ​​​|​​D)​denote the ATE in
cell (​ g, t)​conditional on D,11 let ​​​Δ̃ ​​​ TR​ = E(​Δ​​  TR​​|​​D)​denote the ATT conditional on
D, and let β ​​​ ̃ ​​fe​​  = E( ​​βˆ ​fe
​ ​​​|​​D)​. The first measure we consider is the minimal value of
the standard deviation of the ​​​Δ̃ ​​g,t​​​  terms under which one could have that ​​​β̃ ​fe ​ ​​​is of a
different sign than ​​​Δ̃ ​​​ TR​. Therefore, this summary measure applies to β ​ ​​​ and ​​​Δ̃ ​​​ TR​​,
​​​ ̃ ​fe

10
Borusyak and Jaravel (2017) assumes that the treatment effect of cell (​ g, t)​only depends on the number of
periods since group ​g​has started receiving the treatment, whereas Proposition 1 does not rely on that assumption.
11 ​ ̃
​​Δ ​​g,t​​​may differ from ​ E(​Δg​,t​)​
. To see this, let us consider a simple example where
​T = 2​ . Then, under Assumption 3, one has ​​​ Δ̃ ​g,t
​ ​​ = E(​Δg,t
​ ​|​​Dg,1
​ ​, ​Dg,2
​ ​)​
. One may for instance have
​E(​Δg,1
​ ​|​​Dg,1​ ​ = 0, ​Dg,2
​ ​ = 0) < E(​Δg,1
​ ​|​​Dg,1
​ ​ = 1, ​Dg,2
​ ​ = 1)​, if a group is more likely to be treated if her treat-
ment effect is initially high.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2973

rather than ​​βfe ​ ​​​and ​​δ​​  TR​​, the unconditional expectations of ​​​βˆ ​​fe​​​and ​​Δ​​  TR​​on which we
have focused so far. However, one can show that when G ​ ​, the number of groups, goes
̃ ̃
to infinity, ​​​β ​f​e​​  − ​β​fe​​​ and ​​​Δ ​​​  ​− ​δ​​  ​​both converge to 0. So if the number of groups is
TR TR

large, ​​​β̃ ​f​e​​​ and ​​​Δ̃ ​​​ TR​should not differ much from ​β ​ ​​​and ​​δ​​  TR​​, and our robustness mea-
​ fe
sure “almost” applies to ​​β​fe​​​and ​​δ​​  ​​. TR

Let
​Ng​,t​​
((g,t):​Dg,t​ ​​=1 1 )
1/2
σ​(​Δ̃ ​)​ = ​​ ​  ∑ ​​ ​ ​_ ​ ​​(​​Δ̃ ​​g,t​​  − ​​Δ̃ ​​​ TR​)​​​  ​ ​​​  ​​,
2


N ​ ​​

​Ng​,t​​
(​(g,t)​:​Dg​,t​​=1 ​N1​​​ )
1/2
σ​(w)​  = ​​ ​  ∑ ​​ ​ ​_ ​ ​​(​wg​,t​​  − 1)​​​  2​ ​​​ 
​ ​​,

where ​σ(​Δ̃ ​)​is the standard deviation of the conditional ATEs, and ​σ(w)​is the stan-
dard deviation of the ​w​-weights,12 across the treated (​ g, t)​cells. Let n​ = #{(g, t) : ​D​g,t​​
= 1}​denote the number of treated cells. For every i​ ∈ {1, …, n}​, let ​​w​(i)​​​ denote the​
i​th largest of the weights of the treated cells: ​​w​(1)​​  ≥ ​w​(2)​​ ≥ ⋯ ≥ ​w(n) ​ ​​​, and let ​​N(i) ​ ​​​
and ​​​Δ̃ ​​(i)​​​be the number of observations and the conditional ATE of the corresponding
cell. Then, for any k​ ∈ {1, …, n}​, let ​​Pk​​​  = ​∑i≥k  ​​ ​N​
​ (i)​​/​N1​​​​, ​​Sk​​​  = ​∑i≥k
 ​​ ​(​N​(i)​​/​N1​​​) ​w(i)
​ ​​,​
and ​​T​k​​  = ​∑i≥k  ​​ ​(​N​(i)​​/​N1​​​) ​w​  (i) ​​.​
2

COROLLARY 1: Suppose that Assumptions 1–5 hold.

(i) If ​σ(w) > 0​, the minimal value of σ ​ (​Δ̃ ​)​ compatible with ​​​β̃ ​​fe​​​ and ​​​Δ̃ ​​​ TR​ = 0​ is
|​​β̃ ​fe
​ ​​|
​​​σ _​​ fe​​  = ​ ____ ​​  .
σ​(w)​
(ii ) If ​​​β̃ ​f​e​​  ≠ 0​ and at least one of the ​​wg,t ​ ​​​ weights is strictly negative, the minimal
value of σ ​ (​Δ̃ ​)​ compatible with ​​​β̃ ​​fe​​​ and with ​​​Δ̃ ​​g,t​​​ of a different sign than β ​​​ ̃ ​fe
​ ​​​
for all ​(g, t)​is
|​​β̃ ​​fe​​|
​​​σ ‗​​ fe​​  = ​ ________________
      ​​  ,
​​[​Ts​​​  + ​S​  2s​  ​/(​ 1 − ​Ps​)​​ ​]​​​  ​
1/2

where s​ = min{i ∈ {1, …, n} : ​w(​i)​​  < − ​S(​i)​​/(1 − ​P(​i)​​)}.​

σ ​  fe​​​ and ​​​σ


Note that ​​​_​ ‗​​ fe​​​can be estimated simply by replacing β ​​​ ̃ ​​fe​​​ by ​​​βˆ ​​fe​.​​ An
_​​ fe​​​can be used to assess the robustness of ​​​β ​f​e​​​to treatment effect
estimator of ​​​σ ˆ
heterogeneity across groups and periods. If ​​​_​ σ ​  fe​​​is close to 0, ​​​β̃ ​f​e​​​ and ​​​Δ̃ ​​​ TR​ can be
of opposite signs even under a small and plausible amount of treatment effect het-
erogeneity. In that case, treatment effect heterogeneity would be a serious concern
for the validity of β ​​​ˆ ​f​e​​​. On the contrary, if _​​​​ ​  fe​​​is very large, ​​​β̃ ​​fe​​​ and ​​​Δ̃ ​​​ TR​can only be
σ
of opposite signs under a very large and implausible amount of treatment effect
heterogeneity. Then, treatment effect heterogeneity is less of a concern.

12
One can show that ​​∑(g,t):​
 ​​  ​ ​=1​(​Ng​,t​/​N1​​) ​w​g,t​ = 1​.
Dg,t
2974 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Similarly, if ​​​σ‗​​ fe​​​is close to 0, one may have, say, ​​​β̃ ​fe ​ ​​  > 0​, while ​​​Δ̃ ​​g,t​​  ≤ 0​ for
all (​ g, t)​, even if the dispersion of the Δ ​​​ ̃ ​g​,t​​​  terms is relatively small. Notice that ‗​ σ
​​​ ​  fe​​​
is only defined if at least one of the weights is strictly negative: if all the weights are
positive, then one cannot have that ​​​β̃ ​​fe​​​is of a different sign than all the Δ ​​​ ̃ ​​g,t​​​  terms.
When some of the weights ​​wg​,t​​​ are negative, ​​​β ​fe ˆ ​ ​​​may still be robust to heteroge-
neous treatment effects across groups and periods, provided the assumption below
is satisfied.

ASSUMPTION 7 (​w​ Uncorrelated with ​​Δ̃ ​​): ​


E [ ​∑​( ​​(​ ​ ​​=1​ N​g,t​​/​N1​​​)(​wg,t
g,t)​:​Dg,t ​ ​​  − 1)
× (​​Δ̃ ​​g,t​​  − ​​Δ̃ ​​​ TR​  )] = 0​.

COROLLARY 2: If Assumptions 1–5 and 7 hold, then ​​βf​e​​  = ​δ​​  TR​​.

Assumption 7 requires that the weights attached to the fixed effects estima-
tor be uncorrelated with the conditional ATEs in the treated (​g, t)​cells. This is
often implausible. For instance, groups treated the most are also those with the
lowest value of ​​wg​,t​​​ , as shown in Proposition 1. But those groups could also be
those with the largest treatment effect. This would then induce a negative cor-
relation between w ​ ​ and ​​Δ̃ ​​. The plausibility of Assumption 7 can be assessed,
by looking at whether ​w​ is correlated with a predictor of the treatment effect
in each (​g, t)​cell. In the two applications we revisit in Section V, this test is
rejected.

C. Extension to the First-Difference Regression

Instead of Regression 1, many articles have estimated the ­first-difference regres-


sion defined below.

REGRESSION 2 (First-Difference Regression): Let ​​​βˆ ​​fd​​​denote the coefficient


​Dg​,t​​  − ​D​g,t−1​​​in an OLS regression of ​Y
of ​ ​​g,t​​  − ​Y​g,t−1​​​on period fixed effects
and ​​Dg​,t​​  − ​D​g,t−1​​​, among observations for which ​t ≥ 2​. Let ​​βf​d​​  = E[ ​​βˆ ​f​d​​]​.

When T ​ = 2​and ​​Ng​,2​​/​N​g,1​​​does not vary across g​ ​, meaning that all groups expe-
rience the same growth of their number of units from period 1 to 2, one can show
that ​​​βˆ ​f​e​​  = ​​βˆ ​fd
​ ​​​. But, ​​​βˆ ​fe
​ ​​​ differs from ​​​βˆ ​fd
​ ​​​if ​T > 2​or ​​Ng,2
​ ​​/​Ng,1
​ ​​​varies across g​ ​.
We start by showing that a result similar to Theorem 1 also applies to β ​​​ˆ ​​fd​​​.
For any ​(g, t) ∈ {1, …, G } × {2, …, T }​, let ​​εfd,g,t ​ ​​​denote the residual of obser-
vations in group g​ ​and at period t​​in the regression of ​D ​ g​,t​​  − ​D​g,t−1​​​on period
fixed effects, among observations for which t​ ≥ 2​. For any g​ ∈ {1, …, G }​,
let ​​ε​fd,g,1​​  = ​ε​fd,g,T+1​​  = 0​. One can show that if the regressors in Regression 2 are
not perfectly collinear,

​N​ ​​ ​N​ ​​
​N​​​ ( fd,g,t )
​​  ∑ ​​ ​ ​_ g,t
 ​​ ​ε​ ​​  − ​ _ ​ ​ε​fd,g,t+1​​ ​  ≠ 0​.
g,t+1
​N​ ​​
​(g,t)​:​Dg,t
​ ​​=1 1 g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2975

Then we define
​Ng​,t+1​​
ε​ ​fd,g,t​​  − ​ _ ​Ng,t
 ​ ​ε​
​ ​​ fd,g,t+1
​​
​ __________________________
​w​fd,g,t​​  = ​    
      ​​  .
​ ​​=1 ​N​​​ ( fd,g,t ​N​ ​​ fd,g,t+1)
​Ng​,t​​ ​Ng​,t+1​​
​∑​( ​​ ​  ​_  ​ ​ ε
​ ​ ​​  − ​  _  ​ ​ε​ ​​ ​
g,t)​:​Dg,t 1 g,t

THEOREM 2: Suppose that Assumptions ­1–5 hold. Then,

​Ng​,t​​
[​(g,t)​:​D​ ​​=1 ​N1​​​ ]
​ ​ ​​  = E​ ​  ∑ ​​ ​ ​_ ​ ​w​fd,g,t​​ ​Δg,t
​βfd ​ ​​ ​​.
g,t

Theorem 2 shows that under Assumption 5, ​​βf​d​​​is equal to a weighted sum of


the ATEs in each treated ​(g, t)​cell with potentially some strictly negative weights,
just as ​β ​ ​fe​​​. We now characterize the (​ g, t)​cells whose ATEs are weighted negatively
by ​​βf​d​​​. To do so, we focus on staggered adoption designs, as outside of this case it is
more difficult to characterize those cells. Our characterization relies on the fact that
for every ​t ∈ {2, …, T }​, ​​εf​d,g,t​​  = ​D​g,t​​  − ​D​g,t−1​​  − (​D.​,t​​  − ​D​.,t−1​​)​. Here, ​​εf​d,g,t​​​is the
difference between the change of the treatment in group ​g​between t​ − 1​and t​ ​, and
the average change of the treatment across all groups.

PROPOSITION 2: Suppose that Assumptions 1–2 and 6 hold and for all g​​ ,
​​N​g,t​​​does not depend on ​t​. Then, for all ​(g, t)​such that ​​Dg​,t​​  = 1​, ​​wf​d,g,t​​  < 0​ if
and only if ​ ​D​g,t−1​​  = 1​and ​
​D.​,t​​  − ​D​.,t−1​​  > ​D​.,t+1​​  − ​D​.,t​​​ (with the convention
that ​​D.​,T+1​​  = ​D​.,T​​​).

Proposition 2 shows that for all ​t ∈ {2, …, T − 1}​such that the increase in the
proportion of treated units is larger from ​t − 1​to ​t​than from ​t​to ​t + 1​, the period-​t​
ATE of groups already treated in t​ − 1​receives a negative weight. Moreover, if,
at period T ​ ​, at least one group becomes treated, the ATE of groups already treated
in T ​ − 1​also receives a negative weight. Therefore, the treatment effect arising
at the date when a group starts receiving the treatment does not receive a negative
weight, only l­ong-run treatment effects do. Then, negative weights are a concern
when instantaneous and ­long-run treatment effects may differ. Proposition 2 also
shows that the prevalence of negative weights depends on how the number of groups
that start receiving the treatment at date t​​evolves with t​​. Assume for instance that
this number decreases with t​​: many groups start receiving the treatment at date 1, a
bit less start at date 2, etc., a case hereafter referred to as the “more early adopters”
case. Then, if ​N ​ ​g,t​​​is constant across ​(g, t)​, ​​D​.,t​​  − ​D​.,t−1​​​is decreasing in ​t​, and all the
­long-run treatment effects receive negative weights, except maybe those of period T ​​
if ​​D​.,T​​  = ​D​.,T−1​​​. Conversely, assume that the number of groups that start receiving
the treatment at date ​t​increases with ​t​: few groups start receiving the treatment at
date 1, a bit more start at date 2, etc., a case hereafter referred to as the “more late
adopters” case. Then, if ​N ​ g​,t​​​is constant across ​(g, t)​, ​​D.,t ​ ​​  − ​D​.,t−1​​​is increasing in t​​,
and only the period-​T​l­ong-run treatment effects receive negative weights. Overall,
negative weights are much more prevalent in the “more early adopters” than in the
“more late adopters” case.
2976 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

We now come back to general sharp designs where the treatment may not follow
a staggered adoption. Let β ​​​ ̃ ​f​d​​  = E( ​​βˆ ​fd
​ ​​ ​|​​D)​denote the expectation of β ​​​ˆ ​​fd​​​ conditional
on the vector of treatment assignments D ​​​ ̃ ​f​e​​​, one can show that the min-
​ ​. Just as for β
̃ ̃ ̃
imal value of ​σ(​Δ ​)​ compatible with ​​​β ​​fd​​​ and ​​​Δ ​​​  ​ = 0​ is ​​​σ
TR
_​​ fd​​  = |​​β̃ ​fd
​ ​​|/σ(​wfd ​ ​​),​ where

​ ​ ​​
(​(g,t)​:​D​ ​​=1 ​N1​​​ )
1/2
N
σ(​ ​w​ ​​)​  = ​​ ​  ∑ ​​ ​ ​_ ​ ​​(​w​ fd,g,t​​  − 1)​​​  ​ ​​​ 
g,t 2
​ fd ​​
g,t

is the standard deviation of the ​w ​ fd ​ ​​​-weights. One can also show that ‗​​  σ
​​​ fd​​​, the min-
imal value of σ ​ (​Δ̃ ​)​ compatible with ​​​β̃ ​​fd​​​ and ​​​Δ̃ ​​g,t​​​of a different sign than β ​​​ ̃ ​​fd​​​ for
all (​g, t)​, has the same expression as σ ​​​ fe​​​, except that one needs to replace the
‗​​ 
weights ​​wg​,t​​​by the weights ​w ​ f​d,g,t​​​in its definition. Estimators of σ ​​​ fe​​​ and ​​​_​​ 
_​​  σ fd​​​ (or ​​​‗​​  σ fe​​​
‗​​  fd​​​) can then be used to determine which of ​​​βˆ ​​fe​​​ or ​​​βˆ ​​fd​​​is more robust to het-
and ​​​σ
erogeneous treatment effects.
Finally, and similarly to the result shown in Corollary 2 for ​β ​ f​e​​​, ​​βf​d​​​is equal to ​​δ​​  TR​​
under common trends and the following assumption.

​ ̃ ​​): ​E[ ​∑(g,t):​
ASSUMPTION 8 (​​wf​d​​​ Uncorrelated with ​Δ  ​​  ​ ​​=1​(​Ng​,t​​/​N1​​​)(​wf​d,g,t​​  − 1)
Dg,t
× (​Δ​g,t​​  − ​Δ​​  ​  )] = 0​.
TR

Note that under the common trends assumption, one can jointly test Assumption 8
and Assumption 7, the assumption that the weights attached to ​​β​fe​​​are uncorrelated
with the ​​Δ​g,t​​​  terms: if ​​​βˆ ​f​e​​​ and ​​​βˆ ​f​d​​​are significantly different, at least one of these two
assumptions must fail. In the two applications we revisit in Section V, β ​​​ˆ ​f​e​​​ and ​​​βˆ ​​fd​​​ are
significantly different.

III. An Alternative Estimator

In this section, we show that it is possible to estimate a ­well-defined causal effect


even if treatment effects are heterogeneous across groups or over time. Let

[ ​NS​​​ ​(i,g,t)​:t≥2,​Dg​,t​​≠​Dg​,t−1​​ ]
​δ​​  S​  = E​ _
​ ​  1  ​ ​  ∑ ​​​​[​Y​i,g,t​​​(1)​ − ​Yi​,g,t​​​(0)​]​ ​​,

​ ​S​​  = ​∑(g,t):t≥2,​
with ​N   
 ​​ ​N​ Dg​,t​​≠​Dg​,t−1​​​ g,t​​​. The term ​​δ​​  ​​is the ATE of all switching cells. In
S

staggered adoption designs, ​δ​ ​​  S​​is the average of the treatment effect at the time when
a group starts receiving the treatment, across all groups that become treated at some
point.
We now show that ​δ​ ​​  S​​can be unbiasedly estimated by a weighted average of DID
estimators. This result holds under the following supplementary assumptions.

ASSUMPTION 9 (Strong Exogeneity for ​ Y(1)​): For all ​ (g, t) ∈ {1, …, G }


× {2, …, T }​, ​E(​Yg​,t​​(1) − ​Yg,t−1
​ ​​(1) | ​D​g,1​​, …, ​Dg,T
​ ​​) = E(​Yg,t
​ ​​(1) − ​Yg,t−1
​ ​​(1))​.

Assumption 9 is the equivalent of Assumption 4, for the potential outcome with


​ ​g,t​​(1)​be mean independent
treatment. It requires that the shocks affecting a group’s ​Y
of that group’s treatment sequence.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2977

ASSUMPTION 10 (Common Trends for ​Y(1)​): For ​t ≥ 2​, ​E(​Yg,t


​ ​​(1) − ​Yg,t−1
​ ​​(1))​
does not vary across ​g​.

Again, Assumption 10 is the equivalent of Assumption 5, for the potential out-


come with treatment. It requires that between each pair of consecutive periods, the
expectation of the outcome with treatment follow the same evolution over time in
every group. Assumptions 9 and 10 ensure that one can reconstruct the potential out-
come that groups leaving the treatment between ​t − 1​and t​ ​would have experienced
if they had remained treated. In staggered adoption designs, Assumptions 9 and 10
are not necessary for identification, because no group leaves the treatment. Together,
Assumptions 5 and 10 imply that the ATE follows the same evolution over time
in every group: ​E(​Δg​,t​​) = ​ηt​​​  + ​θ​g​​​.13 This still allows for heterogeneous treatment
effects across groups and over time.14

ASSUMPTION 11 (Existence of “Stable” Groups): For all t​ ∈ {2, …, T }​:

​​ g,t−1​​  = 0​, ​​Dg​,t​​  = 1​,


(i ) If there is at least one g​ ∈ {1, …, G }​ such that D​
then there exists at least one g​′ ≠ g, g′ ∈ {1, …, G }​ such that ​​Dg​′,t−1​​
= ​Dg​′,t​​  = 0​.

(ii) If there is at least one ​g ∈ {1, …, G }​ such that ​​D​g,t−1​​  = 1, ​Dg​,t​​  = 0​, then
there exists at least one ​g′ ≠ g, g′ ∈ {1, …, G }​ such that ​​Dg​′,t−1​​  = ​D​g′,t​​  = 1​.

The first point of the stable groups assumption requires that between each pair of
consecutive time periods, if there is a “joiner” (i.e., a group switching from being
untreated to treated), then there should be another group that is untreated at both
dates. The second point requires that between each pair of consecutive time periods,
if there is a “leaver” (i.e., a group switching from being treated to untreated), then
there should be another group that is treated at both dates.
Notice that under Assumption 11, groups’ treatments are not indepen-
dent, so Assumption 3 cannot hold. Accordingly, we replace Assumption 3 by
Assumption 12. Assumption 12 requires that conditional on its own treatments, a
group’s outcomes be mean independent of the other groups’ treatments. It is weaker
than Assumption 3. Assumption 11 is necessary to show that our estimator is unbi-
ased, but it is not necessary to show that it is consistent. Accordingly, in Section 5 of
the online Appendix, we show that our estimator is consistent under Assumption 3.
For every g​ ∈ {1, …, G }​, let ​​Dg​​​  = (​D1,g
​ ​​, …, ​DT,g
​ ​​)​.

ASSUMPTION 12 (Mean Independence between a Group’s Outcome and


Other Groups Treatments): For all ​ , ​E(​Y​g,t​​(0) | D) = E(​Y​g,t​​(0) | ​D​g​​)​ and​
g​and t​​
E(​Yg​,t​​(1) | D) = E(​Y​g,t​​(1) | ​D​g​​)​.

13
It should be possible to weaken Assumptions 9–10, in particular to account for dynamic effects where ​​Δ​g,t​​
may depend on ​(​ D​g,1​, …, ​Dg​,t−1​)​. This introduces complications that are beyond the scope of this paper, but that we
address in de Chaisemartin and D'Haultfœuille (2020a).
14
Imposing Assumptions 9 and 10 does not change the decompositions obtained in Theorems 1 and 2; ​Y ​ ​g,t​(1)​ is
observed for all the treated ​(g, t)​cells entering these decompositions, so those assumptions do not bring identifying
information for those cells.
2978 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

We can now define our estimator. For all t​ ∈ {2, …, T }​and for all (​d, d′ )
∈ ​{0, 1}​​  2​​, let

(3) ​​N​d,d′,t​​  = ​  ∑ ​​ ​N​


​ g,t​​​
g:​Dg​,t​​=d,​Dg​,t−1​​=d′

denote the number of observations with treatment d​ ′​at period t​ − 1​and d​ ​at period t​ ​.
Let

​N​ ​​ ​N​ ​​
​N​ ​​ ( g,t
 ​ ​ ​Y​ ​​  − ​Y​g,t−1​​)​  − ​ 
​N​ ​​ ( g,t
​DID​+,t​​  = ​  ∑ ​​ ​ ​_ ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−1​​)​​,
g,t g,t

​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=0 1,0,t ​ ​​=​Dg,t−1
g:​Dg,t ​ ​​=0 0,0,t

​N​ ​​ ​N​ ​​
​N​ ​​ ( g,t
 ​ ​ ​Y​ ​​  − ​Y​g,t−1​​)​  − ​ 
​N​ ​​ ( g,t
​DID​−,t​​  = ​  ∑ ​​ ​ ​_ ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−1​​)​​.
g,t g,t

​ ​​=​Dg,t−1
g:​Dg,t ​ ​​=1 1,1,t ​ ​​=0,​Dg,t−1
g:​Dg,t ​ ​​=1 0,1,t

Note that ​D ​ ID​+,t​​​is not defined when there is no group such that ​D ​ g​,t​​  = 1,
​D​g,t−1​​  = 0​ , or no group such that ​ ​D​g,t​​  = 0, ​Dg​,t−1​​  = 0​
. In such instances,
we let ​D ​ ID​+,t​​  = 0​ . Similarly, let ​D
​ ID​−,t​​  = 0​when there is no group such
​ g​,t​​  = 1, ​Dg​,t−1​​  = 1​or no group such that ​​Dg​,t​​  = 0, ​Dg​,t−1​​  = 1​. Finally, let
that ​D

​N1,0,t
​ ​​ ​N0,1,t
​ ​​
​DID​M​​  = ​ ∑​​ ​​(_  ​ ​DID​+,t​​  + ​ _ ​ ​DID​−,t​​)​​.
T
​ ​ 
t=2 ​
N ​
S ​​ ​
N S​​​

THEOREM 3: If Assumptions 1, 2, 4, 5, and ­9–12 hold, ​E[​DID​M​​] = ​δ​​ S​​.

In online Appendix Section 5, we also show that when G ​ ​goes to infinity, ​D ​ ID​M​​​
is a consistent and asymptotically normal estimator of ​​δ​​  S​​. The ​​DID​M​​​estimator is
computed by the fuzzydid and did_multiplegt Stata packages.
Here is the intuition underlying Theorem 3. The estimator ​​DID​+,t​​​compares the
evolution of the mean outcome between ​t − 1​and ​t​in two sets of groups: the join-
ers, and those remaining untreated. Under Assumptions 4 and 5, ​​DID​+,t​​​ estimates
the joiners’ treatment effect. Similarly, ​​DID​−,t​​​compares the evolution of the out-
come between ​t − 1​and t​​in two sets of groups: those remaining treated, and the
leavers. Under Assumptions 9 and 10, it estimates the leavers’ treatment effect.
Finally, ​D​ ID​M​​​is a weighted average of those DID estimators. Note that in stag-
gered designs, there are no groups whose treatment decreases over time, so ​​DID​M​​​ is
only a weighted average of the ​D ​ ID​+,t​​​estimators. Note also that one can separately
estimate the joiners’ and the leavers’ treatment effect, by computing separately
weighted averages of the ​D ​ ID​+,t​​​and ​​DID​−,t​​​estimators. The former estimator only
relies on Assumptions 4 and 5, while the latter only relies on Assumptions 9 and 10.
​ ID​M​​​is related to two other estimators. First, it is related to the ­Wald-TC
Note that, ​D
estimator in point 2 of Theorem S1 in the online Appendix of de Chaisemartin and
D’Haultfœuille (2018), but the weighting of ​​DID​+,t​​​and ​D ​ ID​−,t​​​therein differs. As
​ ID​M​​​estimates ​Δ
a result, ​D ​ ​​  ​​under weaker assumptions. Second, ​D
S
​ ID​M​​​is related to
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2979

the ­multiperiod DID estimator in Imai and Kim (2018). However, the m ­ ultiperiod
DID estimator is a weighted average of the ​​DID​+,t​​​, so it does not estimate the leav-
ers’ treatment effect, and applies to a smaller population. Besides, Imai and Kim
(2018) do not establish the properties of their estimator. Finally, they do not gen-
eralize it to ­nonbinary treatments, something we do in online Appendix Section 4.
There may be a b­ ias-variance ­trade-off between ​D ​ ID​M​​​and the t­wo-way fixed
effects regression estimators. For instance, assume that Regression 1 is correctly
specified:

​Y​g,t(​​​ 0)​ = ​α​g​​  + ​λ​t​​  + ​ε​g,t​​​  ,


​ (​​​ 1)​ − ​Yg​,t(​​​ 0)​  = δ​,


​Yg,t

E​(​εg​,t​​  | D)​  = 0​.


Then, if the errors ​ε​ ​g,t​​​are homoskedastic and uncorrelated, it follows from the
­Gauss-Markov theorem that ​​​βˆ ​​fe​​​is the linear estimator of ​δ​, the constant treatment
effect parameter, with the lowest variance. As ​​DID​M​​​is also an unbiased linear esti-
mator of ​δ,​ the variance of ​​​βˆ ​​fe​​​must be lower than that of ​D ​ ID​M​​​. With heteroske-
dastic or correlated errors, one can construct examples where the variance of ​​​βˆ ​​fe​​​ is
higher than that of ​D​ ID​M​​​, but this still suggests that ​​DID​M​​​may often have a larger
variance than that of ​​​βˆ ​​fe​​​, as we find in our applications in Section V.
​ ID​M​​​uses groups whose treatment is stable to infer the trends that
Note that, ​D
would have affected switchers if their treatment had not changed. This strategy
could fail, if switchers experience different trends than groups whose treatment is
stable. To assess if this is a serious concern, we propose to use the following placebo
estimator, that essentially compares the outcome’s evolution from t​ − 2​to t​ − 1​, in
groups that switch and do not switch treatment between t​ − 1​and t​​. This placebo
estimator is defined under a modified version of Assumption 11.

ASSUMPTION 13 (Existence of “Stable” Groups for the Placebo Test): For all
​t ∈ {3, …, T }​:

(i ) If there is at least one ​ D​g,t−2​​  = ​D​g,t−1​​  = 0​


g ∈ {1, …, G }​ such that ​​
and ​​Dg​,t​​  = 1​, then there exists at least one g​ ′ ≠ g, g′ ∈ {1, …, G }​ such
that ​​D​g′,t−2​​  = ​D​g′,t−1​​  = ​D​g′,t​​  = 0​.

(ii ) If there is at least one ​ g ∈ {1, …, G }​ such that ​​ D​g,t−2​​


= ​Dg​,t−1​​  = 1, ​​ ​​  = 0​, then there exists at least one ​
Dg,t g′ ≠ g, g′
∈ {1, …, G }​such that ​​D​g′,t−2​​  = ​D​g′,t−1​​  = ​D​g′,t​​  = 1​.

For all ​t ∈ {2, …, T }​and for all (​ d, d′, d″ ) ∈ ​{0, 1}​​  3​​, let

​​N​d,d′,d″,t​​  = ​  ∑ ​​​N​


​ g,t​​​
g:​Dg​,t​​=d,​Dg​,t−1​​=d′,​Dg​,t−2​​=d″
2980 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

denote the number of observations with treatment status ​d″​at period ​t − 2​, ​d′​ at
period t​ − 1​, and ​d​at period t​​. Let

​N​  S​  ​  = ​ 


​ ∑ ​​​N​
​ g,t​​​  ,
pl

​(g,t)​:t≥3,​Dg,t
​ ​​≠​Dg,t−1
​ ​​=​Dg,t−2
​ ​​

​N​ ​​
​N​ ​​ ( g,t−1
​DID​  pl
+,t​​  = ​  ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−2​​)​
g,t

​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=0 1,0,0,t

​N​ ​​
​N​ ​​ ( g,t−1
− ​  ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−2​​)​​,
g,t

​ ​​=​Dg,t−1
g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=0 0,0,0,t

​N​ ​​
​N​ ​​ ( g,t−1
​DID​  pl
−,t​​  = ​  ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−2​​)​
g,t

​ ​​=​Dg,t−1
g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=1 1,1,1,t

​N​ ​​
​N​ ​​ ( g,t−1
− ​  ∑ ​​ ​ ​_  ​ ​ ​Y​ ​​  − ​Y​g,t−2​​)​​.
g,t

​ ​​=0,​Dg,t−1
g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=1 0,1,1,t

When there is no group such that ​​Dg​,t​​  = 1, ​Dg​,t−1​​  = ​D​g,t−2​​  = 0​or no group such
that ​​D​g,t​​  = ​D​g,t−1​​  = ​D​g,t−2​​  = 0​, we let ​​DID​  pl
+,t​​  = 0​, and we adopt the same con-
vention for ​​DID​  pl −,t ​
​  = 0​ . Let

​N1,0,0,t
​ ​​ ​N0,1,1,t
​ ​​
t=3 ( ​N​  S​  ​ )
T
​DID​  M​ ​  = ​ ∑​​ ​​ _
​  pl ​ ​DID​  pl
+,t​​  + ​ 
_  ​ ​DID​  pl
−,t​ ​​.
pl

​N​  S​  ​
pl

THEOREM 4: If Assumptions 1, 2, 4, 5, 9, 10, 12, and 13 hold, then ​E[​ ​DID​  M​]​  ​  = 0​.
pl

The ​D +,t​​estimator compares the evolution of the mean outcome from ​t − 2​


​ ID​  pl
to ​t − 1​in two sets of groups: those untreated at ​t − 2​and ​t − 1​but treated
at ​t​, and those untreated at ​t − 2​, ​t − 1​, and t​​. If Assumptions 4 and 5 hold,
then E +,t​​  ] = 0​
​ [​DID​  pl . Similarly, if Assumptions 9 and 10 hold, ​E[​DID​  pl
−,t​​  ] = 0​.
Then, ​E[​DID​  M​ ​  ] = 0​is a testable implication of Assumptions 4, 5, 9, and 10, so
pl

finding ​​DID​  M​ ​​significantly different from 0 would imply that those assumptions are
pl

violated: groups that switch treatment experience different trends before that switch
than the groups used to reconstruct their counterfactual trends when they switch.15
Note that ​​DID​  M​ ​​compares the trends of switching and stable groups one period
pl

before the switch. One can define other placebo estimators comparing those trends,
say, two or three periods before the switch. The ​​DID​  M​ ​​estimator and all those other
pl

placebo estimators are computed by the did_multiplegt Stata package.

15
See also Callaway and Sant’Anna (2018), which proposes another placebo test in staggered adoption designs.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2981

IV. Extensions

In this section, we briefly review some of the extensions in our online Appendix.
First, we show that the decomposition of ​β ​ ​fe​​​in Theorem 1 can be extended to fuzzy
designs where the treatment varies within ​(g, t)​cells and to applications with a non-
binary treatment.16 In fuzzy designs or with a n­ onbinary treatment, the weights in
Theorem 1 remain essentially unchanged.
We also consider t­wo-way fixed effects regressions with covariates. Specifically,
we study the coefficient of ​​Dg​,t​​​in a regression of ​​Yi,g,t ​ ​​​on group and period fixed
effects, ​​D​g,t​​​, and a vector of covariates ​​X​g,t​​​. We show that a result very similar to
Theorem 1 applies to that coefficient, up to two differences. First, including covari-
ates allows for different trends across groups, provided those differential trends are
fully accounted for by a linear model in ​​X​g,t​​  − ​X​g,t−1​​​, the change in a group’s covari-
ates. Specifically, instead of Assumptions 4 and 5, one needs to assume that

E​(​Yg​,t(​​​ 0)​​|​​​Dg​​​, ​Xg​​​)​  − E​(​Y​g,t−1​​​(0)​​|​​​Dg​​​, ​Xg​​​)​ = ​(​Xg​,t​​  − ​X​g,t−1​​)​′γ + ​λt​​​​  ,


for some vector ​γ​and constant ​​λt​ ​​​, and where ​​Xg​​​  = (​Xg​,1​​, …, ​Xg​,T​​)​. Importantly,
when the covariates are g­ roup-specific linear trends, the equation above is equiva-
lent to

E​(​Y​g,t(​​​ 0)|​​ ​​​Dg​​​, ​Xg​​​)​  − E​(​Yg​,t−1​​​(0)​​|​​​Dg​​​, ​Xg​​​)​ = ​γ​g​​  + ​λ​t​​​  ,


meaning that from ​t − 1​to ​t​, the evolution of ​Y(0)​in group ​g​should deviate from
its ­group-specific linear trend ​γ ​ g​ ​​​by an amount ​λ
​ t​ ​​​common to all groups. Second, the
residual ​​ε​g,t​​​in the weights in Theorem 1 has to be replaced by ​​ε​  Xg,t​​, the residual of
observations in cell ​(g, t)​in the regression of ​​Dg​,t​​​on group and period fixed effects
and ​X​ ​g,t​​​. Some of the corresponding weights may still be negative, as in Theorem 1.
Overall, t­wo-way fixed effects regressions with covariates may rely on a more
plausible common trends assumptions than those without covariates, but they still
require that the treatment effect be homogeneous, across time and between groups.
Third, we show that under the common trends assumption and the assumption
that the ATE of a ​(g, t)​cell does not change over time, ​​βf​e​​​and ​β ​ ​fd​​​identify weighted
sums of the ATEs of the ​(g, t)​cells whose treatment changes between ​t − 1​and ​t​.
In sharp designs, the weights attached to ​​β​fd​​​are all positive, while for ​​β​fe​​​, the same
only holds in staggered adoption designs.
Fourth, we show that our ​​DID​M​​​estimator can easily be extended to ­nonbinary,
discrete treatments. Then, we define it as a weighted average of DID terms com-
paring the evolution of the outcome in groups whose treatment went from ​d​to ​d′​
between ​t − 1​and ​t​and in groups with a treatment of d​ ​at both dates, across all
possible values of ​d​, ​d′​, and ​t​.
Finally, our twowayfeweights, fuzzydid, and did_multiplegt Stata packages can
handle all of those extensions.

16
The decomposition of ​​βfd
​ ​​in Theorem 2 can also be extended to all of those cases.
2982 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Table 1—Papers Using Two-Way Fixed Effects Regressions Published in the AER

2010 2011 2012 Total


Papers using t­wo-way fixed effects regressions 5 14 14 33
Percent of published papers 5.2 12.2 11.2 9.8
Percent of empirical papers, excluding lab experiments 12.8 23.0 19.2 19.1

Note: This table reports the number of papers using t­wo-way fixed effects regressions pub-
lished in the AER from 2010 to 2012.

Table 2—Descriptive Statistics on Two-Way Fixed Effects Papers

Number
of papers
Panel A. Estimation method
­Fixed effects OLS regression 13
­First-difference OLS regression 6
­Fixed effects or ­first-difference OLS regression, with several treatment variables 6
­Fixed effects or ­first-difference 2LS regression 3
Other regression 5

Panel B. Research design


Sharp design 26
Fuzzy design 7

Panel C. Are there stable groups?


Yes 12
Presumably yes 14
Presumably no 5
No 2

Note: This table reports the estimation method and the research design used in the 33 papers
using ­two-way fixed effects regressions published in the AER from 2010 to 2012, and whether
those papers have stable groups.

V. Applicability, and Applications

A. Applicability

We conducted a review of all papers published in the American Economic Review


(AER) between 2010 and 2012 to assess the importance of t­wo-way fixed effects
regressions in economics. Over these three years, the AER published 337 papers. Out
of these 337 papers, 33 or 9.8 percent of them estimate the FE or FD Regression, or
other regressions resembling closely those regressions. When one withdraws from
the denominator theory papers and lab experiments, the proportion of papers using
these regressions raises to 19.1 percent.
Table 2 shows descriptive statistics about the 33 ­2010–2012 AER papers estimat-
ing t­ wo-way fixed effects regressions. Panel A shows that 13 use the FE regression;
6 use the FD regression; 6 use the FE or FD regression with several treatment vari-
ables; 3 use the FE or FD 2SLS regression discussed in online Appendix Section 3.4;
5 use other regressions that we deemed sufficiently close to the FE or FD regression
to include them in our count.17 Panel B shows that more than t­ hree-fourths of those

17
For instance, two papers use regressions with ­three-way ­fixed effects instead of t­wo-way fixed effects.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2983

papers consider sharp designs, while less than one-fourth consider fuzzy designs.
Finally, panel C assesses whether, in those applications, there are groups whose
exposure to the treatment remains stable between each pair of consecutive time peri-
ods, the condition that has to be met to be able to compute the DID​
​​ M​​​ estimator. For
about one-half of the papers, reading the paper was not enough to assess this with
certainty. We then assessed whether they presumably have stable groups. Overall,
12 papers have stable groups, 14 presumably have stable groups, 5 presumably do
not have stable groups, and 2 do not have stable groups.
In online Appendix Section 6, we review each of the 33 papers. We explain where
­two-way fixed effects regressions are used in the paper, and we detail our assess-
ment of whether the design is a sharp or a fuzzy design, and of whether the stable
groups assumption holds.

B. Application to Gentzkow, Shapiro, and Sinkinson (2011)

Gentzkow, Shapiro, and Sinkinson (2011) studies the effect of newspapers on vot-
ers’ turnout in US presidential elections between 1868 and 1928. They regress the
­first-difference of the turnout rate in county ​g​between election years t​ − 1​and ​t​ on
­state-year fixed effects and on the first difference of the number of newspapers avail-
able in that county. This corresponds to Regression 2, with ­state-year fixed effects
as controls. As reproduced in Table 3, Gentzkow, Shapiro, and Sinkinson (2011)
finds that ​​​βˆ ​​fd​​  = 0.0026​ (standard error = ​9 × ​10​​  −4​​  ). According to this regres-
sion, one more newspaper increased voters’ turnout by 0.26 percentage points. On
the other hand, ​​​βˆ ​f​e​​  = − 0.0011​ (standard error = ​0.0011​). Here, ​​​βˆ ​f​e​​​ and ​​​βˆ ​f​d​​​ are
significantly different (­t-statistic = 2.86).
We use the twowayfeweights Stata package, downloadable with its help file from
the SSC repository, to estimate the weights attached to ​​​βˆ ​​fe​​​. We find that 60 percent
are strictly positive, 40 percent are strictly negative. The negative weights sum to
−0.53. We find ​​​​σ _​  ˆ ​​ fe​​  = 3 × ​10​​  −4​​, meaning that ​​β​fe​​​and the ATT may be of opposite
signs if the standard deviation of the ATEs across all the treated (​ g, t)​cells is equal
to ​0.0003​.18 Further, ​​​​σ ˆ ​​ fe​​  = 7 × ​10​​  −4​​, meaning that ​​βf​e​​​may be of a different sign
‗​ 
than the ATEs of all the treated ​(g, t)​cells if the standard deviation of those ATEs is
equal to 0​ .0007​. We also estimate the weights attached to ​​​βˆ ​f​d​​​. Here, 54 percent are
strictly positive, and 46 percent are strictly negative. The negative weights sum to
−1.43. We find _​  ​​​​σˆ ​​ fd​​  = 4 × ​10​​  −4​​, and ‗​  ​​​​σˆ ​​ fd​​  = 6 × ​10​​  −4​​.
Therefore, ​​βf​e​​​and ​​β​fd​​​can only receive a causal interpretation if the weights
attached to them are uncorrelated with the intensity of the treatment effect in each
county × election-year cell (Assumptions 7 and 8, respectively). This is not war-
ranted. First, as β ​​​ˆ ​​fe​​​ and ​​​βˆ ​fd​ ​​​significantly differ, Assumptions 7 and 8 cannot jointly
hold. Moreover, the weights attached to β ​​​ˆ ​​fe​​​ and ​​​βˆ ​fd
​ ​​​are correlated with variables
that are likely to be themselves associated with the intensity of the treatment effect
in each cell. For instance, the correlation between the weights attached to ​​​βˆ ​f​d​​​and ​t​,
the year variable, is equal to ​− 0.06​ (­t-statistic = −3.28). The effect of newspapers
may be different in the last than in the first years of the panel. For instance, new

18
The number of newspapers is not binary, so strictly speaking, in this application the parameter of interest is
the average causal response parameter introduced in online Appendix Section 3.2, rather than the ATT.
2984 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Table 3—Estimates of the Effect of One Additional Newspaper on Turnout

Estimate Standard error Observations


​​​ ˆ ​​fd​​​
β 0.0026 0.0009 15,627
​​​βˆ ​​fe​​​ −0.0011 0.0011 16,872
​​DID​M​​​ 0.0043 0.0014 16,872
​ ID​  M​ ​​
​D −0.0009
pl
0.0016 13,221
​​DID​M​​​ , on placebo subsample 0.0045 0.0019 13,221

Notes: This table reports estimates of the effect of one additional newspaper on turnout, as
well as a placebo estimate of the common trends assumption underlying ​​DID​M​​​. Estimators are
computed using the data of Gentzkow, Shapiro, and Sinkinson (2011), with ­state-year fixed
effects as controls. Standard errors are clustered by county. To compute the ​D
​ ID​M​​​ estimators,
the number of newspapers is grouped into 4 categories: 0, 1, 2, and more than 3.

means of communication, like the radio, appear in the end of the period under con-
sideration, and may diminish the effect of newspapers.19 This would lead to a vio-
lation of Assumption 8.
The stable groups assumption holds: between each pair of consecutive elections,
there are counties where the number of newspapers does not change. We use the
fuzzydid Stata package, downloadable with its help file from the SSC repository, to
estimate a modified version of our ​​DID​M​​​estimator, that accounts for the fact that
the number of newspapers is not binary (see online Appendix Section 3.2, where
we define this modified estimator). We include s­ tate-year fixed effects as controls
in our estimation. We find that ​D ​ ID​M​​  = 0.0043​, with a standard error of 0​ .0014​.
Therefore, ​​DID​M​​​is 66 percent larger than ​​​βˆ ​f​d​​​, and the two estimators are signifi-
cantly different at the 10 percent level (­t-statistic = 1.77); ​D ​ ID​M​​​is also of a differ-
ent sign than β​​​ˆ ​f​e​​​.
Our ​​DID​M​​​estimator only relies on a common trends assumption. To assess its
plausibility, we compute ​D ​ ID​  M​ ​​, the placebo estimator introduced in Section III.20
pl

As shown in Table 3, our placebo estimator is small and not significantly differ-
ent from 0, meaning that counties where the number of newspapers increased or
decreased between t​ − 1​and t​​did not experience significantly different trends
in turnout from t​ − 2​to t​ − 1​than counties where that number was stable. Our
placebo estimator is estimated on a subset of the data: for each pair of consecu-
tive time periods t​ − 1​and t​​, we only keep counties where the number of news-
papers did not change between t​ − 2​and t​ − 1​. Still, almost 80 percent of the
county × ­election-year observations are used in the computation of the placebo
estimator. Moreover, when reestimated on this subsample, the ​D ​ ID​M​​​estimator is
very close to the ​​DID​M​​​estimator in the full sample.

C. The Effect of Union Membership on Wages

A number of articles have estimated the effect of union membership on wages


using panel data and controlling for workers’ fixed effects. For instance, Jakubson

19
In fact, Gentzkow, Shapiro, and Sinkinson (2011) analyzes the 1868 to 1928 period separately from later
periods, because the growth of the radio may have changed newspapers’ effects.
Again, we need to slightly modify ​D
​ ID​M​​to account for the fact that the number of newspapers is not binary.
20 pl
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2985

Table 4—Estimates of the Union Premium

Estimate Standard error Observations


​​​ ˆ ​​fe​​​
β 0.107 0.030 4,360
​​​βˆ ​​fd​​​ 0.060 0.032 3,815
​​DID​M​​​ 0.041 0.034 3,815
​ ID​  M​ ​​
​D
pl
0.094 0.038 3,101
​​DID​  M​  ​​ −0.041
pl,2
0.030 2,458
​​DID​  M​  ​​ −0.004
pl,3
0.033 1,881

Notes: This table reports estimates of the effect of the union premium, as well as placebo esti-
mators of the common trends assumption. Estimators are computed using the data of Vella and
Verbeek (1998). Standard errors are clustered at the worker level.

(1991) has found a 8.3 percent union membership premium using that strategy, in
a sample of American males from the PSID followed from 1976 to 1980. Vella and
Verbeek (1998) estimates a similar regression and find similar results, in a sample of
young American males from the NLSY followed from 1980 to 1987.21
We use the data in Vella and Verbeek (1998) to compute various estima-
tors of the union wage premium. As union status is often measured with
error (see, e.g., Freeman 1984; Card 1996), we discard changes in union sta-
tus happening twice in three consecutive years. Specifically, for individuals
​ i​,t−1​​  = 0​, ​​Di​,t​​  = 1​, and ​​D​i,t+1​​  = 0​, we replace ​​Di​,t​​​by 0. Similarly, for indi-
with ​D
viduals with ​D ​ i​,t−1​​  = 1​, ​​Di​,t​​  = 0​, and ​​Di​,t+1​​  = 1​, we replace ​D​ ​i,t​​​by 1. Doing so,
we discard half of the union status changes in the initial data.22
We start by estimating a ­two-way fixed effects regression of wages on union
membership with worker and year fixed effects. Table 4 shows that β ​​​ˆ ​​fe​​  = 0.107​
(standard error = ​0.030​), a result close to that of the worker fixed effects regres-
sions in Jakubson (1991) and Vella and Verbeek (1998).
Then, we estimate the weights attached to ​​​βˆ ​f​e​​​. Here, 820 are strictly positive, 196
are strictly negative, but the negative weights only sum to −0.01. Still, ​​​​σ ˆ ​  ​  fe​​  = 0.097​,
_​
meaning that ​β ​ ​fe​​​and the ATT may be of opposite signs if the standard deviation
of the treatment effect across the unionized worker × year observations is equal
to 0.097, a substantial but still possible amount of heterogeneity. The weights are
negatively correlated with workers’ years of schooling (correlation = − 0.12,
­t-statistic = − 1.88). The union premium may be lower for more educated work-
ers (see Freeman and Medoff 1984), as they may be less substitutable than less
educated ones. Then, ​​​βˆ ​​fe​​​may overestimate ​δ​ ​​  TR​​, the average union premium across
all unionized worker × year observations. We also find that ​​​βˆ ​​fd​​  = 0.060​ (standard
error = 0.032) and that ​​​βˆ ​​fe​​​ and ​​​βˆ ​fd ​ ​​​significantly differ (­t-statistic = 1.91),23 thus
casting further doubt on Assumptions 7 and 8.

21
The fixed effects regression is not the main specification in Vella and Verbeek (1998). The authors favor
instead a dynamic selection model.
22
Keeping the original data does not change much the results presented below, except that the placebo estima-
tor ​​DID​M​ ​​ becomes significant.
pl,2
23
The standard error of ​​​βˆ ​fe
​ ​​  − ​​βˆ ​fd
​ ​​​is computed with a ­worker-level clustered bootstrap.
2986 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

The stable groups assumption holds: between each pair of consecutive years, there
are workers whose union membership status does not change. We therefore compute
our ​​DID​M​​​estimator. Table 4 shows that it is equal to ​0.041​(standard error = 0.034).
In this case ​D ​ ID​M​​​is significantly different from β ​​​ˆ ​fe
​ ​​​ (­t-statistic = 2.60) and ​​​βˆ ​fd
​ ​​​
(­t-statistic = 2.36). As discussed in Section III, we can also estimate separately
24

the union premium for workers joining and leaving a union, something that was pre-
viously done by Freeman (1984). The joiners’ effect estimate is equal to ​0.059​(stan-
dard error = 0.053), the leavers’ effect is equal to 0​ .021​(standard error = 0.044),
and the two estimates do not significantly differ (­t-statistic​  = 0.55).
The estimator ​​DID​M​​​relies on a common trends assumption. To assess its plau-
sibility, we compute ​D ​ ID​  M​ ​​, the placebo estimator introduced in Section III; ​D ​ ID​  M​​ ​ 
pl pl

compares the wage growth of workers changing and not changing their union
status one period before that change. We also compute ​​DID​  M​  ​​and ​​DID​  M​  ​​, two
pl,2 pl,3

other placebo estimators performing the same comparison two and three periods
before the change. As shown in Table 4, ​D ​ ID​  M​ ​​is large, positive, and significant
pl

(­t-statistic = 2.49). On the other hand ​​DID​  M​  ​​and ​D ​ ID​  M​  ​​are smaller and insig-
pl,2 pl,3

nificant. Workers that become unionized start experiencing a differential positive


­pretrend one year before becoming unionized. This differential p­ retrend mostly
comes from union joiners: for them, the placebo estimator is equal to 0​ .119​(stan-
dard error = 0.051), while for union leavers the placebo is smaller (​0.061​) and
insignificant (standard error = 0.057). Therefore, the placebos suggest that even
the already small and insignificant ​​DID​M​​​estimator may overestimate the union pre-
mium, due to a positive p­ retrend. In fact, the estimate of leavers’ effect, for which
there is no evidence of a ­pretrend, is very close to 0. Overall, our results indicate that
there may not be a significant union wage premium.

VI. Conclusion

Almost 20 percent of empirical articles published in the AER between 2010


and 2012 use regressions with groups and period fixed effects to estimate treat-
ment effects. In this paper, we show that under a common trends assumption, those
regressions estimate weighted sums of the treatment effect in each group and period.
The weights may be negative: in one application, we find that more than 40 percent
of the weights are negative. The negative weights are an issue when the treatment
effect is heterogeneous, between groups or over time. Then, one could have that the
treatment’s coefficient in those regressions is negative while the treatment effect is
positive in every group and time period. We therefore propose a new estimator to
address this problem. This estimator estimates the treatment effect in the groups that
switch treatment, at the time when they switch. It does not rely on any treatment
effect homogeneity condition. It is computed by the fuzzydid and did_multiplegt
Stata packages. In the two applications we revisit, this estimator is significantly and
economically different from the ­two-way fixed effects estimators.

24
The standard errors of ​​​βˆ ​fe
​ ​​  − ​DID​M​​ and ​​​βˆ ​fd
​ ​​  − ​DID​M​​are computed with a w
­ orker-level clustered bootstrap.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2987

Appendix A. Proofs

One Useful Lemma

Our results rely on the following lemma.

LEMMA 1: If Assumptions ­1–5 hold, for all (​


g, g′, t, t′ ) ∈ ​{1, …, G }​​  2​
× ​{1, …, T }​​  2​,

​E​(​Y​g,t​​ ​|​​ D)​  − E​(​Y​g,t′​​ ​|​​  D)​ − ​(E​(​Y​g′,t​​ ​|​​  D)​ − E​(​Y​g′,t′​​ ​|​​  D)​)​​

Dg​,t​​  E​(​Δg​,t​​ ​|​​  D)​ − ​Dg​,t′​​  E​(​Δg​,t′​​ ​|​​  D)​ − ​(​Dg​′,t​​  E​(​Δg​′,t​​ ​|​​  D)​ − ​Dg​′,t′​​  E​(​Δg​′,t′​​ ​|​​  D)​)​​.
= ​

PROOF OF LEMMA 1:
For all (​ g, t) ∈ {1, …, G} × {1, …, T }​,

( ​Ng​,t​​ i=1 )
​N​g,t​​
E(​ ​Y​g,t​​ ​|​​  D)​  = E​ _
​ ​  1  ​ ​ ∑ ​​ ​​Yi​,g,t​​  | D ​​

( ​Ng​,t​​ i=1 )
​N​g,t​​
​  1  ​ ​ ∑ ​​ ​​(​Yi​,g,t​​​(0)​ + ​Di​,g,t​​​(​Yi​,g,t​​​(1)​ − ​Yi​,g,t​​​(0)​)​)​  | D ​​
= E​ _

= E​(​Yg​,t(​​​ 0)​ ​|​​  D)​ + ​Dg​,t​​  E​(​Δg​,t​​ ​|​​  D)​​


= E​(​Yg​,t(​​​ 0)​ ​|​​ ​Dg​​​)​ + ​Dg​,t​​  E​(​Δg​,t​​ ​|​​  D)​​,


where the third equality follows from Assumption 2, and the fourth from
Assumption 3. Therefore,

​E(​ ​Y​g,t​​ ​|​​  D)​  − E​(​Y​g,t′​​ ​|​​  D)​ − ​(E​(​Y​g′,t​​ ​|​​  D)​ − E​(​Y​g′,t′​​ ​|​​  D)​)​​

​ ​​​(0)​ ​|​​ ​Dg​​​)​  − E​(​Y​g′,t​​​(0)​ − ​Yg′,t′
= E​(​Y​g,t​​​(0)​ − ​Yg,t′
​ ​ ​​​(0)​ ​|​​ ​Dg′​ ​​)​​

​ ​​ ​|​​  D)​ − ​Dg,t′
+ ​D​g,t​​  E​(​Δg,t
​ ​ ​​ ​|​​  D)​ − ​(​Dg′,t
​ ​​  E​(​Δg,t′ ​ ​​ ​|​​  D)​ − ​Dg′,t′
​ ​​  E​(​Δg′,t ​ ​​  E​(​Δ​g′,t′​​ ​|​​  D)​)​​

= E​(​Y​g,t​​​(0)​ − ​Yg,t′
​ ​ ​​​(0)​)​  − E​(​Y​g′,t​​​(0)​ − ​Yg′,t′
​ ​​​(0)​)​​

​ ​​ ​|​​  D)​ − ​Dg,t′
+ ​D​g,t​​  E​(​Δg,t
​ ​ ​​ ​|​​  D)​ − ​(​Dg′,t
​ ​​  E​(​Δg,t′ ​ ​​ ​|​​  D)​ − ​Dg′,t′
​ ​​  E​(​Δg′,t ​ ​​  E​(​Δ​g′,t′​​ ​|​​  D)​)​​

= ​
​ ​ ​​ ​|​​  D)​ − ​Dg,t′
D​g,t​​  E​(​Δg,t ​ ​​ ​|​​  D)​ − ​(​Dg′,t
​ ​​  E​(​Δg,t′ ​ ​​ ​|​​  D)​ − ​Dg′,t′
​ ​​  E​(​Δg′,t ​ ​​ ​|​​  D)​)​​,
​ ​​  E​(​Δg′,t′

where the second equality follows from Assumption 4, and the third from
Assumption 5. ∎
2988 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

PROOF OF THEOREM 1:
­ risch-Waugh theorem and the definition of ​ε​ g​,t​​​ that
It follows from the F
∑  ​​ ​N​
​ g,t ​ g,t​​ ​εg,t ​ ​​ ​|​​  D)​
​ ​​  E​(​Yg,t
E(
(A1) ​ ​ ​​ ​|​​  D)​ = ​ 
​ ​​βˆ ​fe _______________
       ​ ​.
​∑g,t
 ​​ ​N​
​ g,t​​ ​εg,t
​ ​​ ​Dg,t
​ ​​
Now, by definition of ​​εg​ ,t​​​ again,
T
​ ∑​​ ​​Ng,t
(A2) ​ ​ ​​ ​εg,t
​ ​​  = 0 {1, …, G}​​,
for all g ∈ ​
t=1
G
​  ∑ ​​​ ​Ng​,t​​ ​εg​,t​​  = 0
(A3) ​ {1, …, T}​​.
for all t ∈ ​
g=1

Then,

​  g,t​​ ​εg​,t​​  E​(​Yg​,t​​ ​|​​  D)​​


​​∑ ​​ ​N​
g,t

(A4) ​
= ​ ​  g,t​​ ​εg​,t​​​(E​(​Yg​,t​​ ​|​​  D)​  − E​(​Yg​,1​​ ​|​​  D)​  − E​(​Y​1,t​​ ​|​​  D)​  + E​(​Y​1,1​​ ​|​​  D)​)​​
∑ ​​ ​N​
g,t

= ​
​ ​  g,t​​ ​εg​,t​​​(​Dg​,t​​  E​(​Δg​,t​​ ​|​​  D)​ − ​Dg​,1​​  E​(​Δg​,1​​ ​|​​  D)​
∑ ​​ ​N​
g,t

− ​D​1,t​​  E​(​Δ1​,t​​ ​|​​  D)​ + ​D1​,1​​  E​(​Δ​1,1​​ ​|​​  D)​)​​

= ​
​ ​  g,t​​ ​εg​,t​​ ​Dg​,t​​  E​(​Δg​,t​​ ​|​​  D)​​
∑ ​​ ​N​
g,t

​ g,t​​ ​εg​,t​​  E​(​Δg​,t​​ ​|​​  D)​​.


= ​  ∑ ​​ ​N​
(A5) ​
​(g,t)​:​Dg​,t​​=1

The first and third equalities follow from equations (A2) and (A3). The second
equality follows from Lemma 1. The fourth equality follows from Assumption 2.
Finally, Assumption 2 implies that

(A6) ​​∑ ​​ ​N​
​  g,t​​ ​εg​,t​​ ​Dg​,t​​  = ​  ∑ ​​​N​
​ g,t​​ ​εg​,t​​​  .
g,t ​(g,t)​:​Dg​,t​​=1

Combining (A1), (A5), (A6) yields


​Ng​,t​​
E(
(A7) ​ ​ ​​ ​|​​  D)​ = ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​  E​(​Δg,t
​ ​​βˆ ​fe ​ ​​ ​|​​  D)​​.
​N​​​
​(g,t)​:​D​ ​​=1 1 g,t

Then, the result follows from the law of iterated expectations. ∎

PROOF OF PROPOSITION 1:
If for all ​t ≥ 2​, ​​Ng​,t​​/​Ng,t−1 ​ ​​​does not depend on t​​, then it follows from the
first order conditions attached to Regression 1 and a few lines of alge-
bra that ​ ​ε​g,t​​  = ​D​g,t​​  − ​D​g,.​​  − ​D​.,t​​  + ​D​.,.​.​​ Therefore, ​​w​g,t​​​is proportional
to ​D​​g,t​​  − ​D​g,.​​  − ​D​.,t​​  + ​D​.,.​​​
. Then, for all ​ (g, t, t′ )​such that ​​D​g,t​​  = ​D​g,t′​​
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2989

= 1​, ​​D​.,t​​  > ​D​.,t′​​​implies ​w ​​g,t​​  < ​w​g,t′​​​ 


. Similarly, for all (​g, g′, t)​such that
​ g​,t​​  = ​D​g′,t​​  = 1​, ​​Dg,.
​D ​ ​​  > ​D​g′,.​​​implies ​w
​ g,t​ ​​  < ​w​g′,t​ ​​ . ∎

PROOF OF COROLLARY 1:

Proof of the First Point.—If the assumptions of the corollary hold and
̃
​​​Δ ​​​  ​ = 0​, then
TR

⎧ ̃ ​Ng,t
​ ​​
⎪​​β ​f​e​​  = ​∑ ​( ​​ ​ 
___
g,t)​:​Dg​,t​​=1​ ​N1​ ​ ​​​ wg,t ​ ​​ ​​Δ̃ ​​g,t​​  ,
​​⎨​    ​N​g,t​​
 ​​​​

= ​ ∑  ​​ ​  ​___  ​ ​​ Δ ̃ ​​g,t​​  ,
⎩ 0 ​(g,t)​:​Dg,t
​ ​​=1 ​N​​​ 1

where the first equality follows from (A7). These two conditions and the
­Cauchy-Schwarz inequality imply


| ​Ng​,t​​
|​​β ​̃ ​fe​​| = ​ ​  ∑ ​​ ​ ​_ ​ ​(​wg​,t​​  − 1)​​(​​Δ̃ ​​g,t​​  − ​​Δ̃ ​​​ TR​)​​ ≤ σ​(W)​σ​(​Δ̃ ​)​​.
​N​​​
​(g,t)​:​D​ ​​=1 1 g,t
|
​ (​Δ̃ ​) ≥ ​​σ
Hence, σ _​​  fe​​​.
Now, we prove that we can rationalize this lower bound. Let us define
​​β̃ ​f​e(​​​ ​wg,t ​ ​​  − 1)​
​​​Δ̃ ​​ TR _________
g,t​​ = ​ 
    ​ ​.
​σ​​  2(​​ W)​

Then,

​Ng​,t​​ ​​β̃ ​​fe​​​(​w​g,t​​  − 1)​ _____ ​​β̃ ​f​e​​ ​Ng​,t​​


​σ​​  ​​(W)​(​(g,t)​:​Dg​,t​​=1 ​N1​​​ )
​​​Δ̃ ​​​ TR​ = ​  ∑ ​​ ​ ​_ ​ ​ _________     ​  = ​   ​ ​ ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​  − 1 ​ = 0​,

N
​(g,t)​:​Dg​,t​​=1 1
​ ​​ ​σ​​  ​​(W)​
2 2

​ ​g,t​​​ that ∑
as it follows from the definition of ​w  ​​ 
​​ (g,t):​ ​ ​​=1​(​Ng​,t​​/​N1​​​)​wg​,t​​  = 1​.
Dg,t
Similarly,

​N​ ​​ ​​β̃ ​f​e​​​(​wg,t
​ ​​  − 1)​ ​​β̃ ​​ ​​ ​N​ ​​
∑ ​​ ​ ​_  ​ ​w​g,t​​ ​ _________
    ​  = ​ _____  ​ ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​​(​wg​,t​​  − 1)​​
g,t fe g,t
​​ 

N
​(g,t)​:​Dg​,t​​=1 1
​ ​​ ​σ​​  ​​(W)​
2
​ ​​  ​​(W)​​(g,t)​:​Dg​,t​​=1 1​​​
σ 2 ​
N

​​β̃ ​​fe​​ ​Ng​,t​​
= ​ _____
​  ​ ​  ∑ ​​ ​ ​_ ​ ​​(​wg,t
​ ​​  − 1)​​​  2​​
​σ​​  (​​ W)​​(g,t)​:​D​ ​​=1 ​N1​​​
2
g,t

= ​​β̃ ​​fe​​​  ,

​N​g,t​​
where the second equality follows again from the fact that ∑  ​​​  ___ ​ ​​  = 1​.
Dg​,t​​=1​​N​ ​ ​​​ wg,t
​​ (g,t):​
1

Proof of the Second Point.—We first suppose that ​​​β̃ ​​fe​​  > 0​. We seek to solve
n ​N​ ​​
​(i)​
 ​  ​ ​∑​​​ ​ _ ​ ​​(​​Δ̃ ​​​(i)​​​  − ​​Δ̃ ​​​ TR​)​​​  ​​,
2
​​  min​
​​Δ̃ ​​​(1)​​​,…,​​Δ̃ ​​​(n)​​​ i=1 ​N1​​​
2990 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

subject to
n ​N​ ​​
​(i)​
​​β̃ ​​fe​​  = ​ ∑​​​ ​ _ ​  ​w​(​i)​​ ​​​Δ̃ ​​​(i)​​​,​  ​​​Δ̃ ​​​(i)​​ ​ ≤ 0
​ {1, …, n}​​.
for all i ∈ ​
i=1 ​N1​​​

This is a quadratic programming problem, with a matrix that is symmetric pos-


itive but not definite. Hence, by Frank and Wolfe (1956) and the fact that the
linear term in the quadratic problem is 0, the solution exists if and only if the
set of constraints is not empty. If w​ ​​ (n)​​  ≥ 0​ , the set of constraints is empty
because ∑ ​​ i=1​​​  (​N​(i)​​/​N1​​​) ​w(i)
n ̃ ̃
​ ​​ ​​Δ ​​(i)​​  ≤ 0 < ​​β ​​fe​​​. On the other hand, if w​
​​ (n)​​  < 0​, this
set is n­ on-empty since it includes ​(0, …, 0, ​​β̃ ​​fe​​/(​P(​n)​​ ​w(​n)​​))​.
We now derive the corresponding bound. For that purpose, remark that

​ ​ ​​ ​ ​ ​​ ​ ​ ​​ ​ ​ ​​
i=1 ​N1​​​ ( ) (i=1 ​N1​​​ )
n n 2 n n 2
​(i)​ N ​(i)​N N
​(i)​ 2 N​(i)​
​​ ∑​​​ ​ _ ​​​ ​​Δ̃ ​​​(i)​​​  − ​ ∑​​​ ​ _ ​ ​​Δ̃ ​​​(i)​​​ ​​​  ​ = ​ ∑​​​ ​ _ ​ ​​Δ̃ ​​ (i) ​​− ​​ ​ ∑​​​ ​ _ ​ ​​Δ̃ ​​​(i)​​​ ​​​  ​​.
i=1 ​N1​​​ i=1 ​N1​​​

The Karush-Kuhn-Tucker necessary conditions for optimality are that for all ​i​:

​​​Δ̃ ​​​(i)​​​  = ​​Δ̃ ​​​ TR​ + λ ​w​(​i)​​​  − ​γ​​(i)​​​​,


n ​N​ ​​
​(i)​
​ ∑​​​ ​ _ ​ ​w​​(i)​​ ​​​Δ̃ ​​​(i)​​ ​ = ​​β̃ ​f​e​​​  ,

i=1 ​N1​​​

​γ​(​i)​​​  ≥ 0​,

​γ​(​i)​​ ​​​Δ̃ ​​​(i)​​ ​ = 0​,

where ​​​Δ̃ ​​​  TR​ = ​∑ni=1​​​  (​N​​(i)​​/​ ​N1​​​) ​​Δ̃ ​​(i)​​​, ​2λ​is the Lagrange multiplier of the con-
straint ​​∑ni=1​​​  (​N​(i)​​/​N1​​​) ​w(​i)​​ ​​Δ ​​̃ (i)​​  = ​​β̃ ​​fe​​​ and ​2 (​N​(i)​​/​N1​​​) ​γ(​ i)​​​is the Lagrange multiplier
of the constraint ​​​Δ ​​̃ (i)​​  ≤ 0​.
These constraints imply that ​​​Δ̃ ​​(i)​​  = 0​if and only if ​​​Δ̃ ​​​ TR​+ λ​w(​i)​​  ≥ 0​. Therefore,
if ​​​Δ̃ ​​​ TR​ + λ ​w(​i)​​  < 0​, ​​​Δ̃ ​​(i)​​  ≠ 0​ so ​​γ(​i)​​  = 0​, and ​​​Δ̃ ​​(i)​​  = ​​Δ̃ ​​​ TR​ + λ ​w(​i)​​​. Therefore,

(A8) ​​​Δ̃ ​​​(i)​​​  = min​(​​Δ̃ ​​​ TR​ + λ ​w​(​i)​​​, 0)​​.

This equation implies that Δ ​​​̃ ​​(i)​​  ≤ ​​Δ̃ ​​​  TR​ + λ ​w​(i)​​​


, which in turn implies
that ​​​Δ̃ ​​​  TR​ ≤ ​​Δ̃ ​​​  TR​ + λ​, so ​λ ≥ 0​.
As a result, ​​​ Δ̃ ​​​ TR​+ λ ​w​(i)​​​is decreasing in i​​ , and because x​ ↦ min(x, 0)​
is increasing, ​​​Δ̃ ​​(i)​​​is also decreasing in ​ i​. Then ​​​Δ̃ ​(n)
​ ​​  < 0​: otherwise one
̃
would have ​​​Δ ​​(i)​​  = 0​for all ​ i​which would imply ​​​ ̃
β ​​fe​​  = 0​
, a contradiction.
Let s​ = min{i ∈ {1, …, n}  :  ​​Δ̃ ​​(i)​​  < 0}​. Using again (A8), we get
​N​(​i)​​​
̃ ​​  TR​ = ​∑ ​​ ​ ​  _
​​​Δ ​  ​ ​​Δ̃ ​​​(i)​​ ​ = ​P​s​​ ​​Δ̃ ​​​ TR​ + λ ​Ss​​​​  .
i≥s ​
N ​
1 ​​
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2991

Therefore,

̃ ​​  TR​ = ​ _λ ​S​s​​


(A9) ​​​Δ ​  ​​  .
1 − ​Ps​​​

​ ̃ ​​​ TR​​ in (A8), we obtain that for all ​i ≥ s​,


Hence, plugging ​​Δ
​S​s​​
​​​Δ̃ ​​​(i)​​​  = λ​{_
​   ​ + ​w​(​i)​​​}​​.
1 − ​Ps​​​
Finally, using again (A8), we obtain
​N​(​i)​​​
​​​β̃ ​​fe​​  = ​∑ ​​ ​ ​  _ ​ ​w​​(i)​​​ ​​Δ̃ ​​​(i)​​​  = λ​{_  ​ + ​Ts​​​}​​.
​S​  2s​  ​
​ 
i≥s ​N1​​​ 1 − ​Ps​​​
Thus,

____________
​​β̃ ​f​e​​
λ = ​ 
​     ​​.
​Ts​​​  + ​S​  2s​  ​ / ​(1 − ​Ps​)​​ ​
Then, using what precedes,
​N​(​i)​​​ ​N​(​i)​​​
‗​​ 2fe ​​  = ​∑ ​​ ​ ​  _ ​ ​​(λ ​w​(​i)​​​)​​​  2​ + ​∑ ​​ ​ ​  _ ​ ​​(​​Δ̃ ​​​ TR​)​​​  ​​
2
​​​σ
i≥s 1 ​
N ​ ​​ i<s 1 ​
N ​ ​​

λ ​S​s​​
= ​λ​​  2​ ​T​s​​  + ​(1 − ​Ps​)​​ ​ ​​(_
1 − ​Ps​​​ )
2
​ ​   ​ ​​​  ​​

= ​λ​​  2​​[​Ts​​​  + ​ _


1 − ​Ps​​​ ]
​S​  2s​  ​
​  ​ ​​

​​β̃ ​​ 2 ​​ 
= ​ _____________
   ​​.
fe

​Ts​​​  + ​S​  2s​  ​ / ​(1 − ​Ps​​​)​

The result follows, once noted that equations (A8) and (A9) imply that​
s = min{i ∈ {1, …, n}  :  ​w(​i)​​  < − ​S(​i)​​/(1 − ​P(​i)​​)}​.
Finally, consider the case β ​​​ ̃ ​​fe​​  < 0​. By letting ​​​Δ̃ ​​ ′( ​ Δ̃ ​​(i)​​​ and ​​​β̃ ​​ ′f ​
i)​ = − ​​ β̃ ​f​e​​​  ,
e​  = − ​​
we have
​ ​ ​​ ​ ​ ​​
(i=1 ​N1​​​ )
n n 2
​(i)​ N ​(i)​ N
‗​​ fe​​  = ​   ​  ​ ​ ∑​​​ ​ _ ​ ​​​Δ̃ ​​ ​′( ​
i)​​ ​​ ​ −  ​​ ​ ∑​​​ ​ 
_  ​ ​​Δ̃ ​​ ′​( ​
i)​​ ​​​  ​​
2
​​​σ min​
​​Δ̃ ​​ ′​( ​ ̃ ​≤0
1)​​≤0,…,​​Δ ​​ ′​( ​
n)​ i=1 ​N1​​​

subject to
n ​N​ ​​
​(i)​
​​ ∑​​​ ​ _ ​ ​w​​(i)​​​ ​​Δ̃ ​​ ′​( ​ β̃ ​​ ′fe ​​ ​  .
i)​​ = ​​
i=1 ​
N 1​​​

This is the same program as before, with ​​​β̃ ​​  ′fe ​​ ​ instead of ​​​β̃ ​​fe​​​. Therefore, by the same
reasoning as before, we obtain

( fe) ​​ ​​β̃ ​​ ′ ​​  ​​​  ​ ​​β̃ ​​ 2 ​​ 


2

‗​​ 2fe ​​  = ​ _____________
    ​  = ​ _____________
    ​​. ∎
fe
​​​σ
​Ts​​​  + ​S​  s​  ​ / ​(1 − ​Ps​​​)​
2 2
​T​s​​  + ​S​  s​  ​ / ​(1 − ​Ps​​​)​
2992 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

PROOF OF COROLLARY 2:
We have
​Ng​,t​​
(​(g,t)​:​D​ ​​=1 ​N​1​​ )
​β​fe​​  = E​ ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​ ​​Δ̃ ​​g,t​​ ​​

g,t

((​(g,t)​:​D​ ​​=1 ​N1​​​ ) )


​Ng​,t​​
= E​ ​ ​  ∑ ​​ ​ ​_ ​ ​w​g,t​​ ​ ​​Δ̃ ​​​ TR​ ​

g,t

= E​(​​Δ̃ ​​​ TR​)​

= ​δ​​  TR​​.

The first equality follows from the law of iterated expectations and (A7).
The second equality follows from Assumption 7. By the definition of ​ ​wg​,t​​​,
​​∑(g,t):​
 ​​  ​ ​​=1​(​N​g,t​​/​N1​​​) ​wg​,t​​  = 1​, hence the third equality. The fourth equality follows
Dg,t
from the law of iterated expectations. ∎

PROOF OF THEOREM 2:
It follows from the ­Frisch-Waugh theorem and the definition of ​ε​ f​d,g,t​​​ that

g,t)​:t≥2​ g,t​​ ​εf​d,g,t​​​(E​(​Yg​,t​​ |​​​ D)​  − E​(​Y​g,t−1​​ ​|​​ D)​)​


​ ​( ​​ ​N​

​ ​​βˆ ​f​d​​ ​|​​ D)​ = ​ ________________________________
E(
(A10) ​             ​ ​.
g,t)​:t≥2​ g,t​​ ​εf​d,g,t(
​∑​( ​​ ​N​ ​​​ ​Dg​,t​​  − ​D​g,t−1)​​ ​

Now, by definition of ​​ε​fd,g,t​​​ again,


G
(A11) ​​  ∑ ​​​ ​Ng​,t​​ ​εf​d,g,t​​  = 0 {2, …, T}​​.
for all t ∈ ​
g=1

Then,

​ g,t​​ ​εf​d,g,t​​​(E​(​Yg​,t​​ ​|​​ D)​  − E​(​Yg​,t−1​​ ​|​​ D)​)​​


(A12) ​​  ∑ ​​ ​N​
​(g,t)​:t≥2

​ g,t​​ ​εf​d,g,t​​​(E​(​Yg​,t​​ ​|​​ D)​  − E​(​Y​g,t−1​​ ​|​​ D)​


= ​  ∑ ​​ ​N​

​(g,t)​:t≥2

− E​(​Y​1,t​​ ​|​​ D)​  − E​(​Y​1,t−1​​ ​|​​ D)​)​​

= ​  ∑ ​​ ​N​
​ ​ g,t​​ ​εfd,g,t ​ ​​ ​​Δ̃ ​​g,t​​  − ​D​g,t−1​​ ​​Δ̃ ​​g,t−1​​  − ​D​1,t​​ ​​Δ̃ ​​1,t​​  + ​D​1,t−1​​ ​​Δ̃ ​​1,t−1​​)​​
​ ​​​(​Dg,t
​(g,t)​:t≥2

= ​  ∑ ​​ ​N​
​ ​ g,t​​ ​εfd,g,t ​ ​​ ​​Δ̃ ​​g,t​​  − ​D​g,t−1​​ ​​Δ̃ ​​g,t−1​​)​​
​ ​​​(​Dg,t
​(g,t)​:t≥2

​ ∑ ​​ ​​  (​Ng,t
= ​ ​ ​​ ​εfd,g,t
​ ​​  − ​N​g,t+1​​ ​εfd,g,t+1
​ ​ ​​ ​​Δ̃ ​​g,t​​​
​​)​ ​Dg,t
g,t

​Ng​,t+1​​
( )
= ​  ∑ ​​ ​N​
​ ​ ​​  − ​ _ ​ ​ε​fd,g,t+1​​ ​​​Δ̃ ​​g,t​​​  .
​ g,t​​​ ​εfd,g,t
​(g,t)​:​D​ ​​=1
​ g​,t​​
N
g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2993

The first and third equalities follow from (A11). The second equality follows from
Lemma 1. The fourth equality follows from a summation by part, and from the
fact ​​εf​ d,g,1​​  = ​ε​fd,g,T+1​​  = 0​. The fifth equality follows from Assumption 2.
A similar reasoning yields
​N​ ​​
( )
(A13) ​​  ∑ ​​ ​N​ ​ ​​​(​Dg,t
​ g,t​​ ​εfd,g,t ​ ​​  − ​D​g,t−1​​)​  = ​  ∑ ​​ ​N​ ​ ​​  − ​ _ ​ ​ε​fd,g,t+1​​ ​​.
​ g,t​​​ ​εfd,g,t
g,t+1

​(g,t)​:t≥2 ​(g,t)​:​Dg,t
​ ​​=1
​N​ ​​ g,t

Combining (A10), (A12), (A13), and the law of iterated expectations yields the
result. ∎

PROOF OF PROPOSITION 2:
It follows from the first order conditions attached to Regression 2 and a few lines
of algebra that ​​ε​fd,g,t​​  = ​D​g,t​​  − ​D​g,t−1​​  − ​D​.,t​​  + ​D​.,t−1​​​. Therefore, under Assumption 6
and if ​​N​g,t​​​does not vary across t​​, one has that for all (​g, t)​such that ​​Dg​,t​​  = 1,
1 ≤ t ≤ T − 1​, ​​w​fd,g,t​​​is proportional to ​1 − ​Dg​,t−1​​  − (2 ​D.​,t​​  − ​D​.,t−1​​  − ​D​.,t+1​​)​.
Now, ​ ​D​.,t​​  − ​D​.,t−1​​  ≤ 1​ , and under Assumption 6 ​D ​​.,t​​  − ​D​.,t+1​​  ≤ 0​, so​
1 − ​D​g,t−1​​  − (2 ​D.​,t​​  − ​D​.,t−1​​  − ​D​.,t+1​​)​can only be strictly negative if ​​D​g,t−1​​  = 1​.
Then, for all (​ g, t)​such that ​D ​ ​g,t​​  = 1, 1 ≤ t ≤ T − 1​, ​​wf​d,g,t​​​is strictly negative if
and only if ​​Dg​,t−1​​  = 1​ and ​2 ​D.​,t​​  − ​D​.,t−1​​  − ​D​.,t+1​​  > 0​.
Similarly, when ​t = T​, under the same assumptions as above, one has that for
all g​ ​such that ​​Dg​,T​​  = 1​, ​​wf​d,g,T​​​is proportional to 1​ − ​Dg​,T−1​​  − (​D.​,T​​  − ​D​.,T−1​​)​.
Now, ​​D​.,T​​  − ​D​.,T−1​​  ≤ 1​, so ​1 − ​D​g,T−1​​  − (​D.​,T​​  − ​D​.,T−1​​)​can only be strictly neg-
ative if ​​Dg​,T−1​​  = 1​. Then, ​​wf​d,g,T​​​is strictly negative if and only if ​D ​ g​,T−1​​  = 1​
and ​​D​.,T​​  − ​D​.,T−1​​  > 0​.
Finally, when t​ = 1​, one has that for all g​ ​such that ​​Dg​,1​​  = 1​, ​​Dg​,2​​  = 1​ under
Assumption 6, so ​w ​ ​fd,g,1​​​is proportional to ​D ​ ​.,2​​  − ​D​.,1​​,​which is greater than 0 under
Assumption 6. ∎

PROOF OF THEOREM 3:
​ ID​M​​​,
First, by definition of ​D
​N1,0,t
​ ​​ ​N0,1,t
​ ​​
( )
E​(​DID​M​​)​  = ​ ∑​​ ​E​ ​(_  ​  E​(​DID​+,t​​ ​|​​ D)​  + ​ _ ​  E​(​DID​−,t​​ ​|​​ D)​)​ ​​.
T
(A14) ​ ​ 
t=2 ​NS​​​ ​ ​S​​
N
Let ​t​be greater than 2, and let us focus for now on the case where there is at least
one ​​g1​​​​such that ​​D​g​​1​​,t−1​​  = 0​and ​​D​​g1​​​,t​​  = 1​. Then Assumption 11 ensures that
there is a least another group ​g​ 2​​​​such that ​​D​g​2​​​,t−1​​  = ​D​​g2​​​,t​​  = 0​. For every ​g​ such
​ g​,t−1​​  = 0​and ​​Dg,t
that ​D ​ ​​  = 1​, we have

E​(​Y​g,t​​  − ​Y​g,t−1​​ |​​​ D)​  = E​(​Δg,t


(A15) ​ ​ ​​ ​|​​ D)​  + E​(​Y​g,t​​​(0)​ − ​Yg,t−1
​ ​​​(0)​​|​​ D)​​.

Under Assumptions 12, 4, and 5, for all t​ ≥ 2​, there exists a real number ​​ψ0​,t​​​
such that for all ​g​,

​ ​​​(0)​​|​​ D)​  = E​(​Y​g,t​​​(0)​ − ​Yg,t−1


​ (​​​ 0)​ − ​Yg,t−1
E​(​Yg,t
(A16) ​ ​ ​​​(0)​​|​​ ​Dg​​​)​

= E​(​Y​g,t​​​(0)​ − ​Yg,t−1
​ ​​​(0)​)​ = ​ψ0,t
​ ​​​  .
2994 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Then,

​N1​,0,t​​  E​(​DID​+,t​​ ​|​​ D)​​


(A17)  ​

= ​ 
​ ∑ ​​ ​N​ ​ ​​ ​|​​ D)​
​ g,t​​  E​(​Δg,t
​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=0

+ ​  ∑ ​​ ​N​ ​ ​​​(0)​​|​​ D)​​


​ g,t​​  E​(​Y​g,t​​​(0)​ − ​Yg,t−1
​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=0

​N1,0,t
​ ​​
− ​ _ ​ ​ 
​ ∑ ​​ ​N​ ​ g,t​​  E​(​Yg​,t​​​(0)​ − ​Yg​,t−1​​​(0)​​|​​ D)​​
​N​0,0,t​​ g:​Dg,t
​ ​​=​Dg,t−1
​ ​​=0

= ​ 
​ ​ g,t​​  E​(​Δg​,t​​ ​|​​ D)​ 
∑ ​​ ​N​
​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=0

​N​ ​​
(g:​Dg,t​ ​​=1,​Dg,t−1 )
+ ​ψ0​,t​​​ ​  ∑ ​​ ​N​
​ g,t​​  − ​ _ ​ ​ 
1,0,t
​N​ ​​
∑ ​​ ​N​
​ g,t​​ ​​
​ ​​=0 ​ ​​=​Dg,t−1
0,0,t g:​Dg,t ​ ​​=0

= ​ 
​ ​ g,t​​  E​(​Δg​,t​​ ​|​​ D)​​.
∑ ​​ ​N​
​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=0

The first equality follows by (A15), the second by (A16), and the third after some
algebra. If there is no ​g​such that ​​Dg​,t−1​​  = 0​and ​D
​ ​g,t​​  = 1​, (A17) still holds,
​ ID​+,t​​  = 0​in this case.
as ​D
A similar reasoning yields

​N​0,1,t​​  E​(​DID​−,t​​ |​​​ D)​  = ​ 


(A18) ​ ​ g,t​​  E​(​Δg​,t​​ ​|​​ D)​​.
∑ ​​ ​N​
g:​Dg​,t​​=0,​Dg​,t−1​​=1

Plugging (A17) and (A18) into (A14) yields

( ( ​NS​​​ (g:​Dg​,t​​=1,​Dg​,t−1​​=0 g,t g,t g:​Dg​,t​​=0,​Dg​,t−1​​=1 g,t g,t) ))|


T
​E(​ ​DID​M​​)​  = ​ ∑​​ ​E​ E​ _
​  1  ​​ ​  ∑ ​​ ​N​ ​ ​​ ​Δ​ ​​  + ​  ∑ ​​ ​N​ ​ ​​ ​Δ​ ​​ ​​ ​​ D ​ ​​
t=2

= ​δ​​ S​​. ∎

PROOF OF THEOREM 4:
First, as with ​​DID​M​​​, we have

( ))
​N1,0,0,t
​ ​​ ​N0,1,1,t
​ ​​
(
+,t​|​​​ D)​  + ​ 
T
​ (​ ​DID​  M​ )​ ​  = ​ ∑​​ ​E​ ​ _ −,t​|​​​ D)​ ​ ​​.
 ​  E​(​DID​  pl
(A19) E
pl
​  pl ​  E​(​DID​  pl _
​N​  S​  ​ ​ ​  S​  ​
pl
t=3 N
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2995

Let ​t​be greater than 3, and let us for now focus on the case where there exists at
least one ​g​ ​1​​​such that ​​D​g​​1​​,t−2​​  = ​D​​g​1​​,t−1​​  = 0​and ​D
​ ​g​1​​​,t​​  = 1​. Then Assumption 13
ensures that there is a least another group ​g​ ​2​​​such that ​​D​g​​2​​,t−2​​  = ​D​​g​2​​,t−1​​  = ​D​​g2​​​,t​​
= 0​. Then,

+,t​|​​​ D)​​
​N​1,0,0,t​​  E​(​DID​  pl
(A20) ​

= ​ 
  ​ ∑ ​​ ​N​ ​ ​​​(0)​​|​​ D)​​
​ g,t​​  E​(​Y​g,t−1​​​(0)​ − ​Yg,t−2
​ ​​=1,​Dg,t−1
g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=0

​N1,0,0,t
​ ​​
  ​− ​ _ ​  ​  ∑ ​​ ​N​ ​ g,t​​  E​(​Y​g,t−1​​​(0)​ − ​Yg​,t−2​​​(0)​​|​​ D)​​
​N0​,0,0,t​​ g:​Dg,t
​ ​​=​Dg,t−1
​ ​​=​Dg,t−2
​ ​​=0

​N​ ​​
(g:​Dg,t​ ​​=1,​Dg,t−1 )
= ​ψ​0,t−1​​​ ​ 
  ​ ∑ ​​ ​N​
​ g,t​​  − ​ _ ​ ​ 
1,0,0,t
​N​ ​​
∑ ​​ ​N​
​ g,t​​ ​​
​ ​​=​Dg,t−2
​ ​​=0 ​ ​​=​Dg,t−1
0,0,0,t g:​Dg,t ​ ​​=​Dg,t−2
​ ​​=0

= 0​.
  ​

The second equality follows by (A16), and the third follows after some algebra. If
there exists no ​g​such that ​​D​g,t−2​​  = ​D​g,t−1​​  = 0​and ​​Dg​,t​​  = 1​, (A20) still holds,
as ​D +,t​​  = 0​in this case.
​ ID​  pl
A similar reasoning yields

−,t​|​​​ D)​  = 0​.


​N0​,1,1,t​​  E​(​DID​  pl
(A21) ​

The result follows after plugging (A20) and (A21) into (A19). ∎

REFERENCES

Abraham, Sarah, and Liyang Sun. 2018. “Estimating Dynamic Treatment Effects in Event Studies
with Heterogeneous Treatment Effects.” Unpublished.
Ashenfelter, Orley. 1978. “Estimating the Effect of Training Programs on Earnings.” Review of Eco-
nomics and Statistics 60 (1): 47–57.
Athey, Susan, and Guido W. Imbens. 2018. “Design-Based Analysis in Difference-in-Differences Set-
tings with Staggered Adoption.” NBER Working Paper 24963.
Athey, Susan, and Scott Stern. 2002. “The Impact of Information Technology on Emergency Health
Care Outcomes.” RAND Journal of Economics 33 (3): 399–432.
Autor, David H. 2003. “Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the
Growth of Employment Outsourcing.” Journal of Labor Economics 21 (1): 1–42.
Borusyak, Kirill, and Xavier Jaravel. 2017. “Revisiting Event Study Designs.” Unpublished.
Callaway, Brantly, and Pedro H. C. Sant’Anna. 2018. “Difference-in-Differences with Multiple Time
Periods and an Application on the Minimum Wage and Employment.” arXiv e-print 1803.09015.
Card, David. 1996. “The Effect of Unions on the Structure of Wages: A Longitudinal Analysis.”
Econometrica 64 (4): 957–79.
de Chaisemartin, Clément. 2011. “Fuzzy Differences in Differences.” Center for Research in Econom-
ics and Statistics Working Paper 2011-10.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2015. “Fuzzy Differences-in-Differences.”
arXiv e-print 1510.01757v2.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2018. “Fuzzy Differences-in-Differences.”
Review of Economic Studies 85 (2): 999–1028.
2996 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020a. "Difference-in-Differences Estimators


of Intertemporal Treatment Effects." arXiv:2007.04267
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020b. “Replication Data for: Two-Way Fixed
Effects Estimators with Heterogeneous Treatment Effects.” American Economic Association
[publisher], Inter-university Consortium for Political and Social Research [distributor]. https://doi.
org/10.3886/E118363V1.
de Chaisemartin, Clément, Xavier D’Haultfœuille, and Yannick Guyonvarch. 2019. “Fuzzy Differenc-
es-in-Differences with Stata.” Stata Journal 19 (2): 435–58.
Duflo, Esther. 2001. “Schooling and Labor Market Consequences of School Construction in Indone-
sia: Evidence from an Unusual Policy Experiment.” American Economic Review 91 (4): 795–813.
Frank, Marguerite, and Philip Wolfe. 1956. “An Algorithm for Quadratic Programming.” Naval
Research Logistics Quarterly 3 (1–2): 95–110.
Freeman, Richard B. 1984. “Longitudinal Analyses of the Effects of Trade Unions.” Journal of Labor
Economics 2 (1): 1–26.
Freeman, Richard B., and James L. Medoff. 1984. “What Do Unions Do?” ILR Review 38 (2): 244–63.
Gentzkow, Matthew, Jesse M. Shapiro, and Michael Sinkinson. 2011. “The Effect of Newspaper Entry
and Exit on Electoral Politics.” American Economic Review 101 (7): 2980–3018.
Goodman-Bacon, Andrew. 2018. “Difference-in-Differences with Variation in Treatment Timing.”
Unpublished.
Imai, Kosuke, and In Song Kim. 2018. “On the Use of Two-Way Fixed Effects Regression Models for
Causal Inference with Panel Data.” Unpublished.
Jakubson, George. 1991. “Estimation and Testing of the Union Wage Effect Using Panel Data.” Review
of Economic Studies 58 (5): 971–91.
Vella, Francis, and Marno Verbeek. 1998. “Whose Wages Do Unions Raise? A Dynamic Model of
Unionism and Wage Rate Determination for Young Men.” Journal of Applied Econometrics 13 (2):
163–83.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge,
MA: MIT Press.
This article has been cited by:

1. Patrick Premand, Dominic Rohner. 2024. Cash and Conflict: Large-Scale Experimental Evidence
from Niger. American Economic Review: Insights 6:1, 137-153. [Abstract] [View PDF article] [PDF
with links]
2. Daniel Avdic, Petter Lundborg, Johan Vikström. 2024. Does Health-Care Consolidation Harm
Patients? Evidence from Maternity Ward Closures. American Economic Journal: Economic Policy 16:1,
160-189. [Abstract] [View PDF article] [PDF with links]
3. Traviss Cassidy, Mark Dincecco, Ugo Antonio Troiano. 2024. The Introduction of the Income Tax,
Fiscal Capacity, and Migration: Evidence from US States. American Economic Journal: Economic Policy
16:1, 359-393. [Abstract] [View PDF article] [PDF with links]
4. Christophe Bellégo, Joeffrey Drouard. 2024. Fighting Crime in Lawless Areas: Evidence from Slums
in Rio de Janeiro. American Economic Journal: Economic Policy 16:1, 124-159. [Abstract] [View PDF
article] [PDF with links]
5. Joshua Rauh, Ryan Shyu. 2024. Behavioral Responses to State Income Taxation of High Earners:
Evidence from California. American Economic Journal: Economic Policy 16:1, 34-86. [Abstract] [View
PDF article] [PDF with links]
6. Andreas Bjerre-Nielsen, Mikkel Høst Gandil. 2024. Attendance Boundary Policies and the Limits
to Combating School Segregation. American Economic Journal: Economic Policy 16:1, 190-227.
[Abstract] [View PDF article] [PDF with links]
7. Elliott Ash, W. Bentley MacLeod. 2024. Mandatory Retirement for Judges Improved the Performance
of US State Supreme Courts. American Economic Journal: Economic Policy 16:1, 518-548. [Abstract]
[View PDF article] [PDF with links]
8. Oren Reshef. 2023. Smaller Slices of a Growing Pie: The Effects of Entry in Platform Markets.
American Economic Journal: Microeconomics 15:4, 183-207. [Abstract] [View PDF article] [PDF with
links]
9. Robert C. Allen, Mattia C. Bertazzini, Leander Heldring. 2023. The Economic Origins of
Government. American Economic Review 113:10, 2507-2545. [Abstract] [View PDF article] [PDF
with links]
10. Emily C. Lawler, Katherine G. Yewell. 2023. The Effect of Hospital Postpartum Care Regulations
on Breastfeeding and Maternal Time Allocation. American Economic Journal: Applied Economics 15:4,
477-513. [Abstract] [View PDF article] [PDF with links]
11. Giorgio Gulino, Federico Masera. 2023. Contagious Dishonesty: Corruption Scandals and
Supermarket Theft. American Economic Journal: Applied Economics 15:4, 218-251. [Abstract] [View
PDF article] [PDF with links]
12. Fangwen Lu, Weizeng Sun, Jianfeng Wu. 2023. Special Economic Zones and Human Capital
Investment: 30 Years of Evidence from China. American Economic Journal: Economic Policy 15:3, 35-64.
[Abstract] [View PDF article] [PDF with links]
13. Marcus Dillender. 2023. Evidence and Lessons on the Health Impacts of Public Health Funding from
the Fight against HIV/AIDS. American Economic Review 113:7, 1825-1887. [Abstract] [View PDF
article] [PDF with links]
14. Casper Worm Hansen, Asger Mose Wingender. 2023. National and Global Impacts of Genetically
Modified Crops. American Economic Review: Insights 5:2, 224-240. [Abstract] [View PDF article]
[PDF with links]
15. Elena Esposito, Tiziano Rotesi, Alessandro Saia, Mathias Thoenig. 2023. Reconciliation Narratives:
The Birth of a Nation after the US Civil War. American Economic Review 113:6, 1461-1504. [Abstract]
[View PDF article] [PDF with links]
16. D. Mark Anderson, Daniel I. Rees. 2023. The Public Health Effects of Legalizing Marijuana. Journal
of Economic Literature 61:1, 86-143. [Abstract] [View PDF article] [PDF with links]
17. Aljoscha Janssen, Xuan Zhang. 2023. Retail Pharmacies and Drug Diversion during the Opioid
Epidemic. American Economic Review 113:1, 1-33. [Abstract] [View PDF article] [PDF with links]
18. Luca Braghieri, Ro’ee Levy, Alexey Makarin. 2022. Social Media and Mental Health. American
Economic Review 112:11, 3660-3693. [Abstract] [View PDF article] [PDF with links]
19. Jonathan Roth. 2022. Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
American Economic Review: Insights 4:3, 305-322. [Abstract] [View PDF article] [PDF with links]
20. Litterio Mirenda, Sauro Mocetti, Lucia Rizzica. 2022. The Economic Effects of Mafia: Firm Level
Evidence. American Economic Review 112:8, 2748-2773. [Abstract] [View PDF article] [PDF with
links]
21. Jevan Cherniwchan, Nouri Najjar. 2022. Do Environmental Regulations Affect the Decision to
Export?. American Economic Journal: Economic Policy 14:2, 125-160. [Abstract] [View PDF article]
[PDF with links]
22. Enrico Cantoni, Vincent Pons. 2022. Does Context Outweigh Individual Characteristics in Driving
Voting Behavior? Evidence from Relocations within the United States. American Economic Review
112:4, 1226-1272. [Abstract] [View PDF article] [PDF with links]
23. Michael Greenstone, Guojun He, Ruixue Jia, Tong Liu. 2022. Can Technology Solve the Principal-
Agent Problem? Evidence from China’s War on Air Pollution. American Economic Review: Insights 4:1,
54-70. [Abstract] [View PDF article] [PDF with links]
24. Martha J. Bailey, Shuqiao Sun, Brenden Timpe. 2021. Prep School for Poor Kids: The Long-Run
Impacts of Head Start on Human Capital and Economic Self-Sufficiency. American Economic Review
111:12, 3963-4001. [Abstract] [View PDF article] [PDF with links]
25. Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens, Stefan Wager. 2021.
Synthetic Difference-in-Differences. American Economic Review 111:12, 4088-4118. [Abstract] [View
PDF article] [PDF with links]

You might also like