0% found this document useful (0 votes)

53 views35 pages

De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects

Uploaded by

Klaus Hoeltgebaum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views35 pages

De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects

Uploaded by

Klaus Hoeltgebaum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

American Economic Review 2020, 110(9): 2964–2996

https://doi.org/10.1257/aer.20181169

Two-Way Fixed Effects Estimators with

Heterogeneous Treatment Effects†
By Clément de Chaisemartin and Xavier D’Haultfœuille*

Linear regressions with period and group fixed effects are widely
used to estimate treatment effects. We show that they estimate
weighted sums of the average treatment effects (ATE ) in each group
and period, with weights that may be negative. Due to the negative
weights, the linear regression coefficient may for instance be nega-
tive while all the ATEs are positive. We propose another estimator
that solves this issue. In the two applications we revisit, it is signifi-
cantly different from the linear regression estimator. (JEL C21, C23,
D72, J31, J51, L82)

A popular method to estimate the effect of a treatment on an outcome is to com-

pare over time groups experiencing different evolutions of their exposure to treat-
ment. In practice, this idea is implemented by estimating regressions that control for
group and time fixed effects. Hereafter, we refer to those as two-way fixed effects
(FE) regressions. We conducted a survey, and found that 19 percent of all empiri-
cal articles published by the American Economic Review (AER) between 2010 and
2012 have used a two-way FE regression to estimate the effect of a treatment on an
outcome. When the treatment effect is constant across groups and over time, such
regressions estimate that effect under the standard “common trends” assumption.
However, it is often implausible that the treatment effect is constant. For instance,
the minimum wage’s effect on employment may vary across US counties, and may
change over time. This paper examines the properties of two-way FE regressions
when the constant effect assumption is violated.
We start by assuming that all observations in the same (g, t)cell have the same
treatment and that the treatment is binary, as is for instance the case when the treat-
ment is a county-level law. We consider the regression of Y i,g,t , the outcome of unit i
in group gat period ton group fixed effects, period fixed effects, and Dg,t, the treat-

* de Chaisemartin: University of California at Santa Barbara (email: [email protected]);

D’Haultfœuille: CREST-ENSAE (email: [email protected]). Thomas Lemieux was the coeditor for
this article. We are very grateful to Olivier Deschêsnes, Guido Imbens, Peter Kuhn, Kyle Meng, Jesse Shapiro,
Dick Startz, Doug Steigerwald, Clémence Tricaud, Gonzalo Vazquez-Bare, members of the UCSB economet-
rics research group, and seminar participants at Bergen, CIREQ Econometrics conference, CREST, Goteborg,
Gothenburg, Groningen, ITAM, Pompeu Fabra, Stanford, SMU, Tinbergen Institute, UCL, UCLA, UC Davis,
UCSB, USC, and Warwick for their helpful comments. Xavier D’Haultfœuille gratefully acknowledges financial
support from the research grants Otelo (A NR-17-CE26-0015-041) and the Labex Ecodec: Investissements d’Ave-
nir (ANR-11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047).
†
Go to https://doi.org/10.1257/aer.20181169 to visit the article page for additional materials and author
disclosure statements.

2964
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2965

ment in group gat period t. Let βˆ fe

denote the coefficient of Dg,t
, and let β denote
fe
its expectation. Under the common trends assumption, we show that βfeis equal to
a weighted sum of the treatment effect in each treated (g, t) cell:

= E   ∑ W
((g,t):Dg,t )
(1)
βfe g,t Δg,t ,
=1

where Δg ,tis the average treatment effect (ATE) in group g and period tand the
weights Wg,tsum to 1 but may be negative. Negative weights arise because βˆ fe is
a weighted sum of several d ifference-in-differences (DID), which compare the
evolution of the outcome between consecutive time periods across pairs of groups.
However, the “control group” in some of those comparisons may be treated at both
periods. Then, its treatment effect at the second period gets differenced out by the
DID, hence the negative weights.
The negative weights are an issue when the ATEs are heterogeneous across
groups or periods. Then, one could have that β feis negative while all the ATEs are
positive. For instance, 1 .5 × 1 − 0.5 × 4, a weighted sum of 1and 4 , is strictly
negative. Using the dataset of Gentzkow, Shapiro, and Sinkinson (2011), we find
that 40 percent of the weights attached to βfeare negative, so β feis not robust to
heterogeneous effects.1
Researchers may want to know how serious that issue is in the application they
consider. We show that conditional on all treatments, the absolute value of the expec-
tation of βˆ fedivided by the standard deviation of the weights is equal to the minimal
value of the standard deviation of the ATEs across the treated ( g, t)cells under which
the average treatment on the treated (ATT) may actually have the opposite sign than
that coefficient. One can estimate that ratio to assess the robustness of the two-way
FE coefficient. If that ratio is close to 0, that coefficient and the ATT can be of oppo-
site signs even under a small and plausible amount of treatment effect heterogeneity.
In that case, treatment effect heterogeneity would be a serious concern for the valid-
ity of that coefficient. On the contrary, if that ratio is very large, that coefficient and
the ATT can only be of opposite signs under a very large and implausible amount of
treatment effect heterogeneity.
Finally, we propose a new estimator, DIDM, that is valid even if the treatment
effect is heterogeneous over time or across groups. It estimates the average treat-
ment effect across all the (g, t)cells whose treatment changes from t − 1to t. It
relies on common trends assumptions on both potential outcomes. Those conditions
are partly testable, and we propose a test that amounts to looking at pretrends. This
test differs from the standard event study pretrends test (see Autor 2003), which has
been shown to be invalid when treatment effects are heterogeneous (see Abraham
and Sun 2018). We show that our estimator is asymptotically normal. We compute it
in the datasets of Gentzkow, Shapiro, and Sinkinson (2011) and Vella and Verbeek
(1998), and in both cases we find that it is significantly different from β ˆ fe.2 Our esti-
mator can be used in applications where, for each pair of consecutive dates, there are

1
Gentzkow, Shapiro, and Sinkinson (2011) does not estimate βfe, but β fd, the treatment coefficient in the
first-difference regression defined below. Forty-six percent of the weights attached to βfdare strictly negative.
2
In both cases, our estimator is also significantly different from βˆ fd
.
2966 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

groups whose treatment does not change. We estimate that this condition is satisfied
for around 80 percent of the papers using two-way fixed effects regressions found
in our survey of the AER.
Overall, our paper has implications for applied researchers estimating two-way
fixed effects regressions. First, we recommend that they compute the weights
attached to their regression and the ratio of |βˆ fe|divided by the standard deviation
of the weights. To do so, they can use the twowayfeweights Stata package that is
available from the SSC repository. If many weights are negative, and if the ratio is
not very large, we recommend that they compute our new estimator, using the fuzzy-
did and did_multiplegt Stata packages, also available from the SSC repository (see
de Chaisemartin, D’Haultfœuille, and Guyonvarch 2019, for explanations on how to
use the former package).
We extend our results in several important directions. First, another commonly
used regression is the fi g,t  − Yg,t−1, the change in
rst-difference regression of Y
the mean outcome in group g , on period fixed effects and on D g,t  − Dg,t−1, the
change in the treatment. We let βfddenote the expectation of the coefficient
of Dg,t  − Dg,t−1. We show that under common trends, βfdalso identifies a weighted
sum of treatment effects, with potentially some negative weights. Second, in our
online Appendix we show that our results extend to fuzzy designs, where the treat-
ment varies within (g, t)cells, and to two-way fixed effects regressions with a
nonbinary treatment and with covariates.
Our paper is related to the DID literature. Our main result generalizes Theorem 1
in de Chaisemartin and D’Haultfœuille (2018). When the data have two groups
and two periods, the Wald-DID estimand considered therein is equal to βfeand βfd.
Our results on βfeand βfdare thus extensions of that theorem to the case with mul-
tiple periods and groups.3 Moreover, our DIDMestimator is related to the Wald-TC
estimator with many groups and periods proposed in de Chaisemartin and
D’Haultfœuille (2018), and to the multiperiod DID estimator proposed by Imai
and Kim (2018). In Section III, we explain the differences between those three
estimators.
More recently, Borusyak and Jaravel (2017), Abraham and Sun (2018), Athey
and Imbens (2018), Callaway and Sant’Anna (2018), and Goodman-Bacon (2018)
study the special case of staggered adoption designs, where the treatment of a group
is weakly increasing over time. Those papers derive some important results specific
to that design that we do not consider here. Still, some of the results in those papers
are related to ours, and we describe precisely those connections later in the paper.
The most important dimension on which our paper differs from those is that our
results apply to any two-way fixed effects regressions, not only to those with stag-
gered adoption. In our survey of the AER papers estimating two-way fixed effects
regressions, less than 10 percent have a staggered adoption design. This suggests
that while staggered adoptions are an important research design, they may account
for a relatively small minority of the applications where two-way fixed effects
regressions have been used.

3
In fact, a preliminary version of our main result appeared in a working paper version of de Chaisemartin and
D’Haultfœuille (2018): see Theorems S1 and S2 in de Chaisemartin and D’Haultfœuille (2015).
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2967

The paper is organized as follows. Section I introduces the setup. Section II

presents our decomposition results. Section III introduces our alternative esti-
mator. Section IV briefly describes some of the extensions covered in our online
Appendix. Section V presents our survey of the articles published in the AER, and
our two empirical applications. The data and codes are given in de Chaisemartin and
D’Haultfœuille (2020b).

I. Setup

One considers observations that can be divided into Ggroups and Tperiods. For
every (g, t) ∈ {1, …, G} × {1, … , T }, let Ng,tdenote the number of observations
in group g at period t, and let N = ∑g,t  N
g,tbe the total number of observations.
The data may be an individual-level panel or repeated cross-section dataset where
groups are, say, individuals’ county of birth. The data could also be a c ross sec-
tion where cohort of birth plays the role of time. For instance, Duflo (2001) com-
pares the schooling of different cohorts in Indonesia, some of which were exposed
to a school construction program. It is also possible that for all (g, t), Ng,t = 1,
e.g., a group is one individual or firm. All of the above are special cases of the data
structure we consider.
One is interested in measuring the effect of a treatment on some out-
come. Throughout the paper we assume that treatment is binary, but our results
apply to any ordered treatment, as we show in online Appendix Section 3.2.
Then, for every (i, g, t) ∈ {1, …, Ng,t} × {1, …, G} × {1, …, T } , let Di,g,t
and (Yi,g,t(0), Yi,g,t(1))respectively denote the treatment status and the potential out-
comes without and with treatment of observation iin group gat period t.
The outcome of observation iin group gand period tis Yi,g,t = Yi,g,t(Di,g,t). For
all ( g, t), let

Ng,t
Ng,t

1    ∑  D ,
Dg,t =  _
1    ∑  Y ( 0),
Yg,t(0) =  _
Ng,t i=1
i,g,t
Ng,t i=1 i,g,t

Ng,t
Ng,t

Yg,t 1    ∑  Y ( 1),
(1) =  _ and 1    ∑  Y .
Yg,t =  _
Ng,t i=1
i,g,t
Ng,t i=1 i,g,t

Here, Dg,tdenotes the average treatment in group gat period t, while Y (0),
g,t
g,t(1), and Yg,t
Y respectively denote the average potential outcomes without and with
treatment and the average observed outcome in group gat period t.
Throughout the paper, we maintain the following assumptions.

ASSUMPTION 1 (Balanced Panel of Groups): For all

(g, t) ∈ {1, …, G}
× {1, …, T }, Ng,t > 0.

Assumption 1 requires that no group appears or disappears over time. This

assumption is often satisfied. Without it, our results still hold but the notation
becomes more complicated as the denominators of some of the fractions below may
then be equal to zero.
2968 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

ASSUMPTION 2 (Sharp Design): For all (g, t) ∈ {1, …, G } × {1, …, T } and

i ∈ {1, …, Ng,t}, Di,g,t
= Dg,t .

Assumption 2 requires that units’ treatments do not vary within each (g, t) cell,
a situation we refer to as a sharp design. This is for instance satisfied when the
treatment is a g roup-level variable, for instance a county or a s tate law. This is also
mechanically satisfied when Ng,t = 1. In our survey in Section IIA, we find that
almost 80 percent of the papers using two-way fixed effects regressions and pub-
lished in the AER between 2010 and 2012 consider sharp designs. We focus on sharp
designs because of their prevalence, but in online Appendix Section 2, we show that
all the results in Sections II and III can be extended to fuzzy designs.

ASSUMPTION 3 (Independent Groups): The vectors (Yg,t(0), Yg,t(1), Dg,t)1≤t≤T are

mutually independent.

We consider Dg,t, Yg,t(0), Yg,t(1)as random variables. For instance, aggregate ran-
dom shocks may affect the average potential outcomes of group gat period t. The
treatment status of group gat period tmay also be random. The expectations below
are taken with respect to the distribution of those random variables. Assumption 3
allows for the possibility that the treatments and potential outcomes of a group may
be correlated over time, but it requires that the potential outcomes and treatments of
different groups be independent.

ASSUMPTION 4 (Strong Exogeneity): For all (g, t) ∈ {1, …, G } × {2, …, T },

E(Yg,t(0) − Yg,t−1(0) | Dg,1, …, Dg,T) = E(Yg,t(0) − Yg,t−1(0)).

Assumption 4 requires that the shocks affecting a group’s Yg,t(0)be mean inde-
pendent of that group’s treatment sequence. This rules out the possibility that a group
gets treated because it experiences negative shocks, the s o-called Ashenfelter’s dip
(see Ashenfelter 1978). Assumption 4 is related to the strong exogeneity condition
in panel data models, which, as is well known, is necessary to obtain the consistency
of the fixed effects estimator (see, e.g., Wooldridge 2002).
We now define the FE regression described in the introduction.4

REGRESSION 1 (Fixed Effects Regression): Let βˆ fedenote the coefficient of Dg,t

in an OLS regression of Y i,g,ton group fixed effects, period fixed effects, and Dg,t
.
Let βfe = E[ βˆ fe
].5

For all gand t, let Ng,. = ∑Tt=1 Ng,t

and N.,t = ∑G g=1 Ng,trespectively denote
the total number of observations in group gand in period t
. For any vari-
g,tdefined in each ( g, t)cell, let X
able X g,. = ∑Tt=1 (Ng,t /Ng,.
) Xg,tdenote the average
value of Xg,tin group g, let X.,t = ∑G
g=1(Ng,t/N.,t
) Xg,t
denote the average value of Xg,t

4
Throughout the paper, we assume that Dg,tin Regression 1 and D g,t− Dg,t−1in Regression 2 are
not collinear with the other independent variables in those regressions, so β ˆ fe and βˆ fdare well defined.
5
As the independent variables in Regression 1 are constant within each (g, t)cell, Regression 1 is equivalent to
a (g, t)-level regression of Yg,t
on group and period fixed effects and Dg,t
, weighted by N .
g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2969

in period t, and let X.,. = ∑g,t

 (
Ng,t/N ) Xg,t denote the average value of X . For
g,t
instance, D 3,.and D.,2respectively denote the average treatment in group 3 across
time and in period 2 across groups, whereas Y .,.denotes the average value of the
outcome across groups and time. Finally, for any variable Xg,t, we let X denote the
vector (Xg,t)(g,t)∈{1,…,G}×{1,…,T }collecting the values of that variable in each (g, t)
cell. For instance, D is the vector ( Dg,t)(g,t)∈{1,…,G}×{1,…,T }collecting the treatments
of all the (g, t) cells.

II. Two-Way Fixed Effects Regressions

A. A Decomposition Result

We study the FE regression under the following common trends assumption.

ASSUMPTION 5 (Common Trends): For t ≥ 2, E(Yg,t(0) − Yg,t−1(0))does not

vary across g.

Assumption 5 requires that the expectation of the outcome without treatment

follow the same evolution over time in every group. When trepresents birth cohorts,
Assumption 5 requires that the outcome difference between consecutive cohorts be
the same across groups.
Let N1 = ∑i,g,t
  D
i,g,tdenote the number of treated units, let
1     ∑  Y (1) − Y (0)
N1 (i,g,t):D =1[ i,g,t ]
Δ  TR =  _
i,g,t
g,t

denote the average treatment effect across all treated units, and let δ   TR = E[Δ  TR ]
denote the expectation of that parameter, hereafter referred to as the ATT. For any
(g, t) ∈ {1, …, G} × {1, …, T }, let
Ng,t

1    ∑  Y (1) − Y ( 0)
Δg,t =  _
[ i,g,t ]
Ng,t i=1
i,g,t

denote the ATE in cell ( g, t). Note that δ  TRis equal to the expectation of a weighted
average of the treated cells’ Δ g ,t,
Ng,t
[g,t:Dg,t=1 N1 ]
δ  TR = E   ∑   _  Δg,t .
(2)

Under the common trends assumption, we show that β feis also equal to the expec-
tation of a weighted sum of the Δ g,t terms, with potentially some negative weights.
Let ε g,tdenote the residual of observations in cell (g, t)in the regression of Dg,t on
group and period fixed effects,6

Dg,t
= α + γg  + λt  + εg,t .

εg,t
6
arises from a unit-level regression, where the dependent and independent variables only vary at the (g, t)
level. Therefore, all the units in the same (g, t)cell have the same value of εg,t .
2970 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

One can show that if the regressors in Regression 1 are not collinear, the average
value of εg ,tacross all treated (g, t)cells differs from 0: ∑  ( Dg,t=1 Ng,t/N1) εg,t
(g,t):
≠ 0. Then we let wg,tdenote ε g,t
divided by that average:
εg,t
wg,t =  _____________
   Ng,t
   .
∑(    _   ε
g,t):Dg,t
=1 N g,t
1

THEOREM 1: Suppose that Assumptions 1–5 hold. Then,7

Ng,t
[(g,t):D =1 N1 ]
= E   ∑   _  wg,t Δg,t
βfe .
g,t

This result implies that in general, βfe ≠ δ  TR, so βˆ feis a biased estimator of the ATT.
To illustrate this, we consider a simple example of a staggered adoption design with
two groups and three periods, and where the treatments are nonstochastic: group 1
is untreated at periods 1 and 2 and treated at period 3, while group 2 is untreated at
period 1 and treated both at periods 2 and 3.8 We also assume that N g,t/Ng,t−1 does
not vary across g : all groups experience the same growth of their number of obser-
vations from t − 1to t, a requirement that is for instance satisfied when the data is
a balanced panel. Then, one can show that

εg,t = Dg,t  − Dg,.  − D.,t  + D.,. ,

thus implying that

ε1,3 = 1 − 1 / 3 − 1 + 1 / 2 = 1 / 6,

ε2,2 = 1 − 2 / 3 − 1 / 2 + 1 / 2 = 1 / 3,

ε2,3 = 1 − 2 / 3 − 1 + 1 / 2 = − 1 / 6.

The residual is negative in group 2 and period 3, because the regression predicts
a treatment probability larger than one in that cell, a classic extrapolation problem
with linear regressions. Then, under the common trends assumption, it follows from
Theorem 1 and the fact that the treatments are nonstochastic that

βfe = 1 / 2E[Δ1,3
]   + E[Δ2,2
]  − 1 / 2E[Δ2,3
].

f eis equal to a weighted sum of the ATEs in group 1 at period 3, group 2 at

Here, β
period 2, and group 2 at period 3, the three treated ( g, t)cells. However, the weight
assigned to each ATE differs from 1 /3, the proportion that each cell accounts for

7
In the proof, we show the following, stronger result:
N
E[βˆ fe|D] =   ∑ _
  Ng,t wg,t
E[Δg,t
| D].
(g,t):Dg,t
=1 1

8
A similar example appears in Borusyak and Jaravel (2017).
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2971

in the population of treated observations. Therefore, β feis not equal to δ   TR. Perhaps
more worryingly, not all the weights are positive: the weight assigned to the ATE in
group 2, period 3 is strictly negative. Consequently, βfemay be a very misleading
measure of the treatment effect. Assume for instance that E[Δ1,3] = E[Δ2,2 ] = 1
and E [Δ2,3] = 4. At the period when they start receiving the treatment, both groups
experience a modest positive ATE. But this effect builds over time and in period 3,
one period after it has started receiving the treatment, group 2 now experiences a
large ATE. Then,

βfe = 1 / 2 × 1 + 1 − 1 / 2 × 4 = − 1 / 2.

Therefore, β feis strictly negative, while E[Δ1,3], E[Δ2,2], and E[Δ2 ,3]are all posi-
tive. More generally, the negative weights are an issue if the E [Δg ,t]terms are het-
erogeneous, across groups or over time.9 If E [Δ1,3] = E[Δ2,2] = E[Δ2 ,3] = 1,
then βfe = 1 = δ  TR.
Here is some intuition as to why one weight is negative in this example. It
follows from equation (A4) in the proof of Theorem 1 (see also Theorem 1 in
Goodman-Bacon 2018) that in this simple example, βfe = (DID1  + DID2)/2, with

DID1 = E(Y2,2)   − E(Y2,1) − (E(Y1,2)  − E(Y1,1)),

DID2 = E(Y1,3)   − E(Y1,2) − (E(Y2,3)  − E(Y2,2)).

The first DID compares the evolution of the mean outcome from period 1 to 2 in
group 2 and in group 1. The second one compares the evolution of the mean out-
come from period 2 to 3 in group 1 and in group 2. The control group in the second
DID, group 2, is treated both in the pre- and in the post-period. Therefore, under
the common trends assumption, it follows from Lemma 1 in Appendix A (a sim-
ilar result appears in Lemma 1 of de Chaisemartin 2011 and in equation (13) of
Goodman-Bacon 2018) that DID1 = E[Δ2 ,2], but

DID2 = E[Δ1,3]  − (E[Δ2,3]  − E[Δ2,2]).

Note that, D ID2is equal to the ATE in group 1, period 3, minus the change in
group 2’s ATE between periods 2 and 3. Intuitively, the mean outcome of groups 1
and 2 may follow different trends from period 2 to 3 either because group 1 becomes
treated, or because group 2’s ATE changes. The intuition that negative weights arise
because βˆ feuses treated observations as controls also appears in Borusyak and
Jaravel (2017).
We now generalize the previous illustration by characterizing the (g, t) cells
whose ATEs are weighted negatively by βfe .

PROPOSITION 1: Suppose that Assumption 1 holds and for all t ≥ 2,

g,t / Ng,t−1
N does not vary across g. Then, for all (g, t, t′ )such that D
g,t
= Dg,t′

9
On the other hand, βfedoes not rule out heterogeneous treatment effects within (g, t)cells, as it is identified by
variations across ( g, t)cells, and does not leverage any within-cell variation.
2972 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

= 1, D.,t > D.,t′implies w g,t < wg,t′ . Similarly, for all ( g, g′, t) such
that Dg,t = Dg′,t = 1, Dg,.
> Dg′,.implies wg,t < wg′,t .

Proposition 1 shows that βfeis more likely to assign a negative weight to periods
where a large fraction of groups are treated, and to groups treated for many periods.
Then, negative weights are a concern when treatment effects differ between periods
with many versus few treated groups, or between groups treated for many versus
few periods.
Proposition 1 has interesting implications in staggered adoption designs, a spe-
cial case of sharp designs defined as follows.

ASSUMPTION 6 (Staggered Adoption Designs): For all g, Dg,t ≥ Dg,t−1 for all
t ≥ 2.

Assumption 6 is satisfied in applications where groups adopt a treatment at het-

erogeneous dates (see, e.g., Athey and Stern 2002). In that design, Borusyak and
Jaravel (2017) shows that β feis more likely to assign a negative weight to treatment
effects at the last periods of the panel. This result is a special case of Proposition 1:
in staggered adoption designs, D.,tis increasing in t, so Proposition 1 implies that wg,t
is decreasing in t.10 Proposition 1 also implies that in that design, groups that adopt
the treatment earlier are more likely to receive some negative weights.
Finally, in staggered adoption designs, Athey and Imbens (2018) derives a
decomposition of β fethat resembles, but differs from, that in Theorem 1. They
derive their decomposition under the assumption that the dates at which each group
starts receiving the treatment are randomly assigned, while we derive ours under a
common trends assumption.

B. Robustness to Heterogeneous Treatment Effects

Theorem 1 shows that in sharp designs with many groups and periods, β ˆ fe may
be a misleading measure of the treatment effect under the standard common trends
assumption, if the treatment effect is heterogeneous across groups and time periods.
In the corollary below, we propose two robustness measures that can be used to
assess how serious that concern is.
Those robustness measures are defined conditional on D, the vec-
tor stacking together the treatments of all the (g, t)cells. Specifically, for
all (g, t) ∈ {1, …, G } × {1, …, T }, let Δ̃ g,t = E(Δg,t |D)denote the ATE in
cell ( g, t)conditional on D,11 let Δ̃  TR = E(Δ  TR|D)denote the ATT conditional on
D, and let β ̃ fe = E( βˆ fe
|D). The first measure we consider is the minimal value of
the standard deviation of the Δ̃ g,t terms under which one could have that β̃ fe is of a
different sign than Δ̃  TR. Therefore, this summary measure applies to β and Δ̃  TR,
̃ fe

10
Borusyak and Jaravel (2017) assumes that the treatment effect of cell ( g, t)only depends on the number of
periods since group ghas started receiving the treatment, whereas Proposition 1 does not rely on that assumption.
11 ̃
Δ g,tmay differ from E(Δg,t)
. To see this, let us consider a simple example where
T = 2 . Then, under Assumption 3, one has Δ̃ g,t
= E(Δg,t
|Dg,1
, Dg,2
)
. One may for instance have
E(Δg,1
|Dg,1 = 0, Dg,2
= 0) < E(Δg,1
|Dg,1
= 1, Dg,2
= 1), if a group is more likely to be treated if her treat-
ment effect is initially high.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2973

rather than βfe and δ  TR, the unconditional expectations of βˆ feand Δ  TRon which we
have focused so far. However, one can show that when G , the number of groups, goes
̃ ̃
to infinity, β fe  − βfe and Δ   − δ  both converge to 0. So if the number of groups is
TR TR

large, β̃ fe and Δ̃  TRshould not differ much from β and δ  TR, and our robustness mea-
fe
sure “almost” applies to βfeand δ  . TR

Let
Ng,t
((g,t):Dg,t =1 1 )
1/2
σ(Δ̃ ) =   ∑   _  (Δ̃ g,t  − Δ̃  TR)    ,
2

N

Ng,t
((g,t):Dg,t=1 N1 )
1/2
σ(w) =   ∑   _  (wg,t  − 1)  2  
,

where σ(Δ̃ )is the standard deviation of the conditional ATEs, and σ(w)is the stan-
dard deviation of the w-weights,12 across the treated ( g, t)cells. Let n = #{(g, t) : Dg,t
= 1}denote the number of treated cells. For every i ∈ {1, …, n}, let w(i) denote the
ith largest of the weights of the treated cells: w(1) ≥ w(2) ≥ ⋯ ≥ w(n) , and let N(i)
and Δ̃ (i)be the number of observations and the conditional ATE of the corresponding
cell. Then, for any k ∈ {1, …, n}, let Pk = ∑i≥k   N
(i)/N1, Sk = ∑i≥k
  (N(i)/N1) w(i)
,
and Tk = ∑i≥k   (N(i)/N1) w  (i) .
2

COROLLARY 1: Suppose that Assumptions 1–5 hold.

(i) If σ(w) > 0, the minimal value of σ (Δ̃ ) compatible with β̃ fe and Δ̃  TR = 0 is
|β̃ fe
|
σ _ fe =  ____  .
σ(w)
(ii ) If β̃ fe ≠ 0 and at least one of the wg,t weights is strictly negative, the minimal
value of σ (Δ̃ ) compatible with β̃ fe and with Δ̃ g,t of a different sign than β ̃ fe

for all (g, t)is
|β̃ fe|
σ ‗ fe =  ________________
        ,
[Ts  + S  2s  /( 1 − Ps) ] 
1/2

where s = min{i ∈ {1, …, n} : w(i) < − S(i)/(1 − P(i))}.

σ   fe and σ

Note that _ ‗ fecan be estimated simply by replacing β ̃ fe by βˆ fe. An
_ fecan be used to assess the robustness of β feto treatment effect
estimator of σ ˆ
heterogeneity across groups and periods. If _ σ   feis close to 0, β̃ fe and Δ̃  TR can be
of opposite signs even under a small and plausible amount of treatment effect het-
erogeneity. In that case, treatment effect heterogeneity would be a serious concern
for the validity of β ˆ fe. On the contrary, if _   feis very large, β̃ fe and Δ̃  TRcan only be
σ
of opposite signs under a very large and implausible amount of treatment effect
heterogeneity. Then, treatment effect heterogeneity is less of a concern.

12
One can show that ∑(g,t):
  =1(Ng,t/N1) wg,t = 1.
Dg,t
2974 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Similarly, if σ‗ feis close to 0, one may have, say, β̃ fe > 0, while Δ̃ g,t ≤ 0 for
all ( g, t), even if the dispersion of the Δ ̃ g,t terms is relatively small. Notice that ‗ σ
  fe
is only defined if at least one of the weights is strictly negative: if all the weights are
positive, then one cannot have that β̃ feis of a different sign than all the Δ ̃ g,t terms.
When some of the weights wg,t are negative, β fe ˆ may still be robust to heteroge-
neous treatment effects across groups and periods, provided the assumption below
is satisfied.

ASSUMPTION 7 (w Uncorrelated with Δ̃ ):

E [ ∑( ( =1 Ng,t/N1)(wg,t
g,t):Dg,t   − 1)
× (Δ̃ g,t  − Δ̃  TR )] = 0.

COROLLARY 2: If Assumptions 1–5 and 7 hold, then βfe = δ  TR.

Assumption 7 requires that the weights attached to the fixed effects estima-
tor be uncorrelated with the conditional ATEs in the treated (g, t)cells. This is
often implausible. For instance, groups treated the most are also those with the
lowest value of wg,t , as shown in Proposition 1. But those groups could also be
those with the largest treatment effect. This would then induce a negative cor-
relation between w and Δ̃ . The plausibility of Assumption 7 can be assessed,
by looking at whether w is correlated with a predictor of the treatment effect
in each (g, t)cell. In the two applications we revisit in Section V, this test is
rejected.

C. Extension to the First-Difference Regression

Instead of Regression 1, many articles have estimated the first-difference regres-

sion defined below.

REGRESSION 2 (First-Difference Regression): Let βˆ fddenote the coefficient

Dg,t  − Dg,t−1in an OLS regression of Y
of g,t  − Yg,t−1on period fixed effects
and Dg,t  − Dg,t−1, among observations for which t ≥ 2. Let βfd = E[ βˆ fd].

When T = 2and Ng,2/Ng,1does not vary across g , meaning that all groups expe-
rience the same growth of their number of units from period 1 to 2, one can show
that βˆ fe = βˆ fd
. But, βˆ fe
differs from βˆ fd
if T > 2or Ng,2
/Ng,1
varies across g .
We start by showing that a result similar to Theorem 1 also applies to β ˆ fd.
For any (g, t) ∈ {1, …, G } × {2, …, T }, let εfd,g,t denote the residual of obser-
vations in group g and at period tin the regression of D g,t  − Dg,t−1on period
fixed effects, among observations for which t ≥ 2. For any g ∈ {1, …, G },
let εfd,g,1 = εfd,g,T+1 = 0. One can show that if the regressors in Regression 2 are
not perfectly collinear,

N N
N ( fd,g,t )
  ∑   _ g,t
  ε   −  _  εfd,g,t+1 ≠ 0.
g,t+1
N
(g,t):Dg,t
=1 1 g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2975

Then we define
Ng,t+1
ε fd,g,t  −  _ Ng,t
  ε
fd,g,t+1

__________________________
wfd,g,t =     
       .
=1 N ( fd,g,t N fd,g,t+1)
Ng,t Ng,t+1
∑(    _   ε
  −   _   ε
g,t):Dg,t 1 g,t

THEOREM 2: Suppose that Assumptions 1–5 hold. Then,

Ng,t
[(g,t):D =1 N1 ]
= E   ∑   _  wfd,g,t Δg,t
βfd .
g,t

Theorem 2 shows that under Assumption 5, βfdis equal to a weighted sum of

the ATEs in each treated (g, t)cell with potentially some strictly negative weights,
just as β fe. We now characterize the ( g, t)cells whose ATEs are weighted negatively
by βfd. To do so, we focus on staggered adoption designs, as outside of this case it is
more difficult to characterize those cells. Our characterization relies on the fact that
for every t ∈ {2, …, T }, εfd,g,t = Dg,t  − Dg,t−1  − (D.,t  − D.,t−1). Here, εfd,g,tis the
difference between the change of the treatment in group gbetween t − 1and t , and
the average change of the treatment across all groups.

PROPOSITION 2: Suppose that Assumptions 1–2 and 6 hold and for all g ,
Ng,tdoes not depend on t. Then, for all (g, t)such that Dg,t = 1, wfd,g,t < 0 if
and only if Dg,t−1 = 1and
D.,t  − D.,t−1 > D.,t+1  − D.,t (with the convention
that D.,T+1 = D.,T).

Proposition 2 shows that for all t ∈ {2, …, T − 1}such that the increase in the
proportion of treated units is larger from t − 1to tthan from tto t + 1, the period-t
ATE of groups already treated in t − 1receives a negative weight. Moreover, if,
at period T , at least one group becomes treated, the ATE of groups already treated
in T − 1also receives a negative weight. Therefore, the treatment effect arising
at the date when a group starts receiving the treatment does not receive a negative
weight, only long-run treatment effects do. Then, negative weights are a concern
when instantaneous and long-run treatment effects may differ. Proposition 2 also
shows that the prevalence of negative weights depends on how the number of groups
that start receiving the treatment at date tevolves with t. Assume for instance that
this number decreases with t: many groups start receiving the treatment at date 1, a
bit less start at date 2, etc., a case hereafter referred to as the “more early adopters”
case. Then, if N g,tis constant across (g, t), D.,t  − D.,t−1is decreasing in t, and all the
long-run treatment effects receive negative weights, except maybe those of period T
if D.,T = D.,T−1. Conversely, assume that the number of groups that start receiving
the treatment at date tincreases with t: few groups start receiving the treatment at
date 1, a bit more start at date 2, etc., a case hereafter referred to as the “more late
adopters” case. Then, if N g,tis constant across (g, t), D.,t   − D.,t−1is increasing in t,
and only the period-Tlong-run treatment effects receive negative weights. Overall,
negative weights are much more prevalent in the “more early adopters” than in the
“more late adopters” case.
2976 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

We now come back to general sharp designs where the treatment may not follow
a staggered adoption. Let β ̃ fd = E( βˆ fd
|D)denote the expectation of β ˆ fd conditional
on the vector of treatment assignments D ̃ fe, one can show that the min-
. Just as for β
̃ ̃ ̃
imal value of σ(Δ ) compatible with β fd and Δ   = 0 is σ
TR
_ fd = |β̃ fd
|/σ(wfd ), where

((g,t):D =1 N1 )
1/2
N
σ( w ) =   ∑   _  (w fd,g,t  − 1)   
g,t 2
fd
g,t

is the standard deviation of the w fd -weights. One can also show that ‗  σ
fd, the min-
imal value of σ (Δ̃ ) compatible with β̃ fd and Δ̃ g,tof a different sign than β ̃ fd for
all (g, t), has the same expression as σ fe, except that one needs to replace the
‗ 
weights wg,tby the weights w fd,g,tin its definition. Estimators of σ fe and _ 
_  σ fd (or ‗  σ fe
‗  fd) can then be used to determine which of βˆ fe or βˆ fdis more robust to het-
and σ
erogeneous treatment effects.
Finally, and similarly to the result shown in Corollary 2 for β fe, βfdis equal to δ  TR
under common trends and the following assumption.

̃ ): E[ ∑(g,t):
ASSUMPTION 8 (wfd Uncorrelated with Δ   =1(Ng,t/N1)(wfd,g,t  − 1)
Dg,t
× (Δg,t  − Δ  )] = 0.
TR

Note that under the common trends assumption, one can jointly test Assumption 8
and Assumption 7, the assumption that the weights attached to βfeare uncorrelated
with the Δg,t terms: if βˆ fe and βˆ fdare significantly different, at least one of these two
assumptions must fail. In the two applications we revisit in Section V, β ˆ fe and βˆ fd are
significantly different.

III. An Alternative Estimator

In this section, we show that it is possible to estimate a well-defined causal effect

even if treatment effects are heterogeneous across groups or over time. Let

[ NS (i,g,t):t≥2,Dg,t≠Dg,t−1 ]
δ  S = E _
  1     ∑ [Yi,g,t(1) − Yi,g,t(0)] ,

S = ∑(g,t):t≥2,
with N   
  N Dg,t≠Dg,t−1 g,t. The term δ  is the ATE of all switching cells. In
S

staggered adoption designs, δ   Sis the average of the treatment effect at the time when
a group starts receiving the treatment, across all groups that become treated at some
point.
We now show that δ   Scan be unbiasedly estimated by a weighted average of DID
estimators. This result holds under the following supplementary assumptions.

ASSUMPTION 9 (Strong Exogeneity for Y(1)): For all (g, t) ∈ {1, …, G }

× {2, …, T }, E(Yg,t(1) − Yg,t−1
(1) | Dg,1, …, Dg,T
) = E(Yg,t
(1) − Yg,t−1
(1)).

Assumption 9 is the equivalent of Assumption 4, for the potential outcome with

g,t(1)be mean independent
treatment. It requires that the shocks affecting a group’s Y
of that group’s treatment sequence.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2977

ASSUMPTION 10 (Common Trends for Y(1)): For t ≥ 2, E(Yg,t

(1) − Yg,t−1
(1))
does not vary across g.

Again, Assumption 10 is the equivalent of Assumption 5, for the potential out-

come with treatment. It requires that between each pair of consecutive periods, the
expectation of the outcome with treatment follow the same evolution over time in
every group. Assumptions 9 and 10 ensure that one can reconstruct the potential out-
come that groups leaving the treatment between t − 1and t would have experienced
if they had remained treated. In staggered adoption designs, Assumptions 9 and 10
are not necessary for identification, because no group leaves the treatment. Together,
Assumptions 5 and 10 imply that the ATE follows the same evolution over time
in every group: E(Δg,t) = ηt  + θg.13 This still allows for heterogeneous treatment
effects across groups and over time.14

ASSUMPTION 11 (Existence of “Stable” Groups): For all t ∈ {2, …, T }:

g,t−1 = 0, Dg,t = 1,

(i ) If there is at least one g ∈ {1, …, G } such that D
then there exists at least one g′ ≠ g, g′ ∈ {1, …, G } such that Dg′,t−1
= Dg′,t = 0.

(ii) If there is at least one g ∈ {1, …, G } such that Dg,t−1 = 1, Dg,t = 0, then
there exists at least one g′ ≠ g, g′ ∈ {1, …, G } such that Dg′,t−1  = Dg′,t  = 1.

The first point of the stable groups assumption requires that between each pair of
consecutive time periods, if there is a “joiner” (i.e., a group switching from being
untreated to treated), then there should be another group that is untreated at both
dates. The second point requires that between each pair of consecutive time periods,
if there is a “leaver” (i.e., a group switching from being treated to untreated), then
there should be another group that is treated at both dates.
Notice that under Assumption 11, groups’ treatments are not indepen-
dent, so Assumption 3 cannot hold. Accordingly, we replace Assumption 3 by
Assumption 12. Assumption 12 requires that conditional on its own treatments, a
group’s outcomes be mean independent of the other groups’ treatments. It is weaker
than Assumption 3. Assumption 11 is necessary to show that our estimator is unbi-
ased, but it is not necessary to show that it is consistent. Accordingly, in Section 5 of
the online Appendix, we show that our estimator is consistent under Assumption 3.
For every g ∈ {1, …, G }, let Dg = (D1,g
, …, DT,g
).

ASSUMPTION 12 (Mean Independence between a Group’s Outcome and

Other Groups Treatments): For all , E(Yg,t(0) | D) = E(Yg,t(0) | Dg) and
gand t
E(Yg,t(1) | D) = E(Yg,t(1) | Dg).

13
It should be possible to weaken Assumptions 9–10, in particular to account for dynamic effects where Δg,t
may depend on ( Dg,1, …, Dg,t−1). This introduces complications that are beyond the scope of this paper, but that we
address in de Chaisemartin and D'Haultfœuille (2020a).
14
Imposing Assumptions 9 and 10 does not change the decompositions obtained in Theorems 1 and 2; Y g,t(1) is
observed for all the treated (g, t)cells entering these decompositions, so those assumptions do not bring identifying
information for those cells.
2978 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

We can now define our estimator. For all t ∈ {2, …, T }and for all (d, d′ )
∈ {0, 1}  2, let

(3) Nd,d′,t =   ∑  N

g,t
g:Dg,t=d,Dg,t−1=d′

denote the number of observations with treatment d ′at period t − 1and d at period t .
Let

N N
N ( g,t
  Y   − Yg,t−1)  −  
N ( g,t
DID+,t =   ∑   _ ∑   _   Y   − Yg,t−1),
g,t g,t

=1,Dg,t−1
g:Dg,t =0 1,0,t =Dg,t−1
g:Dg,t =0 0,0,t

N N
N ( g,t
  Y   − Yg,t−1)  −  
N ( g,t
DID−,t =   ∑   _ ∑   _   Y   − Yg,t−1).
g,t g,t

=Dg,t−1
g:Dg,t =1 1,1,t =0,Dg,t−1
g:Dg,t =1 0,1,t

Note that D ID+,tis not defined when there is no group such that D g,t = 1,
Dg,t−1 = 0 , or no group such that Dg,t = 0, Dg,t−1 = 0
. In such instances,
we let D ID+,t = 0 . Similarly, let D
ID−,t = 0when there is no group such
g,t = 1, Dg,t−1 = 1or no group such that Dg,t = 0, Dg,t−1 = 1. Finally, let
that D

N1,0,t
N0,1,t

DIDM =  ∑ (_   DID+,t  +  _  DID−,t).
T
 
t=2
N
S
N S

THEOREM 3: If Assumptions 1, 2, 4, 5, and 9–12 hold, E[DIDM] = δ S.

In online Appendix Section 5, we also show that when G goes to infinity, D IDM
is a consistent and asymptotically normal estimator of δ  S. The DIDMestimator is
computed by the fuzzydid and did_multiplegt Stata packages.
Here is the intuition underlying Theorem 3. The estimator DID+,tcompares the
evolution of the mean outcome between t − 1and tin two sets of groups: the join-
ers, and those remaining untreated. Under Assumptions 4 and 5, DID+,t estimates
the joiners’ treatment effect. Similarly, DID−,tcompares the evolution of the out-
come between t − 1and tin two sets of groups: those remaining treated, and the
leavers. Under Assumptions 9 and 10, it estimates the leavers’ treatment effect.
Finally, D IDMis a weighted average of those DID estimators. Note that in stag-
gered designs, there are no groups whose treatment decreases over time, so DIDM is
only a weighted average of the D ID+,testimators. Note also that one can separately
estimate the joiners’ and the leavers’ treatment effect, by computing separately
weighted averages of the D ID+,tand DID−,testimators. The former estimator only
relies on Assumptions 4 and 5, while the latter only relies on Assumptions 9 and 10.
IDMis related to two other estimators. First, it is related to the Wald-TC
Note that, D
estimator in point 2 of Theorem S1 in the online Appendix of de Chaisemartin and
D’Haultfœuille (2018), but the weighting of DID+,tand D ID−,ttherein differs. As
IDMestimates Δ
a result, D   under weaker assumptions. Second, D
S
IDMis related to
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2979

the multiperiod DID estimator in Imai and Kim (2018). However, the m ultiperiod
DID estimator is a weighted average of the DID+,t, so it does not estimate the leav-
ers’ treatment effect, and applies to a smaller population. Besides, Imai and Kim
(2018) do not establish the properties of their estimator. Finally, they do not gen-
eralize it to nonbinary treatments, something we do in online Appendix Section 4.
There may be a b ias-variance trade-off between D IDMand the two-way fixed
effects regression estimators. For instance, assume that Regression 1 is correctly
specified:

Yg,t( 0) = αg  + λt  + εg,t ,

( 1) − Yg,t( 0) = δ,

Yg,t

E(εg,t | D) = 0.

Then, if the errors ε g,tare homoskedastic and uncorrelated, it follows from the
Gauss-Markov theorem that βˆ feis the linear estimator of δ, the constant treatment
effect parameter, with the lowest variance. As DIDMis also an unbiased linear esti-
mator of δ, the variance of βˆ femust be lower than that of D IDM. With heteroske-
dastic or correlated errors, one can construct examples where the variance of βˆ fe is
higher than that of D IDM, but this still suggests that DIDMmay often have a larger
variance than that of βˆ fe, as we find in our applications in Section V.
IDMuses groups whose treatment is stable to infer the trends that
Note that, D
would have affected switchers if their treatment had not changed. This strategy
could fail, if switchers experience different trends than groups whose treatment is
stable. To assess if this is a serious concern, we propose to use the following placebo
estimator, that essentially compares the outcome’s evolution from t − 2to t − 1, in
groups that switch and do not switch treatment between t − 1and t. This placebo
estimator is defined under a modified version of Assumption 11.

ASSUMPTION 13 (Existence of “Stable” Groups for the Placebo Test): For all
t ∈ {3, …, T }:

(i ) If there is at least one Dg,t−2 = Dg,t−1 = 0

g ∈ {1, …, G } such that
and Dg,t = 1, then there exists at least one g ′ ≠ g, g′ ∈ {1, …, G } such
that Dg′,t−2 = Dg′,t−1 = Dg′,t = 0.

(ii ) If there is at least one g ∈ {1, …, G } such that Dg,t−2

= Dg,t−1 = 1, = 0, then there exists at least one
Dg,t g′ ≠ g, g′
∈ {1, …, G }such that Dg′,t−2 = Dg′,t−1 = Dg′,t = 1.

For all t ∈ {2, …, T }and for all ( d, d′, d″ ) ∈ {0, 1}  3, let

Nd,d′,d″,t =   ∑ N

g,t
g:Dg,t=d,Dg,t−1=d′,Dg,t−2=d″
2980 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

denote the number of observations with treatment status d″at period t − 2, d′ at
period t − 1, and dat period t. Let

N  S  =  

∑ N
g,t ,
pl

(g,t):t≥3,Dg,t
≠Dg,t−1
=Dg,t−2

N
N ( g,t−1
DID  pl
+,t =   ∑   _   Y   − Yg,t−2)
g,t

=1,Dg,t−1
g:Dg,t =Dg,t−2
=0 1,0,0,t

N
N ( g,t−1
−   ∑   _   Y   − Yg,t−2),
g,t

=Dg,t−1
g:Dg,t =Dg,t−2
=0 0,0,0,t

N
N ( g,t−1
DID  pl
−,t =   ∑   _   Y   − Yg,t−2)
g,t

=Dg,t−1
g:Dg,t =Dg,t−2
=1 1,1,1,t

N
N ( g,t−1
−   ∑   _   Y   − Yg,t−2).
g,t

=0,Dg,t−1
g:Dg,t =Dg,t−2
=1 0,1,1,t

When there is no group such that Dg,t = 1, Dg,t−1 = Dg,t−2 = 0or no group such
that Dg,t = Dg,t−1 = Dg,t−2 = 0, we let DID  pl
+,t = 0, and we adopt the same con-
vention for DID  pl −,t
= 0 . Let

N1,0,0,t
N0,1,1,t

t=3 ( N  S  )
T
DID  M  =  ∑ _
  pl  DID  pl
+,t  +  
_   DID  pl
−,t .
pl

N  S 
pl

THEOREM 4: If Assumptions 1, 2, 4, 5, 9, 10, 12, and 13 hold, then E[ DID  M]    = 0.
pl

The D +,testimator compares the evolution of the mean outcome from t − 2

ID  pl
to t − 1in two sets of groups: those untreated at t − 2and t − 1but treated
at t, and those untreated at t − 2, t − 1, and t. If Assumptions 4 and 5 hold,
then E +,t ] = 0
[DID  pl . Similarly, if Assumptions 9 and 10 hold, E[DID  pl
−,t ] = 0.
Then, E[DID  M  ] = 0is a testable implication of Assumptions 4, 5, 9, and 10, so
pl

finding DID  M significantly different from 0 would imply that those assumptions are
pl

violated: groups that switch treatment experience different trends before that switch
than the groups used to reconstruct their counterfactual trends when they switch.15
Note that DID  M compares the trends of switching and stable groups one period
pl

before the switch. One can define other placebo estimators comparing those trends,
say, two or three periods before the switch. The DID  M estimator and all those other
pl

placebo estimators are computed by the did_multiplegt Stata package.

15
See also Callaway and Sant’Anna (2018), which proposes another placebo test in staggered adoption designs.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2981

IV. Extensions

In this section, we briefly review some of the extensions in our online Appendix.
First, we show that the decomposition of β fein Theorem 1 can be extended to fuzzy
designs where the treatment varies within (g, t)cells and to applications with a non-
binary treatment.16 In fuzzy designs or with a n onbinary treatment, the weights in
Theorem 1 remain essentially unchanged.
We also consider two-way fixed effects regressions with covariates. Specifically,
we study the coefficient of Dg,tin a regression of Yi,g,t on group and period fixed
effects, Dg,t, and a vector of covariates Xg,t. We show that a result very similar to
Theorem 1 applies to that coefficient, up to two differences. First, including covari-
ates allows for different trends across groups, provided those differential trends are
fully accounted for by a linear model in Xg,t  − Xg,t−1, the change in a group’s covari-
ates. Specifically, instead of Assumptions 4 and 5, one needs to assume that

E(Yg,t( 0)|Dg, Xg)  − E(Yg,t−1(0)|Dg, Xg) = (Xg,t  − Xg,t−1)′γ + λt ,

for some vector γand constant λt , and where Xg = (Xg,1, …, Xg,T). Importantly,
when the covariates are g roup-specific linear trends, the equation above is equiva-
lent to

E(Yg,t( 0)| Dg, Xg)  − E(Yg,t−1(0)|Dg, Xg) = γg  + λt ,

meaning that from t − 1to t, the evolution of Y(0)in group gshould deviate from
its group-specific linear trend γ g by an amount λ
t common to all groups. Second, the
residual εg,tin the weights in Theorem 1 has to be replaced by ε  Xg,t, the residual of
observations in cell (g, t)in the regression of Dg,ton group and period fixed effects
and X g,t. Some of the corresponding weights may still be negative, as in Theorem 1.
Overall, two-way fixed effects regressions with covariates may rely on a more
plausible common trends assumptions than those without covariates, but they still
require that the treatment effect be homogeneous, across time and between groups.
Third, we show that under the common trends assumption and the assumption
that the ATE of a (g, t)cell does not change over time, βfeand β fdidentify weighted
sums of the ATEs of the (g, t)cells whose treatment changes between t − 1and t.
In sharp designs, the weights attached to βfdare all positive, while for βfe, the same
only holds in staggered adoption designs.
Fourth, we show that our DIDMestimator can easily be extended to nonbinary,
discrete treatments. Then, we define it as a weighted average of DID terms com-
paring the evolution of the outcome in groups whose treatment went from dto d′
between t − 1and tand in groups with a treatment of d at both dates, across all
possible values of d, d′, and t.
Finally, our twowayfeweights, fuzzydid, and did_multiplegt Stata packages can
handle all of those extensions.

16
The decomposition of βfd
in Theorem 2 can also be extended to all of those cases.
2982 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Table 1—Papers Using Two-Way Fixed Effects Regressions Published in the AER

2010 2011 2012 Total

Papers using two-way fixed effects regressions 5 14 14 33
Percent of published papers 5.2 12.2 11.2 9.8
Percent of empirical papers, excluding lab experiments 12.8 23.0 19.2 19.1

Note: This table reports the number of papers using two-way fixed effects regressions pub-
lished in the AER from 2010 to 2012.

Table 2—Descriptive Statistics on Two-Way Fixed Effects Papers

Number
of papers
Panel A. Estimation method
Fixed effects OLS regression 13
First-difference OLS regression 6
Fixed effects or first-difference OLS regression, with several treatment variables 6
Fixed effects or first-difference 2LS regression 3
Other regression 5

Panel B. Research design

Sharp design 26
Fuzzy design 7

Panel C. Are there stable groups?

Yes 12
Presumably yes 14
Presumably no 5
No 2

Note: This table reports the estimation method and the research design used in the 33 papers
using two-way fixed effects regressions published in the AER from 2010 to 2012, and whether
those papers have stable groups.

V. Applicability, and Applications

A. Applicability

We conducted a review of all papers published in the American Economic Review

(AER) between 2010 and 2012 to assess the importance of two-way fixed effects
regressions in economics. Over these three years, the AER published 337 papers. Out
of these 337 papers, 33 or 9.8 percent of them estimate the FE or FD Regression, or
other regressions resembling closely those regressions. When one withdraws from
the denominator theory papers and lab experiments, the proportion of papers using
these regressions raises to 19.1 percent.
Table 2 shows descriptive statistics about the 33 2010–2012 AER papers estimat-
ing t wo-way fixed effects regressions. Panel A shows that 13 use the FE regression;
6 use the FD regression; 6 use the FE or FD regression with several treatment vari-
ables; 3 use the FE or FD 2SLS regression discussed in online Appendix Section 3.4;
5 use other regressions that we deemed sufficiently close to the FE or FD regression
to include them in our count.17 Panel B shows that more than t hree-fourths of those

17
For instance, two papers use regressions with three-way fixed effects instead of two-way fixed effects.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2983

papers consider sharp designs, while less than one-fourth consider fuzzy designs.
Finally, panel C assesses whether, in those applications, there are groups whose
exposure to the treatment remains stable between each pair of consecutive time peri-
ods, the condition that has to be met to be able to compute the DID
M estimator. For
about one-half of the papers, reading the paper was not enough to assess this with
certainty. We then assessed whether they presumably have stable groups. Overall,
12 papers have stable groups, 14 presumably have stable groups, 5 presumably do
not have stable groups, and 2 do not have stable groups.
In online Appendix Section 6, we review each of the 33 papers. We explain where
two-way fixed effects regressions are used in the paper, and we detail our assess-
ment of whether the design is a sharp or a fuzzy design, and of whether the stable
groups assumption holds.

B. Application to Gentzkow, Shapiro, and Sinkinson (2011)

Gentzkow, Shapiro, and Sinkinson (2011) studies the effect of newspapers on vot-
ers’ turnout in US presidential elections between 1868 and 1928. They regress the
first-difference of the turnout rate in county gbetween election years t − 1and t on
state-year fixed effects and on the first difference of the number of newspapers avail-
able in that county. This corresponds to Regression 2, with state-year fixed effects
as controls. As reproduced in Table 3, Gentzkow, Shapiro, and Sinkinson (2011)
finds that βˆ fd = 0.0026 (standard error = 9 × 10  −4 ). According to this regres-
sion, one more newspaper increased voters’ turnout by 0.26 percentage points. On
the other hand, βˆ fe = − 0.0011 (standard error = 0.0011). Here, βˆ fe and βˆ fd are
significantly different (t-statistic = 2.86).
We use the twowayfeweights Stata package, downloadable with its help file from
the SSC repository, to estimate the weights attached to βˆ fe. We find that 60 percent
are strictly positive, 40 percent are strictly negative. The negative weights sum to
−0.53. We find σ _  ˆ  fe = 3 × 10  −4, meaning that βfeand the ATT may be of opposite
signs if the standard deviation of the ATEs across all the treated ( g, t)cells is equal
to 0.0003.18 Further, σ ˆ  fe = 7 × 10  −4, meaning that βfemay be of a different sign
‗ 
than the ATEs of all the treated (g, t)cells if the standard deviation of those ATEs is
equal to 0 .0007. We also estimate the weights attached to βˆ fd. Here, 54 percent are
strictly positive, and 46 percent are strictly negative. The negative weights sum to
−1.43. We find _  σˆ  fd = 4 × 10  −4, and ‗  σˆ  fd = 6 × 10  −4.
Therefore, βfeand βfdcan only receive a causal interpretation if the weights
attached to them are uncorrelated with the intensity of the treatment effect in each
county × election-year cell (Assumptions 7 and 8, respectively). This is not war-
ranted. First, as β ˆ fe and βˆ fd significantly differ, Assumptions 7 and 8 cannot jointly
hold. Moreover, the weights attached to β ˆ fe and βˆ fd
are correlated with variables
that are likely to be themselves associated with the intensity of the treatment effect
in each cell. For instance, the correlation between the weights attached to βˆ fdand t,
the year variable, is equal to − 0.06 (t-statistic = −3.28). The effect of newspapers
may be different in the last than in the first years of the panel. For instance, new

18
The number of newspapers is not binary, so strictly speaking, in this application the parameter of interest is
the average causal response parameter introduced in online Appendix Section 3.2, rather than the ATT.
2984 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Table 3—Estimates of the Effect of One Additional Newspaper on Turnout

Estimate Standard error Observations

ˆ fd
β 0.0026 0.0009 15,627
βˆ fe −0.0011 0.0011 16,872
DIDM 0.0043 0.0014 16,872
ID  M 
D −0.0009
pl
0.0016 13,221
DIDM , on placebo subsample 0.0045 0.0019 13,221

Notes: This table reports estimates of the effect of one additional newspaper on turnout, as
well as a placebo estimate of the common trends assumption underlying DIDM. Estimators are
computed using the data of Gentzkow, Shapiro, and Sinkinson (2011), with state-year fixed
effects as controls. Standard errors are clustered by county. To compute the D
IDM estimators,
the number of newspapers is grouped into 4 categories: 0, 1, 2, and more than 3.

means of communication, like the radio, appear in the end of the period under con-
sideration, and may diminish the effect of newspapers.19 This would lead to a vio-
lation of Assumption 8.
The stable groups assumption holds: between each pair of consecutive elections,
there are counties where the number of newspapers does not change. We use the
fuzzydid Stata package, downloadable with its help file from the SSC repository, to
estimate a modified version of our DIDMestimator, that accounts for the fact that
the number of newspapers is not binary (see online Appendix Section 3.2, where
we define this modified estimator). We include s tate-year fixed effects as controls
in our estimation. We find that D IDM = 0.0043, with a standard error of 0 .0014.
Therefore, DIDMis 66 percent larger than βˆ fd, and the two estimators are signifi-
cantly different at the 10 percent level (t-statistic = 1.77); D IDMis also of a differ-
ent sign than βˆ fe.
Our DIDMestimator only relies on a common trends assumption. To assess its
plausibility, we compute D ID  M , the placebo estimator introduced in Section III.20
pl

As shown in Table 3, our placebo estimator is small and not significantly differ-
ent from 0, meaning that counties where the number of newspapers increased or
decreased between t − 1and tdid not experience significantly different trends
in turnout from t − 2to t − 1than counties where that number was stable. Our
placebo estimator is estimated on a subset of the data: for each pair of consecu-
tive time periods t − 1and t, we only keep counties where the number of news-
papers did not change between t − 2and t − 1. Still, almost 80 percent of the
county × election-year observations are used in the computation of the placebo
estimator. Moreover, when reestimated on this subsample, the D IDMestimator is
very close to the DIDMestimator in the full sample.

C. The Effect of Union Membership on Wages

A number of articles have estimated the effect of union membership on wages

using panel data and controlling for workers’ fixed effects. For instance, Jakubson

19
In fact, Gentzkow, Shapiro, and Sinkinson (2011) analyzes the 1868 to 1928 period separately from later
periods, because the growth of the radio may have changed newspapers’ effects.
Again, we need to slightly modify D
IDMto account for the fact that the number of newspapers is not binary.
20 pl
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2985

Table 4—Estimates of the Union Premium

Estimate Standard error Observations

ˆ fe
β 0.107 0.030 4,360
βˆ fd 0.060 0.032 3,815
DIDM 0.041 0.034 3,815
ID  M 
D
pl
0.094 0.038 3,101
DID  M  −0.041
pl,2
0.030 2,458
DID  M  −0.004
pl,3
0.033 1,881

Notes: This table reports estimates of the effect of the union premium, as well as placebo esti-
mators of the common trends assumption. Estimators are computed using the data of Vella and
Verbeek (1998). Standard errors are clustered at the worker level.

(1991) has found a 8.3 percent union membership premium using that strategy, in
a sample of American males from the PSID followed from 1976 to 1980. Vella and
Verbeek (1998) estimates a similar regression and find similar results, in a sample of
young American males from the NLSY followed from 1980 to 1987.21
We use the data in Vella and Verbeek (1998) to compute various estima-
tors of the union wage premium. As union status is often measured with
error (see, e.g., Freeman 1984; Card 1996), we discard changes in union sta-
tus happening twice in three consecutive years. Specifically, for individuals
i,t−1 = 0, Di,t = 1, and Di,t+1 = 0, we replace Di,tby 0. Similarly, for indi-
with D
viduals with D i,t−1 = 1, Di,t = 0, and Di,t+1 = 1, we replace D i,tby 1. Doing so,
we discard half of the union status changes in the initial data.22
We start by estimating a two-way fixed effects regression of wages on union
membership with worker and year fixed effects. Table 4 shows that β ˆ fe = 0.107
(standard error = 0.030), a result close to that of the worker fixed effects regres-
sions in Jakubson (1991) and Vella and Verbeek (1998).
Then, we estimate the weights attached to βˆ fe. Here, 820 are strictly positive, 196
are strictly negative, but the negative weights only sum to −0.01. Still, σ ˆ     fe = 0.097,
_
meaning that β feand the ATT may be of opposite signs if the standard deviation
of the treatment effect across the unionized worker × year observations is equal
to 0.097, a substantial but still possible amount of heterogeneity. The weights are
negatively correlated with workers’ years of schooling (correlation = − 0.12,
t-statistic = − 1.88). The union premium may be lower for more educated work-
ers (see Freeman and Medoff 1984), as they may be less substitutable than less
educated ones. Then, βˆ femay overestimate δ   TR, the average union premium across
all unionized worker × year observations. We also find that βˆ fd = 0.060 (standard
error = 0.032) and that βˆ fe and βˆ fd significantly differ (t-statistic = 1.91),23 thus
casting further doubt on Assumptions 7 and 8.

21
The fixed effects regression is not the main specification in Vella and Verbeek (1998). The authors favor
instead a dynamic selection model.
22
Keeping the original data does not change much the results presented below, except that the placebo estima-
tor DIDM becomes significant.
pl,2
23
The standard error of βˆ fe
  − βˆ fd
is computed with a worker-level clustered bootstrap.
2986 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

The stable groups assumption holds: between each pair of consecutive years, there
are workers whose union membership status does not change. We therefore compute
our DIDMestimator. Table 4 shows that it is equal to 0.041(standard error = 0.034).
In this case D IDMis significantly different from β ˆ fe
(t-statistic = 2.60) and βˆ fd

(t-statistic = 2.36). As discussed in Section III, we can also estimate separately
24

the union premium for workers joining and leaving a union, something that was pre-
viously done by Freeman (1984). The joiners’ effect estimate is equal to 0.059(stan-
dard error = 0.053), the leavers’ effect is equal to 0 .021(standard error = 0.044),
and the two estimates do not significantly differ (t-statistic = 0.55).
The estimator DIDMrelies on a common trends assumption. To assess its plau-
sibility, we compute D ID  M , the placebo estimator introduced in Section III; D ID  M  
pl pl

compares the wage growth of workers changing and not changing their union
status one period before that change. We also compute DID  M  and DID  M  , two
pl,2 pl,3

other placebo estimators performing the same comparison two and three periods
before the change. As shown in Table 4, D ID  M is large, positive, and significant
pl

(t-statistic = 2.49). On the other hand DID  M  and D ID  M  are smaller and insig-
pl,2 pl,3

nificant. Workers that become unionized start experiencing a differential positive

pretrend one year before becoming unionized. This differential p retrend mostly
comes from union joiners: for them, the placebo estimator is equal to 0 .119(stan-
dard error = 0.051), while for union leavers the placebo is smaller (0.061) and
insignificant (standard error = 0.057). Therefore, the placebos suggest that even
the already small and insignificant DIDMestimator may overestimate the union pre-
mium, due to a positive p retrend. In fact, the estimate of leavers’ effect, for which
there is no evidence of a pretrend, is very close to 0. Overall, our results indicate that
there may not be a significant union wage premium.

VI. Conclusion

Almost 20 percent of empirical articles published in the AER between 2010

and 2012 use regressions with groups and period fixed effects to estimate treat-
ment effects. In this paper, we show that under a common trends assumption, those
regressions estimate weighted sums of the treatment effect in each group and period.
The weights may be negative: in one application, we find that more than 40 percent
of the weights are negative. The negative weights are an issue when the treatment
effect is heterogeneous, between groups or over time. Then, one could have that the
treatment’s coefficient in those regressions is negative while the treatment effect is
positive in every group and time period. We therefore propose a new estimator to
address this problem. This estimator estimates the treatment effect in the groups that
switch treatment, at the time when they switch. It does not rely on any treatment
effect homogeneity condition. It is computed by the fuzzydid and did_multiplegt
Stata packages. In the two applications we revisit, this estimator is significantly and
economically different from the two-way fixed effects estimators.

24
The standard errors of βˆ fe
  − DIDM and βˆ fd
  − DIDMare computed with a w
orker-level clustered bootstrap.
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2987

Appendix A. Proofs

One Useful Lemma

Our results rely on the following lemma.

LEMMA 1: If Assumptions 1–5 hold, for all (

g, g′, t, t′ ) ∈ {1, …, G }  2
× {1, …, T }  2,

E(Yg,t | D)  − E(Yg,t′ | D) − (E(Yg′,t | D) − E(Yg′,t′ | D))

Dg,t E(Δg,t | D) − Dg,t′ E(Δg,t′ | D) − (Dg′,t E(Δg′,t | D) − Dg′,t′ E(Δg′,t′ | D)).
=

PROOF OF LEMMA 1:
For all ( g, t) ∈ {1, …, G} × {1, …, T },

( Ng,t i=1 )
Ng,t
E( Yg,t | D) = E _
  1    ∑  Yi,g,t | D

( Ng,t i=1 )
Ng,t
  1    ∑  (Yi,g,t(0) + Di,g,t(Yi,g,t(1) − Yi,g,t(0))) | D
= E _

= E(Yg,t( 0) | D) + Dg,t E(Δg,t | D)

= E(Yg,t( 0) | Dg) + Dg,t E(Δg,t | D),

where the third equality follows from Assumption 2, and the fourth from
Assumption 3. Therefore,

E( Yg,t | D)  − E(Yg,t′ | D) − (E(Yg′,t | D) − E(Yg′,t′ | D))

(0) | Dg)  − E(Yg′,t(0) − Yg′,t′
= E(Yg,t(0) − Yg,t′
(0) | Dg′ )

| D) − Dg,t′
+ Dg,t E(Δg,t
| D) − (Dg′,t
E(Δg,t′ | D) − Dg′,t′
E(Δg′,t E(Δg′,t′ | D))

= E(Yg,t(0) − Yg,t′
(0))  − E(Yg′,t(0) − Yg′,t′
(0))

| D) − Dg,t′
+ Dg,t E(Δg,t
| D) − (Dg′,t
E(Δg,t′ | D) − Dg′,t′
E(Δg′,t E(Δg′,t′ | D))

=
| D) − Dg,t′
Dg,t E(Δg,t | D) − (Dg′,t
E(Δg,t′ | D) − Dg′,t′
E(Δg′,t | D)),
E(Δg′,t′

where the second equality follows from Assumption 4, and the third from
Assumption 5. ∎
2988 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

PROOF OF THEOREM 1:
risch-Waugh theorem and the definition of ε g,t that
It follows from the F
∑   N
g,t g,t εg,t | D)
E(Yg,t
E(
(A1) | D) =  
βˆ fe _______________
        .
∑g,t
  N
g,t εg,t
Dg,t

Now, by definition of εg ,t again,
T
 ∑ Ng,t
(A2) εg,t
= 0 {1, …, G},
for all g ∈
t=1
G
  ∑ Ng,t εg,t = 0
(A3) {1, …, T}.
for all t ∈
g=1

Then,

  g,t εg,t E(Yg,t | D)

∑  N
g,t

(A4)
=   g,t εg,t(E(Yg,t | D)  − E(Yg,1 | D)  − E(Y1,t | D)  + E(Y1,1 | D))
∑  N
g,t

=
  g,t εg,t(Dg,t E(Δg,t | D) − Dg,1 E(Δg,1 | D)
∑  N
g,t

− D1,t E(Δ1,t | D) + D1,1 E(Δ1,1 | D))

=
  g,t εg,t Dg,t E(Δg,t | D)
∑  N
g,t

g,t εg,t E(Δg,t | D).

=   ∑  N
(A5)
(g,t):Dg,t=1

The first and third equalities follow from equations (A2) and (A3). The second
equality follows from Lemma 1. The fourth equality follows from Assumption 2.
Finally, Assumption 2 implies that

(A6) ∑  N
  g,t εg,t Dg,t =   ∑ N
g,t εg,t .
g,t (g,t):Dg,t=1

Combining (A1), (A5), (A6) yields

Ng,t
E(
(A7) | D) =   ∑   _  wg,t E(Δg,t
βˆ fe | D).
N
(g,t):D =1 1 g,t

Then, the result follows from the law of iterated expectations. ∎

PROOF OF PROPOSITION 1:
If for all t ≥ 2, Ng,t/Ng,t−1 does not depend on t, then it follows from the
first order conditions attached to Regression 1 and a few lines of alge-
bra that εg,t = Dg,t  − Dg,.  − D.,t  + D.,.. Therefore, wg,tis proportional
to Dg,t  − Dg,.  − D.,t  + D.,.
. Then, for all (g, t, t′ )such that Dg,t = Dg,t′
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2989

= 1, D.,t > D.,t′implies w g,t < wg,t′

. Similarly, for all (g, g′, t)such that
g,t = Dg′,t = 1, Dg,.
D > Dg′,.implies w
g,t < wg′,t . ∎

PROOF OF COROLLARY 1:

Proof of the First Point.—If the assumptions of the corollary hold and
̃
Δ   = 0, then
TR

⎧ ̃ Ng,t

⎪β fe = ∑ (   
___
g,t):Dg,t=1 N1  wg,t Δ̃ g,t ,
⎨    Ng,t
 
⎪
= ∑     ___   Δ ̃ g,t ,
⎩ 0 (g,t):Dg,t
=1 N 1

where the first equality follows from (A7). These two conditions and the
Cauchy-Schwarz inequality imply

| Ng,t
|β ̃ fe| =   ∑   _  (wg,t  − 1)(Δ̃ g,t  − Δ̃  TR) ≤ σ(W)σ(Δ̃ ).
N
(g,t):D =1 1 g,t
|
(Δ̃ ) ≥ σ
Hence, σ _  fe.
Now, we prove that we can rationalize this lower bound. Let us define
β̃ fe( wg,t   − 1)
Δ̃  TR _________
g,t =  
     .
σ  2( W)

Then,

Ng,t β̃ fe(wg,t  − 1) _____ β̃ fe Ng,t

σ  (W)((g,t):Dg,t=1 N1 )
Δ̃  TR =   ∑   _   _________      =       ∑   _  wg,t  − 1 = 0,

N
(g,t):Dg,t=1 1
σ  (W)
2 2

g,t that ∑
as it follows from the definition of w  
(g,t): =1(Ng,t/N1)wg,t = 1.
Dg,t
Similarly,

N β̃ fe(wg,t
  − 1) β̃  N
∑   _   wg,t  _________
     =  _____     ∑   _  wg,t(wg,t  − 1)
g,t fe g,t
 

N
(g,t):Dg,t=1 1
σ  (W)
2
  (W)(g,t):Dg,t=1 1
σ 2
N

β̃ fe Ng,t
=  _____
    ∑   _  (wg,t
  − 1)  2
σ  ( W)(g,t):D =1 N1
2
g,t

= β̃ fe ,

Ng,t
where the second equality follows again from the fact that ∑    ___ = 1.
Dg,t=1N  wg,t
(g,t):
1

Proof of the Second Point.—We first suppose that β̃ fe > 0. We seek to solve
n N
(i)
   ∑  _  (Δ̃ (i)  − Δ̃  TR)  ,
2
  min
Δ̃ (1),…,Δ̃ (n) i=1 N1
2990 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

subject to
n N
(i)
β̃ fe =  ∑  _  w(i) Δ̃ (i), Δ̃ (i) ≤ 0
{1, …, n}.
for all i ∈
i=1 N1

This is a quadratic programming problem, with a matrix that is symmetric pos-

itive but not definite. Hence, by Frank and Wolfe (1956) and the fact that the
linear term in the quadratic problem is 0, the solution exists if and only if the
set of constraints is not empty. If w (n) ≥ 0 , the set of constraints is empty
because ∑ i=1 (N(i)/N1) w(i)
n ̃ ̃
Δ (i) ≤ 0 < β fe. On the other hand, if w
(n) < 0, this
set is n on-empty since it includes (0, …, 0, β̃ fe/(P(n) w(n))).
We now derive the corresponding bound. For that purpose, remark that

i=1 N1 ( ) (i=1 N1 )
n n 2 n n 2
(i) N (i)N N
(i) 2 N(i)
 ∑  _  Δ̃ (i)  −  ∑  _  Δ̃ (i)   =  ∑  _  Δ̃  (i) −   ∑  _  Δ̃ (i)   .
i=1 N1 i=1 N1

The Karush-Kuhn-Tucker necessary conditions for optimality are that for all i:

Δ̃ (i) = Δ̃  TR + λ w(i)  − γ(i),

n N
(i)
 ∑  _  w(i) Δ̃ (i) = β̃ fe ,

i=1 N1

γ(i) ≥ 0,

γ(i) Δ̃ (i) = 0,

where Δ̃   TR = ∑ni=1 (N(i)/ N1) Δ̃ (i), 2λis the Lagrange multiplier of the con-
straint ∑ni=1 (N(i)/N1) w(i) Δ ̃ (i) = β̃ fe and 2 (N(i)/N1) γ( i)is the Lagrange multiplier
of the constraint Δ ̃ (i) ≤ 0.
These constraints imply that Δ̃ (i) = 0if and only if Δ̃  TR+ λw(i) ≥ 0. Therefore,
if Δ̃  TR + λ w(i) < 0, Δ̃ (i) ≠ 0 so γ(i) = 0, and Δ̃ (i) = Δ̃  TR + λ w(i). Therefore,

(A8) Δ̃ (i) = min(Δ̃  TR + λ w(i), 0).

This equation implies that Δ ̃ (i) ≤ Δ̃   TR + λ w(i)

, which in turn implies
that Δ̃   TR ≤ Δ̃   TR + λ, so λ ≥ 0.
As a result, Δ̃  TR+ λ w(i)is decreasing in i , and because x ↦ min(x, 0)
is increasing, Δ̃ (i)is also decreasing in i. Then Δ̃ (n)
< 0: otherwise one
̃
would have Δ (i) = 0for all iwhich would imply ̃
β fe = 0
, a contradiction.
Let s = min{i ∈ {1, …, n} : Δ̃ (i) < 0}. Using again (A8), we get
N(i)
̃   TR = ∑     _
Δ    Δ̃ (i) = Ps Δ̃  TR + λ Ss .
i≥s
N
1
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2991

Therefore,

̃   TR =  _λ Ss

(A9) Δ    .
1 − Ps

̃  TR in (A8), we obtain that for all i ≥ s,

Hence, plugging Δ
Ss
Δ̃ (i) = λ{_
    + w(i)}.
1 − Ps
Finally, using again (A8), we obtain
N(i)
β̃ fe = ∑     _  w(i) Δ̃ (i) = λ{_   + Ts}.
S  2s 
 
i≥s N1 1 − Ps
Thus,

____________
β̃ fe
λ =  
    .
Ts  + S  2s  / (1 − Ps)
Then, using what precedes,
N(i) N(i)
‗ 2fe   = ∑     _  (λ w(i))  2 + ∑     _  (Δ̃  TR) 
2
σ
i≥s 1
N i<s 1
N

λ Ss
= λ  2 Ts  + (1 − Ps) (_
1 − Ps )
2
     

= λ  2[Ts  +  _

1 − Ps ]
S  2s 
 

β̃  2  
=  _____________
   .
fe

Ts  + S  2s  / (1 − Ps)

The result follows, once noted that equations (A8) and (A9) imply that
s = min{i ∈ {1, …, n} : w(i) < − S(i)/(1 − P(i))}.
Finally, consider the case β ̃ fe < 0. By letting Δ̃  ′(  Δ̃ (i) and β̃  ′f 
i) = − β̃ fe ,
e  = −
we have

(i=1 N1 )
n n 2
(i) N (i) N
‗ fe =        ∑  _  Δ̃  ′( 
i)  −  ∑  
_   Δ̃  ′( 
i)  
2
σ min
Δ̃  ′(  ̃ ≤0
1)≤0,…,Δ  ′( 
n) i=1 N1

subject to
n N
(i)
 ∑  _  w(i) Δ̃  ′(  β̃  ′fe   .
i) =
i=1
N 1

This is the same program as before, with β̃   ′fe   instead of β̃ fe. Therefore, by the same
reasoning as before, we obtain

( fe) β̃  ′     β̃  2  

‗ 2fe   =  _____________
     =  _____________
    . ∎
fe
σ
Ts  + S  s  / (1 − Ps)
2 2
Ts  + S  s  / (1 − Ps)
2992 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

PROOF OF COROLLARY 2:
We have
Ng,t
((g,t):D =1 N1 )
βfe = E   ∑   _  wg,t Δ̃ g,t

g,t

(((g,t):D =1 N1 ) )

Ng,t
= E   ∑   _  wg,t Δ̃  TR

g,t

= E(Δ̃  TR)

= δ  TR.

The first equality follows from the law of iterated expectations and (A7).
The second equality follows from Assumption 7. By the definition of wg,t,
∑(g,t):
  =1(Ng,t/N1) wg,t = 1, hence the third equality. The fourth equality follows
Dg,t
from the law of iterated expectations. ∎

PROOF OF THEOREM 2:
It follows from the Frisch-Waugh theorem and the definition of ε fd,g,t that

g,t):t≥2 g,t εfd,g,t(E(Yg,t | D)  − E(Yg,t−1 | D))

(  N
∑
βˆ fd | D) =  ________________________________
E(
(A10)              .
g,t):t≥2 g,t εfd,g,t(
∑(  N Dg,t  − Dg,t−1)

Now, by definition of εfd,g,t again,

G
(A11)   ∑ Ng,t εfd,g,t = 0 {2, …, T}.
for all t ∈
g=1

Then,

g,t εfd,g,t(E(Yg,t | D)  − E(Yg,t−1 | D))

(A12)   ∑  N
(g,t):t≥2

g,t εfd,g,t(E(Yg,t | D)  − E(Yg,t−1 | D)

=   ∑  N

(g,t):t≥2

− E(Y1,t | D)  − E(Y1,t−1 | D))

=   ∑  N
g,t εfd,g,t Δ̃ g,t  − Dg,t−1 Δ̃ g,t−1 − D1,t Δ̃ 1,t  + D1,t−1 Δ̃ 1,t−1)
(Dg,t
(g,t):t≥2

=   ∑  N
g,t εfd,g,t Δ̃ g,t  − Dg,t−1 Δ̃ g,t−1)
(Dg,t
(g,t):t≥2

∑    (Ng,t
= εfd,g,t
  − Ng,t+1 εfd,g,t+1
Δ̃ g,t
) Dg,t
g,t

Ng,t+1
( )
=   ∑  N
  −  _  εfd,g,t+1 Δ̃ g,t .
g,t εfd,g,t
(g,t):D =1
g,t
N
g,t
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2993

The first and third equalities follow from (A11). The second equality follows from
Lemma 1. The fourth equality follows from a summation by part, and from the
fact εf d,g,1 = εfd,g,T+1 = 0. The fifth equality follows from Assumption 2.
A similar reasoning yields
N
( )
(A13)   ∑  N (Dg,t
g,t εfd,g,t   − Dg,t−1) =   ∑  N   −  _  εfd,g,t+1 .
g,t εfd,g,t
g,t+1

(g,t):t≥2 (g,t):Dg,t
=1
N g,t

Combining (A10), (A12), (A13), and the law of iterated expectations yields the
result. ∎

PROOF OF PROPOSITION 2:
It follows from the first order conditions attached to Regression 2 and a few lines
of algebra that εfd,g,t = Dg,t  − Dg,t−1 − D.,t  + D.,t−1. Therefore, under Assumption 6
and if Ng,tdoes not vary across t, one has that for all (g, t)such that Dg,t = 1,
1 ≤ t ≤ T − 1, wfd,g,tis proportional to 1 − Dg,t−1  − (2 D.,t  − D.,t−1  − D.,t+1).
Now, D.,t  − D.,t−1 ≤ 1 , and under Assumption 6 D .,t  − D.,t+1 ≤ 0, so
1 − Dg,t−1  − (2 D.,t  − D.,t−1  − D.,t+1)can only be strictly negative if Dg,t−1 = 1.
Then, for all ( g, t)such that D g,t = 1, 1 ≤ t ≤ T − 1, wfd,g,tis strictly negative if
and only if Dg,t−1 = 1 and 2 D.,t  − D.,t−1  − D.,t+1 > 0.
Similarly, when t = T, under the same assumptions as above, one has that for
all g such that Dg,T = 1, wfd,g,Tis proportional to 1 − Dg,T−1  − (D.,T  − D.,T−1).
Now, D.,T  − D.,T−1 ≤ 1, so 1 − Dg,T−1  − (D.,T  − D.,T−1)can only be strictly neg-
ative if Dg,T−1 = 1. Then, wfd,g,Tis strictly negative if and only if D g,T−1 = 1
and D.,T  − D.,T−1 > 0.
Finally, when t = 1, one has that for all g such that Dg,1 = 1, Dg,2 = 1 under
Assumption 6, so w fd,g,1is proportional to D .,2  − D.,1,which is greater than 0 under
Assumption 6. ∎

PROOF OF THEOREM 3:
IDM,
First, by definition of D
N1,0,t
N0,1,t

( )
E(DIDM) =  ∑ E (_   E(DID+,t | D)  +  _  E(DID−,t | D)) .
T
(A14)  
t=2 NS S
N
Let tbe greater than 2, and let us focus for now on the case where there is at least
one g1such that Dg1,t−1 = 0and Dg1,t = 1. Then Assumption 11 ensures that
there is a least another group g 2such that Dg2,t−1 = Dg2,t = 0. For every g such
g,t−1 = 0and Dg,t
that D = 1, we have

E(Yg,t  − Yg,t−1 | D) = E(Δg,t

(A15) | D)  + E(Yg,t(0) − Yg,t−1
(0)| D).

Under Assumptions 12, 4, and 5, for all t ≥ 2, there exists a real number ψ0,t
such that for all g,

(0)| D) = E(Yg,t(0) − Yg,t−1

( 0) − Yg,t−1
E(Yg,t
(A16) (0)| Dg)

= E(Yg,t(0) − Yg,t−1
(0)) = ψ0,t
.
2994 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

Then,

N1,0,t E(DID+,t | D)

(A17)

=  
∑  N | D)
g,t E(Δg,t
=1,Dg,t−1
g:Dg,t =0

+   ∑  N (0)| D)

g,t E(Yg,t(0) − Yg,t−1
=1,Dg,t−1
g:Dg,t =0

N1,0,t

−  _   
∑  N g,t E(Yg,t(0) − Yg,t−1(0)| D)
N0,0,t g:Dg,t
=Dg,t−1
=0

=  
g,t E(Δg,t | D) 
∑  N
=1,Dg,t−1
g:Dg,t =0

N
(g:Dg,t =1,Dg,t−1 )
+ ψ0,t   ∑  N
g,t  −  _   
1,0,t
N
∑  N
g,t
=0 =Dg,t−1
0,0,t g:Dg,t =0

=  
g,t E(Δg,t | D).
∑  N
=1,Dg,t−1
g:Dg,t =0

The first equality follows by (A15), the second by (A16), and the third after some
algebra. If there is no gsuch that Dg,t−1 = 0and D
g,t = 1, (A17) still holds,
ID+,t = 0in this case.
as D
A similar reasoning yields

N0,1,t E(DID−,t | D) =  

(A18) g,t E(Δg,t | D).
∑  N
g:Dg,t=0,Dg,t−1=1

Plugging (A17) and (A18) into (A14) yields

( ( NS (g:Dg,t=1,Dg,t−1=0 g,t g,t g:Dg,t=0,Dg,t−1=1 g,t g,t) ))|

T
E( DIDM) =  ∑ E E _
  1     ∑  N Δ   +   ∑  N Δ D
t=2

= δ S. ∎

PROOF OF THEOREM 4:
First, as with DIDM, we have

( ))
N1,0,0,t
N0,1,1,t

(
+,t| D)  +  
T
( DID  M ) =  ∑ E _ −,t| D) .
   E(DID  pl
(A19) E
pl
  pl   E(DID  pl _
N  S    S 
pl
t=3 N
VOL. 110 NO. 9 DE CHAISEMARTIN AND D’HAULTFŒUILLE: TWO-WAY FIXED EFFECTS 2995

Let tbe greater than 3, and let us for now focus on the case where there exists at
least one g 1such that Dg1,t−2 = Dg1,t−1 = 0and D
g1,t = 1. Then Assumption 13
ensures that there is a least another group g 2such that Dg2,t−2 = Dg2,t−1 = Dg2,t
= 0. Then,

+,t| D)
N1,0,0,t E(DID  pl
(A20)

=  
∑  N (0)| D)
g,t E(Yg,t−1(0) − Yg,t−2
=1,Dg,t−1
g:Dg,t =Dg,t−2
=0

N1,0,0,t

−  _    ∑  N g,t E(Yg,t−1(0) − Yg,t−2(0)| D)
N0,0,0,t g:Dg,t
=Dg,t−1
=Dg,t−2
=0

N
(g:Dg,t =1,Dg,t−1 )
= ψ0,t−1  
∑  N
g,t  −  _   
1,0,0,t
N
∑  N
g,t
=Dg,t−2
=0 =Dg,t−1
0,0,0,t g:Dg,t =Dg,t−2
=0

= 0.

The second equality follows by (A16), and the third follows after some algebra. If
there exists no gsuch that Dg,t−2 = Dg,t−1 = 0and Dg,t = 1, (A20) still holds,
as D +,t = 0in this case.
ID  pl
A similar reasoning yields

−,t| D) = 0.

N0,1,1,t E(DID  pl
(A21)

The result follows after plugging (A20) and (A21) into (A19). ∎

REFERENCES

Abraham, Sarah, and Liyang Sun. 2018. “Estimating Dynamic Treatment Effects in Event Studies
with Heterogeneous Treatment Effects.” Unpublished.
Ashenfelter, Orley. 1978. “Estimating the Effect of Training Programs on Earnings.” Review of Eco-
nomics and Statistics 60 (1): 47–57.
Athey, Susan, and Guido W. Imbens. 2018. “Design-Based Analysis in Difference-in-Differences Set-
tings with Staggered Adoption.” NBER Working Paper 24963.
Athey, Susan, and Scott Stern. 2002. “The Impact of Information Technology on Emergency Health
Care Outcomes.” RAND Journal of Economics 33 (3): 399–432.
Autor, David H. 2003. “Outsourcing at Will: The Contribution of Unjust Dismissal Doctrine to the
Growth of Employment Outsourcing.” Journal of Labor Economics 21 (1): 1–42.
Borusyak, Kirill, and Xavier Jaravel. 2017. “Revisiting Event Study Designs.” Unpublished.
Callaway, Brantly, and Pedro H. C. Sant’Anna. 2018. “Difference-in-Differences with Multiple Time
Periods and an Application on the Minimum Wage and Employment.” arXiv e-print 1803.09015.
Card, David. 1996. “The Effect of Unions on the Structure of Wages: A Longitudinal Analysis.”
Econometrica 64 (4): 957–79.
de Chaisemartin, Clément. 2011. “Fuzzy Differences in Differences.” Center for Research in Econom-
ics and Statistics Working Paper 2011-10.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2015. “Fuzzy Differences-in-Differences.”
arXiv e-print 1510.01757v2.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2018. “Fuzzy Differences-in-Differences.”
Review of Economic Studies 85 (2): 999–1028.
2996 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 2020

de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020a. "Difference-in-Differences Estimators

of Intertemporal Treatment Effects." arXiv:2007.04267
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2020b. “Replication Data for: Two-Way Fixed
Effects Estimators with Heterogeneous Treatment Effects.” American Economic Association
[publisher], Inter-university Consortium for Political and Social Research [distributor]. https://doi.
org/10.3886/E118363V1.
de Chaisemartin, Clément, Xavier D’Haultfœuille, and Yannick Guyonvarch. 2019. “Fuzzy Differenc-
es-in-Differences with Stata.” Stata Journal 19 (2): 435–58.
Duflo, Esther. 2001. “Schooling and Labor Market Consequences of School Construction in Indone-
sia: Evidence from an Unusual Policy Experiment.” American Economic Review 91 (4): 795–813.
Frank, Marguerite, and Philip Wolfe. 1956. “An Algorithm for Quadratic Programming.” Naval
Research Logistics Quarterly 3 (1–2): 95–110.
Freeman, Richard B. 1984. “Longitudinal Analyses of the Effects of Trade Unions.” Journal of Labor
Economics 2 (1): 1–26.
Freeman, Richard B., and James L. Medoff. 1984. “What Do Unions Do?” ILR Review 38 (2): 244–63.
Gentzkow, Matthew, Jesse M. Shapiro, and Michael Sinkinson. 2011. “The Effect of Newspaper Entry
and Exit on Electoral Politics.” American Economic Review 101 (7): 2980–3018.
Goodman-Bacon, Andrew. 2018. “Difference-in-Differences with Variation in Treatment Timing.”
Unpublished.
Imai, Kosuke, and In Song Kim. 2018. “On the Use of Two-Way Fixed Effects Regression Models for
Causal Inference with Panel Data.” Unpublished.
Jakubson, George. 1991. “Estimation and Testing of the Union Wage Effect Using Panel Data.” Review
of Economic Studies 58 (5): 971–91.
Vella, Francis, and Marno Verbeek. 1998. “Whose Wages Do Unions Raise? A Dynamic Model of
Unionism and Wage Rate Determination for Young Men.” Journal of Applied Econometrics 13 (2):
163–83.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge,
MA: MIT Press.
This article has been cited by:

1. Patrick Premand, Dominic Rohner. 2024. Cash and Conflict: Large-Scale Experimental Evidence
from Niger. American Economic Review: Insights 6:1, 137-153. [Abstract] [View PDF article] [PDF
with links]
2. Daniel Avdic, Petter Lundborg, Johan Vikström. 2024. Does Health-Care Consolidation Harm
Patients? Evidence from Maternity Ward Closures. American Economic Journal: Economic Policy 16:1,
160-189. [Abstract] [View PDF article] [PDF with links]
3. Traviss Cassidy, Mark Dincecco, Ugo Antonio Troiano. 2024. The Introduction of the Income Tax,
Fiscal Capacity, and Migration: Evidence from US States. American Economic Journal: Economic Policy
16:1, 359-393. [Abstract] [View PDF article] [PDF with links]
4. Christophe Bellégo, Joeffrey Drouard. 2024. Fighting Crime in Lawless Areas: Evidence from Slums
in Rio de Janeiro. American Economic Journal: Economic Policy 16:1, 124-159. [Abstract] [View PDF
article] [PDF with links]
5. Joshua Rauh, Ryan Shyu. 2024. Behavioral Responses to State Income Taxation of High Earners:
Evidence from California. American Economic Journal: Economic Policy 16:1, 34-86. [Abstract] [View
PDF article] [PDF with links]
6. Andreas Bjerre-Nielsen, Mikkel Høst Gandil. 2024. Attendance Boundary Policies and the Limits
to Combating School Segregation. American Economic Journal: Economic Policy 16:1, 190-227.
[Abstract] [View PDF article] [PDF with links]
7. Elliott Ash, W. Bentley MacLeod. 2024. Mandatory Retirement for Judges Improved the Performance
of US State Supreme Courts. American Economic Journal: Economic Policy 16:1, 518-548. [Abstract]
[View PDF article] [PDF with links]
8. Oren Reshef. 2023. Smaller Slices of a Growing Pie: The Effects of Entry in Platform Markets.
American Economic Journal: Microeconomics 15:4, 183-207. [Abstract] [View PDF article] [PDF with
links]
9. Robert C. Allen, Mattia C. Bertazzini, Leander Heldring. 2023. The Economic Origins of
Government. American Economic Review 113:10, 2507-2545. [Abstract] [View PDF article] [PDF
with links]
10. Emily C. Lawler, Katherine G. Yewell. 2023. The Effect of Hospital Postpartum Care Regulations
on Breastfeeding and Maternal Time Allocation. American Economic Journal: Applied Economics 15:4,
477-513. [Abstract] [View PDF article] [PDF with links]
11. Giorgio Gulino, Federico Masera. 2023. Contagious Dishonesty: Corruption Scandals and
Supermarket Theft. American Economic Journal: Applied Economics 15:4, 218-251. [Abstract] [View
PDF article] [PDF with links]
12. Fangwen Lu, Weizeng Sun, Jianfeng Wu. 2023. Special Economic Zones and Human Capital
Investment: 30 Years of Evidence from China. American Economic Journal: Economic Policy 15:3, 35-64.
[Abstract] [View PDF article] [PDF with links]
13. Marcus Dillender. 2023. Evidence and Lessons on the Health Impacts of Public Health Funding from
the Fight against HIV/AIDS. American Economic Review 113:7, 1825-1887. [Abstract] [View PDF
article] [PDF with links]
14. Casper Worm Hansen, Asger Mose Wingender. 2023. National and Global Impacts of Genetically
Modified Crops. American Economic Review: Insights 5:2, 224-240. [Abstract] [View PDF article]
[PDF with links]
15. Elena Esposito, Tiziano Rotesi, Alessandro Saia, Mathias Thoenig. 2023. Reconciliation Narratives:
The Birth of a Nation after the US Civil War. American Economic Review 113:6, 1461-1504. [Abstract]
[View PDF article] [PDF with links]
16. D. Mark Anderson, Daniel I. Rees. 2023. The Public Health Effects of Legalizing Marijuana. Journal
of Economic Literature 61:1, 86-143. [Abstract] [View PDF article] [PDF with links]
17. Aljoscha Janssen, Xuan Zhang. 2023. Retail Pharmacies and Drug Diversion during the Opioid
Epidemic. American Economic Review 113:1, 1-33. [Abstract] [View PDF article] [PDF with links]
18. Luca Braghieri, Ro’ee Levy, Alexey Makarin. 2022. Social Media and Mental Health. American
Economic Review 112:11, 3660-3693. [Abstract] [View PDF article] [PDF with links]
19. Jonathan Roth. 2022. Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
American Economic Review: Insights 4:3, 305-322. [Abstract] [View PDF article] [PDF with links]
20. Litterio Mirenda, Sauro Mocetti, Lucia Rizzica. 2022. The Economic Effects of Mafia: Firm Level
Evidence. American Economic Review 112:8, 2748-2773. [Abstract] [View PDF article] [PDF with
links]
21. Jevan Cherniwchan, Nouri Najjar. 2022. Do Environmental Regulations Affect the Decision to
Export?. American Economic Journal: Economic Policy 14:2, 125-160. [Abstract] [View PDF article]
[PDF with links]
22. Enrico Cantoni, Vincent Pons. 2022. Does Context Outweigh Individual Characteristics in Driving
Voting Behavior? Evidence from Relocations within the United States. American Economic Review
112:4, 1226-1272. [Abstract] [View PDF article] [PDF with links]
23. Michael Greenstone, Guojun He, Ruixue Jia, Tong Liu. 2022. Can Technology Solve the Principal-
Agent Problem? Evidence from China’s War on Air Pollution. American Economic Review: Insights 4:1,
54-70. [Abstract] [View PDF article] [PDF with links]
24. Martha J. Bailey, Shuqiao Sun, Brenden Timpe. 2021. Prep School for Poor Kids: The Long-Run
Impacts of Head Start on Human Capital and Economic Self-Sufficiency. American Economic Review
111:12, 3963-4001. [Abstract] [View PDF article] [PDF with links]
25. Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens, Stefan Wager. 2021.
Synthetic Difference-in-Differences. American Economic Review 111:12, 4088-4118. [Abstract] [View
PDF article] [PDF with links]

5 6102481020978726165
No ratings yet
5 6102481020978726165
51 pages
w29691 PDF
No ratings yet
w29691 PDF
30 pages
Research Paper - Econometrics - TWFE
No ratings yet
Research Paper - Econometrics - TWFE
35 pages
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
No ratings yet
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
30 pages
DID Paper 2
No ratings yet
DID Paper 2
51 pages
Event Studies Slides
No ratings yet
Event Studies Slides
39 pages
SSRN 3555463
No ratings yet
SSRN 3555463
64 pages
Takehome - Exam DiD and RDD
No ratings yet
Takehome - Exam DiD and RDD
36 pages
Panel Data Lecture Notes
No ratings yet
Panel Data Lecture Notes
38 pages
Panal Data Method ch14 PDF
No ratings yet
Panal Data Method ch14 PDF
38 pages
Transformation for Difference-in-Differences
No ratings yet
Transformation for Difference-in-Differences
45 pages
Utad 016
No ratings yet
Utad 016
36 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
M604 Final Solutions
No ratings yet
M604 Final Solutions
20 pages
Chapter 14
No ratings yet
Chapter 14
22 pages
On Estimating Multiple Treatment Effects With Regression
No ratings yet
On Estimating Multiple Treatment Effects With Regression
26 pages
Econometrics 2: 1. Repeated Cross Section: Difference in Differences
No ratings yet
Econometrics 2: 1. Repeated Cross Section: Difference in Differences
28 pages
2025 More On Panels
No ratings yet
2025 More On Panels
17 pages
Did Staggered
No ratings yet
Did Staggered
37 pages
Revisiting Event Study Designs: Robust and Efficient Estimation
No ratings yet
Revisiting Event Study Designs: Robust and Efficient Estimation
67 pages
Synthetic Control and Regression Methods
No ratings yet
Synthetic Control and Regression Methods
38 pages
DID Paper
No ratings yet
DID Paper
54 pages
SSRN 4487202
No ratings yet
SSRN 4487202
382 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
SSRN 4487202
No ratings yet
SSRN 4487202
380 pages
Distribution Regression Difference-in-Differences
No ratings yet
Distribution Regression Difference-in-Differences
49 pages
ECOS3903 Week 8 Lecture Slides v2
No ratings yet
ECOS3903 Week 8 Lecture Slides v2
30 pages
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
No ratings yet
Econometrica - 2009 - Bai - Panel Data Models With Interactive Fixed Effects
51 pages
Part2 - FEM and REM
No ratings yet
Part2 - FEM and REM
20 pages
Topic 6 FE DiD SCM
No ratings yet
Topic 6 FE DiD SCM
80 pages
Econometrics Exam: Pooled Data Analysis
No ratings yet
Econometrics Exam: Pooled Data Analysis
6 pages
FEF and FEF-IV - 5 - 6 - 2017
No ratings yet
FEF and FEF-IV - 5 - 6 - 2017
13 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
Potential Outcomes Framework
100% (1)
Potential Outcomes Framework
7 pages
05 Covariates
No ratings yet
05 Covariates
104 pages
How Much Should We Trust Differences in Difference
No ratings yet
How Much Should We Trust Differences in Difference
32 pages
Causal K-Means Clustering Methodology
No ratings yet
Causal K-Means Clustering Methodology
44 pages
CH 14 Wooldridge 5e PPT
No ratings yet
CH 14 Wooldridge 5e PPT
12 pages
DP 16202
No ratings yet
DP 16202
51 pages
BorusyakJaravelSpiess (2024) ReviewOfEconomicStudies
No ratings yet
BorusyakJaravelSpiess (2024) ReviewOfEconomicStudies
33 pages
Causal Inference for Social Scientists
No ratings yet
Causal Inference for Social Scientists
33 pages
Unequal Baseline DiD Analysis
No ratings yet
Unequal Baseline DiD Analysis
49 pages
Panel Data Modelling
No ratings yet
Panel Data Modelling
24 pages
Econometrica: Eywords
No ratings yet
Econometrica: Eywords
51 pages
14 382 Pset 5
No ratings yet
14 382 Pset 5
7 pages
Dynamic DiD Regression Li Strezhnev June 25 2024
No ratings yet
Dynamic DiD Regression Li Strezhnev June 25 2024
112 pages
Subject: Statistics: Eco: Pd-Ii 1 /24
No ratings yet
Subject: Statistics: Eco: Pd-Ii 1 /24
24 pages
Slides 5 Fixed Effects
No ratings yet
Slides 5 Fixed Effects
306 pages
Multiple Testing With Covariate Adjustment
No ratings yet
Multiple Testing With Covariate Adjustment
20 pages
Journal - Generalized Synthetic Control Method - Causal Inference With Interactive Fixed Effects Models
No ratings yet
Journal - Generalized Synthetic Control Method - Causal Inference With Interactive Fixed Effects Models
20 pages
Rev Lect 3&4 J
No ratings yet
Rev Lect 3&4 J
56 pages
LATE - An Intro
No ratings yet
LATE - An Intro
24 pages
Applied Econometrics: William Greene Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: William Greene Department of Economics Stern School of Business
68 pages
Understanding Fixed Effects Model Analysis
No ratings yet
Understanding Fixed Effects Model Analysis
8 pages
Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs
No ratings yet
Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs
30 pages
An Introduction To G Methods - Ashley I Naimi Stephen R Cole Edward H Kennedy
No ratings yet
An Introduction To G Methods - Ashley I Naimi Stephen R Cole Edward H Kennedy
20 pages
Interpreting Event-Studies in DiD Methods
No ratings yet
Interpreting Event-Studies in DiD Methods
9 pages
This Content Downloaded From 137.224.8.42 On Mon, 01 Apr 2024 07:46:12 +00:00
No ratings yet
This Content Downloaded From 137.224.8.42 On Mon, 01 Apr 2024 07:46:12 +00:00
33 pages
International Finance Course Guide
No ratings yet
International Finance Course Guide
6 pages
Principles of Economics and Business 2 Economics of Taxation Klaus Fonseca Hoeltgebaum
No ratings yet
Principles of Economics and Business 2 Economics of Taxation Klaus Fonseca Hoeltgebaum
54 pages
Addendum Vacancy Circular
No ratings yet
Addendum Vacancy Circular
10 pages
Soql Cheatsheet: Presented by
No ratings yet
Soql Cheatsheet: Presented by
10 pages
Traditional and Life Span
No ratings yet
Traditional and Life Span
4 pages
Aptis Training Material Overview
No ratings yet
Aptis Training Material Overview
5 pages
Lecture02 BTree
No ratings yet
Lecture02 BTree
5 pages
Refrigerants
No ratings yet
Refrigerants
28 pages
Marie-Pascale Delahoussaye Resume
No ratings yet
Marie-Pascale Delahoussaye Resume
1 page
4 - Letters From An American Farmer
100% (1)
4 - Letters From An American Farmer
90 pages
Determination of Residence Time Distribution in Thin Film SSHE
No ratings yet
Determination of Residence Time Distribution in Thin Film SSHE
9 pages
Sustainable Cooling for Farmers
No ratings yet
Sustainable Cooling for Farmers
3 pages
Planning Through Debate-The Communicative Turn in Planning Theory-Patsy Healy
No ratings yet
Planning Through Debate-The Communicative Turn in Planning Theory-Patsy Healy
21 pages
Saralseva Syllabus
No ratings yet
Saralseva Syllabus
16 pages
Quiz - 4213 - de Thi Thu Vao Lop 6 Mon Tieng Anh Theo Mau Thcs Cau Giay So 2
No ratings yet
Quiz - 4213 - de Thi Thu Vao Lop 6 Mon Tieng Anh Theo Mau Thcs Cau Giay So 2
9 pages
Topic 1 - Water Consumption
No ratings yet
Topic 1 - Water Consumption
11 pages
SG3525 900W Power Supply BOM Guide
No ratings yet
SG3525 900W Power Supply BOM Guide
3 pages
Vectra300 2017 r02
No ratings yet
Vectra300 2017 r02
56 pages
Column Buckling
No ratings yet
Column Buckling
22 pages
Final Report Jinal Rohit
No ratings yet
Final Report Jinal Rohit
15 pages
Global AI Hackathon-2025
No ratings yet
Global AI Hackathon-2025
4 pages
Environmental Awareness Activities Guide
No ratings yet
Environmental Awareness Activities Guide
7 pages
Guilt and Culture in Orwell's Essay
No ratings yet
Guilt and Culture in Orwell's Essay
6 pages
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
No ratings yet
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
44 pages
Communication PROJECT PLAN
No ratings yet
Communication PROJECT PLAN
4 pages
Seats of Matrix For UG PG 2024 25
No ratings yet
Seats of Matrix For UG PG 2024 25
35 pages
The Level of Social Media Slang Contextualization On Students' Academic Language Proficiency and Literacy Development: Basis For Instructional Approach
No ratings yet
The Level of Social Media Slang Contextualization On Students' Academic Language Proficiency and Literacy Development: Basis For Instructional Approach
14 pages
Results
No ratings yet
Results
30 pages
Lesson Plan Philippine Archipelago Formation Grade11
No ratings yet
Lesson Plan Philippine Archipelago Formation Grade11
3 pages
ULN2065
No ratings yet
ULN2065
12 pages
Mgt503 MCQ 50 70
No ratings yet
Mgt503 MCQ 50 70
19 pages