0% found this document useful (0 votes)

37 views30 pages

Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs

This article introduces a covariate balancing propensity score (CBPS) estimator for estimating treatment effects on the treated (ATT) within a difference-in-differences (DID) framework using panel data. The proposed CBPS DID estimator exhibits double robustness and local efficiency, outperforming traditional augmented inverse probability weighting (AIPW) estimators, particularly in scenarios of model misspecification. The paper provides theoretical insights, simulation studies, and empirical examples to demonstrate the effectiveness of the CBPS DID estimator.

Uploaded by

chinamaker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views30 pages

Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs

Uploaded by

chinamaker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A difference-in-differences estimator by

covariate balancing propensity score

Junjie Li
Department of Economics, Hitotsubashi University
and
Yukitoshi Matsushita
arXiv:2508.02097v1 [econ.EM] 4 Aug 2025

Department of Economics, Hitotsubashi University

August 5, 2025

Abstract
This article develops a covariate balancing approach for the estimation of treatment
effects on the treated (ATT) in a difference-in-differences (DID) research design when
panel data are available. We show that the proposed covariate balancing propensity score
(CBPS) DID estimator possesses several desirable properties: (i) local efficiency, (ii) double
robustness in terms of consistency, (iii) double robustness in terms of inference, and (iv)
faster convergence to the ATT compared to the augmented inverse probability weighting
(AIPW) DID estimators when both working models are locally misspecified. These latter
two characteristics set the CBPS DID estimator apart from the AIPW DID estimator
theoretically. Simulation studies and an empirical study demonstrate the desirable finite
sample performance of the proposed estimator.

Keywords: double robustness, local misspecification, panel data, treatment effects on the treated
(ATT),

1
1 Introduction

Difference-in-differences (DID) is a widely employed research design in evaluating the causal

effects of policy interventions using observational data. In its canonical form, the DID approach

necessitates two groups and two periods, stipulating that no entity is exposed to the treatment

in the initial period, while a subset remains untreated in the subsequent period. A crucial

underpinning of the DID design is the so-called (unconditional) parallel trends assumption

(PTA), which posits that, in the absence of the treatment, the average outcomes for both the

treatment and comparison groups would have evolved along parallel paths over time. While the

PTA is inherently untestable, its validity has been questioned, particularly in scenarios where pre-

treatment characteristics, differing between the treatment and comparison groups, are correlated

with the outcome’s evolution. In such instances, the canonical DID setup becomes implausible,

prompting researchers to incorporate pre-treatment covariates into the DID framework. This

modification ensures the satisfaction of the PTA conditionally on these covariates (conditional

PTA).

Under this conditional PTA, three predominant estimation procedures have emerged: the

outcome regression (OR), the inverse probability weighting (IPW), and the augmented IPW

(AIPW), the latter offering double robustness in terms of consistency, see Sant’Anna and Zhao

(2020) for a comprehensive review. However, these methods confront the challenge of potential

misspecification of the outcome regression and/or propensity score models, leading to incorrect

inferences. While doubly robust estimators offer improved consistency by requiring only one

of the two working models to be correctly specified, an unavoidable situation where both

models are misspecified still remains. For estimation of the average treatment effect (ATE),

Kang and Schafer (2007) highlight this limitation, demonstrating that the advantages of AIPW

estimators can substantially diminish when both the outcome regression and propensity score

models are slightly misspecified. To address this issue, Imai and Ratkovic (2014) proposed

the Covariate Balancing Propensity Score (CBPS) methodology, illustrating that the CBPS

estimator can significantly ameliorate the finite sample performance of doubly robust estimators.

Further theoretical expositions of the CBPS ATE estimator were provided by Fan et al. (2022).

2
Nevertheless, their investigation focuses on ATE estimation.

In this paper, we apply the CBPS methodology to the ATT estimator within the framework

of DID design. In particular, we rigorously investigate the robustness and efficiency properties

of the CBPS DID estimator. Our contributions to the DID literature are manifold: Firstly,

this article briefly reviews a range of existing ATT estimators within the DID framework

and then propose a CBPS-based DID estimator when panel data are available. Secondly, we

demonstrate that our proposed estimator possesses the qualities of double robustness and

local efficiency. A notable distinction of our estimator is its double robustness not only in

terms of consistency but also in terms of inference. This characteristic guarantees that the

asymptotic linear representation of the CBPS DID estimator remains invariant even when one

of the working models is misspecified. As a consequence, it allows us to estimate the asymptotic

variance based on the influence function. This feature sets our estimator apart from existing

doubly robust AIPW DID estimators because the asymptotic linear representation of the AIPW

DID estimator is not invariant when one of the working models is misspecified. The third

contribution of our work lies in establishing that our estimator can achieve a faster convergence

rate relative to the AIPW DID estimator, under the scenario where both the propensity score and

outcome regression models are locally misspecified. This situation has been seldom addressed in

existing DID literature. Lastly, the fourth contribution is the practicality of our estimator. It is

straightforward to implement. This simplicity in application enhances its utility in empirical

research.

Organization of this paper: The subsequent section of this paper delineates the settings

and assumptions used throughout the paper and briefly overviews some existing DID estimators.

In Section 3, we introduce the CBPS method and propose a CBPS-based DID estimator, and

derive its theoretical properties. We evaluate the finite sample performance of our proposed

CBPS DID estimator with Monte Carlo simulation in Section 4, and provide an empirical

example in Section 5. Conclusions are summarized in Section 6. All mathematical proofs

supporting our arguments and findings are comprehensively compiled in the Appendix.

3
2 Difference-in-differences

2.1 Setup

We introduce the notation and framework that will be employed throughout this article. Our

analysis is based on a two-period, two-group structure (treatment and comparison groups). Let

Yit represent the outcome of interest for unit i at time t. We suppose that researchers have access

to outcome data both in a pre-treatment period t = 0 and in a post-treatment period t = 1.

Define Dit as an indicator variable, where Dit = 1 if unit i receives treatment on time t, and

Dit = 0 otherwise. We note that Di0 = 0 for all units i, which simplifies the treatment indicator

to Di = Di1 . The observed outcome Yit can also be rewritten as Yit = Di Yit (1) + (1 − Di )Yit (0),

where Yit (0) denotes the potential outcome of unit i at time t if one does not receive treatment

and Yit (1) represents the potential outcome if the same one receives treatment, but Yit (0) and

Yit (1) cannot be observed simultaneously for any unit. In the remainder of this paper, we assume

the availability of panel data on {Yi0 , Yi1 , Di , Xi }ni=1 , where Xi ∈ Rd is a vector of pre-treatment

covariates and the first element of Xi is a constant.

The parameter of interest, the average treatment effect on the treated (ATT), is defined as:

τ = E [Yi1 (1) − Yi1 (0)|Di = 1] . (2.1)

Since Yi1 (1) = Yi1 given Di = 1, ATT can be rewritten as:

τ = E [Yi1 |Di = 1] − E [Yi1 (0)|Di = 1] . (2.2)

According to the representation (2.2) above, we can show that the first term (E [Yi1 |Di = 1] =
E[Di Yi1 ]
E[Di ]
) can be estimated directly from the observed data. The main challenge in identifying the

ATT lies in computing the second term (E [Yi1 (0)|Di = 1]) from the observed data since Yi1 (0)

is missing for treated subjects with Di = 1. In order to identify the ATT (or E [Yi1 (0)|Di = 1]),

the following assumptions are necessary.

4
Assumption 1. Assume that the data {Yi0 , Yi1 , Di , Xi }ni=1 are independent and identically

distributed (iid).

Assumption 2. E[Yi1 (0) − Yi0 (0)|Xi , Di = 1] = E[Yi1 (0) − Yi0 (0)|Xi , Di = 0] almost surely.

Assumption 3. For some ε > 0, Pr(Di = 1) > ε and Pr(Di = 1|Xi ) ≤ 1 − ε almost surely.

Assumption 2, which we term the conditional PTA, posits that the average conditional

outcomes for the treatment and comparison groups would have followed parallel paths in the

absence of the treatment. Assumption 3 is an overlap condition, which states that at least a

small fraction of the population is exposed to the treatment, and for every specific value of

covariates Xi , at least a small portion is not treated. These three assumptions are commonly

utilized in semiparametric DID methods, see, e.g. (Heckman, Ichimura, and Todd 1997; Abadie

2005; Sant’Anna and Zhao 2020). Next, we briefly provide an overview of the existing approaches

to identify and estimate the ATT.

2.2 Existing DID estimators

There are two approaches to identify the ATT: the OR approach (Heckman, Ichimura, and

Todd (1997)) and the IPW approach (Abadie (2005)).

(i) The OR approach: under Assumptions 1-3, the ATT is identified as:

(2.3)

where ∆Yi = Yi1 − Yi0 and m∆ (Xi ) = E [∆Yi |Xi , Di = 0]. Based on the identification result

(2.3), the OR approach requires researchers to model the conditional expectation of outcome

evolution E [∆Yi |Xi , Di = 0] at the first step. Researchers typically adopt a linear parametric

5
model Xi′ γ (outcome regression model) to specify the true conditional expectation of outcome

evolution E [∆Yi |Xi , Di = 0]. Consequently, the OR DID estimator is represented as follows:

1 1
τ̂ OR = Xi′ γ̂ OLS ,
X X
∆Yi − (2.4)
ntreat i∈treat ntreat i∈treat

where γ̂ OLS is the OLS estimator of the regression of Yi on Xi by the comparison groups (Di = 0)

and ntreat denotes the treatment group size.

(ii) The IPW approach: under Assumptions 1-3, the ATT is alternatively identified as:

" #
Di − π(Xi )
τ =E ∆Yi := τ IP W , (2.5)
E[Di ](1 − π(Xi ))

where π(Xi ) = Pr(Di = 1|Xi ) is the propensity score. Based on the identification result (2.5), the

IPW approach requires researchers to estimate the propensity score at the first step. Researchers
exp(Xi′ β)
typically use a parametric model (e.g. π(Xi′ β) = 1+exp(Xi′ β)
) to specify the propensity score

π(Xi ) and estimate parameters by the maximum likelihood method. Hence, the IPW DID

estimator is expressed as:

n
IP W 1X Di − π(Xi′ β̂ M L )
τ̂ = ∆Yi , (2.6)
n i=1 D̄(1 − π(Xi′ β̂ M L ))

1 Pn
where β̂M L is the maximum likelihood estimator and D̄ = n i=1 Di .

(iii) AIPW approach: Consistency of the OR and IPW estimators depends on the correct

specification of the outcome regression model and the propensity score model, respectively. To

achieve consistency in scenarios where one of two working models are misspecified, Sant’Anna

and Zhao (2020) proposed the following AIPW DID estimator:

n
1X Di − π(Xi′ β̂ M L )
τ̂ AIP W = (∆Yi − Xi′ γ̂ AIP W ), (2.7)
n i=1 D̄(1 − π(Xi′ β̂ M L ))

6
with γ̂ AIP W = γ̂ OLS , which follows from the alternative identification of ATT:

" #
Di − π(Xi )
τ =E (∆Yi − m∆ (Xi )) := τ AIP W . (2.8)
E[Di ](1 − π(Xi ))

Observing the form of (2.7), it is apparent that AIPW procedure combines both OR and

IPW methodologies. This synthesis allows the AIPW estimator to mitigate some of the inherent

weaknesses associated with the OR and IPW approaches individually. Indeed, Sant’Anna and

Zhao (2020) show that the AIPW DID estimator is both locally efficient and doubly robust

in terms of consistency. In the cross-sectional setting, however, Kang and Schafer (2007)

demonstrated that the performance of the AIPW ATE estimator can be poor in scenarios where

both of the working models are slightly misspecified. To address this issue, Imai and Ratkovic

(2014) introduced the CBPS method and demonstrated that the CBPS ATE estimator can

significantly enhance performance over other existing ATE estimators, including the AIPW

ATE estimator, particularly when both working models are misspecified. In the subsequent

subsection, we extend the application of the CBPS method from estimating ATE to ATT

within the framework of DID research. Our objective is to propose a CBPS DID estimator

and to rigorously investigate its theoretical properties, focusing on their robustness and efficiency.

3 CBPS DID estimator

3.1 CBPS methodology

The CBPS method introduced by Imai and Ratkovic (2014) offers a distinct approach to

propensity score estimation. In contrast to the IPW method, the CBPS method imposes exact

finite sample balance of pre-treatment covariates across the treatment and comparison groups

rather than focusing on the predictive accuracy of treatment assignment. The CBPS DID

estimator is defined as
n
1X Di − π(Xi′ β̂ CBP S )
τ̂ CBP S = ∆Yi , (3.1)
n i=1 D̄(1 − π(Xi′ β̂ CBP S ))

7
where β̂ satisfies

 
n n
1 X
Di −
(1 − Di )π(Xi′ β̂ CBP S )  1X Di − π(Xi′ β̂ CBP S )
Xi = Xi = 0. (3.2)
n i=1 1 − π(Xi′ β̂ CBP S ) n i=1 1 − π(Xi′ β̂ CBP S )

Recall that the IPW DID estimator (2.6) employs the maximum likelihood method to
1 Pn Di −π(Xi′ β̂ M L ) π̇(Xi′ β̂ M L )
estimate the propensity score, where β̂ M L satisfies n i=1 1−π(X ′ β̂ M L ) π(X ′ β̂ M L ) Xi = 0 where
i i

∂π(v)
π̇(v) = ∂v
. Hence the only difference of the IPW DID estimator (2.6) and the CBPS DID

estimator (3.1) is the method of propensity score estimation. On the other hand, by (3.2), the

CBPS DID estimator can also be expressed as

n
1X Di − π(Xi′ β̂ CBP S )
τ̂ CBP S = (∆Yi − Xi′ γ CBP S ) (3.3)
n i=1 D̄(1 − π(Xi′ β̂ CBP S ))

where γ CBP S is any value. Hence the only difference of the AIPW DID estimator (2.7) and the

CBPS DID estimator (3.1) is the value of γ in (2.7) and (3.3). It is noteworthy that γ CBP S in

(3.3) can take any value and it is not estimated in the actual CBPS estimation defined as (3.1).

In the following subsections, we will conduct a comprehensive theoretical analysis of the CBPS

DID estimator to elucidate its advantages. It is this arbitrariness of γ CBP S in (3.3) that plays a

key role in showing robustness to misspecification of parametric working models compared to

the AIPW DID estimator.

3.2 Local efficiency

In this subsection, we start with the scenario when both of the working models are correctly

specified. We show that, in such a case, the CBPS DID estimator attains the semiparametric

efficiency bound for the ATT under DID framework, when both propensity score model and

outcome regression model are correctly specified. This property is the so-called local efficiency.

Sant’Anna and Zhao (2020) derived the semiparametric efficiency bound for ATT under a DID

framework.

Theorem 1. Under Assumptions 1-3 and Assumptions A (stated in Appendix A), if π(Xi′ β) =

8
π(Xi ) and Xi′ γ = m∆ (Xi ) holds,

√ n
1 X d

n(τ̂ CBP S
− τ) = √ − N 0, E[ηie 2 ] ,
ηie + op (1) → (3.4)
n i=1

where
Di − π(Xi ) Di
ηie = (∆Yi − m∆ (Xi )) − τ (3.5)
E[Di ]{1 − π(Xi )} E[Di ]

is the efficient influence function for the ATT and E[ηie 2 ] is the semiparametric efficiency bound.

Theorem 1 shows asymptotic normality of the CBPS DID estimator when both of the

working models are correctly specified. The asymptotic variance of the CBPS DID estimator is

equal to the semiparametric efficiency bound derived by Sant’Anna and Zhao (2020). It should

be noted that the AIPW DID estimator also achieves the semiparametric efficiency bound when

both of the working models are correctly specified.

3.3 Double robustness

In the previous subsection, we showed that the CBPS DID estimator is locally efficient. In

this subsection, we shift our focus from efficiency to robustness, under the scenario that one of

the two working models is misspecified. Firstly, we show that the CBPS DID estimator remains

consistent with the ATT even if either the propensity score model or the outcome regression

model (but not both) is misspecified. This property is referred to as double robustness in terms

of consistency.

Theorem 2. Under Assumptions 1-3 and Assumptions A, the CBPS DID estimator is doubly
p
robust in terms of consistency, that is τ̂ CBP S →
− τ if at least one of the following two conditions

holds:

1. The outcome regression model is correctly specified, that is, there exists some value γ0

such that Xi′ γ0 = m∆ (Xi ) a.s.

9
2. The propensity score model is correctly specified, that is, there exists some value β0 such

that π(Xi′ β0 ) = π(Xi ) a.s.

Consequently, the CBPS DID estimator offers more flexibility and is less demanding in terms

of a researcher’s ability to correctly specify nuisance parametric models, compared to the OR

and the IPW approach. It should be noted that the AIPW DID estimator is also doubly robust

in terms of consistency.

While double robustness in terms of consistency is a valuable property, it may not suffice

for inference. The next theorem shows that the asymptotic linear representation of the CBPS

DID estimator remains invariant even when one of the working models is misspecified so that

the asymptotic variance can be estimated based on the influence function. This is referred as

double robustness in terms of inference. In contrast, the asymptotic linear representation of

the AIPW DID estimator is not invariant when one of the working models is misspecified. A

detailed proof is provided in the Appendix.

Theorem 3. Let β0AIP W , γ0AIP W , β0CBP S and γ0CBP S denote probability limits of β̂ M L , γ̂ AIP W ,
−1
CBP S CBP S Pn (1−Di )π̇(Xi′ β̂ CBP S ) ′ Pn (1−Di )π̇(Xi′ β̂ CBP S )∆Yi
β̂ and γ̂ = i=1 (1−π(X ′ β̂ CBP S ))2 Xi Xi i=1 (1−π(Xi′ β̂ CBP S ))2
Xi , respectively.
i

Under Assumptions 1-3 and Assumptions A, if either π(Xi′ β0a ) = π(Xi ) a.s. or Xi′ γ0a = m∆ (Xi )

a.s. for a = CBP S, AIP W , the CBPS DID and AIPW DID estimators satisfy:

√ n
CBP S 1 X
n(τ̂ − τ) = √ ηiCBP S + op (1), (3.6)
n i=1
√ n
1 X
n(τ̂ AIP W − τ ) = √ η AIP W + Op (1), (3.7)
n i=1 i

Di −π(Xi ′ β0a ) Di
where ηia = E[Di ]{1−π(Xi ′ β0a )}
(∆Yi − Xi′ γ0a ) − E[D i]
τ. Note that ηia is equal to the efficient influence

function (3.5) under the assumption that both working models are correctly specified.

Theorem 3 reveals that inference based on τ̂ AIP W and its influence function may be mislead-

ing when one of the working models is misspecified. In contrast, inference based on τ̂ CBP S and

its influence function will remain accurate even when one of the working models is misspecified.

10
This double robustness in terms of inference significantly enhances the appeal of the CBPS DID

estimator. We note that although γ̂ CBP S does not appear in estimating τ̂ CBP S (see (3.1)), it does

appear in estimating the asymptotic variance. Specifically, the asymptotic variance of the CBPS
2
Pn Di −π(Xi ′ β̂ CBP S ) Di CBP S
DID estimator should be estimated by 1
n i=1 D̄{1−π(Xi ′ β̂ CBP S )}
(∆Yi − Xi′ γ̂ CBP S ) − D̄
τ̂ .

3.4 Convergence rate under local misspecification

Although the CBPS DID estimator is shown to have desirable properties in scenarios where

at least one of the two working models is correctly specified, situations might arise where both

of the working models are misspecified. To address this issue, Fan et al. (2022) conduct a

theoretical investigation of the AIPW ATE and CBPS ATE estimators under the scenario that

both propensity score and outcome models are locally misspecified and find that the CBPS ATE

estimator may converge in probability to the ATE at a faster rate than the AIPW estimator.

In this subsection, we examine whether the CBPS DID estimator inherits such a desirable

property in the DID design. We consider the same locally misspecified models as Fan et al. (2022):

Assumption 4.

π(Xi ) = π(Xi′ β ∗ ) exp(ξu(Xi ; β ∗ )), (3.8)

m∆ (Xi ) = Xi′ γ ∗ + δr(Xi ), (3.9)

where u(Xi ; β ∗ ) and r(Xi ) are functions determining the directions of misspecification with

|u(Xi ; β ∗ )| ≤ C, |r(Xi )| ≤ C a.s. for some positive constant C, ξ ∈ R and δ ∈ R represent the

magnitudes of misspecifications with ξ = o(1) and δ = o(1).

Theorem
 4. Under Assumptions
h π̇(X ′ β ∗ )
1-4 and Assumptions A, suppose
i−1 h π(X ′ β ∗ ) i  at least one entry of
 π̇(Xi′ β ∗ )Xi′ E i X X′
1−π(X ′ β ∗ ) i i
E i
1−π(X ′ β ∗ )
u(Xi ;β ∗ )Xi 
 u(Xi ;β ∗ )
 
E  1−π(X ′ β ∗ ) − i
1−π(Xi′ β ∗ )
i
 is nonzero, as n → ∞, the
Xi 

 i 


11
CBPS DID and AIPW DID estimators satisfy:

n
1X
τ̂ CBP S
−τ = ηie + Op (ξ 2 δ + δn−1/2 + ξn−1/2 ), (3.10)
n i=1

and
n
1X
τ̂ AIP W − τ = η e + Op (ξδ + δn−1/2 + ξn−1/2 ), (3.11)
n i=1 i

where ηie is the efficient influence introduced in Theorem 1.

√
If nξδ → ∞, then the CBPS DID estimator converges in probability to the ATT at

a faster rate than the AIPW DID estimator since τ̂ CBP S − τ = Op (n−1/2 + ξ 2 δ), whereas
√
τ̂ AIP W − τ = Op (ξδ). On the other hand, if nξδ → 0, the two estimators have the same
√
limiting distribution N (0, E[ηie2 ]), but n(τ̂ CBP S − τ ) converges to the limit distribution at a
√
faster rate than n(τ̂ AIP W − τ ). Theorem 4 implies that the CBPS DID estimator demonstrates

greater robustness to slight model misspecification compared to the AIPW DID estimator.

These faster rates of convergence are attributed to the arbitrariness of γ CBP S in (3.3), which

effectively eliminates the product ξδ in the asymptotic linear representation of the CBPS DID

estimator. A detailed proof of this can be found in the Appendix.

4 Simulation

In this section, we conduct a series of simulation studies to examine the finite sample properties

of the CBPS DID estimator. The simulation designs here closely follow those in (Sant’Anna

and Zhao 2020; Fan et al. 2022). Throughout these simulations, we utilize a logistic working

model for the propensity score and a linear regression model for outcome evolution. For the

OR, IPW, and AIPW approaches, we estimate the propensity scores using maximum likelihood

estimation and estimate outcome evolution using ordinary least squares.

We set the sample size n equal to 1000, and conduct 1000 Monte Carlo simulations for

12
each design. The performance of various DID estimators is compared in terms of average bias,

median bias, root mean square error (RMSE), empirical 95% coverage probability, the average

length of a 95% confidence interval, and the average of their plug-in estimator for the asymptotic

variance. The confidence intervals are constructed using the normal approximation, and the

asymptotic variances are estimated by their sample analogues. Additionally, we present the

semiparametric efficiency bound for each design calculated by Sant’Anna and Zhao (2020).

This helps to assess the potential loss of efficiency or accuracy of a semiparametric DID estimator.

4.1 Data generating process

We conduct Monte Carlo simulations across five distinct scenarios as follow:

1. Both propensity score and outcome regression models are correctly specified.

2. Only the outcome regression model is correctly specified.

3. Only the propensity score model is correctly specified.

4. Both the propensity score and the outcome regression models are misspecified.

5. Both the propensity score and the outcome regression models are locally misspecified.

For a generic Wi = (W1i , W2i , W3i , W4i )′ , let

for (Wi ) = 210 + 27.4W1i + 13.7(W2i + W3i + W4i ),

fps (Wi ) = 0.75(−W1i + 0.5W2i − 0.25W3i − 0.1W4i ).

Let Xi = (X1i , X2i , X3i , X4i )′ be generated from N (0, I4 ), and I4 be the 4 × 4 identity ma-
q
trix. For j = 1, 2, 3, 4, let Zji = (Z̃ji − E[Z̃ji ])/ V ar(Z̃ji ), where Z̃1i = exp(0.5X1i ), Z̃2i =

10 + X2i /(1 + exp(X1i )), Z̃3i = (0.6 + X1i X3i /25)3 and Z̃4i = (20 + X2i + X4i )2 . We consider the

following data generating processes (DGPs):

13
DGP1.(PS and OR are correctly specified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 ,

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))), Di = 1{π(Zi ) ≥ Ui },

DGP2.(PS is misspecified but OR is correctly specified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 ,

π(Xi ) = exp(fps (Xi ))/(1 + exp(fps (Xi ))), Di = 1{π(Xi ) ≥ Ui },

DGP3.(PS is correctly specified but OR is misspecified)

Yi1 (d) = 2for (Xi ) + v(Xi , Di ) + εi1 (d), Yi0 (0) = for (Xi ) + v(Xi , Di ) + εi0 ,

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))), Di = 1{π(Zi ) ≥ Ui },

DGP4.(PS and OR are misspecified)

Yi1 (d) = 2for (Xi ) + v(Xi , Di ) + εi1 (d), Yi0 (0) = for (Xi ) + v(Xi , Di ) + εi0 ,

π(Xi ) = exp(fps (Xi ))/(1 + exp(fps (Xi ))), Di = 1{π(Xi ) ≥ Ui },

14
DGP5.(PS and OR are locally misspecified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d) + 2δr(Zi ), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 + δr(Zi ),

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))) · exp(ξu(Zi )), Di = 1{π(Zi ) ≥ Ui },

where d = 0, 1 is an indicator of potential outcome, Di = 0, 1 denotes whether unit i receives

treatment or not, εi0 and εi1 (d) are independent standard normal random variable, Ui is an

independent standard uniform random variable. For a generic Wi , v(Wi , Di ) represents an

independent normal random variable with a mean of Di · for (Wi ) and a variance of one. This

v serves as a proxy for time-invariant unobserved heterogeneity. ξ = δ = n−0.5 denotes the

2 2 2 2 2 2
magnitudes of misspecification, u(Zi ) = −Z1i + Z2i and r(Zi ) = 2Z1i + 4Z2i + 3Z3i + Z4i

determine the directions of misspecification. It is important to note that in all the DGPs

mentioned above„ the true ATT is zero, and the available data are {Yi0 , Yi1 , Di , Zi }ni=1 , where

Zi = (1, Z1i , Z2i , Z3i , Z4i )′ includes a constant among the covariates, the realized outcomeYi0

and Yi1 are generated according to Yi0 = Yi0 (0) and Yi1 = Di Yi1 (1) + (1 − Di )Yi1 (0) respectively.

4.2 Results

The results are summarized in the tables below, τ̂ IP W represents the IPW DID estimator

(2.6), τ̂ OR denotes the OR DID estimator (2.4), τ̂ AIP W is the AIPW DID estimator (2.7), and

τ̂ CBP S refers to the CBPS DID estimator (3.1). The abbreviations used are as follows: “Av.Bias”,

“Med.Bias”, “RMSE”, “Asy.V”, “Cover” and “CIL” stand for the average simulated bias, median

simulated bias, simulated root mean-squared error, average of the plug-in estimators for the

asymptotic variance, 95% coverage probability, and the average length of the 95% confidence

interval.

Table 1 suggests that when both working models are correctly specified, all semiparametric

DID estimators show minimal Monte Carlo bias. However, τ̂ OR , τ̂ AIP W , and τ̂ CBP S outperform

15
Table 1: DGP1, both working models are correctly specified

Semiparametric efficiency bound:11.1

Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 0.091 0.217 2.805 8403.740 0.946 10.526

τ̂ OR 0.002 0.002 0.101 10.244 0.954 0.396
τ̂ AIP W 0.002 0.003 0.105 11.245 0.946 0.414
τ̂ CBP S 0.002 0.003 0.105 10.945 0.943 0.409

τ̂ IP W in terms of bias, root mean square error, asymptotic variance, and the length of the

confidence interval. This implies that the IPW DID estimator is substantially less efficient

compared to the latter three. Although τ̂ OR tends to be slightly more efficient than τ̂ AIP W and

τ̂ CBP S , the performance of these three estimators is quite similar.

Table 2: DGP2, correct outcome regression model with a misspecified propensity score model

Semiparametric efficiency bound:11.6

Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 2.068 2.099 3.297 7472.489 0.833 10.072

τ̂ OR 0.003 0.002 0.100 10.139 0.947 0.394
τ̂ AIP W 0.003 0.004 0.102 10.459 0.944 0.400
τ̂ CBP S 0.003 0.004 0.103 10.713 0.947 0.405

Table 2 illustrates that when the propensity score model is misspecified, the CBPS DID

estimator, τ̂ CBP S , is competitive with τ̂ OR and τ̂ AIP W . As anticipated, τ̂ IP W is biased in this

scenario. Conversely, Table 3 indicates that when the outcome regression model is misspecified,

τ̂ CBP S outperforms the other three estimators in terms of root mean square error, asymptotic

variance, and coverage probability. In this scenario, τ̂ OR displays a non-negligible bias, which

aligns with expectations.

Table 3: DGP3, misspecified outcome regression model with a correct propensity score model

Semiparametric efficiency bound:11.1

Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 0.121 0.313 3.173 10454.468 0.941 11.910

τ̂ OR -1.352 -1.330 1.822 1506.947 0.826 4.804
τ̂ AIP W -0.001 0.010 1.223 1834.486 0.966 5.234
τ̂ CBP S -0.022 0.002 1.011 977.372 0.947 3.870

16
Table 4: DGP4, both models are misspecified

Semiparametric efficiency bound:11.6

Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W -1.046 -1.013 2.609 6092.271 0.954 9.243

τ̂ OR -5.224 -5.195 5.372 1472.513 0.006 4.751
τ̂ AIP W -3.242 -3.220 3.494 2674.697 0.378 6.240
τ̂ CBP S -2.547 -2.528 2.727 974.912 0.265 3.865

In Table 4, when both working models are misspecified, it is unsurprising that all semipara-

metric DID estimators exhibit bias, and generally, the inference procedures are misleading. In

this scenario, the CBPS DID estimator demonstrates smaller biases, lower root mean square

error (RMSE), reduced asymptotic variance, and shorter confidence interval lengths compared

to the OR and AIPW DID estimators. However, in DGP4, the IPW DID estimator appears to

perform the best.

Table 5: DGP5, both models are locally misspecified

Semiparametric efficiency bound:11.1

Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 7.612 7.477 8.036 6938.706 0.118 10.101

τ̂ OR 0.215 0.209 0.244 12.587 0.533 0.439
τ̂ AIP W 0.118 0.113 0.166 12.009 0.777 0.428
τ̂ CBP S 0.086 0.083 0.146 13.032 0.866 0.446

Table 5 indicates that in scenarios when both working models are locally misspecified, τ̂ CBP S

shows the smallest bias and root mean square error (RMSE), along with the best coverage

probability. However, in DGP5, τ̂ IP W demonstrates a non-negligible bias compared to the other

three DID estimators. This finding corroborates the finding that IPW-based estimators are

sensitive to even slight misspecifications of the propensity score model, see, e.g. Kang and

Schafer (2007).

In summary, the findings presented in Table 1 indicate that the estimated variance of τ̂ CBP S

is very close to the semiparametric efficiency bound when both working models are correctly

specified. This supports our Theorem 1 regarding local semiparametric efficiency. In Tables

2 and 3, when one of the working models is misspecified, our proposed CBPS DID estimator

17
shows little bias, justifies the double robustness in terms of consistency as written in Theorem 2.

Furthermore, in DGP2 and DGP3, the CBPS DID estimator achieves a coverage probability

closer to 95% compared to the AIPW DID estimator, validating the superiority of double

robustness for inference demonstrated in Theorem 3. Lastly, Table 5 reveals that the CBPS

DID estimator is more robust to mild model misspecification than the AIPW DID estimator,

confirming the results of Theorem 4.

5 An empirical application

In this section, we apply our CBPS DID estimator to a real data sample. LaLonde (1986)

conducted a highly influential study evaluating the performance of different treatment effect

estimators based on observational data. The study focused on whether a new statistical method-

ology could replicate an experimental benchmark: the treatment effect of a National Supported

Work (NSW) labor training program on post-treatment earnings. Unfortunately, the results

were not satisfactory due to the potential presence of selection bias in the observational data.

Later, Dehejia and Wahba (1999)cdemonstrated that propensity score matching (PSM) based

estimators could closely replicate the experimental results. However, Smith and Todd (2005)

found that cross-sectional PSM estimators are highly sensitive to both model misspecification

and sample selection. They suggested that DID matching estimators were more appropriate.

Following the findings of Smith and Todd (2005), we use different samples and specifications

to evaluate the existing DID estimators. Specifically, we utilize data from the Current Population

Survey (CPS) to create a comparison group and use the control group from LaLonde’s original

experimental sample and the Dehejia and Wahba (DW) sample as our pseudo treatment group.

We consider two datasets: (1) LaLonde’s control group (425) + CPS (15,992), and (2) DW

control group (260) + CPS (15,992). Since no one received training under this setup, the true

ATT, if consistent, should be zero. Therefore, we use deviations from zero to evaluate the

performance of the DID estimators.

18
The pre-treatment covariates in the data include age, real earnings in 1974, years of education,

and dummy variables for high school dropout status, marital status, race (black), and ethnicity

(Hispanic). The outcome of interest is real earnings in 1978, and pre-treatment real earnings in

1975 are also available. As part of our analysis, similar to Monte Carlo simulations, we compare

the performance of the CBPS DID estimator, τ̂ CBP S with the IPW DID estimator, τ̂ IP W , the

OR DID estimator, τ̂ OR , and the AIPW DID estimator, τ̂ AIP W . We assume that the outcome

models are linear in parameters and the propensity score model follows a logistic specification.

To assess the sensitivity to model misspecification, we also consider three different specifications:

(i) linear covariates (Lin); (ii) addition of some quadratic covariates such as age squared and

education squared (Qua); (iii) addition of some interaction terms selected by SantAnna and

Zhao (S&Z). The results are summarized in Table 6, with standard error reported in parentheses.

Table 6: Deviation of different DID estimators for the effect of training on real earnings in 1978,
with CPS comparison group

Lalonde Sample DW Sample

True ATT=0 True ATT=0

τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S

Lin. -1310 -1469 -972 -1030 -397 -560 69 2

(424) (348) (415) (398) (593) (413) (566) (519)

Qua. -878 -1248 -746 -744 -407 -277 151 211

(551) (354) (525) (477) (1018) (421) (884) (663)

S&Z. -778 -1426 -717 -730 -229 -676 313 239

(523) (352) (507) (474) (937) (426) (825) (648)

The results in Table 6 reveal several interesting findings. First, τ̂ OR displays the largest

bias across different datasets and covariates specification. For the Lalonde sample, every DID

estimator shows more severe bias compared to their performance under the DW sample. Second,

Abadie’s IPW DID estimator τ̂ IP W tends to have the largest standard error in all situations

although its bias is relatively small especially under the Qua and S&Z specifications. Third,

τ̂ AIP W and τ̂ CBP S perform better than the other two in terms of bias. The CBPS DID estimator

τ̂ CBP S is very close to the true ATT when adopting the linear specification under the DW

19
sample. Finally, when we compare τ̂ CBP S with τ̂ AIP W we find that the CBPS DID estimator

tends to have smaller standard errors in all situations. Taken together, the results suggest that

the proposed DID estimator is a compelling alternative to existing DID estimators. Additionally,

we use the Panel Study of Income Dynamics (PSID) to create a comparison group, with the

results provided in the Appendix.

6 Conclusion

In this paper, we introduced an ATT estimator based on the CBPS method within a DID

framework. This framework is applicable when the parallel trends assumption holds after

conditioning on a set of pre-treatment covariates and when panel data are available. We

conducted a thorough theoretical investigation of the CBPS DID estimator. Specifically,

we found that while the CBPS DID estimator’s expression is similar to that of the IPW

estimator, its theoretical properties align more closely with those of the AIPW estimator. We

demonstrated that the CBPS DID estimator is locally semiparametrically efficient and exhibits

double robustness, similar to the AIPW DID estimator. Moreover, the asymptotic linear

representation of the CBPS DID estimator remains invariant even when one of the working

models is misspecified, a feature we refer to as double robustness for inference. Additionally,

our estimator has a faster convergence rate than the AIPW DID estimator when both working

models are locally misspecified. These superior properties set the CBPS DID estimator apart

from the AIPW DID estimator. Our simulation results and empirical studies confirm these

theoretical properties, showcasing the advantages of the proposed CBPS DID estimator.

In this work, we have primarily concentrated on the theoretical development of the CBPS

DID estimator, especially in comparison to the AIPW DID estimator. An intriguing extension

involves adapting the CBPS DID estimator to high-dimensional settings. This is a nontrivial task,

as traditional regression methods tend to break down in high-dimensional settings, and CBPS

methodology is no exception. To overcome this challenge, recent research has incorporated

machine learning techniques into the first-step estimation of the propensity score and the

20
outcome evolution. This approach, known as double machine learning methodology, has been

explored in studies including (Chernozhukov et al. 2017; Chernozhukov et al. 2018; Chang 2020).

Our ongoing research aims to develop and investigate a high-dimensional CBPS DID estimator.

References

Abadie, Alberto (2005). “Semiparametric difference-in-differences estimators”. The review of

economic studies 72.1, pp. 1–19.

Chang, Neng-Chieh (2020). “Double/debiased machine learning for difference-in-differences

models”. The Econometrics Journal 23.2, pp. 177–191.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and

Whitney Newey (2017). “Double/debiased/neyman machine learning of treatment effects”.

American Economic Review 107.5, pp. 261–265.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen,

Whitney Newey, and James Robins (2018). Double/debiased machine learning for treatment

and structural parameters.

Dehejia, Rajeev H and Sadek Wahba (1999). “Causal effects in nonexperimental studies: Reeval-

uating the evaluation of training programs”. Journal of the American statistical Association

94.448, pp. 1053–1062.

Fan, Jianqing, Kosuke Imai, Inbeom Lee, Han Liu, Yang Ning, and Xiaolin Yang (2022).

“Optimal covariate balancing conditions in propensity score estimation”. Journal of Business

& Economic Statistics 41.1, pp. 97–110.

Heckman, James J, Hidehiko Ichimura, and Petra E Todd (1997). “Matching as an econometric

evaluation estimator: Evidence from evaluating a job training programme”. The review of

economic studies 64.4, pp. 605–654.

Imai, Kosuke and Marc Ratkovic (2014). “Covariate balancing propensity score”. Journal of the

Royal Statistical Society Series B: Statistical Methodology 76.1, pp. 243–263.

21
Kang, Joseph DY and Joseph L Schafer (2007). “Demystifying double robustness: A comparison

of alternative strategies for estimating a population mean from incomplete data”.

LaLonde, Robert J (1986). “Evaluating the econometric evaluations of training programs with

experimental data”. The American economic review, pp. 604–620.

Ning, Yang, Peng Sida, and Kosuke Imai (2020). “Robust estimation of causal effects via a

high-dimensional covariate balancing propensity score”. Biometrika 107.3, pp. 533–554.

Sant’Anna, Pedro HC and Jun Zhao (2020). “Doubly robust difference-in-differences estimators”.

Journal of econometrics 219.1, pp. 101–122.

Smith, Jeffrey A and Petra E Todd (2005). “Does matching overcome LaLonde’s critique of

nonexperimental estimators?” Journal of econometrics 125.1-2, pp. 305–353.

22
A Mathematical appendix

A1. Additional assumptions for first step estimations

For simplify the notation, let g(x) be a generic notation for π(x) and m∆ (x), denote a parametric

function g(x′ θ) that serves as a generic representation for π(x′ β) and x′ γ. Additionally, for a generic
p
W , let ∥W ∥ = trace (W ′ W ) denote the Euclidean norm of W .

Let
Di − π(Xi′ β) ∂λi (ζ)
λi (ζ) = (∆Yi − Xi′ γ), λ̇i (ζ) = ,
E[Di ](1 − π(Xi )) ∂ζ

where ζ = (β ′ , γ ′ )′ .

Assumption A.

(i) Assume there is a known parametric function g(x′ θ) = g(x) and θ ∈ Θ ⊂ RK , where Θ is a compact

parameter set.

(ii) Assume g(Xi′ θ) is a.s. continuous in θ ∈ Θ and is a.s. continuously differentiable, in addition,

assume π̇(Xi′ β) is bounded away from zero and infinity.

(iii) There exists a unique pseudo-true parameter θ0 ∈ int (Θ).

(iv) Assume that θ̂ strongly converges to θ0 and can be asymptotically expressed as:

n
√ 1 X
n(θ̂ − θ0 ) = √ ψg (Di , ∆Yi , Xi , θ0 ) + op (1),
n i=1

where E ψg (Di , ∆Yi , Xi , θ0 ) = 0, E ψg (Di , ∆Yi , Xi , θ0 )ψg (Di , ∆Yi , Xi , θ0 )′ is finite and is positive

definite.

h i
(v) Assume that E ∥λi (ζ 0 )∥2 < ∞ and E supζ∈Γ0 |λ̇i (ζ)| < ∞, where Γ0 is a small neighborhood of ζ 0 .

23
√
Assumption A (i)-(iv) represent standard conditions necessary for consistency and n-asymptotically

linear representations of the first-step estimators, and (v) imposes some weak integrability conditions.

See Sant’Anna and Zhao (2020) for more details.

A2. ATT in OR approach

τ = E [Yi1 (1) − Yi1 (0)|Di = 1]

= E [Yi1 (1) − Yi0 (0)|Di = 1] − E [E [Yi1 (0) − Yi0 (0)|Xi , Di = 1] |Di = 1]

= E [Yi1 (1) − Yi0 (0)|Di = 1] − E [E [Yi1 (0) − Yi0 (0)|Xi , Di = 0] |Di = 1] (Assumption 2)

= E [∆Yi |Di = 1] − E [E [∆Yi |Xi , Di = 0] |Di = 1]

= E [∆Yi |Di = 1] − E [m∆ (Xi )|Di = 1] = τ OR .

A3. ATT in IPW approach

Di − π(Xi )

τ IP W = E ∆Yi
E[Di ](1 − π(Xi ))
h i
(1−Di )π(Xi )
E [Di ∆Yi ] E 1−π(Xi ) ∆Yi
= −
E[Di ] E[Di ]
h i
π(Xi )
E [Di ∆Yi ] E 1−π(Xi ) E [(1 − Di )∆Yi |Xi ]
= −
Pr(Di = 1) Pr(Di = 1)
h i
π(Xi )
E 1−π(Xi ) E [1 − Di |Xi ] E [Yi1 (0) − Yi0 (0)|Xi ]
= E [∆Yi |Di = 1] − (Assumption 2)
Pr(Di = 1)
E [π(Xi )E [∆Yi |Xi , Di = 0]]
= E [∆Yi |Di = 1] −
Pr(Di = 1)

= E [∆Yi |Di = 1] − E [E [∆Yi |Xi , Di = 0] |Di = 1] = τ

24
A4. ATT in AIPW approach

Di − π(Xi )

τ AIP W = E (∆Yi − m∆ (Xi ))
E[Di ](1 − π(Xi ))
Di − π(Xi ) E[Di |Xi ] − π(Xi )

=E ∆Yi − E m∆ (Xi )
E[Di ](1 − π(Xi )) E[Di ](1 − π(Xi ))

= τ IP W − 0 = τ

A5. Proof of Theorem 1

Suppose that both working models are correctly specified, that is π(Xi′ β) = π(Xi ), Xi′ γ = m∆ (Xi ).

Then, by using the expression (3.3), a Taylor expansion yields:

n
( )
√ 1 X Di − π(Xi′ β) Di
n(τ̂ CBP S − τ) = √ ′ (∆Yi − Xi′ γ CBP S ) − τ
n i=1 E[Di ](1 − π(Xi β)) E[Di ]
" #
√ (Di − π(Xi′ β))(∆Yi − Xi′ γ CBP S ) Di
− n(D̄ − E[Di ])E − 2 τ
E 2 [Di ](1 − π(Xi′ β)) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β)
− n(β̂ CBP S − β)′ E Xi + op (1)
E[Di ](1 − π(Xi′ β))2
n
( )
1 X Di − π(Xi′ β) Di
=√ (∆Yi − Xi′ γ CBP S ) − τ + op (1)
n i=1 E[Di ](1 − π(Xi′ β)) E[Di ]
n
1 X
=√ η e + op (1),
n i=1 i

where the second equality follows from

" #
(Di − π(Xi′ β))(∆Yi − Xi′ γ CBP S ) Di (Di − π(Xi ))(∆Yi − m∆ (Xi )) Di

E 2 ′ − 2 τ =E 2
− 2 τ
E [Di ](1 − π(Xi β)) E [Di ] E [Di ](1 − π(Xi )) E [Di ]
τ AIP W − τ
= = 0,
E[Di ]
" #
(1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β) E [(1 − Di )(∆Yi − m∆ (Xi ))|Xi ] π̇(Xi′ β)

E Xi = E Xi = 0
E[Di ](1 − π(Xi′ β))2 E[Di ](1 − π(Xi ))2

by setting γ CBP S = γ. (Note that γ CBP S can take any value.) Thus the conclusion follows by CLT.

25
A6. Proof of Theorem 2

We separate the proof into two cases.

Case 1: If the outcome model is correctly specified, that is Xi′ γ0 = m∆ (Xi ) but π(Xi′ β0 ) ̸= π(Xi ), by

the weak law of large numbers and the continuous mapping theorem, as n → ∞,

" #
CBP S p Di − π(Xi′ β0 )
τ̂ →
− E (∆Yi − m∆ (Xi ))
E[Di ](1 − π(Xi′ β0 ))
" #
Di ∆Yi Di m∆ (Xi ) (1 − Di )(∆Yi − m∆ (Xi ))π(Xi′ β0 )

=E − −E
E[Di ] E[Di ] E[Di ](1 − π(Xi′ β0 ))
" #
Di ∆Yi Di m∆ (Xi ) E[(1 − Di )(∆Yi − m∆ (Xi ))|Xi ]π(Xi′ β0 )

=E − −E
E[Di ] E[Di ] E[Di ](1 − π(Xi′ β0 ))

= τ OR − 0 = τ,

Di −π(Xi′ β0 ) (1−Di )π(Xi′ β0 )

where the first equality follows from 1−π(Xi′ β0 ) = Di − 1−π(Xi′ β0 ) and the third equality follows

from E [(1 − Di )(∆Yi − m∆ (Xi ))|Xi ] = 0.

Case 2: If only the propensity score model is correctly specified, that is, π(Xi′ β0 ) = π(Xi ) but

Xi′ γ0 ̸= m∆ (Xi ), by the weak law of large numbers and the continuous mapping theorem, as n → ∞,

Di − π(Xi )

p
τ̂ CBP S →
− E (∆Yi − Xi′ γ CBP S )
E[Di ](1 − π(Xi ))
Di − π(Xi ) E [Di |Xi ] − π(Xi ) ′ CBP S

=E ∆Yi − E Xγ
E[Di ](1 − π(Xi )) E[Di ](1 − π(Xi )) i

= τ IP W − 0 = τ,

where the second equality follows from E[Di |Xi ] = π(Xi ).

26
A7. Proof of Theorem 3

By Taylor expansions, we have

√
n(τ̂ CBP S − τ )
n
" #
1 X √ (Di − π(Xi′ β0CBP S ))(∆Yi − Xi′ γ CBP S ) Di
=√ ηiCBP S − n(D̄ − E[Di ])E 2 ′ CBP S
− 2 τ
n i=1 E [Di ](1 − π(Xi β0 )) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β0CBP S )
− n(β̂ CBP S
− β0CBP S )′ E Xi + op (1), (A.1)
E[Di ](1 − π(Xi′ β0CBP S ))2

whereas

√
n(τ̂ AIP W − τ )
n
" #
1 X √ (Di − π(Xi′ β0AIP W ))(∆Yi − Xi′ γ0AIP W ) Di
=√ ηiAIP W − n(D̄ − E[Di ])E 2 ′ AIP W
− 2 τ
n i=1 E [Di ](1 − π(Xi β0 )) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ0AIP W )π̇(Xi′ β0AIP W )
− n(β̂ AIP W − β0AIP W )′ E Xi
E[Di ](1 − π(Xi′ β0AIP W ))2
" #
√ Di − π(Xi′ β0AIP W )
− n(γ̂ AIP W − γ0AIP W )′ E Xi + op (1). (A.2)
E[Di ](1 − π(Xi′ β0AIP W ))

As in the proof of Theorem 2, we separate the proof into two cases.

Case 1: If the outcome regression model is correct but the propensity score model is incorrect, the

third terms of both AIPW and CBPS expansions are zero but the fourth term of the AIPW expansion

is nonzero and of order Op (1) because π(Xi ) ̸= π(Xi′ β0AIP W ).

Case 2: On the other hand, if the propensity score model is correct but the outcome regression model is

incorrect, the fourth term of the AIPW expansion is zero but the third terms of the AIPW expansion is

nonzero and of order Op (1). However, the arbitrariness of γ CBP S in the CBPS expansion offers a signifi-

cant advantage. Specifically, the third term of CBPS expansion is zero even when the outcome regression
−1
(1−Di )π̇(Xi′ β0CBP S ) (1−Di )∆Yi π̇(Xi′ β0CBP S )
model is incorrect by setting γ CBP S = E (1−π(Xi′ β0CBP S ))2
Xi Xi′ E (1−π(Xi′ β0CBP S ))2
Xi :=

γ0CBP S .

27
A8. Proof of Theorem 4

We provide a sketch of the proof because the detail is very similar to that of (3.10) and (3.11) of Fan

et al. (2021). Letting β0CBP S denote the probability limit of β̂ CBP S , we decompose

τ̂ CBP S − τ
n
" #
1 1X Di − π(Xi′ β0CBP S ) ′ CBP S

= ∆Yi − Xi γ − Di τ
D̄ n i=1 1 − π(Xi′ β0CBP S )
n
" #
1 1X Di − π(Xi′ β̂ CBP S ) Di − π(Xi′ β0CBP S ) ′ CBP S

+ − ∆Yi − Xi γ
D̄ n i=1 1 − π(Xi′ β̂ CBP S ) 1 − π(Xi′ β0CBP S )

:= A1 + A2 .

First, we write γ CBP S = γ ∗ + δA for any value A since γ CBP S is arbitrary. Then we have

n o
∆Yi − Xi′ γ CBP S = ∆Yi − m∆ (Xi ) + m∆ (Xi ) − Xi′ γ CBP S

= ∆Yi − m∆ (Xi ) + δ r(Xi ) − Xi′ A .

(A.3)

Also, by the same argument as the proof of (C.1) in Fan et al. (2021), we have

" #−1 " #

∗ π̇i∗ πi∗
β0CBP S − β = ξE Xi Xi′ E u∗ X ′ + O(ξ 2 ). (A.4)
1 − πi∗ 1 − πi∗ i i

where u∗i = ui (Xi ; β ∗ ), πi∗ = π(Xi′ β ∗ ) and π̇i∗ = π̇(Xi′ β ∗ ). Hence by using a similar argument as the

proof of Theorem 1 with (A.3) and (A.4), we have

A2 = Op (δn−1/2 ).

and

n
" #
1X 1 Di − π(Xi′ β0CBP S ) ′ CBP S

A1 = ηie + E m X
∆ i − Xi γ + Op (ξn−1/2 + δn−1/2 )
n i=1 E[Di ] 1 − π(Xi′ β0CBP S )
n
" #
1X 1 Di − π(Xi′ β0CBP S )
= ηie + E δ r(Xi ) − Xi′ A + Op (ξn−1/2 + δn−1/2 )
n i=1 E[Di ] 1 − π(Xi′ β0CBP S )
n
1X
= η e + Op (ξ 2 δ + ξn−1/2 + δn−1/2 ).
n i=1 i

28
To see the last equality, the second term is calculated as

" #
1 Di − π(Xi′ β0CBP S )
E δ r(Xi ) − Xi′ A
E[Di ] 1 − π(Xi′ β0CBP S )
"( ) #
1 Di − π(Xi′ β ∗ ) (1 − Di )π̇i∗ Xi′ (β0CBP S − β ∗ )
Xi′ A + O(ξ 2 δ)

= E − δ r(Xi ) −
E[Di ] 1 − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ )}2
"( ) #
1 π(Xi′ β ∗ ) (1 + ξu∗i ) − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ ) (1 + ξu∗i )} π̇i∗ Xi′ (β0CBP S − β ∗ )
δ r(Xi ) − Xi′ A

= E −
E[Di ] 1 − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ )}2

+ O(ξ 2 δ)
 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 
1

 ξui∗ ξ π̇i i 1−πi ∗ X X
i i E u
1−πi i i
∗ 
δ r(Xi ) − Xi′ A  + O(ξ 2 δ)

= E −
 
′ ∗ ′ ∗
E[Di ]  1 − π(Xi β )
 1 − π(Xi β ) 

 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 
ξδ

 ∗
ui π̇ i i 1−πi ∗ Xi Xi E ∗
1−πi iu i 
r(Xi ) − Xi′ A  + O(ξ 2 δ).

= E ′β∗) −
 
E[Di ] 1 − π(X 1 − π(X ′β∗)
i i

 


 h π̇∗ i−1 h π∗ i 
 ∗ ′ i
π̇i Xi E 1−π∗ Xi Xi ′ i ∗
E 1−π∗ ui Xi 

u∗i

Hence assuming that at least one entry of E  1−π(X ′ β ∗ ) − i i
Xi  is
 
 i 1−π(Xi′ β ∗ ) 
 
nonzero, there exists A such that

 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 

 ∗
ui π̇i i 1−πi ∗ Xi Xi E ∗
1−πi iu i 
r(Xi ) − Xi′ A  = 0.

E ′β∗) −
 
1 − π(X 1 − π(X ′β∗)
i i

 


This completes the proof of (3.10). The proof of (3.11) follows from the same argument except that

γ̂ AIP W is not arbitrary unlike γ CBP S .

29
A9. Additional Application Results

Table 7: Deviation of different DID estimators for the effect of training on real earnings in 1978,
with PSID comparison group

Lalonde Sample DW Sample

True ATT=0 True ATT=0

τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S

Lin. -1038 -1605 1190 661 2238 -664 4062 1932

(821) (699) (1169) (505) (1196) (898) (2298) (552)

Qua. 812 -1236 1357 930 1785 -308 4673 2331

(825) (704) (1177) (569) (1268) (902) (2480) (681)

S&Z. 811 -947 1039 915 1750 209 3934 2215

(826) (642) (1052) (564) (1310) (812) (2118) (715)

Simplifying Difference-in-Differences Analysis
No ratings yet
Simplifying Difference-in-Differences Analysis
20 pages
Distribution Regression for DiD Analysis
No ratings yet
Distribution Regression for DiD Analysis
49 pages
Dynamic DiD Regression Li Strezhnev June 25 2024
No ratings yet
Dynamic DiD Regression Li Strezhnev June 25 2024
112 pages
Synthetic Difference-in-Differences Study
No ratings yet
Synthetic Difference-in-Differences Study
32 pages
Callaway & SantAnna
No ratings yet
Callaway & SantAnna
31 pages
Transformation for Difference-in-Differences
No ratings yet
Transformation for Difference-in-Differences
45 pages
SDID Estimator Implementation in Stata
No ratings yet
SDID Estimator Implementation in Stata
37 pages
Utaa 001
No ratings yet
Utaa 001
17 pages
Nonlinear DiD Methods for Panel Data
No ratings yet
Nonlinear DiD Methods for Panel Data
36 pages
Causal Inference for Social Scientists
No ratings yet
Causal Inference for Social Scientists
33 pages
DiD Notes
No ratings yet
DiD Notes
76 pages
Better Understanding Triple Differences Estimators: Marcelo Ortiz-Villavicencio Pedro H. C. Sant'Anna July 21, 2025
No ratings yet
Better Understanding Triple Differences Estimators: Marcelo Ortiz-Villavicencio Pedro H. C. Sant'Anna July 21, 2025
60 pages
Advances in Difference-in-Differences Methods
No ratings yet
Advances in Difference-in-Differences Methods
58 pages
w29691 PDF
No ratings yet
w29691 PDF
30 pages
Difference-in-Differences Designs: A Practitioner's Guide
No ratings yet
Difference-in-Differences Designs: A Practitioner's Guide
75 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
DID Topics
No ratings yet
DID Topics
86 pages
5 6102481020978726165
No ratings yet
5 6102481020978726165
51 pages
Non-Parallel Trends in DiD Estimation
No ratings yet
Non-Parallel Trends in DiD Estimation
33 pages
SSRN 3555463
No ratings yet
SSRN 3555463
64 pages
Revisiting Event Study Designs: Robust and Efficient Estimation
No ratings yet
Revisiting Event Study Designs: Robust and Efficient Estimation
67 pages
DID Paper 2
No ratings yet
DID Paper 2
51 pages
SSRN 4015931
No ratings yet
SSRN 4015931
12 pages
Lecture Note 12
No ratings yet
Lecture Note 12
42 pages
Generalized Synthetic Control Method
No ratings yet
Generalized Synthetic Control Method
20 pages
Did, Iv
No ratings yet
Did, Iv
42 pages
Research Paper - Econometrics - TWFE
No ratings yet
Research Paper - Econometrics - TWFE
35 pages
Synthetic Control and Regression Methods
No ratings yet
Synthetic Control and Regression Methods
38 pages
Difference-in-Difference Causal Estimation
No ratings yet
Difference-in-Difference Causal Estimation
56 pages
Covariate Balancing in Treatment Effect Estimation
No ratings yet
Covariate Balancing in Treatment Effect Estimation
12 pages
Identifying DD in Cross-Sectional Data
No ratings yet
Identifying DD in Cross-Sectional Data
7 pages
Advances in Difference-in-Differences Methods
No ratings yet
Advances in Difference-in-Differences Methods
54 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
Advanced Difference-in-Differences Workshop
No ratings yet
Advanced Difference-in-Differences Workshop
28 pages
New Methods for Causal Inference in Panel Data
No ratings yet
New Methods for Causal Inference in Panel Data
62 pages
Multi-Accurate CATE Estimation Methods
No ratings yet
Multi-Accurate CATE Estimation Methods
53 pages
Li Luo and Pattabhiramaiah
No ratings yet
Li Luo and Pattabhiramaiah
15 pages
Differences in Differences
No ratings yet
Differences in Differences
78 pages
Event Studies Slides
No ratings yet
Event Studies Slides
39 pages
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
No ratings yet
Chaisemartind'Haultfoeuille (2023) EconometricsJournal
30 pages
Coping with Non-Parallel Trends in DID
No ratings yet
Coping with Non-Parallel Trends in DID
15 pages
Difference-in-Differences with Interference
No ratings yet
Difference-in-Differences with Interference
65 pages
De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects
No ratings yet
De Chaisemartin D Haultfœuille 2020 Two Way Fixed Effects Estimators With Heterogeneous Treatment Effects
35 pages
Estimating Causal Effects Using Difference-In-differences Under Network Dependency and Interference
No ratings yet
Estimating Causal Effects Using Difference-In-differences Under Network Dependency and Interference
44 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
Difference-in-Differences Methodology Overview
No ratings yet
Difference-in-Differences Methodology Overview
53 pages
Event-Study Designs: Robust Estimation Methods
No ratings yet
Event-Study Designs: Robust Estimation Methods
33 pages
Slides 1 Arnold Ventures 2024
No ratings yet
Slides 1 Arnold Ventures 2024
68 pages
Bayesian Methods for DiD Inference
No ratings yet
Bayesian Methods for DiD Inference
65 pages
Optimal Testing and Treatment Estimation
No ratings yet
Optimal Testing and Treatment Estimation
69 pages
Difference-in-Differences Estimation Overview
No ratings yet
Difference-in-Differences Estimation Overview
19 pages
Difference-in-Differences Estimation Overview
No ratings yet
Difference-in-Differences Estimation Overview
20 pages
Borusyak Jaravel Spiess 2024 Event Study Design
No ratings yet
Borusyak Jaravel Spiess 2024 Event Study Design
33 pages
Counterfactual Representation Learning for Dose-Response
No ratings yet
Counterfactual Representation Learning for Dose-Response
14 pages
13 Dind
No ratings yet
13 Dind
58 pages
Hirano Imbens Ridder 2003
No ratings yet
Hirano Imbens Ridder 2003
30 pages
CTBench: Cryptocurrency Time Series Generation Benchmark
No ratings yet
CTBench: Cryptocurrency Time Series Generation Benchmark
21 pages
Comparisons of Experiments in Moral Hazard Problems: Zizhe Xia
No ratings yet
Comparisons of Experiments in Moral Hazard Problems: Zizhe Xia
26 pages
CreditARF: A Framework For Corporate Credit Rating With Annual Report and Financial Feature Integration
No ratings yet
CreditARF: A Framework For Corporate Credit Rating With Annual Report and Financial Feature Integration
64 pages
Assessing The Macroeconomic Impacts of Disasters: An Updated Multi-Regional Impact Assessment (MRIA) Model
No ratings yet
Assessing The Macroeconomic Impacts of Disasters: An Updated Multi-Regional Impact Assessment (MRIA) Model
25 pages
A Difference-In-Differences Estimator by Covariate Balancing Propensity Score
No ratings yet
A Difference-In-Differences Estimator by Covariate Balancing Propensity Score
49 pages
Channel Choice and Customer Value
No ratings yet
Channel Choice and Customer Value
116 pages
AYO8 Gok WDR 8 B
No ratings yet
AYO8 Gok WDR 8 B
12 pages
A Tokenized Sovereign Debt Conversion Mechanism For Dynamic Public Debt Reduction
No ratings yet
A Tokenized Sovereign Debt Conversion Mechanism For Dynamic Public Debt Reduction
26 pages
Endogenous Network Structures With Precision and Dimension Choices
No ratings yet
Endogenous Network Structures With Precision and Dimension Choices
33 pages
Cactus Flower Spaces and Monodromy of Bethe Vectors: Abstract
No ratings yet
Cactus Flower Spaces and Monodromy of Bethe Vectors: Abstract
33 pages
Subcategories of Module Categories Via Restricted Yoneda Embeddings
No ratings yet
Subcategories of Module Categories Via Restricted Yoneda Embeddings
16 pages
Representation Infinite Algebras From Non-Abelian Subgroups of
No ratings yet
Representation Infinite Algebras From Non-Abelian Subgroups of
49 pages
Arrow Reductions For The Finitistic Dimension Conjecture: Abstract
No ratings yet
Arrow Reductions For The Finitistic Dimension Conjecture: Abstract
51 pages
Heterogeneous Effects in Labor Economics
No ratings yet
Heterogeneous Effects in Labor Economics
31 pages
Employee Retention Through Negative Shocks
No ratings yet
Employee Retention Through Negative Shocks
57 pages
UFOs and Balloon Incidents Analysis
No ratings yet
UFOs and Balloon Incidents Analysis
9 pages
Algorithmic Collusion by Large Language Models: Sara Fish Yannai A. Gonczarowski Ran Shorrer March 31, 2024
No ratings yet
Algorithmic Collusion by Large Language Models: Sara Fish Yannai A. Gonczarowski Ran Shorrer March 31, 2024
41 pages
SAW RFID Tag Capacity Limits Analysis
No ratings yet
SAW RFID Tag Capacity Limits Analysis
7 pages
Predicting The Tensile Strength of Polyester/cotton Blended Woven Fabrics Using Feed Forward Back Propagation Artificial Neural Networks
No ratings yet
Predicting The Tensile Strength of Polyester/cotton Blended Woven Fabrics Using Feed Forward Back Propagation Artificial Neural Networks
8 pages
Linear Regression A Foundational ML Algorithm
No ratings yet
Linear Regression A Foundational ML Algorithm
10 pages
Intelligent Traffic Signal Control System
No ratings yet
Intelligent Traffic Signal Control System
11 pages
1 s2.0 S2666016425000878 Main
No ratings yet
1 s2.0 S2666016425000878 Main
37 pages
Deep Learning Anime Recommender System
No ratings yet
Deep Learning Anime Recommender System
23 pages
FANNTool 1.0 Guide for Developers
No ratings yet
FANNTool 1.0 Guide for Developers
21 pages
Practical Guide To Statistical Forecasting in APO DP
100% (3)
Practical Guide To Statistical Forecasting in APO DP
49 pages
BMCook: Task-Agnostic Model Compression
No ratings yet
BMCook: Task-Agnostic Model Compression
10 pages
ML-Based Financial Management System
No ratings yet
ML-Based Financial Management System
9 pages
Paper 4
No ratings yet
Paper 4
6 pages
Step by Step Data Processing For ML Project
No ratings yet
Step by Step Data Processing For ML Project
16 pages
ML Interview Questions
No ratings yet
ML Interview Questions
21 pages
2 PB
No ratings yet
2 PB
10 pages
22cs503 Machine Learning Lab
No ratings yet
22cs503 Machine Learning Lab
45 pages
Evaluation and Prediction of Transportation Resilience Under Extreme Weather Events A Diffusion Graph Convolutional Approach
No ratings yet
Evaluation and Prediction of Transportation Resilience Under Extreme Weather Events A Diffusion Graph Convolutional Approach
20 pages
Data-Driven Sales Analysis at GB Foods
No ratings yet
Data-Driven Sales Analysis at GB Foods
8 pages
MCQs Unit 3 Measures of Dispersion
100% (2)
MCQs Unit 3 Measures of Dispersion
15 pages
Adaptive Wavelet Thresholding For Image Denoising
No ratings yet
Adaptive Wavelet Thresholding For Image Denoising
16 pages
Lab - Eti Mannual
No ratings yet
Lab - Eti Mannual
57 pages
IEEE Paper SPD
No ratings yet
IEEE Paper SPD
4 pages
Heineken Sales Forecasting Analysis
No ratings yet
Heineken Sales Forecasting Analysis
30 pages
Advanced Machine Learning Techniques
No ratings yet
Advanced Machine Learning Techniques
164 pages
RAI Dumps 28 Sep
No ratings yet
RAI Dumps 28 Sep
161 pages
1 s2.0 S1877050919302789 Main
No ratings yet
1 s2.0 S1877050919302789 Main
7 pages
MAT 361 Lecture 20 21
No ratings yet
MAT 361 Lecture 20 21
38 pages
Employment vs. Education Enrollment in Malaysia
No ratings yet
Employment vs. Education Enrollment in Malaysia
25 pages
Ultimate Statistics Handnote
No ratings yet
Ultimate Statistics Handnote
19 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
Key Scikit-Learn Models and Hyperparameters
No ratings yet
Key Scikit-Learn Models and Hyperparameters
1 page