0% found this document useful (0 votes)
37 views30 pages

Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs

This article introduces a covariate balancing propensity score (CBPS) estimator for estimating treatment effects on the treated (ATT) within a difference-in-differences (DID) framework using panel data. The proposed CBPS DID estimator exhibits double robustness and local efficiency, outperforming traditional augmented inverse probability weighting (AIPW) estimators, particularly in scenarios of model misspecification. The paper provides theoretical insights, simulation studies, and empirical examples to demonstrate the effectiveness of the CBPS DID estimator.

Uploaded by

chinamaker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views30 pages

Anchoring-Based Causal Design (ABCD) : Estimating The Effects of Beliefs

This article introduces a covariate balancing propensity score (CBPS) estimator for estimating treatment effects on the treated (ATT) within a difference-in-differences (DID) framework using panel data. The proposed CBPS DID estimator exhibits double robustness and local efficiency, outperforming traditional augmented inverse probability weighting (AIPW) estimators, particularly in scenarios of model misspecification. The paper provides theoretical insights, simulation studies, and empirical examples to demonstrate the effectiveness of the CBPS DID estimator.

Uploaded by

chinamaker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A difference-in-differences estimator by

covariate balancing propensity score


Junjie Li
Department of Economics, Hitotsubashi University
and
Yukitoshi Matsushita
arXiv:2508.02097v1 [econ.EM] 4 Aug 2025

Department of Economics, Hitotsubashi University


August 5, 2025

Abstract
This article develops a covariate balancing approach for the estimation of treatment
effects on the treated (ATT) in a difference-in-differences (DID) research design when
panel data are available. We show that the proposed covariate balancing propensity score
(CBPS) DID estimator possesses several desirable properties: (i) local efficiency, (ii) double
robustness in terms of consistency, (iii) double robustness in terms of inference, and (iv)
faster convergence to the ATT compared to the augmented inverse probability weighting
(AIPW) DID estimators when both working models are locally misspecified. These latter
two characteristics set the CBPS DID estimator apart from the AIPW DID estimator
theoretically. Simulation studies and an empirical study demonstrate the desirable finite
sample performance of the proposed estimator.

Keywords: double robustness, local misspecification, panel data, treatment effects on the treated
(ATT),

1
1 Introduction

Difference-in-differences (DID) is a widely employed research design in evaluating the causal

effects of policy interventions using observational data. In its canonical form, the DID approach

necessitates two groups and two periods, stipulating that no entity is exposed to the treatment

in the initial period, while a subset remains untreated in the subsequent period. A crucial

underpinning of the DID design is the so-called (unconditional) parallel trends assumption

(PTA), which posits that, in the absence of the treatment, the average outcomes for both the

treatment and comparison groups would have evolved along parallel paths over time. While the

PTA is inherently untestable, its validity has been questioned, particularly in scenarios where pre-

treatment characteristics, differing between the treatment and comparison groups, are correlated

with the outcome’s evolution. In such instances, the canonical DID setup becomes implausible,

prompting researchers to incorporate pre-treatment covariates into the DID framework. This

modification ensures the satisfaction of the PTA conditionally on these covariates (conditional

PTA).

Under this conditional PTA, three predominant estimation procedures have emerged: the

outcome regression (OR), the inverse probability weighting (IPW), and the augmented IPW

(AIPW), the latter offering double robustness in terms of consistency, see Sant’Anna and Zhao

(2020) for a comprehensive review. However, these methods confront the challenge of potential

misspecification of the outcome regression and/or propensity score models, leading to incorrect

inferences. While doubly robust estimators offer improved consistency by requiring only one

of the two working models to be correctly specified, an unavoidable situation where both

models are misspecified still remains. For estimation of the average treatment effect (ATE),

Kang and Schafer (2007) highlight this limitation, demonstrating that the advantages of AIPW

estimators can substantially diminish when both the outcome regression and propensity score

models are slightly misspecified. To address this issue, Imai and Ratkovic (2014) proposed

the Covariate Balancing Propensity Score (CBPS) methodology, illustrating that the CBPS

estimator can significantly ameliorate the finite sample performance of doubly robust estimators.

Further theoretical expositions of the CBPS ATE estimator were provided by Fan et al. (2022).

2
Nevertheless, their investigation focuses on ATE estimation.

In this paper, we apply the CBPS methodology to the ATT estimator within the framework

of DID design. In particular, we rigorously investigate the robustness and efficiency properties

of the CBPS DID estimator. Our contributions to the DID literature are manifold: Firstly,

this article briefly reviews a range of existing ATT estimators within the DID framework

and then propose a CBPS-based DID estimator when panel data are available. Secondly, we

demonstrate that our proposed estimator possesses the qualities of double robustness and

local efficiency. A notable distinction of our estimator is its double robustness not only in

terms of consistency but also in terms of inference. This characteristic guarantees that the

asymptotic linear representation of the CBPS DID estimator remains invariant even when one

of the working models is misspecified. As a consequence, it allows us to estimate the asymptotic

variance based on the influence function. This feature sets our estimator apart from existing

doubly robust AIPW DID estimators because the asymptotic linear representation of the AIPW

DID estimator is not invariant when one of the working models is misspecified. The third

contribution of our work lies in establishing that our estimator can achieve a faster convergence

rate relative to the AIPW DID estimator, under the scenario where both the propensity score and

outcome regression models are locally misspecified. This situation has been seldom addressed in

existing DID literature. Lastly, the fourth contribution is the practicality of our estimator. It is

straightforward to implement. This simplicity in application enhances its utility in empirical

research.

Organization of this paper: The subsequent section of this paper delineates the settings

and assumptions used throughout the paper and briefly overviews some existing DID estimators.

In Section 3, we introduce the CBPS method and propose a CBPS-based DID estimator, and

derive its theoretical properties. We evaluate the finite sample performance of our proposed

CBPS DID estimator with Monte Carlo simulation in Section 4, and provide an empirical

example in Section 5. Conclusions are summarized in Section 6. All mathematical proofs

supporting our arguments and findings are comprehensively compiled in the Appendix.

3
2 Difference-in-differences

2.1 Setup

We introduce the notation and framework that will be employed throughout this article. Our

analysis is based on a two-period, two-group structure (treatment and comparison groups). Let

Yit represent the outcome of interest for unit i at time t. We suppose that researchers have access

to outcome data both in a pre-treatment period t = 0 and in a post-treatment period t = 1.

Define Dit as an indicator variable, where Dit = 1 if unit i receives treatment on time t, and

Dit = 0 otherwise. We note that Di0 = 0 for all units i, which simplifies the treatment indicator

to Di = Di1 . The observed outcome Yit can also be rewritten as Yit = Di Yit (1) + (1 − Di )Yit (0),

where Yit (0) denotes the potential outcome of unit i at time t if one does not receive treatment

and Yit (1) represents the potential outcome if the same one receives treatment, but Yit (0) and

Yit (1) cannot be observed simultaneously for any unit. In the remainder of this paper, we assume

the availability of panel data on {Yi0 , Yi1 , Di , Xi }ni=1 , where Xi ∈ Rd is a vector of pre-treatment

covariates and the first element of Xi is a constant.

The parameter of interest, the average treatment effect on the treated (ATT), is defined as:

τ = E [Yi1 (1) − Yi1 (0)|Di = 1] . (2.1)

Since Yi1 (1) = Yi1 given Di = 1, ATT can be rewritten as:

τ = E [Yi1 |Di = 1] − E [Yi1 (0)|Di = 1] . (2.2)

According to the representation (2.2) above, we can show that the first term (E [Yi1 |Di = 1] =
E[Di Yi1 ]
E[Di ]
) can be estimated directly from the observed data. The main challenge in identifying the

ATT lies in computing the second term (E [Yi1 (0)|Di = 1]) from the observed data since Yi1 (0)

is missing for treated subjects with Di = 1. In order to identify the ATT (or E [Yi1 (0)|Di = 1]),

the following assumptions are necessary.

4
Assumption 1. Assume that the data {Yi0 , Yi1 , Di , Xi }ni=1 are independent and identically

distributed (iid).

Assumption 2. E[Yi1 (0) − Yi0 (0)|Xi , Di = 1] = E[Yi1 (0) − Yi0 (0)|Xi , Di = 0] almost surely.

Assumption 3. For some ε > 0, Pr(Di = 1) > ε and Pr(Di = 1|Xi ) ≤ 1 − ε almost surely.

Assumption 2, which we term the conditional PTA, posits that the average conditional

outcomes for the treatment and comparison groups would have followed parallel paths in the

absence of the treatment. Assumption 3 is an overlap condition, which states that at least a

small fraction of the population is exposed to the treatment, and for every specific value of

covariates Xi , at least a small portion is not treated. These three assumptions are commonly

utilized in semiparametric DID methods, see, e.g. (Heckman, Ichimura, and Todd 1997; Abadie

2005; Sant’Anna and Zhao 2020). Next, we briefly provide an overview of the existing approaches

to identify and estimate the ATT.

2.2 Existing DID estimators

There are two approaches to identify the ATT: the OR approach (Heckman, Ichimura, and

Todd (1997)) and the IPW approach (Abadie (2005)).

(i) The OR approach: under Assumptions 1-3, the ATT is identified as:

τ = E [∆Yi |Di = 1]−E [E [∆Yi |Xi , Di = 0] |Di = 1] = E [∆Yi |Di = 1]−E [m∆ (Xi )|Di = 1] := τ OR ,

(2.3)

where ∆Yi = Yi1 − Yi0 and m∆ (Xi ) = E [∆Yi |Xi , Di = 0]. Based on the identification result

(2.3), the OR approach requires researchers to model the conditional expectation of outcome

evolution E [∆Yi |Xi , Di = 0] at the first step. Researchers typically adopt a linear parametric

5
model Xi′ γ (outcome regression model) to specify the true conditional expectation of outcome

evolution E [∆Yi |Xi , Di = 0]. Consequently, the OR DID estimator is represented as follows:

1 1
τ̂ OR = Xi′ γ̂ OLS ,
X X
∆Yi − (2.4)
ntreat i∈treat ntreat i∈treat

where γ̂ OLS is the OLS estimator of the regression of Yi on Xi by the comparison groups (Di = 0)

and ntreat denotes the treatment group size.

(ii) The IPW approach: under Assumptions 1-3, the ATT is alternatively identified as:

" #
Di − π(Xi )
τ =E ∆Yi := τ IP W , (2.5)
E[Di ](1 − π(Xi ))

where π(Xi ) = Pr(Di = 1|Xi ) is the propensity score. Based on the identification result (2.5), the

IPW approach requires researchers to estimate the propensity score at the first step. Researchers
exp(Xi′ β)
typically use a parametric model (e.g. π(Xi′ β) = 1+exp(Xi′ β)
) to specify the propensity score

π(Xi ) and estimate parameters by the maximum likelihood method. Hence, the IPW DID

estimator is expressed as:

n
IP W 1X Di − π(Xi′ β̂ M L )
τ̂ = ∆Yi , (2.6)
n i=1 D̄(1 − π(Xi′ β̂ M L ))

1 Pn
where β̂M L is the maximum likelihood estimator and D̄ = n i=1 Di .

(iii) AIPW approach: Consistency of the OR and IPW estimators depends on the correct

specification of the outcome regression model and the propensity score model, respectively. To

achieve consistency in scenarios where one of two working models are misspecified, Sant’Anna

and Zhao (2020) proposed the following AIPW DID estimator:

n
1X Di − π(Xi′ β̂ M L )
τ̂ AIP W = (∆Yi − Xi′ γ̂ AIP W ), (2.7)
n i=1 D̄(1 − π(Xi′ β̂ M L ))

6
with γ̂ AIP W = γ̂ OLS , which follows from the alternative identification of ATT:

" #
Di − π(Xi )
τ =E (∆Yi − m∆ (Xi )) := τ AIP W . (2.8)
E[Di ](1 − π(Xi ))

Observing the form of (2.7), it is apparent that AIPW procedure combines both OR and

IPW methodologies. This synthesis allows the AIPW estimator to mitigate some of the inherent

weaknesses associated with the OR and IPW approaches individually. Indeed, Sant’Anna and

Zhao (2020) show that the AIPW DID estimator is both locally efficient and doubly robust

in terms of consistency. In the cross-sectional setting, however, Kang and Schafer (2007)

demonstrated that the performance of the AIPW ATE estimator can be poor in scenarios where

both of the working models are slightly misspecified. To address this issue, Imai and Ratkovic

(2014) introduced the CBPS method and demonstrated that the CBPS ATE estimator can

significantly enhance performance over other existing ATE estimators, including the AIPW

ATE estimator, particularly when both working models are misspecified. In the subsequent

subsection, we extend the application of the CBPS method from estimating ATE to ATT

within the framework of DID research. Our objective is to propose a CBPS DID estimator

and to rigorously investigate its theoretical properties, focusing on their robustness and efficiency.

3 CBPS DID estimator

3.1 CBPS methodology

The CBPS method introduced by Imai and Ratkovic (2014) offers a distinct approach to

propensity score estimation. In contrast to the IPW method, the CBPS method imposes exact

finite sample balance of pre-treatment covariates across the treatment and comparison groups

rather than focusing on the predictive accuracy of treatment assignment. The CBPS DID

estimator is defined as
n
1X Di − π(Xi′ β̂ CBP S )
τ̂ CBP S = ∆Yi , (3.1)
n i=1 D̄(1 − π(Xi′ β̂ CBP S ))

7
where β̂ satisfies

 
n n
1 X
Di −
(1 − Di )π(Xi′ β̂ CBP S )  1X Di − π(Xi′ β̂ CBP S )
Xi = Xi = 0. (3.2)
n i=1 1 − π(Xi′ β̂ CBP S ) n i=1 1 − π(Xi′ β̂ CBP S )

Recall that the IPW DID estimator (2.6) employs the maximum likelihood method to
1 Pn Di −π(Xi′ β̂ M L ) π̇(Xi′ β̂ M L )
estimate the propensity score, where β̂ M L satisfies n i=1 1−π(X ′ β̂ M L ) π(X ′ β̂ M L ) Xi = 0 where
i i

∂π(v)
π̇(v) = ∂v
. Hence the only difference of the IPW DID estimator (2.6) and the CBPS DID

estimator (3.1) is the method of propensity score estimation. On the other hand, by (3.2), the

CBPS DID estimator can also be expressed as

n
1X Di − π(Xi′ β̂ CBP S )
τ̂ CBP S = (∆Yi − Xi′ γ CBP S ) (3.3)
n i=1 D̄(1 − π(Xi′ β̂ CBP S ))

where γ CBP S is any value. Hence the only difference of the AIPW DID estimator (2.7) and the

CBPS DID estimator (3.1) is the value of γ in (2.7) and (3.3). It is noteworthy that γ CBP S in

(3.3) can take any value and it is not estimated in the actual CBPS estimation defined as (3.1).

In the following subsections, we will conduct a comprehensive theoretical analysis of the CBPS

DID estimator to elucidate its advantages. It is this arbitrariness of γ CBP S in (3.3) that plays a

key role in showing robustness to misspecification of parametric working models compared to

the AIPW DID estimator.

3.2 Local efficiency

In this subsection, we start with the scenario when both of the working models are correctly

specified. We show that, in such a case, the CBPS DID estimator attains the semiparametric

efficiency bound for the ATT under DID framework, when both propensity score model and

outcome regression model are correctly specified. This property is the so-called local efficiency.

Sant’Anna and Zhao (2020) derived the semiparametric efficiency bound for ATT under a DID

framework.

Theorem 1. Under Assumptions 1-3 and Assumptions A (stated in Appendix A), if π(Xi′ β) =

8
π(Xi ) and Xi′ γ = m∆ (Xi ) holds,

√ n
1 X d
 
n(τ̂ CBP S
− τ) = √ − N 0, E[ηie 2 ] ,
ηie + op (1) → (3.4)
n i=1

where
Di − π(Xi ) Di
ηie = (∆Yi − m∆ (Xi )) − τ (3.5)
E[Di ]{1 − π(Xi )} E[Di ]

is the efficient influence function for the ATT and E[ηie 2 ] is the semiparametric efficiency bound.

Theorem 1 shows asymptotic normality of the CBPS DID estimator when both of the

working models are correctly specified. The asymptotic variance of the CBPS DID estimator is

equal to the semiparametric efficiency bound derived by Sant’Anna and Zhao (2020). It should

be noted that the AIPW DID estimator also achieves the semiparametric efficiency bound when

both of the working models are correctly specified.

3.3 Double robustness

In the previous subsection, we showed that the CBPS DID estimator is locally efficient. In

this subsection, we shift our focus from efficiency to robustness, under the scenario that one of

the two working models is misspecified. Firstly, we show that the CBPS DID estimator remains

consistent with the ATT even if either the propensity score model or the outcome regression

model (but not both) is misspecified. This property is referred to as double robustness in terms

of consistency.

Theorem 2. Under Assumptions 1-3 and Assumptions A, the CBPS DID estimator is doubly
p
robust in terms of consistency, that is τ̂ CBP S →
− τ if at least one of the following two conditions

holds:

1. The outcome regression model is correctly specified, that is, there exists some value γ0

such that Xi′ γ0 = m∆ (Xi ) a.s.

9
2. The propensity score model is correctly specified, that is, there exists some value β0 such

that π(Xi′ β0 ) = π(Xi ) a.s.

Consequently, the CBPS DID estimator offers more flexibility and is less demanding in terms

of a researcher’s ability to correctly specify nuisance parametric models, compared to the OR

and the IPW approach. It should be noted that the AIPW DID estimator is also doubly robust

in terms of consistency.

While double robustness in terms of consistency is a valuable property, it may not suffice

for inference. The next theorem shows that the asymptotic linear representation of the CBPS

DID estimator remains invariant even when one of the working models is misspecified so that

the asymptotic variance can be estimated based on the influence function. This is referred as

double robustness in terms of inference. In contrast, the asymptotic linear representation of

the AIPW DID estimator is not invariant when one of the working models is misspecified. A

detailed proof is provided in the Appendix.

Theorem 3. Let β0AIP W , γ0AIP W , β0CBP S and γ0CBP S denote probability limits of β̂ M L , γ̂ AIP W ,
 −1
CBP S CBP S Pn (1−Di )π̇(Xi′ β̂ CBP S ) ′ Pn (1−Di )π̇(Xi′ β̂ CBP S )∆Yi
β̂ and γ̂ = i=1 (1−π(X ′ β̂ CBP S ))2 Xi Xi i=1 (1−π(Xi′ β̂ CBP S ))2
Xi , respectively.
i

Under Assumptions 1-3 and Assumptions A, if either π(Xi′ β0a ) = π(Xi ) a.s. or Xi′ γ0a = m∆ (Xi )

a.s. for a = CBP S, AIP W , the CBPS DID and AIPW DID estimators satisfy:

√ n
CBP S 1 X
n(τ̂ − τ) = √ ηiCBP S + op (1), (3.6)
n i=1
√ n
1 X
n(τ̂ AIP W − τ ) = √ η AIP W + Op (1), (3.7)
n i=1 i

Di −π(Xi ′ β0a ) Di
where ηia = E[Di ]{1−π(Xi ′ β0a )}
(∆Yi − Xi′ γ0a ) − E[D i]
τ. Note that ηia is equal to the efficient influence

function (3.5) under the assumption that both working models are correctly specified.

Theorem 3 reveals that inference based on τ̂ AIP W and its influence function may be mislead-

ing when one of the working models is misspecified. In contrast, inference based on τ̂ CBP S and

its influence function will remain accurate even when one of the working models is misspecified.

10
This double robustness in terms of inference significantly enhances the appeal of the CBPS DID

estimator. We note that although γ̂ CBP S does not appear in estimating τ̂ CBP S (see (3.1)), it does

appear in estimating the asymptotic variance. Specifically, the asymptotic variance of the CBPS
 2
Pn Di −π(Xi ′ β̂ CBP S ) Di CBP S
DID estimator should be estimated by 1
n i=1 D̄{1−π(Xi ′ β̂ CBP S )}
(∆Yi − Xi′ γ̂ CBP S ) − D̄
τ̂ .

3.4 Convergence rate under local misspecification

Although the CBPS DID estimator is shown to have desirable properties in scenarios where

at least one of the two working models is correctly specified, situations might arise where both

of the working models are misspecified. To address this issue, Fan et al. (2022) conduct a

theoretical investigation of the AIPW ATE and CBPS ATE estimators under the scenario that

both propensity score and outcome models are locally misspecified and find that the CBPS ATE

estimator may converge in probability to the ATE at a faster rate than the AIPW estimator.

In this subsection, we examine whether the CBPS DID estimator inherits such a desirable

property in the DID design. We consider the same locally misspecified models as Fan et al. (2022):

Assumption 4.

π(Xi ) = π(Xi′ β ∗ ) exp(ξu(Xi ; β ∗ )), (3.8)

m∆ (Xi ) = Xi′ γ ∗ + δr(Xi ), (3.9)

where u(Xi ; β ∗ ) and r(Xi ) are functions determining the directions of misspecification with

|u(Xi ; β ∗ )| ≤ C, |r(Xi )| ≤ C a.s. for some positive constant C, ξ ∈ R and δ ∈ R represent the

magnitudes of misspecifications with ξ = o(1) and δ = o(1).

Theorem
 4. Under Assumptions
h π̇(X ′ β ∗ )
1-4 and Assumptions A, suppose
i−1 h π(X ′ β ∗ ) i  at least one entry of
 π̇(Xi′ β ∗ )Xi′ E i X X′
1−π(X ′ β ∗ ) i i
E i
1−π(X ′ β ∗ )
u(Xi ;β ∗ )Xi 
 u(Xi ;β ∗ )
 
E  1−π(X ′ β ∗ ) − i
1−π(Xi′ β ∗ )
i
 is nonzero, as n → ∞, the
Xi 

 i 

11
CBPS DID and AIPW DID estimators satisfy:

n
1X
τ̂ CBP S
−τ = ηie + Op (ξ 2 δ + δn−1/2 + ξn−1/2 ), (3.10)
n i=1

and
n
1X
τ̂ AIP W − τ = η e + Op (ξδ + δn−1/2 + ξn−1/2 ), (3.11)
n i=1 i

where ηie is the efficient influence introduced in Theorem 1.


If nξδ → ∞, then the CBPS DID estimator converges in probability to the ATT at

a faster rate than the AIPW DID estimator since τ̂ CBP S − τ = Op (n−1/2 + ξ 2 δ), whereas

τ̂ AIP W − τ = Op (ξδ). On the other hand, if nξδ → 0, the two estimators have the same

limiting distribution N (0, E[ηie2 ]), but n(τ̂ CBP S − τ ) converges to the limit distribution at a

faster rate than n(τ̂ AIP W − τ ). Theorem 4 implies that the CBPS DID estimator demonstrates

greater robustness to slight model misspecification compared to the AIPW DID estimator.

These faster rates of convergence are attributed to the arbitrariness of γ CBP S in (3.3), which

effectively eliminates the product ξδ in the asymptotic linear representation of the CBPS DID

estimator. A detailed proof of this can be found in the Appendix.

4 Simulation

In this section, we conduct a series of simulation studies to examine the finite sample properties

of the CBPS DID estimator. The simulation designs here closely follow those in (Sant’Anna

and Zhao 2020; Fan et al. 2022). Throughout these simulations, we utilize a logistic working

model for the propensity score and a linear regression model for outcome evolution. For the

OR, IPW, and AIPW approaches, we estimate the propensity scores using maximum likelihood

estimation and estimate outcome evolution using ordinary least squares.

We set the sample size n equal to 1000, and conduct 1000 Monte Carlo simulations for

12
each design. The performance of various DID estimators is compared in terms of average bias,

median bias, root mean square error (RMSE), empirical 95% coverage probability, the average

length of a 95% confidence interval, and the average of their plug-in estimator for the asymptotic

variance. The confidence intervals are constructed using the normal approximation, and the

asymptotic variances are estimated by their sample analogues. Additionally, we present the

semiparametric efficiency bound for each design calculated by Sant’Anna and Zhao (2020).

This helps to assess the potential loss of efficiency or accuracy of a semiparametric DID estimator.

4.1 Data generating process

We conduct Monte Carlo simulations across five distinct scenarios as follow:

1. Both propensity score and outcome regression models are correctly specified.

2. Only the outcome regression model is correctly specified.

3. Only the propensity score model is correctly specified.

4. Both the propensity score and the outcome regression models are misspecified.

5. Both the propensity score and the outcome regression models are locally misspecified.

For a generic Wi = (W1i , W2i , W3i , W4i )′ , let

for (Wi ) = 210 + 27.4W1i + 13.7(W2i + W3i + W4i ),

fps (Wi ) = 0.75(−W1i + 0.5W2i − 0.25W3i − 0.1W4i ).

Let Xi = (X1i , X2i , X3i , X4i )′ be generated from N (0, I4 ), and I4 be the 4 × 4 identity ma-
q
trix. For j = 1, 2, 3, 4, let Zji = (Z̃ji − E[Z̃ji ])/ V ar(Z̃ji ), where Z̃1i = exp(0.5X1i ), Z̃2i =

10 + X2i /(1 + exp(X1i )), Z̃3i = (0.6 + X1i X3i /25)3 and Z̃4i = (20 + X2i + X4i )2 . We consider the

following data generating processes (DGPs):

13
DGP1.(PS and OR are correctly specified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 ,

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))), Di = 1{π(Zi ) ≥ Ui },

DGP2.(PS is misspecified but OR is correctly specified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 ,

π(Xi ) = exp(fps (Xi ))/(1 + exp(fps (Xi ))), Di = 1{π(Xi ) ≥ Ui },

DGP3.(PS is correctly specified but OR is misspecified)

Yi1 (d) = 2for (Xi ) + v(Xi , Di ) + εi1 (d), Yi0 (0) = for (Xi ) + v(Xi , Di ) + εi0 ,

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))), Di = 1{π(Zi ) ≥ Ui },

DGP4.(PS and OR are misspecified)

Yi1 (d) = 2for (Xi ) + v(Xi , Di ) + εi1 (d), Yi0 (0) = for (Xi ) + v(Xi , Di ) + εi0 ,

π(Xi ) = exp(fps (Xi ))/(1 + exp(fps (Xi ))), Di = 1{π(Xi ) ≥ Ui },

14
DGP5.(PS and OR are locally misspecified)

Yi1 (d) = 2for (Zi ) + v(Zi , Di ) + εi1 (d) + 2δr(Zi ), Yi0 (0) = for (Zi ) + v(Zi , Di ) + εi0 + δr(Zi ),

π(Zi ) = exp(fps (Zi ))/(1 + exp(fps (Zi ))) · exp(ξu(Zi )), Di = 1{π(Zi ) ≥ Ui },

where d = 0, 1 is an indicator of potential outcome, Di = 0, 1 denotes whether unit i receives

treatment or not, εi0 and εi1 (d) are independent standard normal random variable, Ui is an

independent standard uniform random variable. For a generic Wi , v(Wi , Di ) represents an

independent normal random variable with a mean of Di · for (Wi ) and a variance of one. This

v serves as a proxy for time-invariant unobserved heterogeneity. ξ = δ = n−0.5 denotes the


2 2 2 2 2 2
magnitudes of misspecification, u(Zi ) = −Z1i + Z2i and r(Zi ) = 2Z1i + 4Z2i + 3Z3i + Z4i

determine the directions of misspecification. It is important to note that in all the DGPs

mentioned above„ the true ATT is zero, and the available data are {Yi0 , Yi1 , Di , Zi }ni=1 , where

Zi = (1, Z1i , Z2i , Z3i , Z4i )′ includes a constant among the covariates, the realized outcomeYi0

and Yi1 are generated according to Yi0 = Yi0 (0) and Yi1 = Di Yi1 (1) + (1 − Di )Yi1 (0) respectively.

4.2 Results

The results are summarized in the tables below, τ̂ IP W represents the IPW DID estimator

(2.6), τ̂ OR denotes the OR DID estimator (2.4), τ̂ AIP W is the AIPW DID estimator (2.7), and

τ̂ CBP S refers to the CBPS DID estimator (3.1). The abbreviations used are as follows: “Av.Bias”,

“Med.Bias”, “RMSE”, “Asy.V”, “Cover” and “CIL” stand for the average simulated bias, median

simulated bias, simulated root mean-squared error, average of the plug-in estimators for the

asymptotic variance, 95% coverage probability, and the average length of the 95% confidence

interval.

Table 1 suggests that when both working models are correctly specified, all semiparametric

DID estimators show minimal Monte Carlo bias. However, τ̂ OR , τ̂ AIP W , and τ̂ CBP S outperform

15
Table 1: DGP1, both working models are correctly specified

Semiparametric efficiency bound:11.1


Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 0.091 0.217 2.805 8403.740 0.946 10.526


τ̂ OR 0.002 0.002 0.101 10.244 0.954 0.396
τ̂ AIP W 0.002 0.003 0.105 11.245 0.946 0.414
τ̂ CBP S 0.002 0.003 0.105 10.945 0.943 0.409

τ̂ IP W in terms of bias, root mean square error, asymptotic variance, and the length of the

confidence interval. This implies that the IPW DID estimator is substantially less efficient

compared to the latter three. Although τ̂ OR tends to be slightly more efficient than τ̂ AIP W and

τ̂ CBP S , the performance of these three estimators is quite similar.

Table 2: DGP2, correct outcome regression model with a misspecified propensity score model

Semiparametric efficiency bound:11.6


Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 2.068 2.099 3.297 7472.489 0.833 10.072


τ̂ OR 0.003 0.002 0.100 10.139 0.947 0.394
τ̂ AIP W 0.003 0.004 0.102 10.459 0.944 0.400
τ̂ CBP S 0.003 0.004 0.103 10.713 0.947 0.405

Table 2 illustrates that when the propensity score model is misspecified, the CBPS DID

estimator, τ̂ CBP S , is competitive with τ̂ OR and τ̂ AIP W . As anticipated, τ̂ IP W is biased in this

scenario. Conversely, Table 3 indicates that when the outcome regression model is misspecified,

τ̂ CBP S outperforms the other three estimators in terms of root mean square error, asymptotic

variance, and coverage probability. In this scenario, τ̂ OR displays a non-negligible bias, which

aligns with expectations.

Table 3: DGP3, misspecified outcome regression model with a correct propensity score model

Semiparametric efficiency bound:11.1


Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 0.121 0.313 3.173 10454.468 0.941 11.910


τ̂ OR -1.352 -1.330 1.822 1506.947 0.826 4.804
τ̂ AIP W -0.001 0.010 1.223 1834.486 0.966 5.234
τ̂ CBP S -0.022 0.002 1.011 977.372 0.947 3.870

16
Table 4: DGP4, both models are misspecified

Semiparametric efficiency bound:11.6


Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W -1.046 -1.013 2.609 6092.271 0.954 9.243


τ̂ OR -5.224 -5.195 5.372 1472.513 0.006 4.751
τ̂ AIP W -3.242 -3.220 3.494 2674.697 0.378 6.240
τ̂ CBP S -2.547 -2.528 2.727 974.912 0.265 3.865

In Table 4, when both working models are misspecified, it is unsurprising that all semipara-

metric DID estimators exhibit bias, and generally, the inference procedures are misleading. In

this scenario, the CBPS DID estimator demonstrates smaller biases, lower root mean square

error (RMSE), reduced asymptotic variance, and shorter confidence interval lengths compared

to the OR and AIPW DID estimators. However, in DGP4, the IPW DID estimator appears to

perform the best.

Table 5: DGP5, both models are locally misspecified

Semiparametric efficiency bound:11.1


Av.Bias Med.Bias RMSE Asy.V Cover CIL

τ̂ IP W 7.612 7.477 8.036 6938.706 0.118 10.101


τ̂ OR 0.215 0.209 0.244 12.587 0.533 0.439
τ̂ AIP W 0.118 0.113 0.166 12.009 0.777 0.428
τ̂ CBP S 0.086 0.083 0.146 13.032 0.866 0.446

Table 5 indicates that in scenarios when both working models are locally misspecified, τ̂ CBP S

shows the smallest bias and root mean square error (RMSE), along with the best coverage

probability. However, in DGP5, τ̂ IP W demonstrates a non-negligible bias compared to the other

three DID estimators. This finding corroborates the finding that IPW-based estimators are

sensitive to even slight misspecifications of the propensity score model, see, e.g. Kang and

Schafer (2007).

In summary, the findings presented in Table 1 indicate that the estimated variance of τ̂ CBP S

is very close to the semiparametric efficiency bound when both working models are correctly

specified. This supports our Theorem 1 regarding local semiparametric efficiency. In Tables

2 and 3, when one of the working models is misspecified, our proposed CBPS DID estimator

17
shows little bias, justifies the double robustness in terms of consistency as written in Theorem 2.

Furthermore, in DGP2 and DGP3, the CBPS DID estimator achieves a coverage probability

closer to 95% compared to the AIPW DID estimator, validating the superiority of double

robustness for inference demonstrated in Theorem 3. Lastly, Table 5 reveals that the CBPS

DID estimator is more robust to mild model misspecification than the AIPW DID estimator,

confirming the results of Theorem 4.

5 An empirical application

In this section, we apply our CBPS DID estimator to a real data sample. LaLonde (1986)

conducted a highly influential study evaluating the performance of different treatment effect

estimators based on observational data. The study focused on whether a new statistical method-

ology could replicate an experimental benchmark: the treatment effect of a National Supported

Work (NSW) labor training program on post-treatment earnings. Unfortunately, the results

were not satisfactory due to the potential presence of selection bias in the observational data.

Later, Dehejia and Wahba (1999)cdemonstrated that propensity score matching (PSM) based

estimators could closely replicate the experimental results. However, Smith and Todd (2005)

found that cross-sectional PSM estimators are highly sensitive to both model misspecification

and sample selection. They suggested that DID matching estimators were more appropriate.

Following the findings of Smith and Todd (2005), we use different samples and specifications

to evaluate the existing DID estimators. Specifically, we utilize data from the Current Population

Survey (CPS) to create a comparison group and use the control group from LaLonde’s original

experimental sample and the Dehejia and Wahba (DW) sample as our pseudo treatment group.

We consider two datasets: (1) LaLonde’s control group (425) + CPS (15,992), and (2) DW

control group (260) + CPS (15,992). Since no one received training under this setup, the true

ATT, if consistent, should be zero. Therefore, we use deviations from zero to evaluate the

performance of the DID estimators.

18
The pre-treatment covariates in the data include age, real earnings in 1974, years of education,

and dummy variables for high school dropout status, marital status, race (black), and ethnicity

(Hispanic). The outcome of interest is real earnings in 1978, and pre-treatment real earnings in

1975 are also available. As part of our analysis, similar to Monte Carlo simulations, we compare

the performance of the CBPS DID estimator, τ̂ CBP S with the IPW DID estimator, τ̂ IP W , the

OR DID estimator, τ̂ OR , and the AIPW DID estimator, τ̂ AIP W . We assume that the outcome

models are linear in parameters and the propensity score model follows a logistic specification.

To assess the sensitivity to model misspecification, we also consider three different specifications:

(i) linear covariates (Lin); (ii) addition of some quadratic covariates such as age squared and

education squared (Qua); (iii) addition of some interaction terms selected by SantAnna and

Zhao (S&Z). The results are summarized in Table 6, with standard error reported in parentheses.

Table 6: Deviation of different DID estimators for the effect of training on real earnings in 1978,
with CPS comparison group

Lalonde Sample DW Sample


True ATT=0 True ATT=0

τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S

Lin. -1310 -1469 -972 -1030 -397 -560 69 2


(424) (348) (415) (398) (593) (413) (566) (519)

Qua. -878 -1248 -746 -744 -407 -277 151 211


(551) (354) (525) (477) (1018) (421) (884) (663)

S&Z. -778 -1426 -717 -730 -229 -676 313 239


(523) (352) (507) (474) (937) (426) (825) (648)

The results in Table 6 reveal several interesting findings. First, τ̂ OR displays the largest

bias across different datasets and covariates specification. For the Lalonde sample, every DID

estimator shows more severe bias compared to their performance under the DW sample. Second,

Abadie’s IPW DID estimator τ̂ IP W tends to have the largest standard error in all situations

although its bias is relatively small especially under the Qua and S&Z specifications. Third,

τ̂ AIP W and τ̂ CBP S perform better than the other two in terms of bias. The CBPS DID estimator

τ̂ CBP S is very close to the true ATT when adopting the linear specification under the DW

19
sample. Finally, when we compare τ̂ CBP S with τ̂ AIP W we find that the CBPS DID estimator

tends to have smaller standard errors in all situations. Taken together, the results suggest that

the proposed DID estimator is a compelling alternative to existing DID estimators. Additionally,

we use the Panel Study of Income Dynamics (PSID) to create a comparison group, with the

results provided in the Appendix.

6 Conclusion

In this paper, we introduced an ATT estimator based on the CBPS method within a DID

framework. This framework is applicable when the parallel trends assumption holds after

conditioning on a set of pre-treatment covariates and when panel data are available. We

conducted a thorough theoretical investigation of the CBPS DID estimator. Specifically,

we found that while the CBPS DID estimator’s expression is similar to that of the IPW

estimator, its theoretical properties align more closely with those of the AIPW estimator. We

demonstrated that the CBPS DID estimator is locally semiparametrically efficient and exhibits

double robustness, similar to the AIPW DID estimator. Moreover, the asymptotic linear

representation of the CBPS DID estimator remains invariant even when one of the working

models is misspecified, a feature we refer to as double robustness for inference. Additionally,

our estimator has a faster convergence rate than the AIPW DID estimator when both working

models are locally misspecified. These superior properties set the CBPS DID estimator apart

from the AIPW DID estimator. Our simulation results and empirical studies confirm these

theoretical properties, showcasing the advantages of the proposed CBPS DID estimator.

In this work, we have primarily concentrated on the theoretical development of the CBPS

DID estimator, especially in comparison to the AIPW DID estimator. An intriguing extension

involves adapting the CBPS DID estimator to high-dimensional settings. This is a nontrivial task,

as traditional regression methods tend to break down in high-dimensional settings, and CBPS

methodology is no exception. To overcome this challenge, recent research has incorporated

machine learning techniques into the first-step estimation of the propensity score and the

20
outcome evolution. This approach, known as double machine learning methodology, has been

explored in studies including (Chernozhukov et al. 2017; Chernozhukov et al. 2018; Chang 2020).

Our ongoing research aims to develop and investigate a high-dimensional CBPS DID estimator.

References

Abadie, Alberto (2005). “Semiparametric difference-in-differences estimators”. The review of

economic studies 72.1, pp. 1–19.

Chang, Neng-Chieh (2020). “Double/debiased machine learning for difference-in-differences

models”. The Econometrics Journal 23.2, pp. 177–191.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and

Whitney Newey (2017). “Double/debiased/neyman machine learning of treatment effects”.

American Economic Review 107.5, pp. 261–265.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen,

Whitney Newey, and James Robins (2018). Double/debiased machine learning for treatment

and structural parameters.

Dehejia, Rajeev H and Sadek Wahba (1999). “Causal effects in nonexperimental studies: Reeval-

uating the evaluation of training programs”. Journal of the American statistical Association

94.448, pp. 1053–1062.

Fan, Jianqing, Kosuke Imai, Inbeom Lee, Han Liu, Yang Ning, and Xiaolin Yang (2022).

“Optimal covariate balancing conditions in propensity score estimation”. Journal of Business

& Economic Statistics 41.1, pp. 97–110.

Heckman, James J, Hidehiko Ichimura, and Petra E Todd (1997). “Matching as an econometric

evaluation estimator: Evidence from evaluating a job training programme”. The review of

economic studies 64.4, pp. 605–654.

Imai, Kosuke and Marc Ratkovic (2014). “Covariate balancing propensity score”. Journal of the

Royal Statistical Society Series B: Statistical Methodology 76.1, pp. 243–263.

21
Kang, Joseph DY and Joseph L Schafer (2007). “Demystifying double robustness: A comparison

of alternative strategies for estimating a population mean from incomplete data”.

LaLonde, Robert J (1986). “Evaluating the econometric evaluations of training programs with

experimental data”. The American economic review, pp. 604–620.

Ning, Yang, Peng Sida, and Kosuke Imai (2020). “Robust estimation of causal effects via a

high-dimensional covariate balancing propensity score”. Biometrika 107.3, pp. 533–554.

Sant’Anna, Pedro HC and Jun Zhao (2020). “Doubly robust difference-in-differences estimators”.

Journal of econometrics 219.1, pp. 101–122.

Smith, Jeffrey A and Petra E Todd (2005). “Does matching overcome LaLonde’s critique of

nonexperimental estimators?” Journal of econometrics 125.1-2, pp. 305–353.

22
A Mathematical appendix

A1. Additional assumptions for first step estimations

For simplify the notation, let g(x) be a generic notation for π(x) and m∆ (x), denote a parametric

function g(x′ θ) that serves as a generic representation for π(x′ β) and x′ γ. Additionally, for a generic
p
W , let ∥W ∥ = trace (W ′ W ) denote the Euclidean norm of W .

Let
Di − π(Xi′ β) ∂λi (ζ)
λi (ζ) = (∆Yi − Xi′ γ), λ̇i (ζ) = ,
E[Di ](1 − π(Xi )) ∂ζ

where ζ = (β ′ , γ ′ )′ .

Assumption A.

(i) Assume there is a known parametric function g(x′ θ) = g(x) and θ ∈ Θ ⊂ RK , where Θ is a compact

parameter set.

(ii) Assume g(Xi′ θ) is a.s. continuous in θ ∈ Θ and is a.s. continuously differentiable, in addition,

assume π̇(Xi′ β) is bounded away from zero and infinity.

(iii) There exists a unique pseudo-true parameter θ0 ∈ int (Θ).

(iv) Assume that θ̂ strongly converges to θ0 and can be asymptotically expressed as:

n
√ 1 X
n(θ̂ − θ0 ) = √ ψg (Di , ∆Yi , Xi , θ0 ) + op (1),
n i=1

where E ψg (Di , ∆Yi , Xi , θ0 ) = 0, E ψg (Di , ∆Yi , Xi , θ0 )ψg (Di , ∆Yi , Xi , θ0 )′ is finite and is positive
   

definite.

h i
(v) Assume that E ∥λi (ζ 0 )∥2 < ∞ and E supζ∈Γ0 |λ̇i (ζ)| < ∞, where Γ0 is a small neighborhood of ζ 0 .
 

23

Assumption A (i)-(iv) represent standard conditions necessary for consistency and n-asymptotically

linear representations of the first-step estimators, and (v) imposes some weak integrability conditions.

See Sant’Anna and Zhao (2020) for more details.

A2. ATT in OR approach

τ = E [Yi1 (1) − Yi1 (0)|Di = 1]

= E [Yi1 (1) − Yi0 (0)|Di = 1] − E [E [Yi1 (0) − Yi0 (0)|Xi , Di = 1] |Di = 1]

= E [Yi1 (1) − Yi0 (0)|Di = 1] − E [E [Yi1 (0) − Yi0 (0)|Xi , Di = 0] |Di = 1] (Assumption 2)

= E [∆Yi |Di = 1] − E [E [∆Yi |Xi , Di = 0] |Di = 1]

= E [∆Yi |Di = 1] − E [m∆ (Xi )|Di = 1] = τ OR .

A3. ATT in IPW approach

Di − π(Xi )
 
τ IP W = E ∆Yi
E[Di ](1 − π(Xi ))
h i
(1−Di )π(Xi )
E [Di ∆Yi ] E 1−π(Xi ) ∆Yi
= −
E[Di ] E[Di ]
h i
π(Xi )
E [Di ∆Yi ] E 1−π(Xi ) E [(1 − Di )∆Yi |Xi ]
= −
Pr(Di = 1) Pr(Di = 1)
h i
π(Xi )
E 1−π(Xi ) E [1 − Di |Xi ] E [Yi1 (0) − Yi0 (0)|Xi ]
= E [∆Yi |Di = 1] − (Assumption 2)
Pr(Di = 1)
E [π(Xi )E [∆Yi |Xi , Di = 0]]
= E [∆Yi |Di = 1] −
Pr(Di = 1)

= E [∆Yi |Di = 1] − E [E [∆Yi |Xi , Di = 0] |Di = 1] = τ

24
A4. ATT in AIPW approach

Di − π(Xi )
 
τ AIP W = E (∆Yi − m∆ (Xi ))
E[Di ](1 − π(Xi ))
Di − π(Xi ) E[Di |Xi ] − π(Xi )
   
=E ∆Yi − E m∆ (Xi )
E[Di ](1 − π(Xi )) E[Di ](1 − π(Xi ))

= τ IP W − 0 = τ

A5. Proof of Theorem 1

Suppose that both working models are correctly specified, that is π(Xi′ β) = π(Xi ), Xi′ γ = m∆ (Xi ).

Then, by using the expression (3.3), a Taylor expansion yields:

n
( )
√ 1 X Di − π(Xi′ β) Di
n(τ̂ CBP S − τ) = √ ′ (∆Yi − Xi′ γ CBP S ) − τ
n i=1 E[Di ](1 − π(Xi β)) E[Di ]
" #
√ (Di − π(Xi′ β))(∆Yi − Xi′ γ CBP S ) Di
− n(D̄ − E[Di ])E − 2 τ
E 2 [Di ](1 − π(Xi′ β)) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β)
− n(β̂ CBP S − β)′ E Xi + op (1)
E[Di ](1 − π(Xi′ β))2
n
( )
1 X Di − π(Xi′ β) Di
=√ (∆Yi − Xi′ γ CBP S ) − τ + op (1)
n i=1 E[Di ](1 − π(Xi′ β)) E[Di ]
n
1 X
=√ η e + op (1),
n i=1 i

where the second equality follows from

" #
(Di − π(Xi′ β))(∆Yi − Xi′ γ CBP S ) Di (Di − π(Xi ))(∆Yi − m∆ (Xi )) Di
 
E 2 ′ − 2 τ =E 2
− 2 τ
E [Di ](1 − π(Xi β)) E [Di ] E [Di ](1 − π(Xi )) E [Di ]
τ AIP W − τ
= = 0,
E[Di ]
" #
(1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β) E [(1 − Di )(∆Yi − m∆ (Xi ))|Xi ] π̇(Xi′ β)
 
E Xi = E Xi = 0
E[Di ](1 − π(Xi′ β))2 E[Di ](1 − π(Xi ))2

by setting γ CBP S = γ. (Note that γ CBP S can take any value.) Thus the conclusion follows by CLT.

25
A6. Proof of Theorem 2

We separate the proof into two cases.

Case 1: If the outcome model is correctly specified, that is Xi′ γ0 = m∆ (Xi ) but π(Xi′ β0 ) ̸= π(Xi ), by

the weak law of large numbers and the continuous mapping theorem, as n → ∞,

" #
CBP S p Di − π(Xi′ β0 )
τ̂ →
− E (∆Yi − m∆ (Xi ))
E[Di ](1 − π(Xi′ β0 ))
" #
Di ∆Yi Di m∆ (Xi ) (1 − Di )(∆Yi − m∆ (Xi ))π(Xi′ β0 )
 
=E − −E
E[Di ] E[Di ] E[Di ](1 − π(Xi′ β0 ))
" #
Di ∆Yi Di m∆ (Xi ) E[(1 − Di )(∆Yi − m∆ (Xi ))|Xi ]π(Xi′ β0 )
 
=E − −E
E[Di ] E[Di ] E[Di ](1 − π(Xi′ β0 ))

= τ OR − 0 = τ,

Di −π(Xi′ β0 ) (1−Di )π(Xi′ β0 )


where the first equality follows from 1−π(Xi′ β0 ) = Di − 1−π(Xi′ β0 ) and the third equality follows

from E [(1 − Di )(∆Yi − m∆ (Xi ))|Xi ] = 0.

Case 2: If only the propensity score model is correctly specified, that is, π(Xi′ β0 ) = π(Xi ) but

Xi′ γ0 ̸= m∆ (Xi ), by the weak law of large numbers and the continuous mapping theorem, as n → ∞,

Di − π(Xi )
 
p
τ̂ CBP S →
− E (∆Yi − Xi′ γ CBP S )
E[Di ](1 − π(Xi ))
Di − π(Xi ) E [Di |Xi ] − π(Xi ) ′ CBP S
   
=E ∆Yi − E Xγ
E[Di ](1 − π(Xi )) E[Di ](1 − π(Xi )) i

= τ IP W − 0 = τ,

where the second equality follows from E[Di |Xi ] = π(Xi ).

26
A7. Proof of Theorem 3

By Taylor expansions, we have


n(τ̂ CBP S − τ )
n
" #
1 X √ (Di − π(Xi′ β0CBP S ))(∆Yi − Xi′ γ CBP S ) Di
=√ ηiCBP S − n(D̄ − E[Di ])E 2 ′ CBP S
− 2 τ
n i=1 E [Di ](1 − π(Xi β0 )) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ CBP S )π̇(Xi′ β0CBP S )
− n(β̂ CBP S
− β0CBP S )′ E Xi + op (1), (A.1)
E[Di ](1 − π(Xi′ β0CBP S ))2

whereas


n(τ̂ AIP W − τ )
n
" #
1 X √ (Di − π(Xi′ β0AIP W ))(∆Yi − Xi′ γ0AIP W ) Di
=√ ηiAIP W − n(D̄ − E[Di ])E 2 ′ AIP W
− 2 τ
n i=1 E [Di ](1 − π(Xi β0 )) E [Di ]
" #
√ (1 − Di )(∆Yi − Xi′ γ0AIP W )π̇(Xi′ β0AIP W )
− n(β̂ AIP W − β0AIP W )′ E Xi
E[Di ](1 − π(Xi′ β0AIP W ))2
" #
√ Di − π(Xi′ β0AIP W )
− n(γ̂ AIP W − γ0AIP W )′ E Xi + op (1). (A.2)
E[Di ](1 − π(Xi′ β0AIP W ))

As in the proof of Theorem 2, we separate the proof into two cases.

Case 1: If the outcome regression model is correct but the propensity score model is incorrect, the

third terms of both AIPW and CBPS expansions are zero but the fourth term of the AIPW expansion

is nonzero and of order Op (1) because π(Xi ) ̸= π(Xi′ β0AIP W ).

Case 2: On the other hand, if the propensity score model is correct but the outcome regression model is

incorrect, the fourth term of the AIPW expansion is zero but the third terms of the AIPW expansion is

nonzero and of order Op (1). However, the arbitrariness of γ CBP S in the CBPS expansion offers a signifi-

cant advantage. Specifically, the third term of CBPS expansion is zero even when the outcome regression
 −1  
(1−Di )π̇(Xi′ β0CBP S ) (1−Di )∆Yi π̇(Xi′ β0CBP S )
model is incorrect by setting γ CBP S = E (1−π(Xi′ β0CBP S ))2
Xi Xi′ E (1−π(Xi′ β0CBP S ))2
Xi :=

γ0CBP S .

27
A8. Proof of Theorem 4

We provide a sketch of the proof because the detail is very similar to that of (3.10) and (3.11) of Fan

et al. (2021). Letting β0CBP S denote the probability limit of β̂ CBP S , we decompose

τ̂ CBP S − τ
n
" #
1 1X Di − π(Xi′ β0CBP S )  ′ CBP S

= ∆Yi − Xi γ − Di τ
D̄ n i=1 1 − π(Xi′ β0CBP S )
n
" #
1 1X Di − π(Xi′ β̂ CBP S ) Di − π(Xi′ β0CBP S )  ′ CBP S

+ − ∆Yi − Xi γ
D̄ n i=1 1 − π(Xi′ β̂ CBP S ) 1 − π(Xi′ β0CBP S )

:= A1 + A2 .

First, we write γ CBP S = γ ∗ + δA for any value A since γ CBP S is arbitrary. Then we have

n o
∆Yi − Xi′ γ CBP S = ∆Yi − m∆ (Xi ) + m∆ (Xi ) − Xi′ γ CBP S

= ∆Yi − m∆ (Xi ) + δ r(Xi ) − Xi′ A .



(A.3)

Also, by the same argument as the proof of (C.1) in Fan et al. (2021), we have

" #−1 " #


∗ π̇i∗ πi∗
β0CBP S − β = ξE Xi Xi′ E u∗ X ′ + O(ξ 2 ). (A.4)
1 − πi∗ 1 − πi∗ i i

where u∗i = ui (Xi ; β ∗ ), πi∗ = π(Xi′ β ∗ ) and π̇i∗ = π̇(Xi′ β ∗ ). Hence by using a similar argument as the

proof of Theorem 1 with (A.3) and (A.4), we have

A2 = Op (δn−1/2 ).

and

n
" #
1X 1 Di − π(Xi′ β0CBP S )  ′ CBP S

A1 = ηie + E m X
∆ i − Xi γ + Op (ξn−1/2 + δn−1/2 )
n i=1 E[Di ] 1 − π(Xi′ β0CBP S )
n
" #
1X 1 Di − π(Xi′ β0CBP S ) 
= ηie + E δ r(Xi ) − Xi′ A + Op (ξn−1/2 + δn−1/2 )
n i=1 E[Di ] 1 − π(Xi′ β0CBP S )
n
1X
= η e + Op (ξ 2 δ + ξn−1/2 + δn−1/2 ).
n i=1 i

28
To see the last equality, the second term is calculated as

" #
1 Di − π(Xi′ β0CBP S ) 
E δ r(Xi ) − Xi′ A
E[Di ] 1 − π(Xi′ β0CBP S )
"( ) #
1 Di − π(Xi′ β ∗ ) (1 − Di )π̇i∗ Xi′ (β0CBP S − β ∗ )
Xi′ A + O(ξ 2 δ)

= E − δ r(Xi ) −
E[Di ] 1 − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ )}2
"( ) #
1 π(Xi′ β ∗ ) (1 + ξu∗i ) − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ ) (1 + ξu∗i )} π̇i∗ Xi′ (β0CBP S − β ∗ )
δ r(Xi ) − Xi′ A

= E −
E[Di ] 1 − π(Xi′ β ∗ ) {1 − π(Xi′ β ∗ )}2

+ O(ξ 2 δ)
 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 
1

 ξui∗ ξ π̇i i 1−πi ∗ X X
i i E u
1−πi i i
∗ 
δ r(Xi ) − Xi′ A  + O(ξ 2 δ)

= E −
 
′ ∗ ′ ∗
E[Di ]  1 − π(Xi β )
 1 − π(Xi β ) 

 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 
ξδ

 ∗
ui π̇ i i 1−πi ∗ Xi Xi E ∗
1−πi iu i 
r(Xi ) − Xi′ A  + O(ξ 2 δ).

= E ′β∗) −
 
E[Di ] 1 − π(X 1 − π(X ′β∗)
i i

 

 h π̇∗ i−1 h π∗ i 
 ∗ ′ i
π̇i Xi E 1−π∗ Xi Xi ′ i ∗
E 1−π∗ ui Xi 

u∗i

Hence assuming that at least one entry of E  1−π(X ′ β ∗ ) − i i
Xi  is
 
 i 1−π(Xi′ β ∗ ) 
 
nonzero, there exists A such that

 h ∗ i−1 h ∗ i 
∗X ′E π̇i ′ πi ∗X 

 ∗
ui π̇i i 1−πi ∗ Xi Xi E ∗
1−πi iu i 
r(Xi ) − Xi′ A  = 0.

E ′β∗) −
 
1 − π(X 1 − π(X ′β∗)
i i

 

This completes the proof of (3.10). The proof of (3.11) follows from the same argument except that

γ̂ AIP W is not arbitrary unlike γ CBP S .

29
A9. Additional Application Results

Table 7: Deviation of different DID estimators for the effect of training on real earnings in 1978,
with PSID comparison group

Lalonde Sample DW Sample


True ATT=0 True ATT=0

τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S τ̂ IP W τ̂ OR τ̂ AIP W τ̂ CBP S

Lin. -1038 -1605 1190 661 2238 -664 4062 1932


(821) (699) (1169) (505) (1196) (898) (2298) (552)

Qua. 812 -1236 1357 930 1785 -308 4673 2331


(825) (704) (1177) (569) (1268) (902) (2480) (681)

S&Z. 811 -947 1039 915 1750 209 3934 2215


(826) (642) (1052) (564) (1310) (812) (2118) (715)

30

You might also like