Asheshrambachan Harvard EconomicsPhD Dissertation Revised
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
Citation
Rambachan, Asheshananda. 2022. Essays on Identification and Causality. Doctoral dissertation,
Harvard University Graduate School of Arts and Sciences.
Permanent link
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37371934
Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA
Accessibility
Essays on Identification and Causality
A dissertation presented
by
Asheshananda Rambachan
to
Doctor of Philosophy
in the subject of
Economics
Harvard University
Cambridge, Massachusetts
April 2022
© 2022 Asheshananda Rambachan
Abstract
analysis, with a particular focus on understanding what researchers can learn from data under
The first chapter characterizes the behavioral and econometric assumptions under which
researchers can identify whether expert decision makers, such as doctors, judges, and managers,
make systematic prediction mistakes in observational empirical settings like medical testing,
pretrial release, and hiring. Under these assumptions, I provide a statistical test for whether the
decision maker makes systematic prediction mistakes and methods for conducting inference on
the ways in which the decision maker’s predictions are systematically biased. As an empirical
illustration, I analyze the pretrial release decisions of judges in New York City.
The second chapter, which is coauthored with Neil Shephard, develops the direct potential
outcome system as a foundational framework for analyzing dynamic causal effects in observational
time series settings. We provide novel conditions under which popular time series estimands,
such as the impulse response function, local projections, and local projection with an instrumental
The third chapter, which is coauthored with Iavor Bojinov and Neil Shephard, proposes a rich
class of finite population dynamic causal effects in panel experiments. We provide a nonparametric
estimator that is unbiased for these dynamic causal effects over the randomization distribution
of assignments, derive its finite population limiting distribution, and develop two methods for
conducting inference. We further show that population linear fixed effect estimators do not recover
causally interpretable estimands if there are dynamic causal effects and serial correlation in the
iii
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Introduction 1
iv
2 When Do Common Time Series Estimands Have Nonparametric Causal Meaning? 55
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 The Direct Potential Outcome System and Dynamic Causal Effects . . . . . . . . . . 59
2.2.1 The Direct Potential Outcome System . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.2 Dynamic Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.2.3 Links to macroeconometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3 Estimands Based on Assignments and Outcomes . . . . . . . . . . . . . . . . . . . . 67
2.3.1 Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.3.2 Local Projection Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.3.3 Generalized Impulse Response Function . . . . . . . . . . . . . . . . . . . . . 71
2.3.4 Generalized Local Projection and Local Filtered Projection Estimands . . . . 73
2.4 The Instrumented Potential Outcome System . . . . . . . . . . . . . . . . . . . . . . . 73
2.4.1 The Instrumented System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.5 Estimands Based on Assignments, Instruments and Outcomes . . . . . . . . . . . . . 75
2.5.1 Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5.2 IV Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5.3 Generalized Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.5.4 Generalized IV and Filtered IV Estimands . . . . . . . . . . . . . . . . . . . . 80
2.6 Estimands Based on Instruments and Outcomes . . . . . . . . . . . . . . . . . . . . . 81
2.6.1 Ratio Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.6.2 Local Projection IV Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.6.3 Generalized Ratio Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6.4 Generalized Local Projection IV and Local Filtered Projection IV Estimands 85
2.7 Estimands Based Only on Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.7.1 Linear simultaneous equation approach . . . . . . . . . . . . . . . . . . . . . . 86
2.7.2 Causal meaning of the GIRF of Yk,t on Yj,t`h . . . . . . . . . . . . . . . . . . . 88
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
v
3.4 Estimation in a linear potential outcome panel . . . . . . . . . . . . . . . . . . . . . . 104
3.4.1 Interpreting the unit fixed effects estimator . . . . . . . . . . . . . . . . . . . . 105
3.4.2 Interpreting the two-way fixed effects estimator . . . . . . . . . . . . . . . . . 106
3.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.5.1 Simulation design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.5.2 Normal approximations and size control . . . . . . . . . . . . . . . . . . . . . 109
3.5.3 Rejection rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.6 Empirical application in experimental economics . . . . . . . . . . . . . . . . . . . . 111
3.6.1 Inference on total lag-p weighted average dynamic causal effects . . . . . . . 112
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
References 115
vi
Appendix B Appendix to Chapter 2 207
B.1 Proofs of Results for Assignments and Outputs . . . . . . . . . . . . . . . . . . . . . 207
B.1.1 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
B.1.2 Proof of Theorem 2.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
B.1.3 Proof of Theorem 2.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2 Proofs of Results for Assignments, Instruments and Outputs . . . . . . . . . . . . . 209
B.2.1 Proof of Theorem 2.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2.2 Proof of Theorem 2.5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3 Proofs of Results for Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3.1 Proof of Theorem 2.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
vii
List of Tables
1.1 Estimated lower bound on the fraction of judges whose release decisions are
inconsistent with expected utility maximization at accurate beliefs about failure to
appear risk given defendant characteristics. . . . . . . . . . . . . . . . . . . . . . . . . 43
1.2 Location of the maximum studentized violation of revealed preference inequali-
ties among judges whose release decisions are inconsistent with expected utility
maximization at accurate beliefs about failure to appear risk given defendant
characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Top line results for the causal interpretation of common estimands based on assign-
ments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.2 Top line results for the causal interpretation of common estimands based on assign-
ments, instruments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.3 Top line results for the causal interpretation of common estimands based on instru-
ments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.1 Null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2 Stage games from twice-played prisoners’ dilemma in the experiment conducted by
Andreoni and Samuelson (2006). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3 Summary statistics for the experiment in Andreoni and Samuelson (2006). . . . . . 112
3.4 Estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3. 113
A.1 Estimated lower bound on the fraction of judges whose release decisions are
inconsistent with expected utility maximization behavior at accurate beliefs about
any pretrial misconduct risk given defendant characteristics. . . . . . . . . . . . . . . 140
A.2 Summary statistics comparing the main estimation sample and cases heard by the
top 25 judges, broken out by defendant race. . . . . . . . . . . . . . . . . . . . . . . . 185
A.3 Summary statistics for released and detained defendants in the main estimation
sample and for cases heard by the top 25 judges. . . . . . . . . . . . . . . . . . . . . . 186
A.4 Summary statistics of misconduct rates among released defendants in the main
estimation sample and cases heard by the top 25 judges. . . . . . . . . . . . . . . . . 187
viii
A.5 Summary statistics in the universe of all cases subject to a pretrial release decision
and main estimation sample in the NYC pretrial release data, broken out by
defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
A.6 Summary statistics for released and detained defendants in the universe of all cases
subject to a pretrial release decision and the main estimation sample in the NYC
pretrial release data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.7 Balance check estimates for the quasi-random assignment of judges for all defen-
dants and by defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.8 Balance check estimates for the quasi-random assignment of judges by defendant
race and age. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
A.9 Balance check estimates for the quasi-random assignment of judges by defendant
race and felony charge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.10 Location of the maximum studentized violation of revealed preference inequali-
ties among judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about any pretrial misconduct risk. . . . 202
A.11 Estimated lower bound on the fraction of judges whose “release on recognizance”
decisions are inconsistent with expected utility maximization behavior at accurate
beliefs about behavior under bail conditions and failure to appear risk given
defendant characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
C.1 Null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
C.2 Null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
ix
List of Figures
1.1 Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-age cells for
one judge in New York City. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.2 Estimated bounds on implied prediction mistakes between lowest and highest
predicted failure to appear risk deciles made by judges within each race-by-age cell. 46
1.3 Ratio of total expected social welfare under algorithmic decision rule relative to
release decisions of judges that make detectable prediction mistakes about failure
to appear risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.4 Ratio of total expected social welfare under algorithmic decision rule that corrects
prediction mistakes relative to release decisions of judges that make detectable
prediction mistakes about failure to appear risk. . . . . . . . . . . . . . . . . . . . . . 53
3.1 Simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 108
3.2 Rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and
H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.1 Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-felony charge
cells for one judge in New York City. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Estimated bounds on implied prediction mistakes between top and bottom pre-
dicted failure to appear risk deciles made by judges within each race-by-felony
charge cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
A.3 Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that make detectable prediction mistakes about failure
to appear risk by defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.4 Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that make detectable prediction mistakes about failure to appear risk.136
A.5 Ratio of total expected social welfare under algorithmic decision rule relative to
release decisions of judges that do not make detectable prediction mistakes about
failure to appear risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
x
A.6 Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes by
defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.7 Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that do not make detectable prediction mistakes. . . . . . . . . . . . 139
A.8 Histogram of number of cases heard by each judge in the top 25 judges. . . . . . . . 183
A.9 Receiver-operating characteristic (ROC) curves for ensemble prediction functions . 184
A.10 Ratio of total expected social welfare under algorithmic decision rules relative to
observed release decisions of judges that make detectable prediction mistakes over
race-by-felony charge cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.11 Ratio of total expected social welfare under full automation decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes over
race-by-felony charge cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.12 Fraction of judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about failure to appear risk using direct
imputation bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.13 95% confidence intervals for the implied prediction mistake of failure to appear risk
between the highest and lowest predicted failure to appear risk deciles using direct
imputation bounds with κ “ 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
A.14 95% confidence intervals for the implied prediction mistakes of any pretrial miscon-
duct risk between the highest and lowest predicted any pretrial misconduct risk
deciles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
C.1 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 219
C.2 Simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 220
C.3 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 221
C.4 Simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 223
C.5 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 224
C.6 Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
C.7 Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
xi
C.8 Simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 227
C.9 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 228
C.10 Simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of
the parameter ϕ and treatment probability ppwq. The rows index the parame-
ter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability ppwq P
t0.25, 0.5, 0.75u. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
C.11 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 230
C.12 Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
C.13 Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
C.14 Simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 233
C.15 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 234
C.16 Estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5)
of W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of
Andreoni and Samuelson (2006). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
C.17 Estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq of
W “ 1tλ ě 0.6u on cooperation in period one based on the experiment of Andreoni
and Samuelson (2006) for each time period t P rTs and p “ 0, 1, 2, 3. . . . . . . . . . . 237
xii
Acknowledgments
I have so many people to thank for filling my life as a Ph.D. student with happiness and joy.
support. Sendhil Mullainathan took me under his wing early on, and he has been a constant
source of inspiration and fun ever since. A call or text from Sendhil could always put a smile
on my face and reignite my passion for research. Isaiah Andrews is my role model in every
sense. He constantly made time for me, taught me something new in every conversation, and
showed me that brilliance does not conflict with caring, generosity, and kindness. Neil Shephard
showed me the ropes of conducting research – from conceiving an idea to writing a draft, and
from presentation to publication. My collaboration with Neil was the best apprenticeship I
could have imagined. Elie Tamer provided expert advice and helped me see how I fit into the
greater econometrics community. Gary Chamberlain patiently answered my many questions and
provided thoughtful feedback on my half-baked ideas. He gave me the confidence that I belonged
Beyond my advisors, I am grateful to have had Jens Ludwig, Iavor Bojinov, Alexandra
Chouldechova, Ed Glaeser, Pepe Montiel Olea, and Joshua Schwartzstein first as mentors and
now as friends. I received invaluable advice from Jim Stock and Larry Katz. My undergraduate
advisors at Princeton, Mark Watson and Hank Farber, inspired me to do a Ph.D. in Economics
with Jon Kleinberg, Nano Barahona, Matthew Gentzkow, and Jesse Shapiro. Their intellectual
Jonathan Roth and Amanda Coston were close collaborators, and even better friends. Andrea
Giorgio Saponaro, Sagar Saxena, and Roman Sigalov filled my Ph.D. with raucous laughter. I am
so grateful we were in the same cohort and to have become their friends.
My family was my bedrock of love and support throughout my Ph.D. My parents, Anant and
it, and supported me every step of the way. I owe my parents everything. The highlight of my
Ph.D. was living within a mile of my brother Akshar and sister-in-law Apoorva for two years. Our
xiii
weekly visits to Cambridge Common and Mario Kart battles were an oasis of fun amidst classes
and research. My sister Ishanaa and brother-in-law James were never more than a Facetime call
away. Many adventure-filled weekend visits to D.C. or San Francisco often meant they were much
closer. During my Ph.D., our family grew in size with the birth of my two nieces, Aadyaa and
Amiyaa. They are two rays of sunshine that brighten my life every day.
Finally, I was blessed to meet the love of my life, Jessica, during my Ph.D. Her love, support,
and humor gave me the strength to persevere, and she means more to me than I can possibly put
xiv
To my Nani, Sohnmatee Balkisoon; and the loving memories of my
Nana, Dibnarine Balkissoon; Ajee, Chanrawtee Rambachan; and Aja,
Deoraj Rambachan.
xv
Introduction
analysis, with a particular focus on understanding what researchers can learn from data under
Decision makers, such as doctors, judges, and managers, make consequential choices based on
predictions of unknown outcomes. In the first chapter, I consider the following questions: Do these
decision makers make systematic prediction mistakes based on the available information? If so, in
what ways are their predictions systematically biased? Uncovering systematic prediction mistakes
is difficult as the preferences and information sets of decision makers are unknown to researchers.
In this paper, I characterize behavioral and econometric assumptions under which systematic
prediction mistakes can be identified in observational empirical settings such as hiring, medical
testing, and pretrial release. I derive a statistical test for whether the decision maker makes
systematic prediction mistakes under these assumptions and show how supervised machine
learning based methods can be used to apply this test. I provide methods for conducting inference
on the ways in which the decision maker’s predictions are systematically biased. As an illustration,
I apply this econometric framework to analyze the pretrial release decisions of judges in New
York City, and I estimate that at least 20% of judges make systematic prediction mistakes about
In the second chapter, which is joint work with Neil Shephard, we introduce the nonparametric,
direct potential outcome system as a foundational framework for analyzing dynamic causal effects
restrictions on the potential outcome process nor restrictions on the extent to which past assign-
ments may causally affect outcomes. Using this framework, we provide conditions under which
1
common predictive time series estimands, such as the impulse response function, generalized
impulse response function, local projection, and local projection instrument variables, have a
The third chapter, which is joint work with Iavor Bojinov and Neil Shephard, analyzes dynamic
causal effects in panel experiments. In panel experiments, we randomly assign units to different
interventions, measuring their outcomes, and repeating the procedure in several periods. Using
the potential outcomes framework, we define finite population dynamic causal effects that capture
the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects,
we provide a nonparametric estimator that is unbiased over the randomization distribution and
derive its finite population limiting distribution as either the sample size or the duration of
the experiment increases. We develop two methods for inference: a conservative test for weak
null hypotheses and an exact randomization test for sharp null hypotheses. We further analyze
the finite population probability limit of linear fixed effects estimators. These commonly-used
estimators do not recover a causally interpretable estimand if there are dynamic causal effects and
serial correlation in the assignments, highlighting the value of our proposed estimator.
2
Chapter 1
1.1 Introduction
Decision makers, such as doctors, judges, and managers, are commonly tasked with making
whether to detain a defendant awaiting trial, a judge predicts what the defendant will do if released
based on information such as the defendant’s current criminal charge and prior arrest record. Are
these decision makers making systematic prediction mistakes based on this available information?
If so, in what ways are their predictions systematically biased? These foundational questions
in behavioral economics and psychology (e.g., Meehl, 1954; Tversky and Kahneman, 1974) have
renewed policy relevance and empirical life as machine learning based models increasingly replace
or inform decision makers in criminal justice, health care, labor markets, and consumer finance.2
1I am especially grateful to Isaiah Andrews, Sendhil Mullainathan, Neil Shephard, Elie Tamer and Jens Ludwig
for their invaluable feedback, support, and advice. I thank Alex Albright, Nano Barahona, Laura Blattner, Iavor
Bojinov, Raj Chetty, Bo Cowgill, Will Dobbie, Xavier Gabaix, Matthew Gentzkow, Ed Glaeser, Yannai Gonczarowski,
Larry Katz, Daniel Martin, Ross Mattheis, Robert Minton, Ljubica Ristovska, Jonathan Roth, Suproteem Sarkar, Joshua
Schwartzstein, Jesse Shapiro, Chris Walker, and participants at the Brookings Institution’s Artificial Intelligence
Conference for many useful comments and suggestions. I also thank Hye Chang, Nicole Gillespie, Hays Golden, and
Ellen Louise Dunn for assistance at the University of Chicago Crime Lab. All empirical results based on New York City
pretrial data were originally reported in a University of Chicago Crime Lab technical report (Rambachan and Ludwig,
2021). I acknowledge financial support from the NSF Graduate Research Fellowship (Grant DGE1745303).
2 Riskassessment tools are used in criminal justice systems throughout the United States (Stevenson, 2018; Albright,
2019; Dobbie and Yang, 2019; Stevenson and Doleac, 2019; Yang and Dobbie, 2020). Clinical risk assessments aid
3
In assessing whether such machine learning based models can improve decision-making,
empirical researchers evaluate decision makers’ implicit predictions through comparisons of their
choices against those made by predictive models.3 Uncovering systematic prediction mistakes from
decisions is challenging, however, as both the decision maker’s preferences and information set
are unknown to us. For example, we do not know how judges assess the cost of pretrial detention.
Judges may uncover useful information through their courtroom interactions with defendants, but
we do not observe these interactions. Hence, the decision maker’s choices may diverge from the
model not because she is making systematic prediction mistakes, but rather she has preferences
that differ from the model’s objective function or observes information that is unavailable to the
model. While this empirical literature recognizes these challenges (e.g., Kleinberg et al., 2018a;
Mullainathan and Obermeyer, 2021), it lacks a unifying econometric framework for analyzing the
decision maker’s choices under the weakest possible assumptions about their preferences and
information sets.
This paper develops such an econometric framework for analyzing whether a decision maker
makes systematic prediction mistakes and to characterize how predictions are systematically
biased. This clarifies what can (and cannot) be identified about systematic prediction mistakes
from data and empirically relevant assumptions about behavior, and maps those assumptions
into statistical inferences about systematic prediction mistakes. I consider empirical settings, such
as pretrial release, medical treatment or diagnosis, and hiring, in which a decision maker must
make decisions for many individuals based on a prediction of some unknown outcome using
each individual’s characteristics. These characteristics are observable to both the decision maker
and the researcher. The available data on the decision maker’s choices and associated outcomes
suffer from a missing data problem (Heckman, 1974; Rubin, 1976; Heckman, 1979; Manski, 1989):
doctors in diagnostic and treatment decisions (Obermeyer and Emanuel, 2016; Beaulieu-Jones et al., 2019; Abaluck
et al., 2020; Chen et al., 2020). For applications in consumer finance, see Einav et al. (2013), Fuster et al. (2018), Gillis
(2019), Dobbie et al. (2020), and Blattner and Nelson (2021) in economics, and see Khandani et al. (2010), Hardt et al.
(2016), Liu et al. (2018), and Coston et al. (2021) in computer science. For discussions of workforce analytics and resume
screening software, see Autor and Scarborough (2008), Jacob and Lefgren (2008), Rockoff et al. (2011), Feldman et al.
(2015), Hoffman et al. (2018), Erel et al. (2019), Li et al. (2020), Raghavan et al. (2020), and Frankel (2021).
3 See, for example, Kleinberg et al. (2015), Berk et al. (2016), Chalfin et al. (2016), Chouldechova et al. (2018), Cowgill
(2018), Hoffman et al. (2018), Kleinberg et al. (2018a), Erel et al. (2019), Ribers and Ullrich (2019), Li et al. (2020), Jung
et al. (2020a), and Mullainathan and Obermeyer (2021). Comparing a decision maker’s choices against a predictive
model has a long tradition in psychology (e.g., Dawes, 1971, 1979; Dawes et al., 1989; Camerer and Johnson, 1997; Grove
et al., 2000; Kuncel et al., 2013). See Camerer (2019) for a recent review of this literature.
4
the researcher only observes the outcome conditional on the decision maker’s choices (e.g., the
researcher only observes a defendant’s behavior upon release if a judge released them).4
This paper then makes four main contributions. First, I characterize behavioral and econometric
assumptions under which systematic prediction mistakes can be identified in these empirical
settings. Second, under these assumptions, I derive a statistical test for whether the decision
maker makes systematic prediction mistakes and show how machine learning based models can
be used to apply this test. Third, I provide methods for conducting inference on the ways in which
the decision maker’s predictions are systematically biased. These contributions provide, to my
knowledge, the first microfounded econometric framework for studying systematic prediction
mistakes in these empirical settings, enabling researchers to answer a wider array of behavioral
questions under weaker assumptions than existing empirical research.5 Finally, I apply this
econometric framework to analyze the pretrial release decisions of judges in New York City as an
empirical illustration.
I explore the restrictions imposed on the decision maker’s choices by expected utility max-
imization, which models the decision maker as maximizing some (unknown to the researcher)
utility function at beliefs about the outcome given the characteristics as well as some private
information.6,7,8 Due to the missing data problem, the true conditional distribution of the outcome
given the characteristics is partially identified. The expected utility maximization model therefore
4A large literature explores whether forecasters, households, or individuals have rational expectations in settings
where both outcomes and subjective expectations or forecasts are observed. See, for example, Manski (2004), Elliott
et al. (2005), Elliott et al. (2008), Gennaioli et al. (2016), Bordalo et al. (2020), D’Haultfoeuille et al. (2020), and Farmer et al.
(2021). I focus on settings in which we only observe an individual’s discrete choices and partial information about the
outcome.
5 Appendix A.2 provides a step-by-step user’s guide for empirical researchers interested in applying these methods.
6 Mourifie et al. (2019) and Henry et al. (2020) analyze the testable implications of Roy-style and extended Roy-style
selection. See Heckman and Vytlacil (2006) for an econometric review of Roy selection models. The expected utility
maximization model can be interpreted as a generalized Roy model, and I discuss these connections in Section 1.2.
7A literature in decision theory explores conditions under which a decision maker’s random choice rule, which
summarizes their choice probabilities in each possible menu of actions, has a random utility model representation. See,
for example, Gul and Pesendorfer (2006), Gul et al. (2014), Lu (2016), and Natenzon (2019). I consider empirical settings
in which we only observe choice probabilities from a single menu.
8 Kubler et al. (2014), Echenique and Saito (2015), Chambers et al. (2016), Polisson et al. (2020), and Echenique et al.
(2021) use revealed preference analysis to characterize expected utility maximization behavior in consumer demand
settings, in which a consumer’s state-contingent consumption choices across several budget sets are observed. See the
review in Echenique (2020).
5
only restricts the decision maker’s beliefs given the characteristics to lie in this identified set,
what I call “accurate beliefs.” If there exists no utility function in a researcher-specified class
that rationalizes observed choices under this model, I therefore say the decision maker is making
set of utility functions at which the decision maker’s choices are consistent with expected utility
candidate utility function, then some distribution of private information can be constructed such
that the decision maker cannot do better than their observed choices in an expected utility sense.9
If the identified set of utility functions is empty, then the decision maker is making systematic
prediction mistakes as there is no combination of utility function and private information at which
their observed choices are consistent with expected utility maximization at accurate beliefs.
I then prove that without further assumptions systematic prediction mistakes are untestable. If
either all characteristics of individuals directly affect the decision maker’s utility function or the
missing data can take any value, then the identified set of utility functions is non-empty. Any
variation in the decision maker’s conditional choice probabilities can be rationalized by a utility
function and private information that sufficiently vary across all the characteristics. However,
placing an exclusion restriction on which characteristics may directly affect the utility function
and constructing informative bounds on the missing data restore the testability of expected utility
maximization behavior.10 Under such an exclusion restriction, variation in the decision maker’s
choices across characteristics that do not directly affect the utility function must only arise due
to variation in beliefs. The decision maker’s beliefs given the characteristics and her private
information must further be Bayes-plausible with respect to some distribution of the outcome
given the characteristics that lies in the identified set. Together this implies testable restrictions on
9 By searching for any distribution of private information that rationalizes the decision maker’s choices, my analysis
follows in the spirit of the robust information design literature (e.g., Kamenica and Gentzkow, 2011; Bergemann and
Morris, 2013, 2016, 2019; Kamenica, 2019). Syrgkanis et al. (2018) and Bergemann et al. (2019) use results from this
literature to study multiplayer games, whereas I analyze the choices of a single decision maker. Gualdani and Sinha
(2020) also analyzes single-agent settings under weak assumptions on the information environment.
10 The exclusion restriction on which characteristics may directly affect utility complements recent results on the
validity of “marginal outcome tests” for discrimination (Bohren et al., 2020; Canay et al., 2020; Gelbach, 2021; Hull, 2021).
In the special case of a binary decision and binary outcome, the expected utility maximization model is a generalization
of the extended Roy model analyzed in Canay et al. (2020) and Hull (2021). I formalize these connections in Section 1.3.
6
the decision maker’s choices across characteristics that do not directly affect utility. Behavioral
assumptions about the decision maker’s utility function and econometric assumptions to address
the missing data problem are therefore sufficient to identify systematic prediction mistakes.
These results clarify what conclusions can be logically drawn about systematic prediction
mistakes given the researcher’s assumptions on the decision maker’s utility function and the
missing data problem. These testable restrictions arise from the joint null hypothesis that the
decision maker maximizes expected utility at accurate beliefs and that their utility function
satisfies the conjectured exclusion restriction.11 If these restrictions are satisfied, we cannot
logically reject that the decision maker’s choices maximize expected utility at some utility function
in the researcher-specified class and beliefs about the outcomes given the characteristics that lie
in the identified set. Stronger conclusions would require stronger assumptions on the decision
which many procedures are available (e.g., see the reviews by Canay and Shaikh, 2017; Molinari,
2020). The number of moment inequalities grows with the dimensionality of the observable
deal with this practical challenge, I discuss how supervised machine learning methods may be
used to reduce the dimension of this testing problem. Researchers may construct a prediction
function for the outcome on held out data and partition the characteristics into percentiles of
predicted risk based on this estimated prediction function. Testing implied revealed preference
inequalities across percentiles of predicted risk is a valid test of the joint null hypothesis that the
decision maker’s choices maximize expected utility at a utility function satisfying the conjectured
exclusion restriction and accurate beliefs. This provides, to my knowledge, the first microfounded
procedure for using supervised machine learning based prediction functions to identify systematic
prediction mistakes.
With this framework in place, I further establish that the data are informative about how
11 This finding echoes a classic insight in finance that testing whether variation in asset prices reflect violations of
rational expectations requires assumptions about admissible variation in underlying stochastic discount factors. See
Campbell (2003), Cochrane (2011) for reviews and Augenblick and Lazarus (2020) for a recent contribution.
7
the decision maker’s predictions are systematically biased. I extend the behavioral model to
allow the decision maker to have possibly inaccurate beliefs about the unknown outcome and
sharply characterize the identified set of utility functions at which the decision maker’s choices
are consistent with “inaccurate” expected utility maximization.12 This takes no stand on the
behavioral foundations for the decision maker’s inaccurate beliefs, and so it encompasses various
Sims, 2003; Gabaix, 2014; Caplin and Dean, 2015; Bordalo et al., 2016; Handel and Schwartzstein,
2018). I derive bounds on an interpretable parameter that summarizes the extent to which the
decision maker’s beliefs overreact or underreact to the characteristics of individuals. For a fixed
pair of characteristic values, these bounds summarize whether the decision maker’s beliefs about
the outcome vary more (“overreact”) or less than (“underreact”) the true conditional distribution
of the outcome across these values. These bounds again arise because any variation in the decision
maker’s choices across characteristics that do not directly affect utility must only arise due to
variation in beliefs. Comparing observed variation in the decision maker’s choice probabilities
against possible variation in the probability of the outcome is therefore informative about the
As an empirical illustration, I analyze the pretrial release system in New York City, in which
judges decide whether to release defendants awaiting trial based on a prediction of whether
they will fail to appear in court.13 For each judge, I observe the conditional probability that she
releases a defendant given a rich set of characteristics (e.g., race, age, current charge, prior criminal
record, etc.) as well as the conditional probability that a released defendant fails to appear in
court. The conditional failure to appear rate among detained defendants is unobserved due to
the missing data problem. If all defendant characteristics may directly affect the judge’s utility
function or the conditional failure to appear rate among detained defendants may take any value,
then my theoretical results establish that the judge’s release decisions are always consistent with
12 The decision maker’s beliefs about the outcome conditional on the characteristics are no longer required to lie in
the identified set for the conditional distribution of the outcome given the characteristics.
13 Several empirical papers also study the New York City pretrial release system. Leslie and Pope (2017) estimates
the effects of pretrial detention on criminal case outcomes. Arnold et al. (2020b) and Arnold et al. (2020a) estimate
whether judges and pretrial risk assessments respectively discriminate against black defendants. Kleinberg et al. (2018a)
studies whether a statistical risk assessment could improve pretrial outcomes in New York City. I discuss the differences
between my analysis and this prior research in Section 1.5.
8
expected utility maximization behavior at accurate beliefs. We cannot logically rule out that the
judge’s release decisions reflect either a utility function that varies richly based on defendant
However, empirical researchers often assume that while judges may engage in taste-based
discrimination on a defendant’s race, other defendant characteristics such as prior pretrial mis-
conduct history only affects judges’ beliefs about failure to appear risk. Judges in New York City
are quasi-randomly assigned to defendants, which implies bounds on the conditional failure to
appear rate among detained defendants. Given such exclusion restrictions and quasi-experimental
bounds on the missing data, expected utility maximization behavior is falsified by “misrankings”
in the judge’s release decisions. Holding fixed defendant characteristics that may directly affect
utility (e.g., among defendants of the same race), do all defendants released by the judge have a
lower observed failure to appear rate than the researcher’s upper bound on the failure to appear
rate of all defendants detained by the judge? If not, there is no combination of a utility function
that satisfies the conjectured exclusion restriction nor private information such that the judge’s
choices maximize expected utility at accurate beliefs about failure to appear risk given defendant
characteristics.
By testing for such misrankings in the pretrial release decisions of individual judges, I estimate,
as a lower bound, that at least 20% of judges in New York City from 2008-2013 make systematic
prediction mistakes about failure to appear risk based on defendant characteristics. Under a
range of exclusion restrictions and quasi-experimental bounds on the failure to appear rate among
detained defendants, there exists no utility function nor distribution of private information such
that the release decisions of these judges would maximize expected utility at accurate beliefs
about failure to appear risk. I further find that these systematic prediction mistakes arise because
judges’ beliefs underreact to variation in failure to appear risk based on defendant characteristics
between predictably low risk and predictably high risk defendants. Rejections of expected utility
maximization behavior at accurate beliefs are therefore driven by release decisions on defendants
Finally, to highlight policy lessons from this behavioral analysis, I explore the implications of
replacing decision makers with algorithmic decision rules in the New York City pretrial release
setting. Since supervised machine learning methods are tailored to deliver accurate predictions
9
(Mullainathan and Spiess, 2017; Athey, 2017), such algorithmic decision rules may improve
outcomes by correcting systematic prediction mistakes. I show that expected social welfare under
a candidate decision rule is partially identified, and inference on counterfactual expected social
welfare can be reduced to testing moment inequalities with nuisance parameters that enter the
moments linearly (e.g., see Gafarov, 2019; Andrews et al., 2019; Cho and Russell, 2020; Cox and
Shi, 2020). Using these results, I then estimate the effects of replacing judges who were found to
make systematic prediction mistakes with an algorithmic decision rule. Automating decisions
only where systematic prediction mistakes occur at the tails of the predicted risk distribution
weakly dominates the status quo, and can lead to up to 20% improvements in worst-case expected
social welfare, which is measured as a weighted average of the failure to appear rate among
released defendants and the pretrial detention rate. Automating decisions whenever the human
decision maker makes systematic prediction mistakes can therefore be a free lunch.14 Fully
replacing judges with the algorithmic decision rule, however, has ambiguous effects that depend
on the parametrization of social welfare. In fact, for some parametrizations of social welfare, I
find that fully automating decisions can lead to up to 25% reductions in worst-case expected
social welfare relative to the judges’ observed decisions. More broadly, designing algorithmic
decision rules requires carefully assessing their predictive accuracy and their effects on disparities
across groups (e.g., Barocas et al., 2019; Mitchell et al., 2019; Chouldechova and Roth, 2020). These
findings highlight that it is also essential to analyze which the existing decision makers make
systematic prediction mistakes and if so, on what decisions. This paper relates to a large empirical
literature that evaluates decision makers’ implicit predictions through either comparisons of their
choices against those made by machine learning based models or estimating structural models of
decision making in particular empirical settings. While the challenges of unknown preferences
and information sets are recognized, researchers typically resort to strong assumptions. Kleinberg
et al. (2018a) and Mullainathan and Obermeyer (2021) restrict preferences to be constant across
both decisions and decision makers. Lakkaraju and Rudin (2017), Chouldechova et al. (2018),
14 This
finding relates to a computer science literature on “human-in-the-loop” analyses of algorithmic decision
support systems (e.g., Tan et al., 2018; Green and Chen, 2019a,b; De-Arteaga et al., 2020; Hilgard et al., 2021). Recent
methods estimate whether a decision should be automated by an algorithm or instead be deferred to an existing decision
maker (Madras et al., 2018; Raghu et al., 2019; Wilder et al., 2020; De-Arteaga et al., 2021). I show that understanding
whether to automate or defer requires assessing whether the decision maker makes systematic prediction mistakes.
10
Coston et al. (2020), and Jung et al. (2020a) assume that observed choices were as-good-as randomly
assigned given the characteristics, eliminating the problem of unknown information sets. Recent
work introduces parametric models for the decision maker’s private information, such as Abaluck
et al. (2016), Arnold et al. (2020b), Jung et al. (2020b), and Chan et al. (2021). See also Currie
and Macleod (2017), Ribers and Ullrich (2020), and Marquardt (2021). I develop an econometric
framework for studying systematic prediction mistakes that only requires exclusion restrictions on
which characteristics affect the decision maker’s preferences but no further restrictions. I model
the decision maker’s information environment fully nonparametrically. This enables researchers
to both identify and characterize systematic prediction mistakes in many empirical settings under
(SDSC) data” (Caplin and Martin, 2015; Caplin and Dean, 2015; Caplin, 2016; Caplin et al., 2020;
Caplin and Martin, 2021). While useful in analyzing lab-based experiments, such characterization
results have had limited applicability so far due to the difficulty of collecting such SDSC data
(Gabaix, 2019; Rehbeck, 2020). I focus on common empirical settings in which the data suffer
from a missing data problem, and show that these settings can approximate ideal SDSC data
variation in observational settings such as pretrial release, medical treatment or diagnosis, and
hiring can therefore solve the challenge of “economic data engineering” recently laid out by
Caplin (2021). Martin and Marx (2021) study the identification of taste-based discrimination
by a decision maker in a binary choice experiment, providing bounds on the decision maker’s
group-dependent threshold rule. The setting I consider nests theirs by allowing for several key
features of observational data such as missing data, multi-valued outcomes, and multiple choices.
Lu (2019) shows that a decision maker’s state-dependent utilities and beliefs can be identified
11
1.2 An Empirical Model of Expected Utility Maximization
A decision maker makes choices for many individuals based on a prediction of an unknown
outcome using each individual’s characteristics. Under what conditions do the decision maker’s
choices maximize expected utility at some (unknown to us) utility function and accurate beliefs
A decision maker makes choices for many individuals based on predictions of an unknown
outcome. The decision maker selects a binary choice c P t0, 1u for each individual. Each individual
potential outcome yc P Y is the outcome that would occur if the decision maker were to select
choice c P t0, 1u. The characteristics and potential outcomes have finite support, and I denote
summarizes the joint distribution of the characteristics, the decision maker’s choices, and the
potential outcomes across all individuals. I assume PpW “ w, X “ xq ě δ for all pw, xq P W ˆ X
The researcher observes the characteristics of each individual as well as the decision maker’s
choice. There is, however, a missing data problem: the researcher only observes the potential
outcome associated with the choice selected by the decision maker (Rubin, 1974; Holland, 1986).
Defining the observable outcome as Y :“ CY1 ` p1 ´ CqY0 , the researcher therefore observes the
joint distribution pW, X, C, Yq „ P. I assume the researcher knows this population distribution
with certainty to focus on the identification challenges in this setting. The researcher observes the
15 Thecharacteristics pw, xq P W ˆ X will play different roles in the expected utility maximization model, and so I
introduce separate notation now.
12
The counterfactual potential outcome probabilities PY0 py0 | 1, w, xq, PY1 py1 | 0, w, xq are not observed
due to the missing data problem.16 As a consequence, the choice-dependent potential outcome
probabilities
Notation: For a finite set A, let ∆pAq denote the set of all probability distributions on A. For
c P t0, 1u, let P⃗Y p ¨ | c, w, xq P ∆pY 2 q denote the vector of conditional potential outcome probabilities
given C “ c and characteristics pw, xq P W ˆ X . For any pair c, c̃ P t0, 1u I write PYc pyc | c̃, w, xq :“
PpYc “ yc | C “ c̃, W “ w, X “ xq and PYc p ¨ | c̃, w, xq P ∆pY q as the distribution of the potential
outcome Yc given C “ c̃ and characteristics pw, xq. Analogously, P⃗Y p⃗y | w, xq :“ Pp⃗
Y “ ⃗y | W “
with PYc p ¨ | w, xq P ∆pY q denote the distributions of potential outcomes given the characteristics
pw, xq P W ˆ X .
Throughout the paper, I model the researcher’s assumptions about the missing data problem
Assumption 1.2.1 (Bounds on Missing Data). For each c P t0, 1u and pw, xq P W ˆ X , there exists a
Let Bw,x denote the collection tB0,w,x , B1,w,x u and B denote the collection Bw,x at all pw, xq P W ˆ X .
In some cases, researchers may wish to analyze the decision maker’s choices without placing
any further assumptions on the missing data, which corresponds to setting Bc,w,x equal to the
set of all choice-dependent potential outcome probabilities that are consistent with the joint
distribution of the observable data pW, X, C, Yq „ P. In other cases, researchers may use quasi-
13
Under Assumption 1.2.1, various features of the joint distribution pW, X, C, ⃗
Yq „ P are partially
identified. The sharp identified set for the distribution of the potential outcome vector given the
characteristics pw, xq P W ˆ X , denoted by H P pP⃗Y p ¨ | w, xq; Bw,x q, equals the set of Pr⃗Y p ¨ | w, xq P
Pr⃗Y p⃗y | w, xq “ Pr⃗Y p⃗y | 0, w, xqπ0 pw, xq ` Pr⃗Y p⃗y | 1, w, xqπ1 pw, xq
for some Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x . If the bounds B0,w,x , B1,w,x are singletons for
The main text of the paper assumes that the decision maker faces only two choices and that
the characteristics have finite support. My results directly extend to settings with multiple choices,
and I provide an extension to the case with continuous characteristics in Supplement A.6.
In this section, I illustrate how several motivating empirical applications map into this setting.
Example 1.2.1 (Medical Treatment). A doctor decides what medical treatment to give a patient based on a
prediction of how that medical treatment will affect the patient’s health outcomes (Chandra and Staiger, 2007;
Manski, 2017; Currie and Macleod, 2017; Abaluck et al., 2020; Currie and Macleod, 2020). For example,
a doctor decides whether to give reperfusion therapy C P t0, 1u to an admitted patient that has suffered
from a heart attack (Chandra and Staiger, 2020). The potential outcomes Y0 , Y1 P t0, 1u denote whether the
patient would have died within 30 days of admission had the doctor not given or given reperfusion therapy
respectively. The characteristics pW, Xq summarize rich information about the patient that is available
at the time of admission such as demographic information, collected vital signs and the patient’s prior
medical history. The researcher observes the doctor’s conditional treatment probability π1 pw, xq, the 30-day
mortality rate among patients that received reperfusion therapy PY1 p1 | 1, w, xq, and the 30-day mortality
rate among patients that did not receive reperfusion therapy PY0 p1 | 0, w, xq. The counterfactual 30-day
In a large class of empirical applications, the decision maker’s choice does not have a direct
causal effect on the outcome of interest, but still generates a missing data problem through
14
selection. In these settings, the potential outcome given C “ 0 satisfies Y0 ” 0, and Y1 :“ Y ˚ is
a latent outcome that is revealed whenever the decision maker selects choice C “ 1. Hence, the
observable outcome is Y :“ C ¨ Y ˚ . These screening decisions are a leading class of prediction policy
Example 1.2.2 (Pretrial Release). A judge decides whether to to detain or release defendants C P t0, 1u
awaiting trial (Arnold et al., 2018; Kleinberg et al., 2018a; Arnold et al., 2020b). The latent outcome
Y ˚ P t0, 1u is whether a defendant would commit pretrial misconduct if released. The characteristics
pW, Xq summarize information about the defendant that is available at the pretrial release hearing such as
demographic information, the current charges filed against the defendant and the defendant’s prior criminal
record. The researcher observes the characteristics of each defendant, whether the judge released them, and
whether the defendant committed pretrial misconduct only if the judge released them. The judge’s conditional
release rate π1 pw, xq and conditional pretrial misconduct rate among released defendants PY˚ p1 | 1, w, xq
are observed. The conditional pretrial misconduct rate among detained defendants PY˚ p1 | 0, w, xq is
unobserved. ▲
Example 1.2.3 (Medical Testing and Diagnosis). A doctor decides whether to conduct a costly medical
test or make a particular diagnosis (Abaluck et al., 2016; Ribers and Ullrich, 2019; Chan et al., 2021).
For example, shortly after an emergency room visit, a doctor decides whether to conduct a stress test on
patients C P t0, 1u to determine whether they had a heart attack (Mullainathan and Obermeyer, 2021). The
latent outcome Y ˚ P t0, 1u is whether the patient had a heart attack. The characteristics pW, Xq summarize
information that is available about the patient such as their demographics, reported symptoms, and prior
medical history. The researcher observes the characteristics of each patient, whether the doctor conducted a
stress test and whether the patient had a heart attack only if the doctor conducted a stress test. The doctor’s
conditional stress testing rate π1 pw, xq and the conditional heart attack rate among stress tested patients
PY˚ p1 | 1, w, xq are observed. The conditional heart attack rate among untested patients PY˚ p1 | 0, w, xq is
unobserved. ▲
Example 1.2.4 (Hiring). A hiring manager decides whether to hire job applicants C P t0, 1u (Autor and
Scarborough, 2008; Chalfin et al., 2016; Hoffman et al., 2018; Frankel, 2021).17 The latent outcome Y ˚ P Y
17 Thesetting also applies to job interview decisions (Cowgill, 2018; Li et al., 2020), where the choice C P t0, 1u is
whether to interview an applicant and the outcome Y ˚ P t0, 1u is whether the applicant is ultimately hired by the firm.
15
is some measure of on-the-job productivity, which may be length of tenure since turnover is costly. The
characteristics pW, Xq are various information about the applicant such as demographics, education level
and prior work history. The researcher observes the characteristics of each applicant, whether the manager
hired the applicant, and their length of tenure only if hired. The manager’s conditional hiring rate π1 pw, xq
and the conditional distribution of tenure lengths among hired applicants PY˚ py˚ | 1, w, xq are observed.
The distribution of tenure lengths among rejected applicants PY˚ py˚ | 0, w, xq is unobserved. ▲
Other examples of a screening decision include loan approvals (e.g., Fuster et al., 2018; Dobbie
et al., 2020; Blattner and Nelson, 2021; Coston et al., 2021), child welfare screenings (Chouldechova
et al., 2018), and disability insurance screenings (e.g., Benitez-Silva et al., 2004; Low and Pistaferri,
2015, 2019).
I examine the restrictions imposed on the decision maker’s choices by expected utility maximiza-
tion. I define the two main ingredients of the expected utility maximization model. A utility
function summarizes the decision maker’s payoffs over choices, outcomes, and characteristics.
Private information is some additional random variable V P V that summarizes all additional
information that is available to the decision maker but unobserved by the researcher.
Definition 1.2.1. A utility function U : t0, 1u ˆ Y 2 ˆ W Ñ R specifies the payoff associated with each
choice-outcome pair, where Upc, ⃗y; wq is the payoff associated with choice c and potential outcome vector ⃗y
at characteristics w P W . Let U denote the feasible set of utility functions specified by the researcher.
Under the model, the decision maker observes the characteristics pW, Xq as well as some
worry that the decision maker observes additional private information that is not recorded in
the observable data. Doctors may learn useful information about the patient’s current health in
an exam. Judges may learn useful information about defendants from courtroom interactions.
But these interactions are often not recorded. Since it is unobservable to the researcher, I explore
restrictions on the decision maker’s behavior without placing distributional assumptions on their
private information.
16
Based on this information set, the decision maker forms beliefs about the unknown outcome
and selects a choice to maximize expected utility. The expected utility maximization model is
summarized by a joint distribution over the characteristics, private information, choices and
Definition 1.2.3. The decision maker’s choices are consistent with expected utility maximization if there
i. Expected Utility Maximization: For all c P t0, 1u, c1 ‰ c, pw, x, vq P W ˆ X ˆ V such that
Qpc | w, x, vq ą 0,
” ı ” ı
EQ Upc, ⃗
Y; Wq | W “ w, X “ x, V “ v ě EQ Upc1 , ⃗
Y; Wq | W “ w, X “ x, V “ v .
K⃗
ii. Information Set: C K Y | W, X, V under Q.
iii. Data Consistency: For all pw, xq P W ˆ X , there exists Pr⃗Y p ¨ | 0, w, xq P B0,w,x and Pr⃗Y p ¨ | 1, w, xq P
Definition 1.2.4. The identified set of utility functions, denoted by H P pU; B q Ď U , is the set of utility
In words, the decision maker’s choices are consistent with expected utility maximization if
three conditions are satisfied. First, if the decision maker selects a choice c with positive probability
given W “ w, X “ x, V “ v under the model Q, then it must have been optimal to do so ("Expected
Utility Maximization"). The decision maker may flexibly randomize across choices whenever they
are indifferent. Second, the decision maker’s choices must be independent of the outcome given
the characteristics and private information under the model Q ("Information Set"), formalizing
the sense in which the decision maker’s information set consists of only pW, X, Vq.18 Finally, the
joint distribution of characteristics, choices and outcomes under the model Q must be consistent
18 The“Information Set” condition is related to sensitivity analyses in causal inference that assumes there is some
unobserved confounder such that the decision is only unconfounded conditional on both the observable characteristics
and the unobserved confounder (e.g., Rosenbaum, 2002; Imbens, 2003; Kallus and Zhou, 2018; Yadlowsky et al., 2020).
See Supplement A.7.1 for further discussion of this connection.
17
The key restriction is that only the characteristics W P W directly enter the decision maker’s
utility function. The decision maker’s utility function satisfies an exclusion restriction on their
characteristics X P X may only affect their beliefs. In medical treatment, the utility function
specifies the doctor’s payoffs from treating a patient given their potential health outcomes.
Researchers commonly assume that a doctor’s payoffs are constant across patients, and patient
characteristics only affect beliefs about potential health outcomes under the treatment (e.g.,
Chandra and Staiger, 2007, 2020).19 In pretrial release, the utility function specifies a judge’s
relative payoffs from detaining a defendant that would not commit pretrial misconduct and
releasing a defendant that would commit pretrial misconduct. These payoffs may vary based
on only some defendant characteristics W. For example, the judge may engage in tasted-based
discrimination against black defendants (Becker, 1957; Arnold et al., 2018, 2020b), be more lenient
towards younger defendants (Stevenson and Doleac, 2019), or be more harsh towards defendants
Since this is a substantive economic assumption, I discuss three ways to specify such exclusion
restrictions on the decision maker’s utility function. First, as mentioned, exclusion restrictions on
the decision maker’s utility function are common in existing empirical research. The researcher
may therefore appeal to established modelling choices to guide this assumption. Second, the
what observable characteristics ought not to directly enter the decision maker’s utility function.
Third, the researcher may conduct a sensitivity analysis and report how their conclusions vary
how flexible the decision maker’s utility function must be across observable characteristics to
rationalize choices.
If Definition 1.2.3 is satisfied, then the decision maker’s implied beliefs about the outcome
given the observable characteristics under the expected utility maximization model, denoted
by Q⃗Y p ¨ | w, xq P ∆pY 2 q, lie in the identified set for the distribution of the outcome given the
19 Similarly, in medical testing and diagnosis decisions, researchers assume that a doctor’s preferences are constant
across patients, and patient characteristics only affect beliefs about the probability of an underlying medical condition
(e.g., Abaluck et al., 2016; Chan et al., 2021; Mullainathan and Obermeyer, 2021).
18
observable characteristics. This is an immediate consequence of Data Consistency in Definition
1.2.3.
Lemma 1.2.1. If the decision maker’s choices are consistent with expected utility maximization, then
Therefore, if the decision maker’s choices are consistent with expected utility maximization, then
their implied beliefs Q⃗Y p ¨ | w, xq must be “accurate” in this sense. Conversely, if the decision
maker’s choices are inconsistent with expected utility maximization, then there is no configuration
of utility function and private information such that their choices would maximize expected utility
given any implied beliefs in the identified set for the distribution of the outcome conditional on
the characteristics. In this case, their implied beliefs are systematically mistaken.
Definition 1.2.5. The decision maker is making detectable prediction mistakes based on the observable
characteristics if their choices are inconsistent with expected utility maximization, meaning H P pU; B q “ H.
under Definitions 1.2.3-1.2.5 is tied to both the researcher-specified bounds on the missing data
B0,w,x , B1,w,x (Assumption 1.2.1) and the feasible set of utility functions U (Definition 1.2.1). Less
informative bounds on the missing data imply that expected utility maximization places fewer
restrictions on behavior as there are more candidate values of the missing choice-dependent
potential outcome probabilities that may rationalize choices. Observed behavior that is con-
sistent with expected utility maximization at bounds B0,w,x , B1,w,x may, in fact, be inconsistent
with expected utility maximization at alternative, tighter bounds Br0,w,x , Br1,w,x .20 Analogously, a
larger feasible set of utility functions U implies that expected utility maximization places fewer
restrictions on behavior as the researcher is entertaining a larger set of utility functions that may
rationalize choices. Definition 1.2.5 must therefore be interpreted as a prediction mistake that can
be detected given the researcher’s assumptions on both the missing data and the feasible set of
utility functions.
20 Consider an extreme case in which P⃗Y p ¨ | w, xq is partially identified under bounds B0,w,x , B1,w,x but point
identified under alternative bounds Br0,w,x , Br1,w,x . Under Definitions 1.2.3-1.2.5, a detectable prediction mistake at
bounds Br0,w,x , Br1,w,x means that the decision maker’s implied beliefs Q⃗Y p ¨ | w, xq do not equal the point identified
quantity P⃗Y p ¨ | w, xq, yet a detectable prediction mistake at bounds B0,w,x , B1,w,x means that the decision maker’s
implied beliefs Q⃗Y p ¨ | w, xq do not lie in the identified set H P pP⃗Y p ¨ | w, xq; Bw,x q.
19
Remark 1.2.1. The expected utility maximization model relates to recent developments on Roy-style selec-
tion (Mourifie et al., 2019; Henry et al., 2020) and marginal outcome tests for taste-based discrimination
(Bohren et al., 2020; Canay et al., 2020; Gelbach, 2021; Hull, 2021). Defining the expected benefit functions
” ı ” ı
Λ0 pw, x, vq “ EQ Up0, ⃗ Y; Wq | W “ w, X “ x, V “ v , Λ1 pw, x, vq “ EQ Up1, ⃗ Y; Wq | W “ w, X “ x, V “ v ,
the expected utility maximization model is a generalized Roy model that imposes that the observable char-
acteristics W P W enter into the utility function and affect beliefs, whereas the observable characteristics
X P X and private information V P V only affect beliefs. The expected utility maximization model also
takes no stand on how the decision maker resolves indifferences, and so it is an incomplete econometric
The decision maker’s choices are consistent with expected utility maximization if and only if
there exists a utility function U P U and values of the missing data that satisfy a series of revealed
preference inequalities.
Theorem 1.2.1. The decision maker’s choices are consistent with expected utility maximization if and
only if there exists a utility function U P U and Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all
pw, xq P W ˆ X satisfying
” ı ” ı
EQ Upc, ⃗
Y; Wq | C “ c, W “ w, X “ x ě EQ Upc1 , ⃗
Y; Wq | C “ c, W “ w, X “ x . (1.1)
for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where the joint distribution
pW, X, C, ⃗
Yq „ Q is given by Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.
Corollary 1.2.1. The identified set of utility functions H P pU; B q is the set of all utility functions U P U
such that there exists Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X satisfying
(1.1)
Theorem 1.2.1 provides a necessary and sufficient characterization of expected utility maximization
behavior that only involves the data and the bounds on the choice-dependent potential outcome
probabilities. Importantly, the characterization no longer depends on the decision maker’s private
20
of interest and statistically test whether these inequalities are satisfied. In the next section, I use
The key insight is that checking whether observed choices are consistent with expected utility
maximization is an information design problem (Bergemann and Morris, 2019; Kamenica, 2019).
Observed choices are consistent with expected utility maximization at utility function Upc, ⃗y; wq if
and only if, at each pw, xq P W ˆ X , an information designer could induce a decision maker with
accurate beliefs about the potential outcomes given the characteristics (i.e., some Q⃗Y p ¨ | w, xq P
HpP⃗Y p ¨ | w, xq; Bw,x q) to take the observed choices by providing additional information to them via
some information structure. The information structure is the decision maker’s private information
under the expected utility maximization model. Due to the missing data problem, I must check
whether the information designer could induce the observed choices at any accurate beliefs in
the identified set HpP⃗Y p ¨ | w, xq; Bw,x q. This is possible if and only if the revealed preference
inequalities (1.1) are satisfied. In this sense, Theorem 1.2.1 builds on the “no-improving action
switches” inequalities, which were originally derived by Caplin and Martin (2015) to analyze
choice behavior over state-dependent lotteries. The potential outcome vector and characteristics
of each individual can be interpreted as a payoff-relevant state that is partially observed by the
decision maker. The decision maker’s treatment choice can therefore be interpreted as a choice
between state-dependent lotteries over her utility payoffs Upc, ⃗y; wq. By incorporating the missing
problems.
The proof of sufficiency for Theorem 1.2.1 shows that if the revealed preference inequalities (1.1)
utility maximization model that satisfies Data Consistency (Definition 1.2.3).21 I construct a
likelihood function for the private information QV p ¨ | ⃗y, w, xq such that if the decision maker
finds it optimal to select choice C “ 0, C “ 1, then the decision maker’s posterior beliefs Q⃗Y p ¨ |
21 While its proof is constructive, Theorem 1.2.1 can also be established through Bergemann and Morris (2016)’s
equivalence result between the set of Bayesian Nash Equilibrium and the set of Bayes Correlated Equilibrium in
incomplete information games, where the potential outcome vector ⃗ Y and the characteristics pW, Xq are the state, the
initial information structure is the null information structure, the private information V is the augmenting signal
structure, and applying Data Consistency (Definition 1.2.3) on the equilibrium conditions.
21
respectively. By this construction, these choice-dependent potential outcome probabilities are
Bayes-plausible posterior beliefs with respect to some conditional distribution for the potential
outcomes given the characteristics in the identified set H P pP⃗Y p ¨ | w, xq; Bw,x q. The revealed
preference inequalities (1.1) imply that this construction satisfies Expected Utility Maximization,
and additional work remains to show that it also satisfies Data Consistency (Definition 1.2.3). In
this sense, the choice-dependent potential outcome probabilities summarize the decision maker’s
posterior beliefs under any distribution of private information. The researcher’s assumptions
about the missing data therefore restrict the possible informativeness of the decision maker’s
private information.22
In this section, I apply Theorem 1.2.1 to characterize the testable implications of expected utility
maximization in screening decisions with a binary outcome, such as pretrial release and medical
testing, under various assumptions on the decision maker’s utility function and the missing data
problem. Testing these restrictions is equivalent to testing many moment inequalities, and I discuss
how supervised machine learning based methods may be used to reduce the dimensionality of
In a screening decision, the potential outcome under the decision maker’s choice C “ 0 satisfies
Y0 ” 0 and Y1 :“ Y ˚ is a latent outcome that is revealed whenever the decision maker selected
choice C “ 1. For exposition, I further assume that the latent outcome is binary Y “ t0, 1u as in
the motivating applications of pretrial release and medical testing. Appendix A.3.1 extends these
Focusing on a screening decision with a binary outcome simplifies the setting in Section
1.2. The bounds on the choice-dependent latent outcome probabilities given choice C “ 0 are
an interval for the conditional probability of Y ˚ “ 1 given choice C “ 0 with B0,w,x “ rPY˚ p1 |
22 InSupplement A.7.1, I formally show that the researchers’ bounds on the missing data (Assumption 1.2.1) restrict
the average informativeness of the decision maker’s private information in a screening decision.
22
0, w, xq, PY˚ p1 | 0, w, xqs for all pw, xq P W ˆ X . The bounds given choice C “ 1 are the point
identified conditional probability of Y ˚ “ 1 given choice C “ 1 with B1,w,x “ tPY˚ p1 | 1, w, xqu for
all pw, xq P W ˆ X . Finally, it is without loss of generality to normalize two entries of the utility
function Upc, y˚ ; wq, and so I normalize Up0, 1; wq “ 0, Up1, 0; wq “ 0 for all w P W .23
I derive conditions under which the decision maker’s choices are consistent with expected
utility maximization at strict preferences, meaning the decision maker is assumed to strictly prefer
a unique choice at each latent outcome. This rules out trivial cases such as complete indifference.
Definition 1.3.1 (Strict Preferences). The utility functions U P U satisfy strict preferences if Up0, 0; wq ă
In the pretrial release example, focusing on strict preference utility functions means that the
researcher is willing to assume it is always costly for the judge to either detain a defendant (C “ 0)
that would not commit pretrial misconduct (Y ˚ “ 0) or release a defendant (C “ 1) that would
By applying Theorem 1.2.1, I characterize the conditions under which the decision maker’s
choices in a screening decision with a binary outcome are consistent with expected utility
maximization at some strict preference utility function and private information. For each w P W ,
Theorem 1.3.1. Consider a screening decision with a binary outcome. Assume PY˚ p1 | 1, w, xq ă 1 for
all pw, xq P W ˆ X with π1 pw, xq ą 0. The decision maker’s choices are consistent with expected utility
maximization at some strict preference utility function if and only if, for all w P W ,
Otherwise, H P pU; B q “ H, and the decision maker is making detectable prediction mistakes based on the
observable characteristics.
Corollary 1.3.1. The identified set of strict preference utility functions H P pU; B q equals the set of all
23
utility functions satisfying, for all w P W , Up0, 0; wq ă 0, Up1, 1; wq ă 0 and
Up0, 0; wq
max PY˚ p1 | 1, w, xq ď ď min PY˚ p1 | 0, w, xq.
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq xPX 0 pwq
In a screening decision with a binary outcome, expected utility maximization at strict preferences
requires the decision maker to make choices according to an incomplete threshold rule based
on their posterior beliefs given the characteristics and private information. The threshold only
depends on the characteristics W, and it is incomplete since it takes no stand on how possible
indifferences are resolved. Theorem 1.2.1 establishes that the choice-dependent latent outcome
probabilities summarize all possible posterior beliefs under the expected utility maximization
model. Applying an incomplete threshold rule to posterior beliefs under the expected utility
choice-dependent latent outcome probabilities. Theorem 1.3.1 formalizes this argument, and
checks whether there exists some value of the unobservable choice-dependent latent outcome
probabilities that are consistent with the researcher’s bounds (Assumption 1.2.1) and would
reproduce the decision maker’s observed choices under such a threshold rule.
If the conditions in Theorem 1.3.1 are violated, there exists no strict preference utility function
nor private information such that the decision maker’s choices are consistent with expected utility
maximization at accurate beliefs. By examining cases in which these conditions are satisfied, I
next characterize conditions under which we cannot identify whether the decision maker makes
First, Theorem 1.3.1 highlights the necessity of placing an exclusion restriction on which
observable characteristics directly affect the decision maker’s utility function. If all observable
characteristics directly affect the decision maker’s utility function (i.e., X “ H), then the decision
maker’s choices are consistent with expected utility maximization whenever the researcher
Corollary 1.3.2. Under the same conditions as Theorem 1.3.1, suppose X “ H and all observable
characteristics therefore directly affect the decision maker’s utility function. If PY˚ p1 | 1, wq ď PY˚ p1 | 0, wq
for all w P W , then the decision maker’s choices are consistent with expected utility maximization at some
24
This negative result arises because a characteristic-dependent threshold can always be constructed
that rationalizes the decision maker’s observed choices if the researcher’s assumptions allow the
C “ 1 for all characteristics. In this case, the decision maker’s observed choices are consistent with
expected utility maximization at a strict preference utility function that varies richly across the
characteristics w P W . If the researcher suspects that the decision maker observes useful private
information, then imposing an exclusion restriction that some observable characteristics do not
directly enter into the utility function is necessary for identifying prediction mistakes.
Unfortunately, imposing such an exclusion restriction is not alone sufficient to restore the
testability of expected utility maximization. The researcher must still address the missing data
problem.
Corollary 1.3.3. Under the same conditions as Theorem 1.3.1, if PY˚ p1 | 0, w, xq “ 1 for all pw, xq P
W ˆ X , then the decision maker’s choices are consistent with expected utility maximization at some strict
preferences.
the decision maker’s choices may always be rationalized by the extreme case in which the decision
maker’s private information is perfectly predictive of the unknown outcome (i.e., PY˚ p1 | 0, w, xq “
Corollaries 1.3.2-1.3.3 highlight that testing expected utility maximization, and therefore
detectable prediction mistakes, requires both behavioral assumptions on which characteristics may
directly affect the decision maker’s utility function and econometric assumptions that generate
such assumptions, Theorem 1.3.1 provides interpretable conditions to test for detectable prediction
mistakes. At any fixed w P W , does there exist some x P X such that the largest possible
given C “ 1 at some other x1 P X ? If so, then the decision maker cannot be maximizing expected
utility as the decision maker could do strictly better by raising their probability of selecting choice
C “ 0 at x1 and lowering their probability of selecting choice C “ 1 at x. Theorem 1.3.1 shows that
these “misranking” arguments are necessary and sufficient to test the joint null hypothesis that
25
the decision maker’s choices are consistent with expected utility maximization at accurate beliefs
Example 1.3.1 (Pretrial Release). Theorem 1.3.1 requires that, holding fixed the defendant characteristics
that directly affect the judge’s utility function, all detained defendant must have a higher worst-case
probability of committing pretrial misconduct than any released defendant. Suppose the researcher assumes
that the judge may engage in taste-based discrimination based on the defendant race W, but the judge’s
utility function is unaffected by remaining defendant characteristics X (Arnold et al., 2018, 2020b). To test
for detectable prediction mistakes, the researcher must check, among defendants of the same race, whether
there exists some group of released defendants with a higher pretrial misconduct than the worst-case pretrial
misconduct rate of some group of detained defendants among defendants. In this case, the judge must be
misranking the probability of pretrial misconduct, and their choices are inconsistent with expected utility
Remark 1.3.1 (Connection to Marginal Outcome Tests and Inframarginality). In a screening decision
with a binary outcome, my analysis complements recent results in Canay et al. (2020), Gelbach (2021),
and Hull (2021), which use an extended Roy model to explore the validity of marginal outcome tests for
taste-based discrimination. This literature exploits the continuity of private information to derive testable
implications of extended Roy selection in terms of underlying marginal treatment effect functions, which
requires that the researcher identify the conditional expectation of the outcome at each possible “marginal”
decision. In contrast, Theorem 1.3.1 involves only point identified and partially identified choice-dependent
latent outcome probabilities. Common sources of quasi-experimental variation lead to informative bounds
on these quantities without additional assumptions such as monotonicity or functional form restrictions
Remark 1.3.2 (Approximate Expected Utility Maximization). This identification analysis assumes
that the decision maker fully maximizes expected utility. The decision maker, however, may face cognitive
constraints that prevent them from doing so completely. Appendix A.3.2 considers a generalization in which
the decision maker “approximately” maximizes expected utility, meaning that the decision maker’s choices
must only be within ϵw ě 0 of optimal. The decision maker is boundedly rational in this sense, and selects a
choice to “satisfice” expected utility (Simon, 1955, 1956). Expected utility maximization (Definition 1.2.3)
is the special case with ϵw “ 0, and any behavior is rationalizable as approximately maximizing expected
26
utility for ϵw sufficiently large. The smallest, rationalizing parameter ϵw ě 0 summarizes how far from
optimal are the decision maker’s observed choices, and I show that researchers can conduct inference on this
object. ■
Suppose there is a randomly assigned instrument that generates variation in the decision maker’s
choice probabilities in a screening decision. Such instruments commonly arise, for example,
through the random assignment of decision makers to individuals. Judges are randomly assigned
to defendants in pretrial release (Kling, 2006; Dobbie et al., 2018; Arnold et al., 2018; Kleinberg
et al., 2018a; Arnold et al., 2020b) and doctors may be randomly assigned to patients in medical
Assumption 1.3.1 (Random Instrument). Let Z P Z be a finite support instrument, and the joint
pw, x, zq P W ˆ X ˆ Z .
The goal is to construct bounds on the unobservable choice-dependent latent outcome probabilities
In the case where decision makers are randomly assigned to cases, this corresponds to constructing
Under Assumption 1.3.1, the unobservable choice-dependent latent outcome probabilities are
Proposition 1.3.1. Suppose Assumption 1.3.1 holds and consider a screening decision with a binary
outcome. Then, for any pw, x, zq P W ˆ X ˆ Z with π0 pw, x, zq ą 0, H P pPY˚ p ¨ | 0, w, x, zqq “ rPY˚ p1 |
24 There are other examples of instruments in empirical research. For example, Mullainathan and Obermeyer (2021)
argue that doctors are less likely to conduct stress tests for a heart attack on Fridays and Saturdays due to weekend
staffing constraints, even though patients that arrive on these days are no less risky. The introduction of or changes to
recommended guidelines may also affect decision makers’ choices (Albright, 2019; Abaluck et al., 2020).
27
" *
PY˚ p1 | w, xq ´ PC,Y˚ p1, 1 | w, x, zq
PY˚ p1 | 0, w, x, zq “ min ,1 ,
π0 pw, x, zq
and PY˚ p1 | w, xq “ maxz̃PZ tPC,Y˚ p1, 1 | w, x, z̃qu, PY˚ p1 | w, xq “ minz̃PZ tπ0 pw, x, z̃q ` PC,Y˚ p1, 1 |
w, x, z̃qu.
The bounds in Proposition 1.3.1 follow from worst-case bounds on PY˚ p1 | w, xq (e.g., Manski,
1989, 1994) and point identification of PC,Y˚ p1, 1 | w, x, zq, π0 pw, x, zq.25,26 For a fixed value z P Z ,
the researcher may therefore apply the identification results derived in Section 1.3.1 by defining
assigned conditional on some additional characteristics, which will be used in the empirical
application to pretrial release decisions in New York City. Appendix A.4.2 extends these instrument
Under the expected utility maximization model, Assumption 1.3.1 requires that the decision
maker’s beliefs about the latent outcome given the observable characteristics do not depend on
the instrument.
Proposition 1.3.2. Suppose Assumption 1.3.1 holds. If the decision maker’s choices are consistent with
expected utility maximization at some utility function U and joint distribution pW, X, Z, V, C, Y ˚ q „ Q,
then Y ˚ KK Z | W, X under Q.
This is a consequence of Data Consistency in Definition 1.2.3. Requiring that the decision maker’s
beliefs about the outcome given the characteristics be accurate imposes that the instrument cannot
affect their beliefs about the outcome given the observable characteristics if it is randomly assigned.
Aside from this restriction, Assumption 1.3.1 places no further behavioral restrictions on the
expected utility maximization model. Consider the pretrial release setting in which the instrument
arises through the random assignment of judges, meaning Z P Z refers to a judge identifier.
25 An active literature focuses on the concern that the monotonicity assumption in Imbens and Angrist (1994) may
be violated in settings where decision makers are randomly assigned to individuals. de Chaisemartin (2017) and
Frandsen et al. (2019) develop weaker notions of monotonicity for these settings. Proposition 1.3.1 imposes no form of
monotonicity.
26 Lakkaraju et al. (2017) and Kleinberg et al. (2018a) use the random assignment of decision makers to evaluate a
statistical decision rule C̃ by imputing its true positive rate PpY ˚ “ 1 | C̃ “ 1q. In contrast, Proposition 1.3.1 constructs
bounds on a decision maker’s conditional choice-dependent latent outcome probabilities, PY˚ p1 | 0, w, xq.
28
Proposition 1.3.2 implies that if all judges make choices as-if they are maximizing expected utility
at accurate beliefs given defendant characteristics, then all judges must have the same beliefs
about the probability of pretrial misconduct given defendant characteristics. Judges may still
Remark 1.3.3 (Other empirical strategies for constructing bounds). Supplement A.7 discusses two
additional empirical strategies for constructing bounds on the unobservable choice-dependent latent outcome
probabilities. First, researchers may use the observable choice-dependent latent outcome probabilities to
directly bound the unobservable choice-dependent latent outcome probabilities. Second, the researcher may
use a “proxy outcome,” which does not suffer the missing data problem and is correlated with the latent
outcome. For example, researchers often use future health outcomes as a proxy for whether patients had a
particular underlying condition at the time of the medical testing or diagnostic decision (Chan et al., 2021;
Testing whether the decision maker’s choices in a screening decision with a binary outcome are
consistent with expected utility maximization at strict preferences (Theorem 1.3.1) reduces to
Proposition 1.3.3. Consider a screening decision with a binary outcome. Suppose Assumption 1.3.1
holds, and 0 ă πc pw, x, zq ă 1 for all pw, x, zq P W ˆ X ˆ Z . The decision maker’s choices at z P Z are
consistent with expected utility maximization at some strict preference utility function if and only if for all
w P W , pairs x, x̃ P X and z̃ P Z
The number of moment inequalities is equal to dw ¨ d2x ¨ pdz ´ 1q, and grows with the number
of support points of the characteristics and instruments. In empirical applications, this will be
quite large since the characteristics of individuals are both finite-valued and extremely rich. A
key challenge, therefore, in directly testing this sharp characterization is that the number of
29
To deal with this challenge, the number of moment inequalities may be reduced by testing
implied revealed preference inequalities over any partition of the excluded characteristics. For
the excluded characteristics x P X into level sets tx : Dw pxq “ du. By iterated expectations, if
the decision maker’s choices are consistent with expected utility maximization at some strict
preference utility function, then their choices must satisfy implied revealed preference inequalities.
W “ w, Dw pXq “ dq.
Corollary 1.3.4. Consider a screening decision with a binary outcome, and assume PY˚ p1 | 1, w, xq ă 1 for
all pw, xq P W ˆ X with π1 pw, xq ą 0. If the decision maker’s choices are consistent with expected utility
In practice, this can drastically reduce the number of moment inequalities that must be tested.
1.3.4 using procedures that are valid in low-dimensional settings, which is a mature literature in
econometrics (Canay and Shaikh, 2017; Ho and Rosen, 2017; Molinari, 2020).
A natural choice is to construct the partitioning functions Dw p¨q using supervised machine
learning methods that predict the outcome on a set of held-out decisions. Suppose the researcher
estimates the prediction function fˆ : W ˆ X Ñ r0, 1s on a set of held-out decisions. Given the
estimated prediction function, the researcher may define Dw pxq by binning the characteristics
X into percentiles of predicted risk within each value w P W . The resulting implied revealed
preference inequalities search for misrankings in the decision maker’s choices across percentiles
of predicted risk. Alternatively, there may already exist a benchmark risk score. In pretrial
release systems, the widely-used Public Safety Assessment summarizes observable defendant
characteristics into an integer-valued risk score (e.g., Stevenson, 2018; Albright, 2019). In medical
testing or treatment decisions, commonly used risk assessments summarize observable patient
characteristics into an integer-valued risk score (e.g., Obermeyer and Emanuel, 2016; Lakkaraju and
30
Rudin, 2017). The researcher may therefore define the partition Dw pxq to be level sets associated
Corollary 1.3.4 clarifies how supervised machine learning methods can be used to formally
test for prediction mistakes in the large class of screening decisions. In existing empirical work,
researchers directly compare the recommended choices of an estimated decision rule against the
choices of a human decision maker. See, for example, Meehl (1954), Dawes et al. (1989), Grove
et al. (2000) in psychology, Kleinberg et al. (2018a), Ribers and Ullrich (2019), Mullainathan and
Obermeyer (2021) in economics, and Chouldechova et al. (2018), Jung et al. (2020a) in computer
science. Such comparisons rely on strong assumptions that restrict preferences to be constant
across decisions and decision makers (Kleinberg et al., 2018a; Mullainathan and Obermeyer, 2021)
or that observed choices were as-good-as randomly assigned given the characteristics (Lakkaraju
and Rudin, 2017; Chouldechova et al., 2018; Jung et al., 2020a). In contrast, I show that estimated
prediction models should not be used for such direct comparisons, but instead be used to construct
low-dimensional partitions of the characteristics that are assumed to not directly affect preferences.
Given such a partition, checking whether the implied revealed preference inequalities in Corollary
1.3.4 are satisfied provides, to the best of my knowledge, the first microfounded procedure for
using out-of-sample prediction to test whether the decision maker’s choices are consistent with
Appendix A.4.2 extends these results to treatment decisions, showing that testing whether the
decision maker’s choices are consistent with expected utility maximization in a treatment decision
reduces to testing a system of moment inequalities with nuisance parameters that enter linearly.
So far, I have shown that researchers can test whether a decision maker’s choices are consistent
with expected utility maximization at accurate beliefs about the outcome, and therefore whether
the decision maker is making detectable prediction mistakes. By modifying the expected utility
maximization model and the revealed preference inequalities, researchers can further investigate
the ways in which the decision maker’s predictions are systematically biased.
31
1.4.1 Expected Utility Maximization at Inaccurate Beliefs
The definition of expected utility maximization (Definition 1.2.3) implied that the decision maker
acted as-if their implied beliefs about the outcome given the characteristics were accurate (Lemma
1.2.1). As a result, the revealed preference inequalities may be violated if the decision maker
acted as-if they maximized expected utility based on inaccurate beliefs, meaning that their implied
beliefs do not lie in the identified set for the distribution of the outcome given the characteristics
H P pP⃗Y p ¨ | w, xq; Bw,x q. This is a common behavioral hypothesis in empirical applications. Empiri-
cal researchers conjecture that judges may systematically mis-predict failure to appear risk based
on defendant characteristics, and the same concern arises in analyses of medical diagnosis and
treatment decisions.27
To investigate whether the decision maker’s choices maximize expected utility at inaccurate
Definition 1.4.1. The decision maker’s choices are consistent with expected utility maximization at
inaccurate beliefs if there exists some utility function U P U and joint distribution pW, X, V, C, ⃗
Yq „ Q
iii. Data Consistency with Inaccurate Beliefs: For all pw, xq P W ˆ X , there exists Pr⃗Y p ¨ | 0, w, xq P
B0,w,x and Pr⃗Y p ¨ | 1, w, xq P B1,w,x such that for all ⃗y P Y 2 and c P t0, 1u
where Pr⃗Y p⃗y | w, xq “ Pr⃗Y p⃗y | 0, w, xqπ0 pw, xq ` Pr⃗Y p⃗y | 1, w, xqπ1 pw, xq.
the joint distribution of the observable data pW, X, C, Yq „ P if the decision maker’s model-implied
beliefs given the characteristics, Q⃗Y p ¨ | w, xq, are replaced with some marginal distribution of the
outcome given the characteristics that lies in the identified set, Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q.
This imposes that the decision maker is acting as-if they correctly specified the likelihood of
27 Kleinberg et al. (2018a) write, “a primary source of error is that all quintiles of judges misuse the signal available
in defendant characteristics available in our data” (pg. 282-283). In the medical treatment setting, Currie and Macleod
(2017) write, “we are concerned with doctors, who for a variety of possible reasons, do not make the best use of the
publicly available information at their disposal to make good decisions” (pg. 5).
32
prior beliefs about the outcome given the characteristics. Notice that Definition 1.4.1 places
no restrictions on the decision maker’s implied prior beliefs Q⃗Y p ¨ | w, xq, so behavior that is
consistent with expected utility maximization at inaccurate beliefs could arise from several
behavioral hypotheses. The decision maker’s implied prior beliefs may, for instance, be inaccurate
due to inattention to the characteristics (e.g., Sims, 2003; Gabaix, 2014; Caplin and Dean, 2015) or
use of representativeness heuristics (e.g., Gennaioli and Shleifer, 2010; Bordalo et al., 2016, 2021).
The next result characterizes whether the decision maker’s observed choices are consistent
with expected utility maximization at inaccurate beliefs in terms of modified revealed preference
inequalities.
Theorem 1.4.1. Assume Pr⃗Y p ¨ | w, xq ą 0 for all Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q and all
pw, xq P W ˆ X . The decision maker’s choices are consistent with expected utility maximization at
inaccurate beliefs if and only if there exists a utility function U P U , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and
Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X , and non-negative weights ωp⃗y; w, xq satisfying
where EPrr¨s is the expectation under the joint distribution under pW, X, C, ⃗
Yq „ Pr defined as Ppw,
r x, c, ⃗yq “
These modified inequalities ask whether revealed preference inequalities are satisfied at a
reweighed utility function, where the weights ωp⃗y; w, xq are the likelihood ratio of the decision
maker’s implied prior beliefs relative to some conditional distribution of the potential outcomes
given the characteristics in the identified set. Since the decision maker’s prediction mistakes only
arise from misspecification of prior beliefs Q⃗Y p ¨ | w, xq, her posterior beliefs under the model are
proportional to the likelihood ratio between her prior beliefs and the underlying potential outcome
distribution. These likelihood ratio weights can be interpreted as a modification of the decision
maker’s utility function since expected utility is linear in beliefs and utility. This suggests that
33
expected utility maximization at inaccurate beliefs is equivalent to expected utility maximization at
accurate beliefs and preferences that are summarized by this reweighed utility function. Theorem
1.4.1 shows that this intuition completely characterizes expected utility maximization at inaccurate
beliefs.
Theorem 1.4.1 implies that researchers can bound the extent to which the decision maker’s prior
screening decision with a binary outcome.28 This enables researchers to report interpretable
bounds on the extent to which the decision maker’s predictions are systematically biased.
As a first step, Theorem 1.4.1 implies bounds on the decision maker’s reweighted utility
Theorem 1.4.2. Consider a binary screening decision, and assume 0 ă π1 pw, xq ă 1 for all pw, xq P
W ˆ X . Suppose the decision maker’s choices are consistent with expected utility maximization at inaccurate
beliefs and some strict preference utility function Upc, y˚ ; wq at PrY˚ p ¨ | w, xq P H P pPY˚ p ¨ | w, xq; Bw,x q
satisfying 0 ă PrY˚ p1 | w, xq ă 1 for all pw, xq P W ˆ X . Then, there exists non-negative weights
ωp0; w, xqUp0, 0; wq
PY˚ p1 | 1, w, xq ď ď PY˚ p1 | 0, w, xq, (1.2)
ωp0; w, xqUp0, 0; wq ` ωp1; w, xqUp1, 1; wq
where ωpy˚ ; w, xq “ QY˚ py˚ | w, x, q{ PrY˚ py˚ | w, xq and QY˚ py˚ | w, xq, PrY˚ py˚ | w, xq are defined in
Definition 1.4.1.
That is, in a screening decision with a binary outcome, expected utility maximization at
choice-dependent latent outcome probabilities, where the threshold now depends on the decision
maker’s reweighed utility function. This result may be exploited to derive an identified set on the
extent to which the decision maker overreacts or underreacts to variation in the characteristics.
28 These results generalize to multi-valued outcomes over the class of binary-valued utility functions. That is, for some
known Ỹ Ď Y , define Ỹ ˚ “ 1tY ˚ P Ỹ u. The class of two-valued utility functions take the form upc, y˚ ; wq :“ upc, ỹ; wq
and satisfy strict preferences. Over this class of utility functions, the decision maker faces a screening decision with a
binary outcome.
34
QY˚ p1|w,xq{QY˚ p0|w,xq
Define δpw, xq :“ to be the relative odds ratio of the unknown outcome
PrY˚ p1|w,xq{ PrY˚ p0|w,xq
under the decision maker’s implied beliefs relative to the true conditional distribution, and
ωp0;w,xqUp0,0;wq
τpw, xq :“ ωp0;w,xqUp0,0;wq`ωp1;w,xqUp1,1;wq to be the decision maker’s reweighed utility threshold. If
the reweighed utility threshold were known, then the decision maker’s implied prediction mistake
or any w P W , x, x1 P X .
δpw,xq
The ratio δpw,x1 q summarizes the extent to which the decision maker’s implied beliefs about
the outcome overreact or underreact to variation in the characteristics relative to the true condi-
δpw,xq QY˚ p1|w,xq{QY˚ p0|w,xq
tional distribution. In particular, δpw,x1 q may be rewritten as the ratio of QY˚ p1|w,x1 q{QY˚ p0|w,x1 q to
PrY˚ p1|w,xq{ PrY˚ p0|w,xq
. The first term summarizes how the odds ratio of Y ˚ “ 1 relative to Y ˚ “ 0
PrY˚ p1|w,x1 q{ PrY˚ p0|w,x1 q
varies across pw, xq and pw, x1 q under the decision maker’s implied beliefs and the second term
δpw,xq
summarizes how the true odds ratio varies across the same characteristics. If δpw,x1 q is less than
one, then the decision maker’s implied beliefs about the relative probability of Y ˚ “ 1 versus
Y ˚ “ 0 react less to variation across the characteristics pw, xq and pw, x1 q than the true distribution.
In this sense, their implied beliefs are underreacting across these characteristics. Analogously if
δpw,xq
δpw,x1 q is strictly greater than one, then the decision maker’s implied beliefs about the relative
probability of Y ˚ “ 1 versus Y ˚ “ 0 are overreacting across the characteristics pw, xq and pw, x1 q.
Since Theorem 1.4.2 provides an identified set for the reweighted utility thresholds, an identified
δpw,xq
set for the implied prediction mistake δpw,x1 q can in turn be constructed by computing the ratio
(1.3) for each pair τpw, xq, τpw, x1 q that satisfies (1.2).
δpw,xq
The implied prediction mistake δpw,x1 q summarizes how relative changes in the decision maker’s
beliefs across the characteristics pw, xq, pw, x1 q compare to relative changes in the underlying
distribution of outcomes. A particular value of the implied prediction mistake could therefore be
consistent with multiple values of the decision maker’s perceptions of risk QY˚ p1 | w, xq, QY˚ p1 |
w, x1 q. As an example, suppose that PrY˚ p1 | w, xq “ 4{5, PrY˚ p1 | w, x1 q “ 1{5 and the decision
maker’s perceptions of baseline risk are QY˚ p1 | w, xq “ 2{3, QY˚ p1 | w, x1 q “ 1{3. The true odds
35
PrY˚ p1|w,xq PrY˚ p1|w,x1 q
ratio of Y ˚ “ 1 relative to Y ˚ “ 0 at the characteristics are “ 4, “ 1{4. The
Pr ˚ p0|w,xq
Y Pr ˚ p0|w,x1 q
Y
QY˚ p1|w,xq QY ˚p1|w,x1 q
decision maker’s perceived odds ratio are QY˚ p0|w,xq “ 2, QY˚ p0|w,x1 q “ 1{2. In this case, the decision
δpw,xq
maker’s implied prediction mistake δpw,x1 q equals 1{4. If instead the decision maker’s perceptions
of baseline risk were QY˚ p1 | w, xq “ 3{4, QY˚ p1 | w, x1 q “ 3{7, then the decision maker’s perceived
odds ratios would equal 3, 3{4 at characteristics pw, xq, pw, x1 q respectively. Even though the
decision maker’s perceptions of baseline risk are different, the implied prediction mistake again
equals 1{4. In both cases, the decision maker’s perceptions across characteristics pw, xq, pw, x1 q
underreact relative to the true variation in risk. The true odds ratio at pw, xq is 16 times larger
than the true odds ratio at pw, x1 q, but the decision maker perceives it to be only 4 times larger.
Finally, these bounds on the implied prediction mistake are obtained only by assuming that
the decision maker’s utility function satisfy the researcher’s conjectured exclusion restriction
and strict preferences. Under such an exclusion restriction, variation in the decision maker’s
choices across excluded characteristics must only arise due to variation in beliefs. Examining
how choice-dependent latent outcome probabilities vary across characteristics relative to the
decision maker’s choices is therefore informative about the decision maker’s systematic prediction
mistakes. In this sense, preference exclusion restrictions are sufficient to partially identify the
extent to which variation in the decision maker’s implied beliefs are biased.29
After applying the dimension reduction strategy in Section 1.3.3, the implied prediction
mistake across values Dw pXq “ d, Dw pXq “ d1 , denoted by δpw, dq{δpw, d1 q, measures how the
decision maker’s implied beliefs of their own ex-post mistakes vary relative to the true probability
of ex-post mistakes across values Dw pXq “ d, Dw pXq “ d1 . See Appendix A.3.3 for details.
As an empirical illustration, I apply this econometric framework to analyze the pretrial release
decisions of judges in New York City, which is a leading example of a high-stakes screening
29 Thisresult relates to Proposition 1 in Martin and Marx (2021), which shows that utilities and prior beliefs are
not separately identified in state-dependent stochastic choice environments (see also Bohren et al. (2020)). Their result
arises because the authors focus on settings in which there are no additional characteristics of decisions beyond those
which affect both utility and beliefs.
36
decision.30 I find that at least 20% of judges in New York City make detectable prediction mistakes
in their pretrial release decisions. Under various exclusion restrictions on their utility functions,
their pretrial release decisions are inconsistent with expected utility maximization at accurate
beliefs about failure to appear risk given defendant characteristics. These systematic prediction
mistakes arise because judges fail to distinguish between predictably low risk and predictably
high risk defendants. Rejections of expected utility maximization at accurate beliefs are driven
primarily by release decisions on defendants at the tails of the predicted risk distribution.
I focus on the pretrial release system in New York City, which has been previously studied in
Leslie and Pope (2017), Kleinberg et al. (2018a) and Arnold et al. (2020b). The New York City
pretrial system is an ideal setting to apply this econometric framework in three important ways.
First, as discussed in Kleinberg et al. (2018a), the New York City pretrial system narrowly asks
judges to only consider failure to appear risk, not new criminal activity or public safety risk,
in deciding whether to release a defendant. The latent outcome Y ˚ is therefore whether the
defendant would fail to appear in court if released. Section 1.5.4 reports the robustness of my
empirical findings to other choices of outcome Y ˚ . Second, the pretrial release system in New
York City is one of the largest in the country, and consequently I observe many judges making a
large number of pretrial release decisions. Finally, judges in New York City are quasi-randomly
assigned to cases within court-by-time cells, which implies bounds on the conditional failure to
I observe the universe of all arrests made in New York City between November 1, 2008 and
November 1, 2013. This contains information on 1,460,462 cases, of which 758,027 cases were
subject to a pretrial release decision.31 I apply additional sample restrictions to construct the main
30 Because the data are sensitive and only available through an official data sharing agreement with the New York
court system, I conducted this empirical analysis in conjunction with the University of Chicago Crime Lab (Rambachan
and Ludwig, 2021).
31 To
construct the set of cases that were subject to a pretrial release decision, I apply the sample restrictions used in
Kleinberg et al. (2018a). This removes (i) removes desk appearance tickets, (ii) cases that were disposed at arraignment,
37
estimation sample, which consists of 569,256 cases heard by 265 unique judges.32,33 My empirical
analysis tests whether each of the top 25 judges that heard the most cases make detectable
prediction mistakes about failure to appear risk in their pretrial release decisions. These top 25
judges heard 243,118 cases in the main estimation sample. Each judge heard at least 5,000 cases in
For each case, I observe demographic information about the defendant such as their race,
gender, and age, information about the current charge filed against the defendant, the defendant’s
criminal record, and the defendant’s record of pretrial misconduct. I observe a unique identifier
for the judge assigned to each defendant, and whether the assigned judge released or detained
the defendant.34 If the defendant was released, I observe whether the defendant either failed to
Supplement Table A.2 provides descriptive statistics about the main estimation sample and
the cases heard by the top 25 judges, broken out by the race of the defendant. Overall, 72.0% of
defendants are released in the main estimation sample, whereas 73.6% of defendants assigned
to the top 25 judges were released. Defendants in the main estimation sample are similar on
demographic information and current charge information to defendants assigned to the top 25
judges. However, defendants assigned to the top 25 judges have less extensive prior criminal
records. Supplement Table A.3 reports the same descriptive statistics broken out by whether the
defendant was released or detained, revealing that judges respond to defendant characteristics in
their release decisions. Among defendants assigned to the top 25 judges, released and detained
and (iii) cases that were adjourned in contemplation of dismissal as well as duplicate cases.
32 Thefollowing cases are excluded: (i) cases involving non-white and non-black defendants; (ii) cases assigned to
judges with fewer than 100 cases; and (iii) cases heard in a court-by-time cell in which there were fewer than 100 cases
or only one unique judge, where a court-by-time cell is defined at the assigned courtroom by shift by day of week by
month by year level.
33 Supplement Tables A.5-A.6 compare the main estimation sample to the universe of 758,027 cases that were subject
to a pretrial release decision, broken out by the race of the defendant and by whether the defendant was released.
Cases in the main estimation sample have more severe charges and a lower release rate than the universe of cases
subject to a pretrial release decision.
34 Judges in New York City decide whether to release defendants without conditions (“release on recognizance”),
require that defendants post a chosen amount of bail, or deny bail altogether. Following prior empirical work on the
pretrial release setting (e.g., Arnold et al., 2018; Kleinberg et al., 2018a; Arnold et al., 2020b), I collapse these choices into
the binary decision of whether to release (either release on recognizance or set a bail among that the defendant pays) or
detain (either set a bail amount that the defendant does not pay or deny bail altogether). I report the robustness of my
findings to this choice in Section 1.5.4.
38
defendants differ demographically: 50.8% of released defendants are white and 19.7% are female,
whereas only 40.7% of detained defendants are white and only 10.6% are female. Released and
detained defendants also differ on their current charge and criminal record. For example, only
28.8% of defendants released by the top 25 judges face a felony charge, yet 58.6% of detained
I test whether the observed release decisions of judges in New York City maximize expected utility
at accurate beliefs about failure to appear risk given defendant characteristics as well as some
private information under various exclusion restrictions. I test whether the revealed preference
inequalities are satisfied assuming that either (i) no defendant characteristics, (ii) the defendant’s
race, (iii) the defendant’s race and age, or (iv) the defendant’s race and whether the defendant
was charged with a felony offense directly affect the judges’ utility function. I discretize age into
young and older defendants, where older defendants are those older than 25 years.
As a first step, I construct a partition of the excluded characteristics X P X to reduce the number
of moment inequalities as described in Section 1.3.3. I predict failure to appear Y ˚ P t0, 1u among
defendants released by all other judges within each value of the payoff-relevant characteristics
W P W , which are defined as either race-by-age cells or race-by-felony charge cells. The prediction
function is an ensemble that averages the predictions of an elastic net model and a random
forest.35 Over defendants released by the top 25 judges, the ensemble model achieves an area
under the receiver operating characteristic (ROC) curve, or AUC, of 0.693 when the payoff-relevant
characteristics are defined as race-by-age cells and an AUC of 0.694 when the payoff relevant
characteristics are defined as race-by-felony charge cells (see Supplement Figure A.9). Both
ensemble models achieve similar performance on released black and white defendants.
35 I
use three-fold cross-validation to tune the penalties for the elastic net model. The random forest is constructed
using the R package ranger at the default hyperparameter values (Wright and Ziegler, 2017).
39
Constructing Bounds through the Quasi-Random Assignment of Judges
Judges in New York City are quasi-randomly assigned to cases within court-by-time cells defined
at the assigned courtroom by shift by day of week by month by year level.36 To verify quasi-
random assignment, I conduct balance checks that regress a measure of judge leniency on a
rich set of defendant characteristics as well as court-by-time fixed effects that control for the
level at which judges are as-if randomly assigned to cases. I measure judge leniency using the
leave-one-out release rate among all other defendants assigned to a particular judge (Dobbie et al.,
2018; Arnold et al., 2018, 2020b). I conduct these balance checks pooling together all defendants
and separately within each payoff-relevant characteristic cell (defined by race-by-age cells or
race-by-felony charge cells), reporting the coefficient estimates in Supplement Tables A.7-A.9. In
each subsample, the estimated coefficients are economically small in magnitude. A joint F-test
fails to reject the null hypothesis of quasi-random assignment for the pooled main estimation
sample and for all subsamples, except for young black defendants.
I use the quasi-random assignment of judges to construct bounds on the unobservable failure to
appear rate among defendants detained by each judge in the top 25. I group judges into quintiles
of leniency based on the constructed leniency measure, and define the instrument Z P Z to be the
leniency quintile of the assigned judge. Applying the results in Appendix A.4.1, the bound on
the failure to appear rate among defendants with W “ w, Dw pXq “ d for a particular judge using
T P T denotes the court-by-time cells and the expectation averages over all cases assigned to this
c,y˚
ÿ
1tC “ 1, Y ˚ “ 1u “ β w,d,z 1tW “ w, Dw pXq “ d, Z “ zu ` ϕt ` ϵ (1.4)
w,d,z
ÿ
1tC “ 0u “ βcw,d,z 1tW “ w, Dw pXq “ d, Z “ zu ` ϕt ` ν, (1.5)
w,d,z
36 There are two relevant features of the pretrial release system in New York City that suggest judges are as-if
randomly assigned to cases (Leslie and Pope, 2017; Kleinberg et al., 2018a; Arnold et al., 2020b). First, bail judges are
assigned to shifts in each of the five county courthouses in New York City based on a rotation calendar system. Second,
there is limited scope for public defenders or prosecutors to influence which judge will decide any particular case.
40
over all cases in the main estimation sample, where ϕt are court-by-time fixed effects.37 I estimate
c,y˚
the relevant quantities by taking the estimated coefficients β̂cw,d,z̃ , β̂ w,d,z̃ and adding it to the average
of the respective fixed effects associated with cases heard by the judge within each W “ w,
Dw pXq “ d cell.
Figure 1.1: Observed failure to appear rate among released defendants and constructed bound on
the failure to appear rate among detained defendants by race-and-age cells for one judge in New
York City.
Notes: This figure plots the observed failure to appear rate among released defendants (orange, circles) and the bounds
on the failure to appear rate among detained defendants based on the judge leniency instrument (blue, triangles)
at each decile of predicted failure to appear risk and race-by-age cell for the judge that heard the most cases in the
main estimation sample. The judge leniency instrument Z P Z is defined as the assigned judge’s quintile of the
constructed, leave-one-out leniency measure. Judges in New York City are quasi-randomly assigned to defendants
within court-by-time cells. The bounds on the failure to appear rate among detained defendants (blue, triangles) are
constructed using the most lenient quintile of judges, and by applying the instrument bounds for a quasi-random
instrument (see Appendix A.4.1). Section 1.5.3 describes the estimation details for these bounds. Source: Rambachan
and Ludwig (2021).
Figure 1.1 plots the observed failure to appear rate among defendants released by the judge
that heard the most cases in the sample period, as well as the bounds on the failure to appear
rate among detained defendants associated with the most lenient quintile of judges. The bounds
are plotted at each decile of predicted risk for each race-by-age cell. Testing whether this judge’s
pretrial release decisions are consistent with expected utility maximization at accurate beliefs
37 Inestimating these fixed effects regressions, I follow the empirical specification in Arnold et al. (2020b), who
estimate analogous linear regressions to construct estimates of race-specific release rates and pretrial misconduct rates
among released defendants.
41
about failure to appear risk involves checking whether, holding fixed characteristics that directly
affect preferences, all released defendants have a lower observed probability of failing to appear in
court (orange, circles) than the estimated upper bound on the failure to appear rate of all detained
Appendix Figure A.1 plots the analogous quantities for each race-by-felony cell. Supplement
A.9 reports findings using an alternative empirical strategy for constructing bounds on the
conditional failure to appear rate among detained defendants, which constructs bounds on the
conditional failure to appear rate among detained defendants using the observed failure to appear
By constructing the observed failure to appear rate among released defendants and bounds on
the failure to appear rate among detained defendants as in Figure 1.1 for each judge in the top 25,
I test whether their release decisions satisfy the implied revealed preference inequalities across
deciles of predicted failure to appear risk (Corollary 1.3.4). I test the moment inequalities that
compare the observed failure to appear rate among released defendants in the top half of the
predicted failure to appear risk distribution against the bounds on the failure to appear rate
among detained defendants in the bottom half of the predicted failure to appear risk distribution.
The number of true rejections of these implied revealed preference inequalities provides a lower
bound on the number of judges whose choices are inconsistent with the joint null hypothesis
that they are maximizing expected utility at accurate beliefs about failure to appear risk and their
I construct the variance-covariance matrix of the observed failure to appear rates among
released defendants and the bounds on the failure to appear rate among detained defendants
risk decile Dw pXq and leniency instrument Z. I use the conditional least-favorable hybrid test
developed in Andrews et al. (2019) since it is computationally fast given estimates of the moments
Table 1.1 summarizes the results from testing the implied revealed preference inequalities
42
Table 1.1: Estimated lower bound on the fraction of judges whose release decisions are inconsistent
with expected utility maximization at accurate beliefs about failure to appear risk given defendant
characteristics.
for each judge in the top 25 under various exclusion restrictions. After correcting for multiple
hypothesis testing (controlling the family-wise error rate at the 5% level), the implied revealed
preference inequalities are rejected for at least 20% of judges. This provides a lower bound on
the number of true rejections of the implied revealed preference inequalities among the top 25
judges.38 For example, when both race and age are allowed to directly affect judges’ preferences,
violations of the implied revealed preference inequalities means that the observed release decisions
could not have been generated by any possible discrimination based on the defendant’s race and
Kleinberg et al. (2018a) analyze whether judges in New York City make prediction mistakes
in their pretrial release decisions by directly comparing the choices of all judges against those
that would be made by an estimated, machine learning based decision rule (i.e., what the authors
call a “reranking policy”). By comparing the choices of all judges against the model, Kleinberg
et al. (2018a) is limited to making statements about average decision making across judges under
two additional assumptions: first, that judges’ preferences do not vary based on observable
38 The number of rejections returned by a procedure that controls the family-wise error rate at the α-level provides
a valid 1 ´ α lower bound on the number of true rejections. Given j “ 1, . . . , m null hypotheses, let k be the
number of false null hypotheses and let k̂ be the number of rejections by a procedure that controls the family-wise
error rate at the α-level. Observe that Ppk̂ ď kq “ 1 ´ Ppk̂ ą kq. Since tk̂ ą ku Ď tat least one false rejectionu,
Ppk̂ ą kq ď Ppat least one false rejectionq, which implies Ppk̂ ď kq ě 1 ´ Ppat least one false rejectionq ě 1 ´ α.
43
characteristics, and second, that preferences do not vary across judges. In contrast, I conduct my
analysis judge-by-judge, allow each judge’s utility function to flexibly vary based on defendant
characteristics, and allow for unrestricted heterogeneity in utility functions across judges.
whether a defendant would fail to appear in court if they were released. If this is incorrectly
specified and a judge’s preferences depend on some other definition of pretrial misconduct, then
rejections of expected utility maximization may be driven by mis-specification of the outcome. For
example, even though the New York City pretrial release system explicitly asks judges to only
consider failure to appear risk, judges may additionally base their release decisions on whether a
defendant would be arrested for a new crime (Arnold et al., 2018; Kleinberg et al., 2018a; Arnold
et al., 2020b). I reconduct my empirical analysis defining the outcome to be whether the defendant
would either fail to appear in court or be re-arrested for a new crime (“any pre-trial misconduct”).
Appendix Table A.1 shows that, for this alternative definition of pretrial misconduct, the pretrial
release decisions of at least 64% of judges in New York City are inconsistent with expected utility
maximization at preferences that satisfy these conjectured exclusion restrictions and accurate
Robustness to Pretrial Release Definition: My empirical analysis collapsed the pretrial release
decision into a binary choice of either to release or detain a defendant. In practice, judges in New
York City decide whether to release a defendant “on recognizance,” meaning the defendant is
released automatically without bail conditions, or set monetary bail conditions. Consequently,
judges could be making two distinct prediction mistakes: first, judges may be systematically
mis-predicting failure to appear risk; and second, judges may be systematically mis-predicting the
In Supplement A.9.4, I redefine the judge’s choice to be whether to release the defendant on
recognizance and the outcome as both whether the defendant will pay the specified bail amount
and whether the defendant would fail to appear in court if released. Since the modified outcome
takes on multiple values, I use Theorem 1.2.1 to show that expected utility maximization at
44
accurate beliefs about bail payment ability and failure to appear risk can again characterized as a
system of moment inequalities (Proposition A.9.1). I find that at least 32% of judges in New York
City make decisions that are inconsistent with expected utility maximization at accurate beliefs
about the ability of defendants to post a specified bail amount and failure to appear risk given
defendant characteristics.
Given that a large fraction of judges make pretrial release decisions that are inconsistent with
expected utility maximization at accurate beliefs about failure to appear risk, I next apply the
identification results in Section 1.4.2 to bound the extent to which these judges’ implied beliefs
characteristics. For each judge whose choices violate the implied revealed preference inequalities,
I construct a 95% confidence interval for their implied prediction mistakes δpw, dq{δpw, d1 q between
the top decile d and bottom decile d1 of the predicted failure to appear risk distribution. To do so,
I first construct a 95% joint confidence set for the reweighted utility thresholds τpw, dq, τpw, d1 q at
the bottom and top deciles of the predicted failure to appear risk distribution using test inversion
based on Theorem 1.4.2. I then construct a 95% confidence interval for the implied prediction
p1´τpw,dqq{τpw,dq
mistake by calculating p1´τpw,d1 qq{τpw,d1 q for each pair τpw, dq, τpw, d1 q of values that lie in the joint
Figure 1.2 plots the constructed confidence intervals for the implied prediction mistakes
δpw, dq{δpw, d1 q for each judge over the race-and-age cells. Whenever informative, the confidence
intervals highlighted in orange lie everywhere below one, indicating that these judges’ are acting
as-if their implied beliefs about failure to appear risk underreact to predictable variation in failure
to appear risk. That is, these judges are acting as-if they perceive the change in failure to appear
risk between defendants in the top decile and bottom decile of predicted risk to be lass then true
change in failure to appear risk across these defendants. This could be consistent with judges
“over-regularizing” how their implicit predictions of failure to appear risk respond to variation in
the characteristics across these extreme defendants, and may therefore be suggestive of some form
inattention (Gabaix, 2014; Caplin and Dean, 2015; Gabaix, 2019). Developing formal tests for these
45
Figure 1.2: Estimated bounds on implied prediction mistakes between lowest and highest predicted
failure to appear risk deciles made by judges within each race-by-age cell.
Notes: This figure plots the 95% confidence interval on the implied prediction mistake δpw, dq{δpw, d1 q between the top
decile d and bottom decile d1 of the predicted failure to appear risk distribution for each judge in the top 25 whose
pretrial release decisions violated the implied revealed preference inequalities (Table 1.1) and each race-by-age cell. The
implied prediction mistake δpw, dq{δpw, d1 q measures the degree to which judges’ beliefs underreact or overreact to
variation in failure to appear risk. When informative, the confidence intervals highlighted in orange show that judges
under-react to predictable variation in failure to appear risk from the highest to the lowest decile of predicted failure to
appear risk (i.e., the estimated bounds lie below one). These confidence intervals are constructed by first constructing a
95% joint confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the
moment inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated
with each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on
the implied prediction mistake and Section 1.5.5 for the estimation details. Source: Rambachan and Ludwig (2021).
behavioral mechanisms in empirical settings like pretrial release is beyond the scope of this paper.
Analogous estimates for race-and-felony charge cells are summarized in Figure A.2. Supple-
ment A.9 shows that judges’ implied beliefs underreact to variation in the latent outcome using
alternative bounds on the missing data and alternatively defining the latent outcome to be any
pretrial misconduct.
As a final step to investigate why the release decisions of judges in New York City are inconsistent
with expected utility maximization at accurate beliefs, I also report the cells of defendants on
which the maximum studentized violation of the revealed preference inequalities in Corollary
46
1.3.4 occurs. This shows which defendants are associated with the largest violations of the revealed
preference inequalities.
Table 1.2: Location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization at
accurate beliefs about failure to appear risk given defendant characteristics.
White Defendants
Middle Deciles 0% 0%
Tail Deciles 25% 7.14%
Black Defendants
Middle Deciles 0% 0%
Tail Deciles 75% 92.85%
Notes: This table summarizes the location of the maximum studentized violation of revealed preference inequalities in
Corollary 1.3.4 among judges whose release decisions are inconsistent with expected utility maximization at accurate
beliefs and utility functions that depend on both the defendant’s race and age as well as the defendant’s race and
whether the defendant was charged with a felony. Bounds on the failure to appear rate among detained defendants are
constructed using the judge leniency instrument (see Section 1.5.3). Among judge’s whose release decision violate
the revealed preference inequalities at the 5% level, I report the fraction of judges for whom the maximal studentized
violation occurs among white and black defendants on tail deciles (deciles 1-2, 9-10) and middle deciles (3-8) of
predicted failure to appear risk. Source: Rambachan and Ludwig (2021).
Among judges whose choices are inconsistent with expected utility maximization at accurate
beliefs, Table 1.2 reports the fraction of judges for whom the maximal studentized violation occurs
over the tails (deciles 1-2, 9-10) or the middle of the predicted failure to appear risk distribution
(deciles 3-8) for black and white defendants respectively. Consistent with the previous findings,
I find all maximal violations of the revealed preference inequalities occur over defendants that
lie in the tails of the predicted risk distribution. Furthermore, the majority occur over decisions
involving black defendants as well. These empirical findings together highlight that detectable
prediction mistakes primarily occur on defendants at the tails of the predicted risk distribution.
Finally, I illustrate that the preceding behavioral analysis has important policy implications about
the design of algorithmic decision systems by analyzing policy counterfactuals that replace judges
47
with algorithmic decisions rules in the New York City pretrial release setting. As a technical step,
I show that average social welfare under a candidate decision rule is partially identified due to
the missing data problem, and provide simple methods for inference on this quantity.
Focusing on binary screening decisions, consider a policymaker whose payoffs are summarized
by the social welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. The policymaker evaluates a candidate
decision rule p˚ pw, xq P r0, 1s, which denotes the probability C “ 1 is chosen given W “ w, X “ x.
Due to the missing data problem, expected social welfare under the candidate decision rule is
partially identified. I characterize the identified set of expected social welfare under the candidate
decision rule, and show that testing the null hypothesis that expected social welfare is equal to
some value is equivalent to testing a series of moment inequalities with nuisance parameters
that enter linearly. These results extend to analyzing expected social welfare under the decision
For a candidate decision rule p˚ pw, xq, expected social welfare at pw, xq P W ˆ X is
where ℓpw, x; p˚ , U ˚ q :“ U ˚ p1, 1qp˚ pw, xq ´ U ˚ p0, 0qp1 ´ p˚ pw, xqq and βpw, x; p˚ , U ˚ q :“ U ˚ p0, 0qp1 ´
ÿ
θpp˚ , U ˚ q “ βpp˚ , U ˚ q ` Ppw, xqℓpw, x; p˚ , U ˚ qPY˚ p1 | w, xq (1.7)
pw,xqPW ˆX
ř
where βpp˚ , U ˚ q :“ pw,xqPW ˆX Ppw, xqβpw, x; p˚ , U ˚ q. This definition of the social welfare func-
tion is strictly utilitarian. It does not incorporate additional fairness considerations that have
received much attention in an influential literature in computer science are particularly important
in the criminal justice system (e.g., see Barocas and Selbst, 2016; Mitchell et al., 2019; Barocas et al.,
Since PY˚ p1 | w, xq is partially identified due to the missing data problem, total expected
39 The identification and inference results could be extended to incorporate a social welfare function that varies
across groups defined by the characteristics as in Rambachan et al. (2021), or a penalty that depends on the average
characteristics of the individuals that receive C “ 1 as in Kleinberg et al. (2018b).
48
social welfare is also partially identified and its sharp identified set of total expected welfare is an
interval.
Proposition 1.6.1. Consider a binary screening decision, a policymaker with social welfare function
U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 and a candidate decision rule p˚ pw, xq. The sharp identified set of to-
The sharp identified set of total expected social welfare under a candidate decision rule is
characterized by the solution to two linear programs. Provided the candidate decision rule and
joint distribution of the characteristics pW, Xq are known, testing the null hypothesis that total
expected social welfare is equal to some candidate value is equivalent to testing a system of
Proposition 1.6.2. Consider a binary screening decision, a policymaker with social welfare function
U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 and a known candidate decision rule p˚ pw, xq. Conditional on the characteristics
pW, Xq, testing the null hypothesis H0 : θpp˚ , U ˚ q “ θ0 is equivalent to testing whether
¨ ˛
c“0,y˚ “1
´ ¯ ´P
Dδ P Rdw dx ´1 s.t. A
rp¨,1q θ0 ´ ℓ⊺ pp˚ , U ˚ qPc“1,y˚ “1 ´ βpp˚ , U ˚ q ` A ˝ c“0,y˚ “1 ‚,
rp¨,´1q δ ď ˚ ‹
P
˚ “1
where ℓpp˚ , U ˚ q is the dw d x -dimensional vector with elements Ppw, xqℓpw, x; p˚ , U ˚ q, Pc“1,y is the
c“0,y˚ “1 c“0,y˚ “1
dw d x -dimensional vector of moments PC,Y˚ p1, 1 | w, xq, P ,P are the dw d x -dimensional
r is a known matrix.40
vectors of lower and upper bounds on PC,Y˚ p0, 1 | w, xq respectively, and A
A confidence interval for total expected social welfare can then be constructed through test
40 For a matrix B, Bp¨,1q refers to its first column and Bp¨,´1q refers to all columns except its first column.
49
inversion. Testing procedures for moment inequalities with nuisance parameters are available
for high-dimensional settings in Belloni et al. (2018). Andrews et al. (2019) and Cox and Shi
(2020) develop inference procedures that exploit the additional linear structure and are valid in
low-dimensional settings. Using this testing reduction requires that the candidate decision rule be
some known function, which can be achieved by constructing the decision rule on held-out data.
I use these results to compare total expected social welfare under the observed decisions of judges
in New York City against total expected social welfare under counterfactual algorithmic decision
rules. I vary the relative cost of detaining an individual that would not fail to appear in court
U ˚ p0, 0q (i.e., an “unnecessary detention”), normalizing U ˚ p1, 1q “ ´1. For a particular choice of
the social welfare function, I construct an algorithmic decision rule that decides whether to release
individuals based on a prediction of the probability of pretrial misconduct at each possible cell of
payoff relevant characteristics W and each decile of predicted failure to appear risk Dw pXq. The
decision rule is a threshold rule, whose cutoff depends on the particular parametrization of the
social welfare function. Appendix A.3.4 discusses the construction of this decision rule in more
detail.
I construct 95% confidence intervals for total expected social welfare under the algorithmic
decision rule and the judge’s observed released decisions. I report the ratio of worst-case total
expected social welfare under the algorithmic decision rule against worst-case total expected social
welfare under the judge’s observed release decisions, which summarizes the worst-case gain from
replacing the judge’s decisions with the algorithmic decision rule. I conduct this exercise for
each judge over the race-by-age cells, reporting the median, minimum and maximum gain across
judges. Supplement A.9 reports the same results over the race-by-felony charge cells.
I compare the algorithmic decision rule against the release decisions of judges who were found
to make detectable prediction mistakes. Figure 1.3 plots the improvement in worst-case total
expected social welfare under the algorithmic decision rule that fully replaces these judges over
50
all decisions against the observed release decisions of these judges. For most values of the social
welfare function, worst-case total expected social welfare under the algorithmic decision rule is
strictly larger than worst-case total expected social welfare under these judges’ decisions. Recall
these judges primarily made detectable prediction mistakes over defendants in the tails of the
predicted failure to appear risk distribution. Over the remaining defendants, however, their
choices were consistent with expected utility maximization at accurate beliefs about failure to
appear risk. Consequently, the difference in expected social welfare under the algorithmic decision
rule and the judges’ decisions are driven by three forces: first, the algorithmic decision rule
corrects detectable prediction mistakes over the tails of the predicted risk distribution; second, the
algorithmic decision rule corrects possible misalignment between the policymaker’s social welfare
function and judges’ utility function over the remaining defendants; and third, the judges may
observe predictive private information over the remaining defendants that is unavailable to the
For social welfare costs of unnecessary detentions ranging over U ˚ p0, 0q P r0.3, 0.8s, the
algorithmic decision rule either leads to no improvement or strictly lowers worst-case expected
total social welfare relative to these judges’ decisions. Figure A.3 plots the improvement in worst-
case total expected social welfare by the race of the defendant, highlighting that these costs are
particularly large over white defendants. At these values, judges’ preferences may be sufficiently
aligned with the policymaker and observe sufficiently predictive private information over the
remaining defendants that it is costly to fully automate their decisions. Figure A.4 compares the
release rates of the algorithmic decision rule against the observed release rates of these judges. The
release rate of the algorithmic decision rule is most similar to the observed release rate of these
judges precisely over the values of social welfare function where the judges’ decisions dominate
The behavioral analysis suggests that it would most valuable to automate these judges’
decisions over defendants that lie in the tails of the predicted failure to appear risk distribution
where they make detectable prediction mistakes. I compare these judges’ observed release
decisions against an algorithmic decision rule that only automates decisions over defendants in
the tails of the predicted failure to appear risk distribution and otherwise defers to the judges’
observed decisions. This is a common way of statistical risk assessments are used in pretrial
51
Figure 1.3: Ratio of total expected social welfare under algorithmic decision rule relative to release
decisions of judges that make detectable prediction mistakes about failure to appear risk.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule that
fully automates decision-making against the observed release decisions of judges who were found to make detectable
prediction mistakes. Worst case total expected social welfare under each decision rule is computed by constructing 95%
confidence intervals for total expected social welfare under the decision rule, and reporting smallest value that lies
in the confidence interval. These decisions rules are constructed and evaluated over race-by-age cells and deciles of
predicted failure to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not
fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges,
and the dashed lines report the minimum and maximum change across judges. See Section 1.6.2 for further details.
Source: Rambachan and Ludwig (2021).
release systems throughout the United States (Stevenson, 2018; Albright, 2019; Dobbie and Yang,
2019). This algorithmic decision rule only corrects the detectable prediction mistakes, and its
welfare effects are plotted in Figure 1.4. The algorithmic decision rule that only corrects prediction
mistakes weakly dominates the observed release decisions of judges, no matter the value of the
social welfare function. For some parametrizations, the algorithmic decision rule leads to 20%
improvements in worst-case social welfare relative to the observed release decisions of these
judges. Removing judicial discretion over cases where detectable prediction mistakes are made
but otherwise deferring to them over all other cases may therefore be a “free lunch.” This provides
a behavioral mechanism for recent machine learning methods that attempt to estimate whether a
decision should be made by an algorithm or deferred to a decision maker (e.g., Madras et al., 2018;
Raghu et al., 2019; Wilder et al., 2020). These results suggest that deciding whether to automate a
52
Figure 1.4: Ratio of total expected social welfare under algorithmic decision rule that corrects
prediction mistakes relative to release decisions of judges that make detectable prediction mistakes
about failure to appear risk.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule that
only replaces the judge on cases in the tails of the predicted failure to appear risk distribution (deciles 1-2, and deciles
9-10) against the judge’s observed release decisions. Worst case total expected social welfare under each decision rule
is computed by constructing 95% confidence intervals for total expected social welfare under the decision rule, and
reporting smallest value that lies in the confidence interval. These decisions rules are constructed and evaluated over
race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the relative social welfare cost of
detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line
plots the median change across judges, and the dashed lines report the minimum and maximum change across judges.
See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).
decision or defer to a decision maker requires understanding whether the decision maker makes
Figure A.5 reports the welfare effects of automating the release decisions of judges whose choices
were found to be consistent with expected utility maximization at accurate beliefs. Automating
these judge’s release decisions may strictly lower worst-case expected social welfare for a range of
social welfare costs of unnecessary detentions, These judges are making pretrial release decisions
as-if their preferences were sufficiently aligned with the policymaker over these parametrizations
of the social welfare function such that their private information leads to better decisions than the
algorithmic decision rule. Figure A.6 plots the results by defendant race and Figure A.7 compares
53
the release rates of the algorithmic decision rule against the observed release rates of these judges.
Understanding the welfare effects of automating a decision maker whose decisions are consistent
with expected maximization requires assessing the value of their private information against the
degree to which they are misaligned, which is beyond the scope of this paper.41
1.7 Conclusion
This paper develops an econometric framework for testing whether a decision maker makes
prediction mistakes in high stakes, empirical settings such as hiring, medical diagnosis and
treatment, pretrial release, and many others. I characterized expected utility maximization, where
the decision maker maximizes some utility function at accurate beliefs about the outcome given
the observable characteristics of each decision as well as some private information. I developed
tractable statistical tests for whether the decision maker makes systematic prediction mistakes
and methods for conducting inference on the ways in which their predictions are systematically
biased. Analyzing the pretrial release system in New York City, I found that a substantial fraction
of judges make systematic prediction mistakes about failure to appear risk given defendant
characteristics. Finally, I showed how this behavior analysis may inform the design of algorithmic
decision systems by comparing counterfactual social welfare under alternative algorithmic release
This paper highlights that empirical settings, such as pretrial release, medical diagnosis, and
hiring, can serve as rich laboratories for behavioral analysis. I provided a first step by exploring
the empirical content of expected utility maximization, a canonical model of decision making
under uncertainty, in these settings. An exciting direction is to explore the testable implications of
alternative behavioral models such as rational inattention (e.g., Sims, 2003; Gabaix, 2014; Caplin
and Dean, 2015) as well as forms of salience (e.g., Gennaioli and Shleifer, 2010; Bordalo et al.,
2016, 2021). Exploiting the full potential of these empirical settings is an important, policy-
relevant agenda at the intersection of economic theory, applications of machine learning, and
microeconometrics.
41 See Frankel (2021) for a principal-agent analysis of delegating decisions to a misaligned decision maker who
54
Chapter 2
2.1 Introduction
In this paper, we introduce the nonparametric, direct potential outcome system as a foundational
framework for analyzing dynamic causal effects of assignments on outcomes in observational time
series settings. We consider settings in which there is a single unit (e.g., macroeconomy or market)
observed over time. At each time period t ě 1, the unit receives a vector of assignments Wt ,
and an associated vector of outcomes Yt are generated. The outcomes are causally related to the
assignments through a potential outcome process, which is a stochastic process that describes what
would be observed along counterfactual assignment paths. A dynamic causal effect is generically
defined as the comparison of the potential outcome process along different assignment paths at a
1 This chapter is joint work with Neil Shephard. We thank Iavor Bojinov, Gary Chamberlain, Fabrizia Mealli, James
M. Robins, Frank Schorfheide and James H. Stock for conversations that have helped developed our thinking on
causality. We are also grateful to many others, including conference participants, for their comments on earlier versions
of this paper. Rambachan gratefully acknowledges financial support from the NSF Graduate Research Fellowship
under Grant DGE1745303.
55
restrictions on the extent to which past assignments may causally affect outcomes, nor common
outcomes and assignments. Most leading econometric models used to study dynamic causal
effects in time series settings, such as the structural vector moving average model and (both
linear and non-linear) structural vector autoregressions, can be cast as special cases of the direct
potential outcome system by introducing these additional restrictions on the potential outcome
process or the full system. In this sense, the direct potential outcome system provides a flexible,
nonparametric foundation upon which to analyze dynamic causal effects in time series settings.
We then analyze conditions under which predictive time series estimands, such as the impulse
response function (Sims, 1980), generalized impulse response function (Koop et al., 1996), local
projection (Jordá, 2005) and local projection instrumental variables (Jordá et al., 2015), have a
That is, under what conditions do these common time series estimands have a nonparametric
causal interpretation as measuring how movements in the outcomes Yt`h for some h ě 0 are
four data environments, which place alternative assumptions on what output is observed by the
researcher.
First, we analyze a benchmark case in which the researcher directly observes both the outcomes
Yt`h and the assignments Wt generated by the potential outcome system through time. We show
that impulse response functions, local projections, and generalized impulse response functions of
the outcome Yt`h on the assignment Wt identify a dynamic average treatment effect, a weighted
average of marginal average treatment effects, and a filtered average treatment effect respectively
if the assignments Wt are randomly assigned. Random assignment requires that the assignment
must be independent of the potential outcome process (which is familiar from cross-sectional
causal inference) and the assignments must be independent over time. These results provide a
measures of underlying economic shocks, and then uses these constructed measures to estimate
dynamic causal effects using reduced form methods. Nakamura and Steinsson (2018b) refers
to this empirical strategy as “direct causal inference.” Our first set of results therefore provide
conditions that these constructed shocks must satisfy in order for the resulting reduced form
56
estimates to be causally interpretable.
Second, we provide a special case of the direct potential outcome system to incorporate
instrument variables Zt that causally affect the assignment Wt but not the outcome Yt`h . Provided
the researcher directly observes the instrument, the assignment, and outcome, it is natural to
consider the causal interpretation of the time-series analogue of common instrumental variable
estimands. We focus attention on dynamic instrument variable estimands that take the ratio of an
impulse response function of the outcome Yt`h on the instrument Zt (a “reduced-form”) relative
to an impulse response function of the assignment Wt`h on the instrument Zt (a “first stage”).
We show that such dynamic instrumental variable estimands identify an appropriately defined
dynamic “local average treatment effect” in the spirit of Imbens and Angrist (1994), provided
the instrument is randomly assigned and satisfies a monotonicity condition that is familiar from
cross-sectional causal inference. The dynamic local average treatment effects that we characterize
measure the average dynamic treatment effect of the assignment on the h-period ahead outcome
conditional on the event that the instrument causally affects the treatment.
We further analyze the case in which the researcher only observes the instrument Zt and
outcome Yt`h but not the assignment Wt itself. This is an important case. Empirical researchers in
macroeconomics increasingly use “external instruments” to identify the dynamic causal effects of
unobservable economic shocks on macroeconomic outcomes (e.g., see Jordá et al., 2015; Gertler and
Karadi, 2015; Nakamura and Steinsson, 2018a; Ramey and Zubairy, 2018; Stock and Watson, 2018;
Plagborg-Møller and Wolf, 2020; Jordá et al., 2020). In this research, it is common for empirical
researchers to analyze estimands that involve two distinct elements of the outcome vector Yj,t`h , Yk,t
and the instrument. For example, given an external instrument Zt for the unobserved monetary
policy shock Wt (e.g., Kuttner, 2001; Cochrane and Piazessi, 2002; Gertler and Karadi, 2015), it is
common to measure the dynamic causal effect of the monetary policy shock on unemployment
external instrument, (2) estimating a “first stage” impulse response function of the Federal Funds
rate Yk,t on the external instrument, (3) report the ratio of these impulse responses (e.g., Jordá
We show that dynamic IV estimands constructed in this manner are causally interpretable,
and nonparametrically identify a relative, dynamic local average treatment effect that measures the
57
causal response of the h-step ahead outcome Yj,t`h to a change in the treatment Wk,t that raises the
contemporaneous outcome Yk,t by one unit among compliers (i.e., in the monetary policy example,
the dynamic causal effect of unemployment to a monetary policy shock that raises the Federal
funds rate by one unit). This result therefore provides a motivation for the recent interest in
can be used to identify causally interpretable estimands without resorting to functional form
restrictions on the outcomes and without even directly observing the treatment itself.
Finally, we conclude by briefly discussing the most challenging data environment in which
the researcher only observes the outcomes Yt`h , but not the assignments Wt nor any external
instruments Zt . This case is considered by much of the foundational and influential research on
1972, 1980). We consider this setting in order to place the direct potential outcome system in this
broader context, and illustrate that researchers can recover familiar model-based approaches by
Taking a step back, quantifying dynamic causal effects is one of the great themes of the broader
time series literature. Researchers use a variety of methods such as Granger causality (Wiener,
1956; Granger, 1969; White and Lu, 2010), highly structured models such as DSGE models (Herbst
and Schorfheide, 2015), state space modelling (Harvey and Durbin, 1986; Harvey, 1996; Brodersen
et al., 2015; Menchetti and Bojinov, 2021) as well as intervention analysis (Box and Tiao, 1975)
and regression discontinuity (Kuersteiner et al., 2018). The nonparametric potential outcome
framework we develop is distinct. References to some of the more closely related work will be
given in the next section. This paper is not focused on estimators and the associated distribution
Roadmap: Section 2.2 defines the direct potential outcome system and introduces the main class
of dynamic causal effects that we focus on throughout the paper. Section 2.3 looks at the causal
meaning of common statistical estimands based on seeing the realized assignments and outcomes.
The instrumented potential outcome system is defined in Section 2.4, which relates assignments
and instruments to outcomes. Section 2.5 studies the causal interpretation of estimands based
on seeing the realized assignments, instruments and outcomes. Section 2.6 looks at the causal
58
meaning of estimands where only the instruments and outcomes are observed. Section 2.7 looks
at the causal meaning of common statistical estimands where only the outcomes are observed. We
Notation: For a time series tAt utě1 with At P A for all t ě 1, let A1:t :“ pA1 , . . . , At q and
2.2 The Direct Potential Outcome System and Dynamic Causal Effects
We now introduce the direct potential outcome system, which extends the design-based approach
developed in Bojinov and Shephard (2019) to stochastic processes. We define a large class of
casual estimands that summarize the dynamic causal effects of varying the assignment on future
outcomes. As an illustration, we show that the direct potential outcome system nests most leading
dynamic treatment effects in small-T, large-N panels. The panel work of Robins (1986) and
Abbring and Heckman (2007), amongst others, led to an enormous literature on dynamic causal
effects in panel data (Murphy et al., 2001; Murphy, 2003; Heckman and Navarro, 2007; Lechner,
2011; Heckman et al., 2016; Boruvka et al., 2018; Lu et al., 2017; Blackwell and Glynn, 2018; Hernan
and Robins, 2019; Bojinov et al., 2021; Mastakouri et al., 2021). Beyond Bojinov and Shephard
(2019), our work is most closely related to Angrist and Kuersteiner (2011) and Angrist et al. (2018).
There is a single unit. At each time period t ě 1, the unit receives a dw -dimensional assignment
tWt utě1 . Associated with this assignment process, we observe a dy -dimensional outcome tYt utě1 .
The outcomes are causally related to the assignments through the potential outcome process,
which describes what outcome would be observed at time t along a particular path of assignments.
Assumption 2.2.1 (Assignment and Potential Outcome). The assignment process tWt utě1 satisfies
Wt P W :“ dk“1 Wk Ď Rdw . The potential outcome process is, for any deterministic sequence tws usě1
Św
59
with ws P W for all s ě 1, tYt ptws usě1 qutě1 , where the time-t potential outcome satisfies Yt ptws usě1 q P
Y Ď Rd y .
The simplest case is when the assignment is scalar and binary W “ t0, 1u, in which case Wt “ 1
The potential outcome Yt ptws usě1 q may depend on future assignments tws usět`1 . Our next
assumption rules out this dependence, restricting the potential outcome to only depend on past
Assumption 2.2.2 (Non-anticipating Potential Outcomes). For each t ě 1, and all deterministic
Assumption 2.2.2 is a stochastic process analogue of non-interference (Cox, 1958b; Rubin, 1980),
extending White and Kennedy (2009) and Bojinov and Shephard (2019). It still allows for rich
references to the future assignments in the potential outcome process, and write
␣ (
The set Yt pw1:t q : w1:t P W t collects all the potential outcomes at time t.
Together, the assignments and potential outcome generate the output of the system.
Assumption 2.2.3 (Output). The output is tWt , Yt utě1 “ tWt , Yt pW1:t qutě1 . The tYt utě1 is called the
outcome process.
The outcome process is the potential outcome process evaluated at the assignment process.
Finally, we assume that the assignment process is sequentially probabilistic, meaning that any
assignment vector may be realized with positive probability at time t given the history of the
observable stochastic processes up to time t ´ 1. Let tFt utě1 denote the natural filtration generated
Assumption 2.2.4 (Sequentially probabilistic assignment process). The assignment process satisfies
0 ă PpWt “ w | Ft´1 q ă 1 with probability one for all w P W . Here the probabilities are determined by a
60
This is the time series analogue of the “overlap” condition in cross-sectional causal studies. We
make this assumption throughout the paper in order to focus attention on the causal interpretation
of common time series estimands in the presence of rich dynamic causal effects. Understanding
how violations of Assumption 2.2.4 affect the causal interpretation and estimation of common
Definition 2.2.1 (Direct Potential Outcome System). Any tWt , tYt pw1:t q : w1:t P W t uutě1 satisfying
We refer to Definition 2.2.1 as a “direct” potential outcome system in order to emphasize that it
focuses on nonparametrically modelling the direct causal effects of the assignment process tWt u
on the outcomes tYt u. We do not, however, explicitly allow for the assignment Wt to have a causal
effect on future assignments Ws for s ą t. That is, we do not introduce a potential assignment
Wt pw1:t´1 q which would model the assignment Wt that would be realized along the assignment
path w1:t´1 P W t´1 and would open an indirect, causal mechanism that allows the assignment Wt
to indirectly affect future outcomes through its effect on future assignments.2 The assignment
process tWt u in the direct potential outcome system can still nonetheless have rich dependence.
“shocks” on outcomes, which are thought to be underlying “random causes” that drive economic
fluctuations and are causally unrelated to one another (Frisch, 1933; Slutzky, 1937; Sims, 1980).
The empirical goal is to therefore trace out the dynamic causal effects of these primitive, economic
shocks tWt u on macroeconomic outcomes tYt u. We refer the readers to Ramey (2016), Stock
and Watson (2016), and Stock and Watson (2018) for recent discussions of this perspective in
2 Such
indirect causal mechanisms are often studied in a large biostatistics literature on longitudinal causal effects
and dynamic treatment regimes – e.g., see Chapter 19 of Hernan and Robins (2019).
61
Remark 2.2.1 (Background processes). We could have further introduced the background process
tXt utě1 that is causally unaffected by the assignment process. Such a process would play the same role as
Any comparison of the potential outcome process at a particular point in time along different
possible realizations of the assignment process define a dynamic causal effect. The dynamic
causal effect at time t for assignment path w1:t P W t and counterfactual path w1:t
1 P W t is
1 q. Of course, this is an enormous class of dynamic causal effects as there are
Yt pw1:t q ´ Yt pw1:t
exponentially many possible paths w1:t P W t . We therefore introduce causal estimands that
average over these dynamic causal effects along various underlying assignment paths.
To do so, let us introduce some shorthand. For t ě 1, h ě 0, and any fixed w P W , write the
Definition 2.2.2 (Dynamic causal effects). For t ě 1, h ě 0, and any fixed w, w1 P W , the time-t,
h-period ahead impulse causal effect, filtered treatment effect, and average treatment effect are,
respectively:
Yt`h pwq ´ Yt`h pw1 q, ErYt`h pwq ´ Yt`h pw1 q | Ft´1 s, ErYt`h pwq ´ Yt`h pw1 qs.
The impulse causal effect measures the ceteris paribus causal effect of intervening to switch the
time-t assignment from w1 to w on the h-period ahead outcomes holding all else fixed along the
assignment process. The impulse causal effect is a random object since the potential outcome
process itself is stochastic as well as the past W1:t´1 and future Wt`1:t`h assignments are stochastic.
The filtered treatment effect averages the impulse causal effect conditional on the natural filtration
following the stochastic process literature, where filtering refers to the sequential estimation of
time-varying unobserved variables, e.g. Kalman filter (Kalman, 1960; Durbin and Koopman, 2012),
62
particle filter (Gordon et al., 1993; Pitt and Shephard, 1999; Chopin and Papasphiliopoulos, 2020),
and hidden discrete Markov models (Baum and Petrie, 1966; Hamilton, 1989). This labelling fits
Finally, the average treatment effect further averages the filtered treatment effect over the filtration,
yielding the unconditional expectation of the impulse causal effect Yt`h pwq ´ Yt`h pw1 q.
Remark 2.2.2. If new outcome variables were added to an existing causal study, the impulse causal effect
and the average treatment effect for the existing variables would not be changed, but the filtered treatment
effect might as the new outcome variables would bulk up the filtration and so possibly change the conditional
expectation.
We further define analogous versions of the dynamic causal effects for a particular scalar
The corresponding time-t, h-period ahead impulse causal effect, filtered treatment effect, and
Yt`h pwk q ´ Yt`h pw1k q, ErtYt`h pwk q ´ Yt`h pw1k qu | Ft´1 s, ErYt`h pwk q ´ Yt`h pw1k qs.
The dynamic causal effects in Definition 2.2.2 summarize the causal effect of discrete inter-
ventions to switch the time-t assignments on the outcomes. We finally introduce derivatives
that summarize marginal causal effects of incrementally varying the time-t assignment (see,
for example, Angrist and Imbens (1995) and Angrist et al. (2000) for analogous definitions in
cross-sectional settings).
BYt`h pwk q
1
Yt`h pwk q “ , ErYt`h
1
pwk q | Ft´1 s, ErYt`h
1
pwk qs,
Bwk
are called the time-t, h-period ahead marginal impulse causal effect, the marginal filtered treatment
3 We note that Lee and Salanie (2020) also use the phrase “filtered treatment effect” in analyzing a cross-sectional
63
2.2.3 Links to macroeconometrics
Before continuing, we highlight how the direct potential outcome system naturally links to
several recent developments and debates in macroeconometrics and encompasses many familiar
First, the direct potential outcome system provides a unifying framework to analyze what
assumptions must be placed on the assignment process to endow causal meaning to common
macroeconometrics, such as the structural vector moving average, assume linearity. However, this
nullifies state-dependence and asymmetry in dynamic causal effects. Researchers recognize the
restrictiveness of linearity, yet attempt to weaken it on a case-by-case basis. For example, on the
possible nonlinear effects of oil prices (Killian and Vigfusson, 2011b,a; Hamilton, 2011); on the
nonlinear and state dependent effects of monetary policy (Tenreyro and Thwaites, 2016; Jordá et al.,
2020; Aruoba et al., 2021; Mavroeidis, 2021), and on state-dependent fiscal multipliers (Auerbach
and Gorodnichenko, 2012b,a; Ramey and Zubairy, 2018; Cloyne et al., 2020). Similarly, the direct
potential outcome system does not rely on “invertibility” or “recoverability” assumptions about
the assignment and potential outcome processes (Chahrour and Jurado, 2021). Understanding
what can be identified about dynamic causal effects without relying on these assumptions is
an active area (Stock and Watson, 2018; Plagborg-Møller, 2019; Plagborg-Møller and Wolf, 2020;
estimate dynamic causal efects in settings where researchers directly observe both the assignments
measures of the underlying economic shocks of interest Wt , and then use these constructed shocks
methods such as local projections (Jordá, 2005) or autoregressive distributed lag models (Baek
and Lee, 2021). This line of work has recently been called “direct causal inference” by Nakamura
and Steinsson (2018b) in order to contrast it with the dominant model-based approach to causal
inference in macroeconomics in the tradition of Sims (1980). We refer the reader to Nakamura
and Steinsson (2018b), Goncalves et al. (2021), and Baek and Lee (2021) for recent discussions
64
of this growing empirical literature in macroeconomics. The direct potential outcome system
provides a causal foundation for such reduced-form methods in time series, elucidating the
assumptions that the constructed shock must satisfy in order for the reduced form estimands to
Many leading causal models in macroeconomics can be cast as special cases of the direct potential
outcome system that place additional restrictions on the potential outcome process.
Example 2.2.1 (Structural vector moving average (SVMA) model). The SVMA model is the leading
workhorse model for studying dynamic causal effects in macroeconometrics (e.g., Kilian and Lutkepohl,
2017; Stock and Watson, 2018). Any infinite-order SVMA model can be expressed as a direct potential
outcome system by assuming that the potential outcome process satisfies the functional form restriction
t´1
ÿ
Yt pw1:t q :“ Θl wt´l ` Yt˚ ,
l“0
where tWt utě1 is the assignment process, tΘl u0ďlăt is a sequence of lag-coefficient matrices, and tYt˚ utě1
is a stochastic process that is causally unaffected by the assignment process. In this sense, the SVMA model
imposes that the potential outcome process is linear in the assignment process. This mapping requires
no assumptions on the dimensionality of the assignment process dw , the dimensionality of the potential
outcome process dy , nor the lag-coefficient matrices. As discussed in Plagborg-Møller and Wolf (2020), such
an infinite-order SVMA model is consistent with all discrete-time Dynamic Stochastic General Equilibrium
models as well as all stable, linear structural vector autoregression (SVAR) models. We further discuss the
Example 2.2.2 (Nonlinear structural vector autoregressions (SVAR)). Recent advances in nonlinear
SVARs can also be cast as special cases of the direct potential outcome system. As an illustration, consider
the motivating example in Goncalves et al. (2021), which is a non-linear SVAR of the form:
Y1,t pw1:t q “ w1,t , Y2,t pw1:t q “ b ` βY1,t pw1:t q ` ρY2,t´1 pw1:t´1 q ` c f pY1,t pw1:t qq ` w2,t ,
where f is a nonlinear function. Given a stochastic initial condition Y2,0 :“ ϵ2,0 that is causally unaffected
by the assignment process, iterating this system of equations forward arrives at a potential outcome process
65
Y1,t pw1:t q “ w1,t , and Y2,t pw1:t q “ g2,t pw1:t , ϵ2,0 ; θq, where g2,t is a known function and θ :“ pb, c, β, ρq
are the parameters. This is a direct potential outcome system where (1) Y1,t pw1:t q is non-random and only
depends on the contemporaneous assignment, (2) the randomness in Y2:t pw1:t q is driven by the initial
condition. Other recent examples of nonlinear SVARs include Aruoba et al. (2021) and Mavroeidis (2021).
Example 2.2.3 (Potential outcome model in Angrist and Kuersteiner (2011), Angrist et al. (2018)).
Angrist and Kuersteiner (2011) and Angrist et al. (2018) introduce a potential outcome model for time
series settings that is a special case of the direct potential outcome system. Using our notation, Angrist and
Y1,t pw1:t q “ f 1,t pY1:t´1 pw1:t´1 q, w1,t ; ϵ0 q, Y2,t pw1:t q “ f 2,t pY1,t pw1:t q, w2,t , w1:t´1 ; ϵ0 q,
where f 1,t , f 2,t are deterministic functions and ϵ0 is a random initial condition. These structural equations
impose that w1:t only impacts Y1,t through w1,t directly and through Y1,t´1 indirectly. Further, w2,1:t
only impacts Y2,t contemporaneously. Related thinking includes White and Kennedy (2009) and White
and Lu (2010). Through forward iteration of the system starting at t “ 1, this can also be expressed
as a direct potential outcome system. In this system of structural equations, the authors defined the
obs , w, W
collection of their time-t ` h potential outcomes, as tYt`h pw1:t´1 t`1:t`h q : w P WW u and focused
on ErYt`h pw1:t´1
obs , w, W obs 1
t`1:t`h q ´ Yt`h pw1:t´1 , w , Wt`1:t`h qs, which they called the “average policy effect.”
Example 2.2.4 (Expectations). Macroeconomists often consider how assignments are influenced by the
distribution of future outcomes and how they in turn vary with assignments. For example, consumers and
firms are modelled as forward-looking and so, expectations about future outcomes influence behavior today.
Consider a simple optimization-based version (e.g., Lucas, 1972; Sargent, 1981) in which the assignment
process is given by
where U is a utility function of future outcomes and assignments, while Ft´1 is written out in long hand as
obs , wobs . For each possible w T´t`1 , the expectation is over the law of Y pwobs , w q|yobs , wobs .
y1:t´1 1:t´1 t:T P W t:T 1:t´1 t:T 1:t´1 1:t´1
This decision rule delivers the output tWt , Yt pW1:t qutě1 . This looks like a direct potential outcome system
66
since Assumption 2.2.2 holds. The assignment Wt could be a deterministic function of past data if the
optimal choice is unique, which would violate Assumption 2.2.4. However, incorporating noise in the
In this section, we establish nonparametric conditions under which common statistical estimands
based on assignments and outcomes have causal meaning in the direct potential outcome system
tWt , tYt pw1:t q : w1:t P W t uutě1 , where researchers observe the realized assignments and realized
impulse response function, local projection, generalized impulse response function and the local
filtered projection. Table 2.1 defines these estimands and summarizes our main results on their
causal interpretation under important restrictions on the assignment process and other technical
Table 2.1: Top line results for the causal interpretation of common estimands based on assignments
and outcomes.
ErYt`h
ş 1
CovpYt`h ,Wk,t q Wk pwk qsErGt pwk qsdwk
Local Projection
W ErGt pwk qsdwk
ş
VarpWk,t q
k
Generalized Impulse ErYt`h | Wk,t “ wk , Ft´1 s ErYt`h pwk q ´ Yt`h pw1k q | Ft´1 s;
Response Function ´ErYt`h | Wk,t “ w1k , Ft´1 s
ErErYt`h
ş 1
ErtYt`h ´Ŷt`h|t´1 utWk,t ´Ŵk,t|t´1 us Wk pwk q|Ft´1 sErGt|t´1 pwk q|Ft´1 ssdwk
Local Filtered Projection
W ErGt|t´1 pwk qsdwk
ş
ErtWk,t ´Ŵk,t|t´1 u2 s
k
Notes: This table summarizes the main results for the causal interpretation of common estimands based on assignments
and outcomes. Here h ě 0, wk , w1k P Wk , Gt pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t sq and Gt|t´1 pwk q “ 1twk ď Wk,t upWk,t ´
ErWk,t | Ft´1 sq, while Ŷt`h|t´1 :“ ErYt`h | Ft´1 s and Ŵk,t|t´1 :“ ErWk,t | Ft´1 s. Note that ErGt pwk qs ě 0 and
ErGt|t´1 pwk q | Ft´1 s ě 0.
In this section, there is no loss in generality in assuming the outcome Yt`h is univariate. The
67
2.3.1 Impulse Response Function
We begin by determining the conditions under which the unconditional impulse response function
(Sims, 1980) is the h-period ahead average treatment effect. For h ě 0 and deterministic wk , w1k P
IRFk,t,h pwk , w1k q can be decomposed into the average treatment effect and a selection bias term.
Theorem 2.3.1. Assume a direct potential outcome system, consider some k “ 1, . . . , dw , t ě 1, h ě 0, fix
IRFk,t,h pwk , w1k q “ ErYt`h pwk q ´ Yt`h pw1k qs ` ∆k,t,h pwk , w1k q,
where ` ˘
Cov pYt`h pwk q, 1tWk,t “ wk uq Cov Yt`h pw1k q, 1tWk,t “ w1k u
∆k,t,h pwk , w1k q :“ ´ .
Er1tWk,t “ wk us Er1tWk,t “ w1k us
The impulse response function is therefore equal to the average treatment effect if and only if
the selection bias term ∆k,t,h pwk , w1k q “ 0. A sufficient condition for this to hold is that the two
Notice that these covariance terms depend on how the assignment Wk,t covaries with the
potential outcome Yt`h pwk q. Since Yt`h pwk q :“ Yt`h pW1:t´1 , W1:k´1,t , wk , Wk`1:dW ,t , Wt`1:t`h q by
definition, the selection bias therefore depends on how the assignment Wk,t relates to
` ˘
Cov pYt`h pwk q, 1tWk,t “ wk uq “ 0, Cov Yt`h pw1k q, 1tWk,t “ w1k u “ 0 (2.3)
68
then ∆k,t,h pwk , w1k q “ 0. Moreover, (2.3) is satisfied if
Wk,t K
K Yt`h pwk q, and Wk,t KK Yt`h pw1k q, (2.4)
Equation (2.6) says the selection bias is zero if the assignment Wk,t is randomized in the sense that
Recent reviews on dynamic causal effects in macroeconometrics by Ramey (2016) and Stock
and Watson (2018) argue intuitively that the impulse response function of observed outcomes
to “shocks” in parametric structural models, such as the SVMA, are analogous to an average
these statements rely on either intuitive descriptions of the statistical properties of shocks5 , or on a
specific parametric model for the potential outcome process to link the impulse response function
to an average dynamic causal effect. Theorem 2.3.2 clarifies that if the assignment Wk,t is randomly
assigned in these sense of (2.6), then the impulse response function nonparametrically identifies
an average treatment effect in the direct potential outcome system. In this sense, Theorem
Furthermore, Theorems 2.3.1-2.3.2 clarifies a recent empirical literature that seeks to directly
construct measures of the shocks of interest and measure dynamic causal effects through reduced-
form estimates of impulse response functions — so called “direct causal inference” (e.g., see
4 Stockand Watson (2018) write on pg. 922: “The macroeconometric jargon for this random treatment is a ’structural
shock:’ a primitive, unanticipated economic force, or driving impulse, that is unforecastable and uncorrelated with
other shocks. The macroeconomist’s shock is the microeconomists’ random treatment, and impulse response functions
are the causal effects of those treatments on variables of interest over time, that is, dynamic causal effects.”
5 Ramey (2016) writes on pg. 75, “the shocks should have the following characteristics: (1) they should be exogenous
with respect to the other current and lagged endogenous variables in the model; (2) they should be uncorrelated with
other exogenous shocks; otherwise, we cannot identify the unique causal effects of one exogenous shock relative to
another; and (3) they should represent either unanticipated movements in exogenous variables or news about future
movements in exogenous variables.”
69
Nakamura and Steinsson, 2018b; Baek and Lee, 2021). In order for researchers to causally
as nonparametrically identifying an average treatment effect, then the constructed shocks must be
Under the conditions of Theorem 2.3.1, impulse response functions are causal, but nonparamet-
observed by the researcher, it is therefore common to estimate impulse response functions using
“local projections” (Jordá, 2005), which directly regresses the h-step ahead outcome on a constant
CovpYt`h , Wk,t q
LPk,t,h :“ . (2.7)
VarpWk,t q
Theorem 2.3.3 establishes that LPk,t,h identifies a weighted average of marginal causal effects of
Theorem 2.3.3. Under the same conditions as Theorem 2.3.1, further assume that:
K tYt`h pwk q : wk P Wk u.
iii. Independence: Wk,t K
Then, if it exists,
ErYt`h
ş 1 pw qsErG pw qsdw
Wk k t k k
LPk,t,h “ ,
Wk ErGt pwk qsdwk
ş
where Gt pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t sq, noting ErGt pwk qs ě 0.
The local projection estimand LPk,t,h is therefore a weighted average of marginal average treatment
effects of Wk,t on the Yt`h , where the weights ErGt pwk qs are non-negative and sum to one. Thus, if
the assignment Wk,t is a shock in the sense stated in Theorem 2.3.2, the local projection estimand
70
2.3.3 Generalized Impulse Response Function
In non-linear time series models, it is common to focus on the conditional version of the impulse
response function, the h-period ahead generalized impulse response function (Gallant et al., 1993;
GIRFk,t,h pwk , w1k | Ft´1 q :“ ErYt`h | Wk,t “ wk , Ft´1 s ´ ErYt`h | Wk,t “ w1k , Ft´1 s. (2.8)
Mirroring our analysis of the impulse response function, we next show that GIRFk,t,h can be
decomposed into the filtered treatment effect and a selection bias term.
Theorem 2.3.4. Assume a direct potential outcome system, some k “ 1, . . . , dw , t ě 1, and h ě 0 and that
Er|Yt`h pwk q ´ Yt`h pw1k q| | Ft´1 s ă 8. Then, for any deterministic wk , w1k P W ,
GIRFk,t,h pwk , w1k | Ft´1 q “ ErtYt`h pwk q ´ Yt`h pw1k qu | Ft´1 s ` ∆k,t,h pwk , w1k | Ft´1 q,
where
` ˘
Cov pYt`h pwk q, 1tWk,t “ wk u | Ft´1 q Cov Yt`h pw1k q, 1tWk,t “ w1k u | Ft´1
∆k,t,h pwk , w1k | Ft´1 q :“ ´ .
Er1tWk,t “ wk u | Ft´1 s Er1tWk,t “ w1k u | Ft´1 s
Sufficient conditions for the selection bias term ∆k,t,h pwk , w1k | Ft´1 q to equal zero is that the
two conditional covariances are zero. Repeating the unconditional case, Theorem 2.3.5 provides
sufficient conditions such that the selection bias term is equal to zero.
` ˘
Cov pYt`h pwk q, 1tWk,t “ wk u | Ft´1 q “ 0, Cov Yt`h pw1k q, 1tWk,t “ w1k u | Ft´1 “ 0, (2.9)
71
Therefore, under (2.9), the selection bias ∆k,t,h pwk , w1k | Ft´1 q “ 0 and the generalized impulse
response function identifies the filtered impulse causal effect. Notice how much weaker (2.12)
is than (2.6) as it allows the assignment to probabilistically depend flexibly on the past realised
inference. That is, it imposes that conditional on the history up to time t ´ 1, the assignment
Wk,t must be as good as randomly assigned. However, recall that the notation Yt`h pwk q buries
dependence on (i) other contemporaneous assignments W1:k´1,t , Wk`1:dW ,t ; (ii) future assignments
Wt`1:t`h ; and (iii) the potential outcomes at time-pt ` hq. Therefore, (2.12) in Theorem 2.3.5 provides
further sufficient conditions under which (2.11) is satisfied, highlighting that it is sufficient to
further impose that the assignment Wk,t is jointly independent of all other contemporaneous and
Remark 2.3.1. How do the conditions in Theorem 2.3.2 relate to the conditions in Theorem 2.3.5? Applying
Ft´1 qs “ 0. Hence, the conditional and unconditional cases are non-nested. If we instead work probabilisti-
cally, then the condition
´ ¯
K W1:t´1 , W1:k´1,t , Wk`1:dW ,t , Wt`1:t`h , tY1:t`h pw1:t`h q : w1:t`h P W t`h u ,
Wk,t K
which strengthens (2.6) to additionally require independence of the full potential outcome process, implies
the condition (2.12). This second point is important practically. The generalized impulse response function
tells us the filtered treatment effect provided that rWk,t KK tYt`h pwk q : wk P Wk us | Ft´1 . A temporally
averaged generalized impulse response function therefore tells us the average treatment effect without the
need to employ the harsher condition rWk,t KK tYt`h pwk q : wk P Wk us as it sidesteps the use of the impulse
response function.
72
2.3.4 Generalized Local Projection and Local Filtered Projection Estimands
the same conditions as Theorem 2.3.3 but replacing condition (iii) with Equation (2.11), the
ErYt`h
ş 1 pw q | F
CovpYt`h , Wk,t | Ft´1 q Wk k t´1 sErGt|t´1 pwk q | Ft´1 sdwk
“ ,
Wk ErGt|t´1 pwk q | Ft´1 sdwk
ş
VarpWk,t | Ft´1 q
where Gt|t´1 pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t | Ft´1 sq, noting ErGt|t´1 pwk q | Ft´1 s ě 0. The
effects of Wk,t on Yt`h , where the weights now depend on the natural filtration but still are
Of more practical importance, is the local projection of Yt`h ´ Ŷt`h|t´1 on Wk,t ´ Ŵk,t|t´1 , where
Ŷt`h|t´1 :“ ErYt`h | Ft´1 s and Ŵk,t|t´1 :“ ErWk,t | Ft´1 s. We call the associated estimand the local
Under the same conditions as needed for the generalized local projection plus needing the
This is a long-run weighted average of the marginal filtered causal effect. The weights are non-zero
We now use a special case of the direct potential outcome system to incorporate instrumental
variables for the assignment process. This is useful as a rapidly growing literature in macroeco-
nomics exploits the use of instruments to identify dynamic causal effects (e.g., see Jordá et al., 2015;
Gertler and Karadi, 2015; Ramey and Zubairy, 2018; Stock and Watson, 2018; Plagborg-Møller
and Wolf, 2020; Jordá et al., 2020, among many others). Section 2.5 details the case where the
researcher observes the assignments, the instruments, and the outcomes. Section 2.6 considers the
73
case where only the instruments and the outcomes are observed.
The instrumented potential outcome system then imposes two further assumptions on the
potential outcome system: (i) that tVt utě1 splits into an “instrument” tZt utě1 and a “potential as-
signment” tWt pzt q : zt P W Zt utě1 that is only causally affected by the contemporaneous instrument,
meaning Vt “ pZt , Wt pZt qq; (ii) the potential outcome process is only affected by the assignment
W1:t .
␣ (
Wk,t ptzs usě1 q “ Wk,t pz11:t´1 , zt , z1s sět`1 q
almost surely, for all t ě 1 and all deterministic tzt utě1 and tz1t utě1 . Write the potential assignments
as tWt pzt q “ pW1:k´1,t , Wk,t pzt q, Wk`1:dW ,t q : zt P W Z u, while the assignment is Wt “ Wt pZt q “
` ˘ ` ˘
Yt ppw1 , z1 q , ..., pwt , zt qq “ Yt p w1 , z11 , ..., wt , z1t q
␣
t and z , z1 P W t . Write the potential outcomes as Y pw q : w t
(
almost surely for all w1:t P WW 1:t 1:t Z t 1:t 1:t P WW
74
while Zt and tZt utě1 are called the “contemporaneous instrument,” and instrument process, respec-
tively.
t uu
Any tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW tě1 satisfying (i)-(iii) is an instrumented potential
outcome system.
The simplest case is when both the assignment and instrument are scalar and binary, WW “
t0, 1u, W Z “ t0, 1u. In this case, the instrument Zt “ 1 corresponds to “intention to treat” and
Zt “ 0 is “intention to control.” There is treatment and control as intended when Wt p1q “ 1 and
Assumption (i) imposes that Zt is only an instrument for the time-t, k-th assignment. This
strument is often “targeted” towards a single economic shock of interest – for example, empirical
researchers construct proxies for a monetary policy shock (e.g., Gertler and Karadi, 2015; Naka-
mura and Steinsson, 2018a; Jordá et al., 2020) or a fiscal policy shock (Ramey and Zubairy, 2018).
Assumption (ii) is the familiar outcome exclusion restriction on the instrument from cross-sectional
causal inference.
To use this structure, we also need a type of “relevance” condition on the instrument. Such
We now study the conditions under which leading statistical estimands based on assignments,
instruments and outcomes have causal meaning in the context of an instrumented potential
t uu
outcome system tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW tě1 . We consider the case in which
researcher observes the instruments, the assignments, and the outcomes tzobs obs obs
t , wt , yt utě1 .
Since the assignments themselves are assumed to be directly observable, we focus on dynamic
IV estimands that involve taking the ratio of an impulse response function of the outcome on
the instrument relative to the impulse response function of the assignment on the instrument.
We show that such dynamic IV estimands identify local average impulse causal effects in the
sense of Imbens and Angrist (1994), Angrist et al. (1996), and Angrist et al. (2000). Our results in
75
this section are most closely related to Jordá et al. (2020), who used a potential outcome model
analogous to that introduced in Angrist et al. (2018), to understand the causal content of local
In particular, we ask if the following statistical estimands have causal meaning: the Wald
estimand, the IV estimand, the generalized Wald estimand, and the filtered IV estimand. Table 2.2
defines these estimands and summarizes our main results on their causal interpretation under
important restrictions on the assignment process and other technical conditions. The rest of this
Table 2.2: Top line results for the causal interpretation of common estimands based on assignments,
instruments and outcomes.
ErYt`h
ş 1 pz qsErG pz qsdz
CovpYt`h ,Zt q şWZ
t t t t
IV CovpWt ,Zt q ErWt1 pzt qsErGt pzt qsdzt
WZ
ErYt`h
ş 1 pw q|H pw q“1,F
ErYt`h |Zt “z,Ft´1 s´ErYt`h |Zt “z1 ,Ft´1 s W k
ş t k t´1 sErHt pwk q|Ft´1 sdwk
Generalized ErWk,t |Zt “z,Ft´1 s´ErWk,t |Zt “z1 ,Ft´1 s ErH pw q| F
W t k t´1 sdwk
Wald
ErErYt`h
ş 1 pz q|F
ErpYt`h ´Ŷt`h|t´1 qpZt ´Ẑt|t´1 qs t t´1 sErGt pzt q|Ft´1 ssdzt
Filtered IV şWZ
ErpWk,t ´Ŵk,t|t´1 qpZt ´Ẑt|t´1 qs WZ ErErWt1 pzt q|Ft´1 sErGt pzt q|Ft´1 ssdzt
Notes: This table summarizes the main results for the causal interpretation of common estimands based on assignments,
instruments and outcomes. Here h ě 0, z, z1 P W Z , Yt`h pzt q :“ Yt`h pW1:t´1 , Wt,1:k´1 , Wk pzt q, Wt,k`1:dW , Wt`1:t`h q,
t`h pzt q{Bzt , Ht pwk q “ 1tWk,t pz q ď wk ď Wk,t pzqu, Gt pzt q “ 1tzt ď Zt upZt ´ ErZt sq and Gt|t´1 pzt q “
1 pz q :“ BY
Yt`h 1
t
1tzt ď Zt upZt ´ ErZt s | Ft´1 sq, while Ŷt`h|t´1 “ ErYt`h | Ft´1 s, Ẑt|t´1 “ ErZt | Ft´1 s and Ŵk,t|t´1 “ ErWk,t | Ft´1 s.
Note that ErGt pzt qs ě 0 and ErGt|t´1 pzt q | Ft´1 s ě 0.
ErYt`h | Zt “ zs ´ ErYt`h | Zt “ z1 s
.
ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s
The numerator is the impulse response of the outcome Yt`h on the instrument Zt , which can
be thought of as the “reduced-form.” The denominator is the impulse response function of the
assignment Wk,t on the instrument Zt , which can be thought of as the “first-stage.” Our next result
76
establishes that the Wald estimand identifies a weighted average of marginal causal effects for
“compliers” provided that (i) the potential outcome process is continuously differentiable in the
assignment; (ii) the instrument is independent of the potential assignment and outcome processes;
(iii) is a relevance condition; (iv) satisfies a monotonicity condition as introduced in Imbens and
Angrist (1994).
Theorem 2.5.1. Assume an instrumented potential outcome system, fix z, z1 P W Z and that
rwk , wk s Ă R.
ii. Independence: The instrument satisfies Zt KK tWk,t pzq : z P W Z u and Zt KK tYt`h pwk q : wk P Wk u.
Provided the instrument is randomly assigned, relevant, and satisfies a monotonicity condition,
then the Wald estimand equals a weighted average of the marginal causal effects for “compliers”
(i.e., realizations of the potential assignment function for which moving the instrument from z1
to z changes the assignment). The marginal causal effect is the derivative of the h-step ahead
potential outcome process with respect to the k-th assignment, holding all else constant. The
weights are proportional to the probability of the potential assignment function being a “complier,”
Since Yt`h pwk q :“ Yt`h pW1:t´1 , W1:k´1,t , wk , Wk`1:dW ,t , Wt`1:t`h q, Assumption (ii) implicitly re-
2. future and past potential assignments tWk,1:t´1 pz1:t´1 q, Wk,t`1:t`h pzt`1:t`h q : z1:t´1 P Z t , zt`1:t`h P
Z h u,
77
3. future and past instruments Z1:t´1 and Zt`1:t`h , and
We could extend Theorem 2.3.2 to the instrumented potential outcome system, and show that
quantities.
Remark 2.5.1 (Binary Assignment, Binary Instrument Case). Consider the simplest case with Wk,t P
t0, 1u, Zt P t0, 1u and z “ 1, z1 “ 0. Although the math is different due to the discreteness of the assignment
and instrument, under the same conditions as Theorem 2.5.3, we can show that the Wald estimand in this
case equals
which is the time-series generalization of the binary assignment, binary instrument local average treatment
2.5.2 IV Estimand
Rather than directly estimating the Wald estimand, it is natural to estimate a two-stage least
squares regression of the outcome Yt`h on the assignment Wk,t using the instrument Zt . The
associated IV estimand is
CovpYt`h , Zt q
IVk,t,h :“ .
CovpWt , Zt q
This has a causal interpretation by applying Theorem 2.3.3 for the local projection estimand
on both the numerator for the local projection of Yt`h on Zt and the denominator for the local
1 pz q :“ BY
and Yt`h t t`h pzt q{Bzt .
Theorem 2.5.2. Assume an instrumented potential outcome system. Further assume that
i. Differentiability: Yt`h pzq and that Wt pzq are continuously differentiable in the closed interval z P
W Z “ rz, zs Ă R.
78
K tWt pzq : z P W Z u,
ii. Independence: Zt K Zt KK tYt`h pzq : z P W Z u,
ErYt`h
ş 1 pz qsErG pz qsdz
W t t t t
IVk,t,h “ ş Z
WZ ErWt1 pzt qsErGt pzt qsdzt
The generalized Wald estimand is a ratio of a reduced-form generalized impulse response function
Theorem 2.5.3. Assume an instrumented potential outcome system, fix z, z1 P W Z and that
rwk , wk s Ă R.
ii. Independence: The instrument satisfies rZt KK tWk,t pzq : z P W Z us | Ft´1 and rZt KK tYt`h pwk q : wk P
Wk us | Ft´1 .
The generalized Wald estimand analogously equals a weighted average of the marginal filtered
causal effects for “compliers,” where the weights are proportional to the probability of the potential
79
We next provide a sufficient condition for the instrument to be randomly assigned in terms of
Estimating generalized Wald estimand is not easy, particularly if Zt is not discrete. Here we derive
where Ŷt`h|t´1 “ ErYt`h | Ft´1 s, Ŵk,t|t´1 “ ErWk,t | Ft´1 s and Ẑt|t´1 “ ErZt | Ft´1 s.
No new technical issues arise in dealing with this setup, but Assumption (ii) in Theorem 2.5.2
now becomes
ErYt`h
ş 1 pz q | F
t t´1 sErGt|t´1 pzt q | Ft´1 sdzt
şWZ
W Z ErWt pzt q | Ft´1 sErGt|t´1 pzt q | Ft´1 sdzt
1
where Gt|t´1 pzt q “ 1tzt ď Zt upZt ´ ErZt | Ft´1 sq, noting ErGt|t´1 pzt q | Ft´1 s ě 0.
which can be estimated by instrumental variables applied to Yt`h ´ Ŷt`h|t´1 on Wk,t ´ Ŵk,t|t´1
with instruments Zt ´ Ẑt|t´1 . Under the conditions of Theorem 2.5.2 but using (2.14) instead of
80
Assumption (ii), then the filtered IV estimand becomes
ErErYt`h
ş 1 pz q | F
t t´1 sErGt|t´1 pzt q | Ft´1 ssdzt
şWZ .
W Z ErErWt pzt q | Ft´1 sErGt|t´1 pzt q | Ft´1 ssdzt
1
In this section, we study the nonparametric conditions under which common statistical estimands
based on only instruments and outcomes have causal meaning. We focus on an instrumented
t
tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW uutě1 ,
in which the researcher only observes the instruments and the outcomes tzobs obs
t , yt utě1 . We will
sometimes refer to tFtZ,Y utě1 as the natural filtration generated by the realized tzobs obs
t , yt utě1 .
In this context, it is common for empirical researchers to analyze estimands involving two
elements of the outcome vector Yj,t`h , Yk,t and the instrument Zt (therefore, we return to using
a explicit subscript on the outcome variable). Consider, for example, an empirical researcher
that constructs an instrument Zt for the monetary policy shock (e.g., an instrument of the form
used in Kuttner (2001); Cochrane and Piazessi (2002); Gertler and Karadi (2015) or Romer and
Romer (2004)). In this case, the empirical researcher may measure the dynamic causal effect of
the monetary policy shock Wk,t on unemployment Yj,t`h by estimating the first-stage impulse
response function of the federal funds rate Yk,t on the instrument Zt . See, for example, Jordá
et al. (2015); Ramey and Zubairy (2018); Jordá et al. (2020) for recent empirical applications of this
empirical strategy.
In particular, we ask if the following estimands have causal meaning: Ratio Wald, Local Projec-
tion IV, generalized Ratio Wald, and the local filtered projection IV. We show that such dynamic
IV estimands identify “relative” local average impulse causal effect, which is a nonparametrically
instruments (Stock and Watson, 2018; Plagborg-Møller and Wolf, 2020; Jordá et al., 2020). Table 2.3
defines these estimands and summarizes our main results on their causal interpretation under
important restrictions on the assignment process and other technical conditions. The rest of this
81
Section spells out the details.
Table 2.3: Top line results for the causal interpretation of common estimands based on instruments
and outcomes.
ErYj,t`h
ş 1
CovpYj,t`h ,Zt q WZ pzk qsErGt pzk qsdzk
Local Projection CovpYk,t ,Zt q
ş
ErYk,t
1 pz qsErG pz qsdz
WZ k t k k
IV
Z,Y Z,Y
ErYj,t`h
Z,Y Z,Y
ş
ErYj,t`h |Zt “z,Ft´1 s´ErYj,t`h |Zt “z1 ,Ft´1 s W
1
pwk q|Ht pwk q“1,Ft´1 sErHt pwk q|Ft´1 sdwk
Generalized Ratio Z,Y Z,Y 1 pw q|H pw q“1,F Z,Y sErH pw q|F Z,Y sdw
ErYk,t |Zt “z,Ft´1 ErYk,t
ş
s´ErYk,t |Zt “z1 ,Ft´1 s W k t k t´1 t k t´1 k
Wald
Z,Y Z,Y
ErErYj,t`h
ş 1
CovpYj,t`h ´Ŷj,t`h|t´1 ,Zt ´Ẑt|t´1 q WZ pzk q|Ft´1 sErGt pzk q|Ft´1 ssdzk
Local Filtered 1 pz q|F Z,Y sErG pz q|F Z,Y ssdz
ErErYk,t
ş
CovpYk,t ´Ŷk,t|t´1 ,Zt ´Ẑt|t´1 q WZ k t´1 t k t´1 k
Projection IV
Notes: This table summarizes the main results for the causal interpretation of common estimands based on instruments
and outcomes. Here Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu, Gt pzt q “ 1tzt ď Zt upZt ´ ErZt sq and Gt|t´1 pzt q “ 1tzt ď
Z,Y Z,Y Z,Y
Zt upZt ´ ErZt | Ft´1 q, while Ŷk,t`h|t´1 “ ErYk,t`h | Ft´1 s and Ẑt|t´1 “ ErZt | Ft´1 s. Note that ErGt pzt qs ě 0 and
Z,Y
ErGt|t´1 pzt q | Ft´1 s ě 0.
Hence we just need to collect the conditions for the validity of their causal representations, and
i. Differentiability: Yk,t pwk q, Yj,t`h pwk q are continuously differentiable in closed interval Wk :“
rwk , wk s Ă R.
ErYk,t
ş 1 pw q | H pw q “ 1sErH pw qsdw ‰ 0.
iii. Relevance: W k t k t k k
82
iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq with probability one.
In words, the ratio Wald estimand above identifies a relative local average impulse causal
effect under the instrumented potential outcome system. The numerator is a weighted average
of the marginal causal effects of Wk,t on the h-step ahead outcome Yj,t`h , where the weights are
the marginal causal effects of Wk,t on the contemporaneous outcome Yk,t . Therefore, the ratio in
Corollary 2.6.1 measures the causal response of the h-step ahead outcome Yj,t`h to a change in the
treatment Wk,t that increases the contemporaneous outcome Yk,t by one unit on impact (among
compliers).
This is a nonparametric generalization of the well-known result that in linear SVMA models
(without invertibility) the IV based estimands identify relative impulse response functions (Stock
and Watson, 2018; Plagborg-Møller and Wolf, 2020). Corollary 2.6.1 makes no functional form
assumptions nor standard time series assumptions such as invertibility or recoverability. In this
sense, Corollary 2.6.1 highlights the attractiveness of using external instruments to measure
dynamic causal effects in observational time series data. Provided there exists an external
instrument for the treatment Wk,t that is randomly assigned, relevant and satisfies a monotonicity
condition, then the researcher can identify causally interpretable estimands without further
collect the conditions for the validity of their causal representations, and apply Theorem 2.5.2
twice.
83
Corollary 2.6.2. Consider an instrumented potential outcome system. Further assume that
i. Differentiability: Yk,t pzq, Yj,t`h pzq, Wt pzq are continuously differentiable in the closed interval z P
W Z “ rz, zs Ă R.
ErYk,t
ş 1 pz qsErG pz qsdz ‰ 0.
iii. Relevance: WZ t t t t
Researchers may also be interested in analyzing the generalized ratio Wald estimand:
Z,Y Z,Y
ErYj,t`h | Zt “ z, Ft´1 s ´ ErYj,t`h | Zt “ z1 , Ft´1 s
Z,Y Z,Y
,
ErYk,t | Zt “ z, Ft´1 s ´ ErYk,t | Zt “ z1 , Ft´1 s
which is the ratio of generalized impulse response functions at different lags and for different
outcome variables. Since this is the ratio of two generalized Wald estimands, we immediately
i. Differentiability: Yk,t pwk q, Yj,t`h pwk q are continuously differentiable in closed interval Wk :“
rwk , wk s Ă R.
Z,Y
K tWk,t pzq : z P W Z us | Ft´1
ii. Independence: rZt K and rZt KK tYk,t pwk q, Yj,t`h pwk q : wk P Wk us |
Z,Y
Ft´1 .
Z,Y
iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq | Ft´1 with probability one.
84
where Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu.
The interpretation of Corollary 2.6.3 is analogous to the interpretation of the ratio Wald estimand
In practice researchers typically estimate generalized impulse response functions using a two-stage
least-squares type estimator. This is also sometimes called “local projections with an external
instrument” (Jordá et al., 2015). We first analyze this generalized local projection IV
Z,Y
CovpYj,t`h , Zt | Ft´1 q
Z,Y
, (2.15)
CovpYk,t , Zt | Ft´1 q
which again is a ratio, this time of the Generalized IV estimands at different lag lengths. Using
Z,Y
where Gt|t´1 pzk q “ 1tzk ď Zt upZt ´ ErZt | Ft´1 sq.
of this are inherited from those of the generalized local projection IV. In particular, it equals
Z,Y Z,Y
ErErYj,t`h
ş 1
WZ pzk q | Ft´1 sErGt pzk q | Ft´1 ssdzk
.
1 pz q | F Z,Y sErG pz q | F Z,Y ssdz
ErErYk,t
ş
WZ k t´1 t k t´1 k
the tradition of Sims (1980). See, for example, Ramey (2016) and Kilian and Lutkepohl (2017) for
recent reviews. In that literature, researchers introduce parametric models to study the dynamic
causal effects of unobservable “structural shocks,” which themselves must be inferred from the
85
outcomes. Here we link this to our setup, mostly to place our work in context and illustrate that
in the direct potential outcome system framework. Assume there is a direct potential outcome
system
The causal inference approach of using only time series data on outcomes is in the storied tradition
of linear simultaneous equations models developed at the Cowles Foundation (e.g., Christ, 1994;
Hausman, 1983). The most essential causal challenges arise without any dynamic causal effects,
where A0 is a non-stochastic, square matrix. Notice that in this model the potential outcome
process is deteministic and linear combinations of the potential outcomes equal the possible
Yt pw1:t q “ A´1
0 pα ` wt q ,
which implies that the contemporaneous average treatment effect is ErYt pW1:t´1 , wq ´ Yt pW1:t´1 , w1 qs “
Furthermore, under this model, if we see pWt , Yt q “ tWt , Yt pW1:t qu, then, if the second moments
which would make statistical inference rather straightforward. But the point of this simultaneous
equations literature is to carry out inference without directly observing the assignments — which
86
If, in addition to A0 being invertible, we assume that VarpWt q ă 8, then
´ ¯T
VarpYt q “ A´1
0 VarpWt q A ´1
0 ,
Crucially knowing VarpYt q is not enough to untangle A0 and VarpWt q, and so knowledge of the
second moments of the observables is not enough alone to learn the contemporaneous average
treatment effect. In the linear simultaneous equations literature, this is resolved by a priori
imposing more economic structure on the potential outcome process, such as placing more
A central a priori constraint is the one highlighted by Sims (1980). He imposed that (a) A0 is
triangular, (b) VarpWt q is diagonal. For simplicity of exposition, look at the two dimensional case
and write
¨ ˛ ¨ ˛ ¨ ˛
1 0 ‹ 2
˚ 1 0 ‹ σ11 0 ‹
A0 “ ˝ ‚, A´1 ‚, VarpWt q “ ˝ ‚,
˚ ˚
0 “˝
´a21 1 a21 1 0 2
σ22
then the elements within A0 and VarpWt q can be individually determined from VarpYt q if VarpYt q
is of full rank. The same holds in higher dimensions. Hence, with additional restrictions on the
potential outcome process, the contemporaneous causal effect can be determined from the data on
the outcomes, without having observing the assignments (or without the access to instruments).
There are alternative a priori constraints to this triangular which also work here and the above
The linear “structural vector autoregressive ” (SVAR) version of the linear simultaneous
equation has the same fundamental structure. Focusing on the one lag model with no intercept
for simplicity, the SVAR approach assumes that the potential outcome process satisfies
Kilian and Lutkepohl (2017) provide a book length review of this model structure and its various
Φ1 “ A´1
0 A1 . So
0 wt ` Φ1 Yt´1 pw1:t´1 q,
Yt pw1:t q “ A´1
87
which in turn implies that the potential outcome process also has an SVMA model representation
and the h-period ahead marginal average treatment effect is ErBYt`h pw1:t`h q{Bw1t s “ Φ1h A´1
0 . The
time series parameter Φ1 can be determined from the dynamics of the observable outcomes if
this process is stationary. But again A0 and VarpWt q cannot be separately identified from the
A broader analysis focuses on the h-step ahead generalized impulse response function of the
j-th outcome on the k-th outcome without placing functional form restrictions on the potential
outcomes. To do so, we will further assume that the potential outcome process is a deterministic
function of the assignments and that the assignments are independent across time.
Theorem 2.7.1. Consider a direct potential outcome system, and further assume that
Y Y
ErYj,t`h |pYk,t “ yk q, Ft´1 s ´ ErYj,t`h |pYk,t “ y1k q, Ft´1 s (2.16)
Y Y
“ Erψj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s ´ Erψj,t`h pW1:t q|pYk,t “ y1k q, Ft´1 s, (2.17)
Theorem 2.7.1 illustrates that without functional form restrictions on the potential outcome
process, the generalized impulse response function of the j-th outcome on the k-th outcome has a
causal interpretation in terms of the shifting the entire conditional distribution of the treatments
88
W1:t . While this is a non-standard object, it can be interpreted as the causal effect of a stochastic
intervention on the assignment path W1:t , which has been an object of recent interest in a growing
– see, for example, Munoz and van der Laan (2012), Papadogeorgou et al. (2019), Papadogeorgou
et al. (2021), and Wu et al. (2021). Nonetheless, this is a complex causal effect as it measures an
2.8 Conclusion
In this paper, we developed the nonparametric, direct potential outcome system to study causal
inference in observational time series settings. We place no functional form restrictions on the
potential outcome process, no restrictions on the extent to which past assignments causally affect
the outcomes, nor common time series assumptions such as “invertibility’ or “recoverability.” The
direct potential outcome system therefore nests most leading econometric models used in time
series settings as a special case. We then studied conditions on the assignments under which
common time series estimands, such as the impulse response functions, generalized impulse
response function, and local projections, have a causal interpretation in terms of underlying
dynamic causal effects. We further showed that provided the researcher observes an instrument
estimands such as local projections instrumenal variables also have causal interpretations in terms
of local average dynamic causal effects. Taken together, the potential outcome system provides a
flexible, nonparametric foundation for making causal statements from observational time series of
89
Chapter 3
3.1 Introduction
Panel experiments, where we randomly assign units to different interventions, measuring their
response and repeating the procedure in several periods, form the basis of causal inference in
many areas of biostatistics (e.g., Murphy et al. (2001)), epidemiology (e.g., Robins (1986)), and
psychology (e.g., Lillie et al. (2011)). In experimental economics, many authors recognize the
benefits of panel-based experiments, for instance Bellemare et al. (2014, 2016) highlighted the
potentially large gains in power and Czibor et al. (2019) emphasized that panel-based experiments
may help uncover heterogeneity across units. Despite these benefits, panel experiments are used
infrequently in part due to the lack of a formal statistical framework and concerns about how the
impact of past treatments on subsequent outcomes may induce biases in conventional estimators
(Charness et al., 2012). In practice, authors typically assume away this complication by requiring
that the outcomes only depend on contemporaneous treatment, what is often called the “no
1 This chapter is joint work with Iavor Bojinov and Neil Shephard. This chapter based on a paper published in
Quantitative Economics: Bojinov, I., Rambachan, A. and Shephard, N. “Panel Experiments and Dynamic Causal Effects:
A Finite Population Perspective.” 2021. Quantitative Economics, 12(4):1171-1196. We thank Isaiah Andrews, Robert
Minton, Karthik Rajkumar and Jonathan Roth for helpful discussions. We especially thank James Andreoni and Larry
Samuelson for kindly sharing their data. Finally, we are grateful to Gary Chamberlain for early conversations about
this project. Any remaining errors are our own. Rambachan gratefully acknowledges financial support from the NSF
Graduate Research Fellowship under Grant DGE1745303.
90
carryover assumption” (e.g., Abadie et al. (2017), Athey and Imbens (2018), Athey et al. (2018),
Imai and Kim (2019), Arkhangelsky and Imbens (2019), Imai and Kim (2020), de Chaisemartin
and D’Haultfoeuille (2020)). Even when researchers allow for carryover effects, they commonly
focus on incorporating the uncertainty due to sampling units from some super-population as
opposed to the design-based uncertainty, which arises due to the random assignment.2
In this paper, we tackle these challenges by defining a variety of new panel-based dynamic
causal estimands without evoking restrictions on the extent to which treatments can impact
subsequent outcomes. Our approach builds on the potential outcomes formulation of causal
to the outcomes model (Neyman, 1923; Kempthorne, 1955; Cox, 1958a; Rubin, 1974). Our main
estimands are various averages of lag-p dynamic causal effects, which capture how changes in
the assignments affect outcomes after p periods. We provide nonparametric estimators that are
unbiased over the randomization distribution induced by the random design. By exploiting the
underlying Martingale property of our unbiased estimators, we derive their finite population
asymptotic distribution as either the number of sample periods, experimental units, or both
increases. This is a new technique for proving finite population central limit theorems, which may
We develop two methods for conducting nonparametric inference on these dynamic causal
effects. The first uses the limiting distribution to perform conservative tests on weak null
hypotheses of no average dynamic causal effects. The second provides exact randomization tests
for sharp null hypotheses of no dynamic causal effects. We then highlight the usefulness of our
framework by deriving the finite population probability limit of commonly used linear estimation
strategies, such as the unit fixed effects estimator and the two-way fixed effects estimator. Such
estimators are biased for a contemporaneous causal effect whenever there exists carryover effects
and serial correlation in the assignment mechanism, underscoring the value of our proposed
nonparametric estimator.
Finally, we illustrate our theoretical results in a simulation study and apply our framework
to reanalyze a panel-based experiment. The simulation study illustrates our finite population
2 See Abadie et al. (2020) for a discussion of the difference between sampling-based and design-based uncertainty in
the cross-sectional setting.
91
central limit theorems under a variety of assumptions about the underlying potential outcomes
and assignment mechanism. We confirm that conservative tests based on the limiting distribution
of our nonparametric estimator control size well and have good rejection rates against a variety of
(2006), which studies cooperative behavior in game theory and is a natural application of our
methods. Participants in the experiment played a twice-repeated prisoners’ dilemma many times,
and payoff structure of the game was randomly varied across plays. The sequential nature of the
experiment raises the possibility that past assignments may impact future actions as participants
learn about the structure of the game over time. For example, the random variation in the payoff
structure may induce participants to explore possible strategies. This motivates us to analyze the
experiment using our methods that are robust to possible dynamic causal effects. We confirm the
authors’ original hypothesis that the payoff structure of the twice repeated prisoners’ dilemma has
evidence of dynamic causal effects in this experiment — the payoff structure of previously played
games may affect cooperative behavior in the current game, which may be indicative of such
learning.
Our design-based framework provides a unified generalization of the finite population lit-
erature in cross-sectional causal inference (as reviewed in Imbens and Rubin (2015)) and time
series experiments (Bojinov and Shephard, 2019) to panel experiments. Three crucial contributions
differentiate our work from the existing literature. First, we focus on a much richer class of
dynamic causal estimands, which answer a broader set of causal questions by summarizing
heterogeneity across both units and time periods. Second, we derive two new finite population
central limit theorems as the size of the population grows, and as both the duration and population
size increase. Third, we compute the bias present in standard linear estimators in the presence of
dynamic causal effects and serial correlation in the treatment assignment probabilities.
Our framework is also importantly distinct from foundational work by Robins (1986) and
co-authors, that uses treatment paths for causal panel data analysis and focuses on providing
arguments entirely. Our estimands and inference procedures are conditioned on the potential
outcomes and all uncertainty arises solely from the randomness in assignments. Avoiding super-
92
populations arguments is often attractive in panel data applications. For example, a company
only operates in a finite number of markets (e.g., states or cities within the United States) and can
only conduct advertising or promotional experiments across markets. Such panel experiments
are increasingly common in industry (e.g. Bojinov et al., 2020a,b).3 In econometrics, Abadie et al.
(2017) highlight the appeal of this design-based perspective in panel data applications. However,
the panel-based potential outcome model developed in that work contains no dynamics as the
authors primarily focus on cross-sectional data with an underlying cluster structure. Similarly,
Athey and Imbens (2018), Athey et al. (2018) and Arkhangelsky and Imbens (2019) also introduce
a potential outcome model for panel data, but assume away carryover effects. Heckman et al.
(2016), Hull (2018), and Han (2019) consider a potential outcome model similar to ours but
literature in econometrics focuses on estimating dynamic causal effects in panel data under rich
models that allow heterogeneity across units, but does not introduce potential outcomes to define
counterfactuals and also relies on super-population arguments for inference (e.g., see Arellano
and Bonhomme (2016), Arellano et al. (2017) and the review in Arellano and Bonhomme (2012)).
write index sets as rNs :“ t1, . . . , Nu and rTs :“ t1, . . . , Tu. Finally, for a variable Ai,t observed
řT
over i P rNs and t P rTs, define its average over t as Āi¨ :“ T1 t“1 Ai,t , its average over i as
N T N
t :“ N1 i“1 1
ř ř ř
Ai,t and its average over both i and t as Ā :“ NT t“1 i“1 Ai,t .
Consider a panel in which N units (e.g., individuals or firms) are observed over T time periods.
For each unit i P rNs and period t P rTs, we allocate an assignment Wi,t P W . The assignment is a
random variable and we assume |W | ă 8. For a binary assignment W “ t0, 1u, we refer to “1” as
3 Of course, in other applications, super-population arguments may be entirely natural. For example, in the mental
healthcare digital experiments of Boruvka et al. (2018), it is compelling to use sampling-based arguments as the
experimental units are drawn from a larger group of patients for whom we wish to make inference on as, if successful,
the technology will be broadly rolled out.
93
treatment and “0” as control.
The assignment path for unit i is the sequence of assignments allocated to unit i, denoted
Wi,1:T :“ pWi,1 , ..., Wi,T q1 P W T . The cross-sectional assignment at time-t describes all assignments
allocated at period t, denoted W1:N,t :“ pW1,t , ..., WN,t q1 P W N . The assignment panel is the N ˆ T
matrix W1:N,1:T P W NˆT that summarizes the assignments given to all units over the sample
´ ¯1
1
period, where W1:N,1:T :“ pW1:N,1 , . . . , W1:N,T q “ W1,1:T 1
, . . . , WN,1:T .
A potential outcome describes what would be observed for a particular unit at a fixed point in
Definition 3.2.1. The potential outcome for unit-i at time-t along assignment path wi,1:T P W T is written
as Yi,t pwi,1:T q.
In principle, the potential outcome can depend upon the entire assignment path allowing for
arbitrary spillovers across time periods. Definition 3.2.1 imposes that there are no treatment
We now define the potential outcome panel model by restricting the potential outcomes for a unit
Assumption 3.2.1. The potential outcomes are non-anticipating if, for all i P rNs, t P rTs, and
and arbitrary heterogeneity across units and time periods.5 Under Assumption 3.2.1, the potential
outcome for unit i at time t only depends on the assignment path for unit i up to time t, allowing us
to write the potential outcomes as Yi,t pwi,1:t q. As notation, let Yi,t “ tYi,t pwi,1:t q : wi,1:t P W t u denote
4 The idea of defining potential outcomes as a function of assignment paths first appears in Robins (1986) and has
been further developed in subsequent work such as Robins (1994), Robins et al. (1999), Murphy et al. (2001), Boruvka
et al. (2018) and Blackwell and Glynn (2018).
5 Allowing for rich heterogeneity in panel data models is often useful in many economic applications. For example,
there is extensive heterogeneity across units in income processes (Browning et al., 2010) and the dynamic response of
consumption to earnings (Arellano et al., 2017). Time-varying heterogeneity is also an important feature. For example,
it is a classic point of emphasis in studying human capital formation – see Ben-Porath (1967), Griliches (1977) and more
recently, Cunha et al. (2006) and Cunha et al. (2010).
94
the collection of potential outcomes for unit i at time t and Y1:N,1:T “ tYi,t : i P rNs, t P rTsu denote
the collection of potential outcomes for all units across all time periods. Along an assignment
panel w1:N,1:t P W Nˆt up to time t, let Y1:N,1:t pw1:N,1:t q denote the associated N ˆ t matrix of
To connect the observed outcomes with the potential outcomes, we assume every unit complies
obs “
with the assignment.6 For all i P rNs, t P rTs, the observed outcomes for unit i are yi,1:T
obs q, where wobs is the observed assignment path for unit i.
Yi,1:T pwi,1:T i,1:T
A panel of units, assignments and outcomes in which the units are non-interfering and
compliant with the assignments and the outcomes obey Assumption 3.2.1 is a potential outcome
panel. For N “ 1, the potential outcome panel reduces to the potential outcome time series
model in Bojinov and Shephard (2019). For T “ 1, the potential outcome panel reduces to the
cross-sectional potential outcome model (e.g., Holland (1986) and Imbens and Rubin (2015)).
We focus on randomized experiments in which the assignment mechanisms for each period only
depend on past assignments and observed outcomes, but not on future potential outcomes nor
Definition 3.2.2. The assignments are sequentially randomized if, for all t P rTs and any w1:N,1:t´1 P
W Nˆpt´1q
PrpW1:N,t |W1:N,1:t´1 “ w1:N,1:t´1 , Y1:N,1:T q “ PrpW1:N,t |W1:N,1:t´1 “ w1:N,1:t´1 , Y1:N,1:t´1 pw1:N,1:t´1 qq.
(Robins, 1986; Murphy, 2003). This is the panel data analogue of an “unconfounded” or “ignorable”
assignment mechanism in the literature on cross-sectional causal inference (as reviewed in Chapter
3 of Imbens and Rubin (2015)).7 Since future potential outcomes and counterfactual past potential
6 In some applications, this assumption may be unrealistic. For example, in a panel-based clinical trial, we may
worry that patients do not properly adhere to the assignment. In such cases, our analysis can be re-interpreted as
focusing on dynamic intention-to-treat (ITT) effects.
7 If the researcher further observes characteristics Xi,t that are causally unaffected by the assignments, then the
definition of a sequentially randomized assignment mechanism can be modified to additionally condition on past and
contemporaneous values of the characteristics X1:N,1:t .
95
outcomes are unobservable, any feasible assignment mechanism must be sequentially randomized.
An important special case imposes further conditional independence structure across as-
signments. Let W´i,t :“ pW1,t , ..., Wi´1,t , Wi`1,t , ..., WN,t q and F1:N,t,T be the filtration generated by
Definition 3.2.3. The assignments are individualistic for unit i if, for all t P rTs and any w1:N,1:t´1 P
W Nˆpt´1q
PrpWi,t |W´i,t , F1:N,t´1,T q “ PrpWi,t |Wi,1:t´1 “ wi,1:t´1 , Yi,1:t´1 pwi,1:t´1 qq.
An individualistic assignment mechanism further imposes that conditional on its own past assign-
ments and outcomes, the assignment for unit i at time t is independent of the past assignments
and outcomes of all other units as well as all other contemporaneous assignments. For example,
the Bernoulli assignment mechanism, where PrpWi,t |W´i,t , F1:N,t´1,T q “ Pr pWi,t q for all i P rNs and
t P rTs, is individualistic.
Example 3.2.1. Consider a food delivery firm that is testing the effectiveness of a new pricing policy across
ten major U.S. cities (Kastelman and Ramesh, 2018; Sneider and Tang, 2018). Each city is an experimental
unit, and the intervention administers the appropriate pricing policy for a duration of one hour. The outcome
is the total revenue generated during each hour of the experiment, t P rTs and from city i P rNs. The
firm wishes to learn the best policy for each city and the best overall policy across all cities. To do so, it
may conduct a panel experiment with an individualistic treatment assignment in which the probability
a particular pricing policy is administered in a given city over the next hour depends on prior observed
Remark 3.2.1. Many adaptive experimental strategies (such as the one described in Example 3.2.1), in
which a series of units are sequentially exposed to random treatments whose probability vary depending on
the past observed data, satisfy our individualistic sequentially randomized assignment assumptions (e.g.,
Robbins, 1952; Lai and Robbins, 1985). Such experiments are widely used by technology companies to
quickly discern user preferences in recommendation algorithms (Li et al., 2010, 2016) and by academics
interested in improving their power against a particular hypothesis (van der Laan, 2008). There has been a
growing interest in drawing causal inferences based on the collected data in such adaptive experimental
designs (Hadad et al., 2021; Zhang et al., 2020). Since the assignment probabilities are known to the
96
researcher, our results can be viewed as providing finite population techniques for drawing causal conclusions
from adaptive experiments. In the special case of our framework where N “ 1, t P rTs indexes individuals
arriving over time and there no carryover effects, our results in the subsequent section are the finite
Our finite population central limit theorems require that the assignment mechanism be
units may affect the contemporaneous assignment of a given unit, which introduces complex
dependence structure across units. A similar difficulty arises in the growing literature on relaxing
unit’s potential outcomes to depend on another unit’s assignments (e.g., see Sävje et al., 2019).
To derive the asymptotic distribution of causal estimators in such settings, researchers typically
require the assignment mechanism to be independent (Chin, 2018) or at least have only limited
A dynamic causal effect compares the potential outcomes for unit i at time t along different
assignment paths, which we denote by τi,t pwi,1:t , w̃i,1:t q :“ Yi,t pwi,1:t q ´ Yi,t pw̃i,1:t q for assignment
paths wi,1:t , w̃i,1:t P W t . We use these dynamic causal effects to build up causal estimands of
interest.
Since the number of potential outcomes grows exponentially with the time period t, there is a
considerable number of possible causal estimands. To make progress, we restrict our attention to
8 The setup with N “ 1 was developed in Bojinov and Shephard (2019), but this connection to adaptive experiments
97
Definition 3.2.4. For 0 ď p ă t and w, w̃ P W p`1 , the i, t-th lag-p dynamic causal effect is
$
&τi,t ptwobs obs
’
’
i,1:t´p´1 , wu, twi,1:t´p´1 , w̃uq if p ă t ´ 1
τi,t pw, w̃; pq :“
’
%τi,t pw, w̃q
’ otherwise.
The i, t-th lag-p dynamic causal effect measures the difference between the outcomes from
following assignment path w from period t ´ p to t compared to the alternative path w̃, fixing the
assignments for unit i to follow the observed path up to time t ´ p ´ 1. Generally, when N ąą T
By further restricting the paths w and w̃ to share common features, we obtain the weighted
:
ÿ
τi,t pw, w̃; p, qq :“ av τi,t ppw, vq, pw̃, vq; pq,
vPW p´q`1
where w, w̃ P W q and tav u are non-stochastic weights chosen by the researcher that satisfy
ř
vPW p´q`1 av “
The weighted average i, t-th lag-p, q dynamic causal effect summarizes the ceteris paribus, average
causal effect of switching the assignment path between period t ´ p and period t ´ p ` q from
w to w̃ on outcomes at time t.10 In this sense, the weighted average lag-p, q causal effect is a
estimand of interest in existing econometric research.11 Whenever q “ 1, we drop the q from the
: :
notation, simply writing τi,t pw, w̃; pq :“ τi,t pw, w̃; p, 1q.
The main estimands of interest in this paper are averages of the dynamic causal effects that
9 In a time series experiment with N “ 1, Bojinov and Shephard (2019) introduced defining causal effects that
depend on the observed assignment path because most potential outcomes are unobserved since there is only one
experimental unit in their setting. In our more general panel experiments setting, an analogous problem arises when T
is of a similar order as N.
10 For a binary assignment, setting N “ q “ 1 gives us a special case that was studied in Bojinov and Shephard
(2019).
11 Fortime series experiments, Rambachan and Shephard (2020) show that a particular version of the weighted
average lag-p, 1 causal effect is equivalent to the generalized impulse response function (Koop et al., 1996).
98
summarize how different assignments impact the experimental units.
1 řN
1. the time-t lag-p average dynamic causal effect is τ̄¨t pw, w̃; pq :“ N i“1 τi,t pw, w̃; pq.
1 řT
2. the unit-i lag-p average dynamic causal effect is τ̄i¨ pw, w̃; pq :“ T´p t“p`1 τi,t pw, w̃; pq.
1 řT řN
3. the total lag-p average dynamic causal effect is τ̄pw, w̃; pq :“ NpT´pq t“p`1 i“1 τi,t pw, w̃; pq.
These estimands extend to the weighted average i, t-th lag-p dynamic causal effect by analogously defining
τ̄¨t: pw, w̃; p, qq, τ̄i¨: pw, w̃; p, qq, and τ̄ : pw, w̃; p, qq.
We can augment any of the above averages to incorporate non-stochastic weights. For example,
N the weights and consider the weighted time-t lag-p average dynamic
we could define tci,t ui“1
řN
causal effect N1 i“1 ci,t τi,t pw, w̃; pq. These weights, for instance, could be used to adjust for
different assignment path probabilities up to time t ´ p ´ 1, which are non-stochastic since the
In this section, we develop a nonparametric Horvitz and Thompson (1952) type estimator of
the i, t-th lag-p dynamic causal effects and derive its properties. If the assignment mechanism
is individualistic (Definition 3.2.3) and probabilistic (defined below), our proposed estimator is
unbiased for the i, t-th lag-p dynamic causal effects and its related averages over the assignment
mechanism. An appropriately scaled and centered version of our estimator for the average lag-p
dynamic causal effects becomes approximately normally distributed as either the number of units
or time periods grows large. These limiting results are finite population central limit theorems in
For each i, t, and any w “ pw1 , . . . , w p`1 q P W pp`1q , the adapted propensity score summarizes the
conditional probability of a given assignment path and is given by pi,t´p pwq :“ PrpWi,t´p:t “
w|Wi,1:t´p´1 , Yi,1:t pWi,1:t´p´1 , wqq. Even though the assignment mechanism is known, we only
99
obs q, and so it is not possible to
observe the outcomes along the realized assignment path Yi,1:t pwi,1:t
compute pi,t´p pwq for all assignment paths. However, we can compute the adapted propensity
obs
score along the observed assignment path, pi,t´p pwi,t´p:t q (see Appendix C.2 for further discussion).
Assumption 3.3.1 (Probabilistic Assignment). Consider a potential outcome panel. There exists
C L , CU P p0, 1q such that C L ă pi,t´p pwq ă CU for all i P rNs, t P rTs and w P W pp`1q .
All expectations, denoted by Er¨s, are computed with respect to the probabilistic assignment
mechanism. We write Fi,t´p´1 as the filtration generated by Wi,1:t´p´1 and F1:N,t´p´1 as the filtra-
tion generated by W1:N,1:t´p´1 . Since we condition on all of the potential outcomes, conditioning
, wq1pwi,t´p:t , w̃q1pwi,t´p:t
# obs obs obs obs
+
Yi,t pwi,1:t´p´1 “ wq Yi,t pwi,1:t´p´1 “ w̃q
τ̂i,t pw, w̃; pq :“ ´ ,
pi,t´p pwq pi,t´p pw̃q
where 1tAu is an indicator function for an event A. Under individualistic assignments (Definition
obs t1pwobs
i,t´p:t “wq´1pwi,t´p:t “w̃qu
yi,t obs
3.2.3), the estimator simplifies to τ̂i,t pw, w̃; pq “ obs
pi,t´p pwi,t´p:t q
.
Theorem 3.3.1. Consider a potential outcome panel with an assignment mechanism that is individualistic
2
Varpτ̂i,t pw, w̃; pq|Fi,t´p´1 q “ γi,t pw, w̃q ´ τi,t pw, w̃; pq2 :“ σi,t
2
, (3.1)
where
obs
Yi,t pwi,1:t´p´1 , wq2 obs
Yi,t pwi,1:t´p´1 , w̃q2
2
γi,t pw, w̃; pq “ ` .
pi,t´p pwq pi,t´p pw̃q
Further, for distinct w, w̃, w̄, ŵ P W pp`1q
Covpτ̂i,t pw, w̃; pq, τ̂i,t pw̄, ŵ; pq|Fi,t´p´1 q “ ´τi,t pw, w̃; pqτi,t pw̄, ŵ; pq.
100
Finally, τ̂i,t pw, w̃q and τ̂j,t pw, w̃q are independent for i ‰ j conditional on F1:N,t´p´1 .
Theorem 3.3.1 states that for every i, t, the error in estimating τi,t pw, w̃; pq is a martingale
difference sequence through time and conditionally independent across units. The variance of
τ̂i,t pw, w̃; pq depends upon the potential outcomes under both the treatment and counterfactual
2 pw, w̃; pq,
and is generally not estimable. However, its variance is bounded from above by γi,t
obs q2 t1pwobs
i,t´p:t “wq`1pwi,t´p:t “w̃qu
pyi,t obs
2 pw, w̃; pq “
which we can estimate by γ̂i,t . The following proposition
obs
pi,t´p pwi,t´p:t q2
2 pw, w̃; pq is an unbiased estimator of γ2 pw, w̃; pq and its error in estimating
establishes that γ̂i,t i,t
2 pw, w̃; pq is also a martingale difference sequence through time and conditionally independent
γi,t
across units.
2 pw, w̃; pq and γ̂2 pw, w̃; pq are independent for i ‰ j conditional on F
tionally, γ̂i,t j,t 1:N,t´p´1 .
2 pw, w̃; pq is different from the typical Neyman variance bound, derived
The variance bound γi,t
under the assumption of a completely randomized experiment (Imbens and Rubin, 2015, Chap-
ter 5). In a completely randomized experiment, there is a negative correlation between any two
units’ assignments since the total number of units assigned to each treatment is fixed. In our
setting, all units’ assignments are conditionally independent under individualistic assignments,
Remark 3.3.1. Since the weighted average i, t-th lag-p, q dynamic causal effects (Definition 3.2.5) are
linear combinations of the i, t-th lag-p dynamic causal effects, we can directly apply Theorem 3.3.1 and
This estimator is unbiased over the randomization distribution, and its variance can be bounded from above.
For uniform weights, the rest of the generalizations follow immediately by noticing that we can replace all
101
3.3.3 Estimation of lag-p average causal effects
The martingale difference properties of the nonparametric estimator means that the averaged
plug-in estimators
N
1 ÿ
τ̄ˆ¨t pw, w̃; pq :“ τ̂i,t pw, w̃; pq
N
i“1
T
1 ÿ
τ̄ˆi¨ pw, w̃; pq :“ τ̂i,t pw, w̃; pq
pT ´ pq t“p`1
N ÿ T
1 ÿ
τ̄ˆ pw, w̃; pq :“ τ̂i,t pw, w̃; pq
NpT ´ pq t“p`1
i“1
are also unbiased for the average causal estimands τ̄¨t pw, w̃; pq, τ̄i¨ pw, w̃; pq, and τ̄pw, w̃; pq, respec-
tively. We next derive the limiting distribution of appropriately scaled and centered versions of
Theorem 3.3.2. Consider a potential outcome panel with an individualistic (Definition 3.2.3) and probabilis-
tic assignment mechanism (Assumption 3.3.1). Further assume that the potential outcomes are bounded.12
Likewise, for bounded potential outcomes with an individualistic and probabilistic assignment
12 Assuming the potential outcomes are bounded is a common simplifying assumption made in deriving finite
population central limit theorems. As discussed in Li and Ding (2017), this assumption can often be replaced by a
finite-population analogue of the Lindeberg condition in analyses of cross-sectional, randomized experiments.
102
Following the same logic as earlier, we can establish unbiased and consistent estimators of the
Proposition 3.3.2. Under the setup of Theorem 3.3.2, for any w, w̃ P W pp`1q ,
«˜ ¸ ff
N N
1 ÿ 2 1 ÿ 2
E γ̂i,t pw, w̃; pq | F1:N,t´p´1 “ γi,t pw, w̃; pq,
N N
i“1 i“1
»¨ ˛ fi
T T
1 ÿ 1 ÿ
E –˝ γ̂2 pw, w̃; pq ´ γ2 pw, w̃; pq‚ | Fi,0 fl “ 0,
pT ´ pq t“p`1 i,t pT ´ pq t“p`1 i,t
»¨ ˛ fi
N ÿ T N ÿ T
1 ÿ 1 ÿ
E –˝ 2
γ̂i,t pw, w̃; pq ´ 2
γi,t pw, w̃; pq‚ | F1:N,0 fl “ 0.
NpT ´ pq t“p`1
NpT ´ pq t“p`1
i“1 i“1
Moreover,
N N
1 ÿ 2 1 ÿ 2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as N Ñ 8,
N N
i“1 i“1
T T
1 ÿ
2 1 ÿ
2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as T Ñ 8,
pT ´ pq t“p`1 pT ´ pq t“p`1
N ÿ T N ÿ T
1 ÿ
2 1 ÿ
2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as NT Ñ 8.
NpT ´ pq t“p`1
NpT ´ pq t“p`1
i“1 i“1
Proposition 3.3.2 shows that increasing the lag p increases our estimator’s variance, highlighting
an important trade-off: increasing the lag p reduces the dependence on the observed treatment
path at the cost of increased variance. Striking the correct balance depends on the context and the
Theorem 3.3.2 and Proposition 3.3.2 naturally extend to the weighted average i, t-th lag-p, q
dynamic causal effect from Definition 3.2.5 by using the estimator developed in Remark 3.3.1.
3.3.4 Confidence intervals and testing for lag-p average causal effects
Combining the variance bound estimators in Proposition 3.3.2 with the central limit theorems in
Theorem 3.3.2, we can carry out conservative inference for τ̄¨t pw, w̃; pq, τ̄i¨ pw, w̃; pq and τ̄pw, w̃; pq.
Such techniques can be used to construct conservative confidence intervals or tests of weak
null hypotheses that the average dynamic causal effects are zero. For example, these may be
103
Alternatively, we may construct exact tests for sharp null hypotheses. An example of such a
sharp null hypothesis is H0 : τ̄i,t pw, w̃; pq “ 0, for all, w, w̃, i P rNs and specific t “ 4. Since all
potential outcomes are known under such sharp null hypotheses, we can simulate the assignment
obs
path Wi,t´p:t |Wi,1:t´p´1 obs
, yi,1:t´p´1 for each unit i and compute τ̂i,t pw, w̃; pq at each draw. Therefore,
we may simulate the exact distribution of any test statistics under the sharp null hypothesis and
compute an exact p-value for the observed test statistic. These randomization tests only require us
to be able to simulate from the randomization distribution of the assignments paths. Therefore,
such randomization tests may also be conducted if the treatment assignment mechanism is
This section explore the properties of commonly used linear estimators, such as the canonical
unit fixed-effects estimator and two-way fixed effects estimator, under the potential outcomes
panel model. We establish that if there are dynamic causal effects and serial correlation in the
treatment assignment mechanism, both the unit fixed-effects estimator and the two-way fixed
effects estimator are asymptotically biased for a weighted average of contemporaneous causal
effects. In Appendix C.2, we consider analyzing the panel experiment as a repeated cross-section,
Throughout this section, we further assume that the potential outcomes themselves are a linear
Definition 3.4.1. A linear potential outcome panel is a potential outcome panel where
Yi,t pwi,1:t q “ β i,t,0 wi,t ` . . . ` β i,t,t´1 wi,1 ` ϵi,t @t P rTs and i P rNs,
and the non-stochastic coefficients β i,t,0:t´1 and non-stochastic error ϵi,t do not depend upon treatments.
We adapt notation used in Wooldridge (2005) for analyzing panel fixed effects models. For
a generic random variable Ai,t , we compactly write the within-period transformed variable as
A9 i,t “ Ai,t ´ t and the within-unit transformed variable as A qi,t “ Ai,t ´ Āi¨ . The within-unit and
9
within-period transformed variable is | Ai,t “ pAi,t ´ Āq ´ p t ´ Āq ´ p Āi¨ ´ Āq.
104
3.4.1 Interpreting the unit fixed effects estimator
Our next result characterizes the finite population probability limit of the unit fixed effects
ř N řT q q ř N řT q 2
estimator, β̂UFE “ i“1 t“1 Yi,t Wi,t { i“1 t“1 Wi,t , under the linear potential outcome panel
” ı
model. Define CovpW q i,s |F1:N,0,T q :“ q
q i,t , W qi,t :“ E W
σW,i,t,s and µ q i,t |F1:N,0,T .
Proposition 3.4.1. Assume a linear potential outcome panel and that the assignment mechanism is
q i,t |F1:N,0,T q :“ q
individualistic (Definition 3.2.3) with VarpW 2
σW,i,t ă 8 for each i P rNs, t P rTs. Further
Then, as N Ñ 8,
řT řT řt´1 řT
p t“1 κ
qW,β,t,t t“1 s“1 κ
qW,β,t,s δqt
β̂UFE Ñ
Ý řT ` řT ` řT t“1 .
2
σW,t 2
σW,t 2
σW,t
t“1 q t“1 q t“1 q
Proposition 3.4.1 decomposes the finite population probability limit of the unit fixed effects
estimator into three terms. The first term is a weighted average of contemporaneous dynamic
causal coefficients, describing how the contemporaneous causal coefficients covary with the
within-unit transformed assignments over the assignment mechanism. The second term captures
how past causal coefficients covary with the within-unit transformed treatments and arises
due to the presence of dynamic causal effects. The last term is an additional error that arises
demeaned treatment assignment. A sufficient condition for the last term to be equal zero is for
i P rNs, t P rTs. Therefore, the last term is zero whenever unit fixed effects are correctly summarize
the variation in the “control-only” counterfactual outcomes across units and time.
Proposition 3.4.1 is related to yet crucially different from results in Imai and Kim (2019),
which show that the unit fixed effects estimator recover a weighted average of unit-specific
105
contemporaneous causal effects if there are no carryover effects. In contrast, we establish that the
unit fixed effects estimator does not recover a weighted average of unit-specific contemporaneous
causal effects in the presence of carryover effects and persistence in the treatment path assignment
mechanism.
Example 3.4.1. Consider a linear outcome panel model with, for all t ą 1, Yi,t pwi,1:t q “ β 0 wi,t `
q i,t |F1:N,0,T q “ q
β 1 wi,t´1 ` ϵi,t and Yi,1 pwi,1 q “ β 0 wi,1 ` ϵi,1 for t “ 1. Assume VarpW 2
σW,t for all t and
implies
řT řT
p σW,t,t´1
t“2 q δqt
β̂UFE Ñ
Ý β0 ` β1 řT ` řT t“1 .
2
σW,t 2
σW,t
t“1 q t“1 q
The unit fixed effects estimator converges in probability to the contemporaneous dynamic causal coefficient
β 0 plus a bias that depends on two terms. The first component of the bias depends on the lag-1 dynamic
Proposition 3.4.2. Assume a linear potential outcome panel and assume that the assignment mechanism
9
is individualistic and VarpW
| σ9 W,i,t
i,t |F1:N,0,T q :“ q
2 ă 8 for each i P rNs, t P rTs. Further assume that as
Then, as N Ñ 8,
řT 9 řT řt´1 9 řT q9
p t“1 κ
qW,β,t,t t“1 s“1 κ
qW,β,t,s δt
β̂ TWFE Ñ
Ý řT 9 2 ` řT 9 2 ` řT t“1
σW,t σW,t 9 2
σW,t
t“1 q t“1 q t“1 q
Similar to Proposition 3.4.1, the two-way fixed effects estimand can be decomposed into three
106
components under the linear potential outcome panel model, where the interpretation of each
component is similar to the unit fixed effects estimator. A simple sufficient condition for the last
term to equal zero is for counterfactual outcome to be additively separable into a time-specific
and unit-specific effect, Yi,t p0q “ αi ` λt for all i P rNs, t P rTs. Therefore, the last term is zero
whenever unit and time fixed effects are correctly summarize the variation in the “control-only”
An active literature in econometrics analyzes the two-way fixed effects estimator under various
identifying assumptions. For example, de Chaisemartin and D’Haultfoeuille (2020) rule out
carryover effects and decompose the two-way fixed effects estimand under a “common-trends”
assumption that restricts how the potential outcomes under control evolve over time across
groups. Sun and Abraham (2020) decompose the two-way fixed effects estimand in staggered
designs (meaning units receive the treatments at some period and forever after) under a common-
trends assumption. Boryusak and Jaravel (2017), Athey and Imbens (2018) and Goodman-Bacon
(2020) also provide a decomposition of the two-way fixed effects estimand in staggered designs.
carryover effects, whereas these existing decompositions are useful in observational settings where
We conduct a simulation study to investigate the finite sample properties of the asymptotic results
presented in Section 3.3. These simulations show that the finite population central limit theorems
(Theorem 3.3.2) hold for a moderate number of treatment periods and experimental units. The
proposed conservative tests for the weak null of no average dynamic causal effects have correct
We generate the potential outcomes for the panel experiment using an autoregressive model,
Yi,t “ ϕi,t,1 Yi,t´1 pwi,1:t´1 q ` . . . , ϕi,t,t´1 Yi,1 pwi,1 q ` β i,t,0 wi,t ` . . . ` β i,t,t´1 wi,1 ` ϵi,t @t ą 1, (3.2)
107
Yi,1 pwi,1 q “ β i,1,0 wi,1 ` ϵi,1 with ϕi,t,1 “ ϕ, ϕi,t,s “ 0 for s ą 1, β i,t,0 “ β and β i,t,s “ 0 for s ą 0. We
vary the choice ϕ, which governs the persistence of the process, and β, which governs the size of
the contemporaneous causal effects. We vary the probability of treatment pi,t´p pwq “ ppwq as well
as the distribution of the errors ϵi,t , which we either sample from a standard normal or Cauchy
distribution.
distribution, meaning that we first generate the potential outcomes Y1:N,1:T and simulate over
different assignment panels W1:N,1:T , holding the potential outcomes fixed. In the main text, we
focus on evaluating the properties of our estimator for the total average dynamic causal effect
τ̄ˆ p1, 0; 0q. Appendix C.3 explores the properties of our estimators for the time-t average τ̄ˆ¨t p1, 0; 0q
and the unit-i average τ̄ˆi¨ p1, 0; 0q, as well as our estimators of the lag-1 weighted average dynamic
causal effects τ̄ˆ¨t: p1, 0; 1q, τ̄ˆi¨: p1, 0; 1q and τ̄ˆ : p1, 0; 1q.
Figure 3.1: Simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure plots the simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the parameter
ϕ (defined in Equation (3.2)) and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u.
The columns index the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the simulated
randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 10. Results are computed
over 5,000 simulations.
108
3.5.2 Normal approximations and size control
Figure 3.1 plots the randomization distribution for the estimator of the total average dynamic
causal effect τ̄ˆ p1, 0; 0q. We present results for the case with N “ 100, T “ 10 and N “ 500, T “ 100
(the results are similar when the roles of N, T are reversed). When the errors ϵi,t are normally
distributed, the randomization distribution quickly converges to a normal distribution. When the
errors are Cauchy distributed, the total number of units and time periods must be quite large
for the randomization distribution to become approximately normal. There is little difference
in the results across the values of ϕ and ppwq. Appendix C.3 provides quantile-quantile plots
of the simulated randomization distributions to further illustrate the quality of the normal
approximations. Testing based on the normal asymptotic approximation controls size effectively,
Table 3.1: Null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.
ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.050 0.047 0.048 0.25 0.028 0.029 0.032
ϕ 0.5 0.052 0.052 0.050 ϕ 0.5 0.046 0.039 0.044
0.75 0.050 0.049 0.048 0.75 0.055 0.044 0.054
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This table reports the null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon the
normal asymptotic approximation to the randomization distribution of τ̄ˆ p1, 0; 0q. Panel (a) reports the null rejection
probabilities in simulations with ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) reports the null rejection probabilities in
simulations with ϵi,t „ Cauchy and N “ 500, T “ 100. Results are computed over 5,000 simulations.
Focusing on simulations with normally distributed errors, we next investigate the rejection
rate of statistical tests based on the normal asymptotic approximations. To do so, we generate
potential outcomes Y1:N,1:T under different values of β, which governs the magnitude of the
contemporaneous causal effect. As we vary β “ t´1, ´0.9, . . . , 0.9, 1u, we also vary the parameter
ϕ P t0.25, 0.5, 0.75u and probability of treatment ppwq P t0.25, 0.5, 0.75u to investigate how rejection
varies across a range of parameter values. We report the fraction of tests that reject the null
109
Figure 3.2 plots rejection rate curves against the weak null hypotheses H0 : τ̄p1, 0; 0q “ 0 and
H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies for different choices of the parameter ϕ and treatment
probability ppwq. The rejection rate against H0 : τ̄p1, 0; 0q “ 0 quickly converges to one as β moves
away from zero across a range of simulations, indicating that the conservative variance bound
still leads to informative tests. When ϕ “ 0.25, the rejection rate against H0 : τ̄ : p1, 0; 1q “ 0 is
relatively low – lower values of ϕ imply less persistence in the causal effects across periods. When
ϕ “ 0.75, there is substantial persistence in the causal effects across periods and we observe that
Figure 3.2: Rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and
H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.
Notes: This figure plots rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and H0 : τ̄ : p1, 0; 1q “ 0
as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The rejection
rate curve against H0 : τ̄p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄ : p1, 0; 1q “ 0 is
plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u. The columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 100,
T “ 10. Results are averaged over 5000 simulations.
Appendix C.3 analyzes the rejection rate curves against the weak null hypothesis on the time-t
average dynamic causal effects with N “ 100 units and the unit-i average dynamic causal effect
with T “ 100 time periods. The conservative tests can have low power against these unit-specific
or time period-specific weak null hypotheses in small experiments with few units or few time
periods. Unless researchers are analyzing a panel experiment with a large cross-sectional or time
dimension, we recommend that researchers focus on analyzing total lag-p dynamic causal effects,
110
which enables them to improve power by pooling information across both units and time periods.
We apply our methods to reanalyze a panel experiment from Andreoni and Samuelson (2006) that
tests a game-theoretic model of “rational cooperation” and studied how variation in the payoff
The payoffs of the game were determined by two parameters x1 , x2 ě 0 such that x1 ` x2 “
10. In each period, both players simultaneously select either C (cooperate) or D (defect) and
subsequently received the payoffs associated with these choices. Table 3.2 summarizes the payoff
x2
structure. Let λ “ x1 `x2 P r0, 1s govern the relative payoffs between the two periods of the
prisoners’ dilemma; when λ “ 0, all payoffs occurred in period one and when λ “ 1, all payoffs
occurred in period two. The authors develop a model of rational cooperation that predicts when
λ is large, players will cooperate more often in period one compared to when λ is small.
Table 3.2: Stage games from twice-played prisoners’ dilemma in the experiment conducted by
Andreoni and Samuelson (2006).
C D C D
C p3x1 , 3x1 q p0, 4x1 q C p3x2 , 3x2 q p0, 4x2 q
D p4x1 , 0q px1 , x1 q D p4x2 , 0q px2 , x2 q
Notes: This table summarizes the payoffs in the stage games from twice-played prisoners’ dilemma in the experiment
x1
conducted by Andreoni and Samuelson (2006). The parameters satisfy x1 , x2 ě 0, x1 ` x2 “ 10 and λ “ x1 `x2 . The
choice C denotes “cooperate” and the choice D “defect.”
experiment. In each session of the experiment, 22 subjects were recruited to play 20 rounds of the
twice-played prisoners’ dilemma in Table 3.2. In each round, participants were randomly matched
into pairs, and each pair was then randomly assigned λ P t0, 0.1, . . . , 0.9, 1u with equal probability.
The authors conducted the experiment over five sessions for a total sample of 110 participants and
This panel experiment is a natural application of our methods. The sequential nature of the
experiment raises the possibility that past assignments may impact future actions as participants
111
Table 3.3: Summary statistics for the experiment in Andreoni and Samuelson (2006).
Counts
0 1 Mean
Observed treatment, Wi,t 1136 1064 0.484
Observed outcome, Yi,t 521 1679 0.763
Notes: This table reports summary statistics for the experiment in Andreoni and Samuelson (2006). The treatment
Wi,t equals one when the assigned value of λ is larger than 0.6. The outcome Yi,t equals one whenever the participant
cooperates in period one of the twice-repeated prisoners’ dilemma. There are 110 participants and we observe 2220
choices total.
learn about the structure of the game over time. For example, random variation in the payoff
structure may induce players to explore the strategy space. Additionally, the authors originally
analyzed the experiment using regression models with unit-level fixed effects, which may be
biased in the presence of dynamic causal effects even if the potential outcomes are linear as
In our analysis, the outcome of interest Y is an indicator that equals one whenever the
participant cooperated in period one of the stage game, N “ 110, and T “ 20. The assignment
W P W “ t0, 1u is binary and equals one whenever the assigned value λ is greater than 0.6,
meaning that the payoffs are more concentrated in period two than period one of the stage game.
We binarize the assignment in this manner to keep its cardinality (and therefore the number of
possible assignment paths) manageable, while continuing to test the authors’ core prediction on
cooperative behavior. For a given pair of subjects, the assignment mechanism is Bernoulli with
probability p “ 5{11 for treatment and p “ 6{11 for control.13 Table 3.3 summarizes the observed
We analyze the total lag-p weighted average causal effect τ̄ : p1, 0; pq for p “ 0, 1, 2, 3, which pools
information across all units and time periods to investigate dynamic causal effects.14 Based on the
13 One potential complication that may arise from the subjects playing against each other in the stage game is
possible spillovers or interference across units. The impact of such spillovers is, however, unlikely to be substantial as
the matches are anonymous, and no players play each other more than once. We ignore this concern in our analysis.
14 Appendix C.4 investigates unit-specific and period-specific weighted average lag-p dynamic causal effects. Since
there are only N “ 110 units and T “ 20 periods in the experiment, these estimates are noisier than our estimates of
the total lag-p weighted average dynamic causal effects.
112
conservative test in Section 3.3.4, the weak null hypothesis τ̄ : p1, 0; 0q “ 0 can be soundly rejected,
indicating that the treatment has a positive contemporaneous effect on cooperation in period
one of the stage game and confirming the hypothesis of Andreoni and Samuelson (2006). Table
3.4 summarizes these estimates of the total lag-p weighted average causal effects. Interestingly,
the point estimates are positive at p “ 1, 2, 3, suggesting there may be dynamic causal effects on
cooperative behavior across rounds of the twice-repeated prisoners’ dilemma. For example, the
treatment may induce participants to learn about the value of cooperation, thereby producing
persistent effects.
Table 3.4: Estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3.
lag-p
0 1 2 3
Point estimate, τ̄ˆ : p1, 0;
pq 0.285 0.058 0.134 0.089
Conservative p-value 0.000 0.226 0.013 0.126
Randomization p-value 0.000 0.263 0.012 0.114
Notes: This table reports estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3. The
conservative p-value reports the p-value associated with testing the weak null hypothesis of no average dynamic causal
effects, H0 : τ̄ : p1, 0; pq “ 0, using the conservative estimator of the asymptotic variance of the nonparametric estimator
(Theorem 3.3.2). The randomization p-value reports the p-value associated with randomization test of the sharp null of
dynamic causal effects, H0 : τi,t pw, w̃; pq “ 0 for all i P rNs, t P rTs. The randomization p-values are constructed based
on 10,000 draws.
We further investigate these results using randomization tests based on the sharp null of
no dynamic causal effects. We construct the randomization distribution for the nonparametric
estimator of the total lag-p weighted average dynamic causal effect τ̄ˆ : p1, 0; pq for p “ 0, 1, 2, 3
under the sharp null hypothesis of no lag-p dynamic dynamical causal effects for all units and
time periods; H0 : τi,t pw, w̃; pq “ 0 for all i P rNs, t P rTs.15 Table 3.4 summarizes randomization
p-values for the total lag-p weighted average causal effects. The p-value for the randomization test
causal effects for all units and again confirming the hypothesis of Andreoni and Samuelson (2006).
15 When simulating the randomization distribution, we redraw assignment paths in a manner that respects the
realized pairs of subjects in the experiment, meaning that subjects that are paired in the same round receive the same
assignment.
113
3.7 Conclusion
This paper developed a potential outcome model for studying dynamic causal effects in a panel
experiment. We defined new panel-based dynamic causal estimands such as the lag-p dynamic
causal effect and introduced an associated nonparametric estimator. Our proposed estimator is
unbiased for lag-p dynamic causal effects over the randomization distribution, and we derived
its finite population asymptotic distribution. We developed tools for inference on these dynamic
causal effects – a conservative test for weak nulls and an exact randomization test for sharp nulls.
We showed that the linear unit fixed effects estimator and two-way fixed effects estimator are
asymptotically biased for the contemporaneous causal effects in the presence of dynamic causal
effects and persistence in the assignment mechanism. Finally, we illustrated our results through a
114
References
Abadie, A., Athey, S. C., Imbens, G. W. and Wooldridge, J. (2017). When Should You Adjust
Standard Errors for Clustering? Tech. rep., NBER Working Paper No. 24003.
Abaluck, J., Agha, L., Chan, D. C., Singer, D. and Zhu, D. (2020). Fixing Misallocation with
Guidelines: Awareness vs. Adherence. Tech. rep., NBER Working Paper No. 27467.
Abbring, J. H. and Heckman, J. J. (2007). Econometric evaluation of social programs, part III:
Using the marginal treatment effect to organize alternative econometric estimators to evaluate
social programs, and to forecast their effects in new environments. In J. J. Heckman and E. E.
Leamer (eds.), Handbook of Econometrics, vol. 6, Amsterdam, The Netherlands: North Holland,
pp. 5145–5303.
Albright, A. (2019). If You Give a Judge a Risk Score: Evidence from Kentucky Bail Decisions. Tech. rep.
Allen, R. and Rehbeck, J. (2020). Satisficing, Aggregation, and Quasilinear Utility. Tech. rep.
Andreoni, J. and Samuelson, L. (2006). Building rational cooperation. Journal of Economic Theory,
127, 117–154.
Andrews, I., Roth, J. and Pakes, A. (2019). Inference for Linear Conditional Moment Inequalities. Tech.
rep., NBER Working Paper No. 26374.
Angrist, J., Graddy, K. and Imbens, G. (2000). The interpretation of instrumental variables
estimators in simultaneous equation models with an application to the demand for fish. The
Review of Economic Studies, 67 (3), 499–527.
Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of average causal
effects in models with variable treatment intensity. Journal of the American Statistical Association,
90 (430), 431–442.
—, — and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal
of the American Statistical Association, 91, 444–455.
115
— and Kuersteiner, G. M. (2011). Causal effects of monetary shocks: Semiparametric conditional
independence tests with a multinomial propensity score. Review of Economics and Statistics, 93,
725–747.
Arellano, M., Blundell, R. and Bonhomme, S. (2017). Earnings and consumption dynamics: A
nonlinear panel data framework. Econometrica, 85, 693–734.
— and Bonhomme, S. (2012). Nonlinear panel data analysis. Annual Review of Economics, 3, 395–424.
— and — (2016). Nonlinear panel data estimation via quantile regressions. The Econometrics Journal,
19 (3), C61–C94.
Arkhangelsky, D. and Imbens, G. (2019). Double-Robust Identification for Causal Panel Data Models.
Tech. rep., arXiv preprint arXiv:1909.09412.
Arnold, D., Dobbie, W. and Hull, P. (2020a). Measuring Racial Discrimination in Algorithms. Tech.
rep., NBER Working Paper No. 28222.
—, — and — (2020b). Measuring Racial Discrimination in Bail Decisions. Tech. rep., NBER Working
Paper No. 26999.
—, — and Yang, C. (2018). Racial bias in bail decisions. The Quarterly Journal of Economics, 133 (4),
1885–1932.
Aronow, P. M. and Samii, C. (2017). Estimating average causal effects under general interference,
with application to a social network experiment. The Annals of Applied Statistics, 11 (4), 1912–1947.
Aruoba, S. B., Mlikota, M., Schorfheide, F. and Villalvazo, S. (2021). SVARs with Occasionally-
Binding Constraints. Tech. rep.
Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355 (6324),
483–485.
—, Bayati, M., Doudchenko, N., Imbens, G. and Koshravi, K. (2018). Matrix Completion Methods
for Causal Panel Data Models. Tech. rep., arXiv preprint arXiv 1710.10251.
— and — (2012b). Measuring the output responses to fiscal policy. American Economic Journal:
Economic Policy, 4, 1–27.
Autor, D. H. and Scarborough, D. (2008). Does job testing harm minority workers? evidence
from retail establishments. The Quarterly Journal of Economics, 123 (1), 219–277.
116
Baek, C. and Lee, B. (2021). A Guide to Autoregressive Distributed Lag Models for Impulse Response
Estimations. Tech. rep.
Barocas, S., Hardt, M. and Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org,
http://www.fairmlbook.org.
— and Selbst, A. (2016). Big data’s disparate impact. California Law Review, 104, 671–732.
Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state
Markov chains. The Annals of Mathematical Statistics, 37, 1554–1563.
Beaulieu-Jones, B., Finlayson, S. G., Chivers, C., Chen, I., McDermott, M., Kandola, J., Dalca,
A. V., Beam, A., Fiterau, M. and Naumann, T. (2019). Trends and Focus of Machine Learning
Applications for Health Research. JAMA Network Open, 2 (10), e1914051–e1914051.
Bellemare, C., Bissonnette, L. and Kroger, S. (2014). Statistical Power of Within and Between-
Subjects Designs in Economic Experiments. Tech. rep., IZA Working Paper No. 8583.
—, — and — (2016). Simulating power of economic experiments: the powerbbk package. Journal
of the Economic Science Association, 2, 157—-168.
Belloni, A., Bugni, F. and Chernozhukov, V. (2018). Subvector Inference in Partially Identified
Models with Many Moment Inequalities. Tech. rep., arXiv preprint, arXiv:1806.11466.
Ben-Porath, Y. (1967). The Production of Human Capital and the Life Cycle of Earnings. Journal
of Political Economy, 75, 352–365.
Benitez-Silva, H., Buchinsky, M. and Rust, J. (2004). How Large are the Classification Errors in the
Social Security Disability Award Process? Tech. rep., NBER Working Paper Series No. 10219.
Bergemann, D., Brooks, B. and Morris, S. (2019). Counterfactuals with Latent Information. Tech.
rep.
— and Morris, S. (2013). Robust predictions in games with incomplete information. Econometrica,
81 (4), 1251–1308.
— and — (2016). Bayes correlated equilibrium and the comparison of information structures in
games. Theoretical Economics, 11, 487–522.
— and — (2019). Information design: A unified perspective. Journal of Economic Literature, 57 (1),
44–95.
Berk, R. A., Sorenson, S. B. and Barnes, G. (2016). Forecasting domestic violence: A machine
learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 13 (1),
94–115.
Blackwell, M. and Glynn, A. (2018). How to make causal inferences with time-series and
cross-sectional data. American Political Science Review, 112, 1067–1082.
Blattner, L. and Nelson, S. T. (2021). How Costly is Noise? Tech. rep., arXiv preprint,
arXiv:arXiv:2105.07554.
117
Bohren, J. A., Haggag, K., Imas, A. and Pope, D. G. (2020). Inaccurate Statistical Discrimination: An
Identification Problem. Tech. rep., NBER Working Paper Series No. 25935.
Bojinov, I., Rambachan, A. and Shephard, N. (2021). Panel experiments and dynamic causal
effects: A finite population perspective. Quantitative Economics, 12 (4), 1171–1196.
—, Sait-Jacques, G. and Tingley, M. (2020a). Avoid the pitfalls of a/b testing. Harvard Business
Review, 98 (2), 48–53.
— and Shephard, N. (2019). Time series experiments and causal estimands: exact randomization
tests and trading. Journal of the American Statistical Association, 114 (528), 1665–1682.
—, Simchi-Levi, D. and Zhao, J. (2020b). Design and Analysis of Switchback Experiments. Tech. rep.,
arXiv preprint arXiv:2009.00148.
Bordalo, P., Coffman, K., Gennaioli, N. and Shleifer, A. (2016). Stereotypes. The Quarterly
Journal of Economics, 131 (4), 1753–1794.
—, — and Shleifer, A. (2021). Salience. Tech. rep., NBER Working Paper Series No. 29274.
Boruvka, A., Almirall, D., Witkiwitz, K. and Murphy, S. A. (2018). Assessing time-varying
causal effect moderation in mobile health. Journal of the American Statistical Association, 113,
1112–1121.
Boryusak, K. and Jaravel, X. (2017). Revisiting Event Study Designs, with an Application to the
Estimation of the Marginal Propensity to Consume. Tech. rep.
Box, G. E. P. and Tiao, G. C. (1975). Intervention analysis with applications to economic and
environmental problems. Journal of the American Statistical Association, 70, 70–79.
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N. and Scott, S. L. (2015). Inferring causal
impact using bayesian structural time-series models. The Annals of Applied Statistics, 9, 247–274.
Browning, M., Ejrnaes, M. and Alvarez, J. (2010). Modelling income processes with lots of
heterogeneity. Review of Economic Studies, 77, 1353–1381.
Bugni, F. A., Canay, I. A. and Shi, X. (2015). Specification tests for partially identified models
defined by moment inequalities. Journal of Econometrics, 185, 259–282.
Campbell, J. Y. (2003). Chapter 13 consumption-based asset pricing. In Financial Markets and Asset
Pricing, Handbook of the Economics of Finance, vol. 1, Elsevier, pp. 803–887.
118
Canay, I., Mogstad, M. and Mountjoy, J. (2020). On the Use of Outcome Tests for Detecting Bias in
Decision Making. Tech. rep., NBER Working Paper No. 27802.
Canay, I. A. and Shaikh, A. M. (2017). Practical and Theoretical Advances in Inference for Partially
Identified Models, Cambridge University Press, vol. 2, p. 271–306.
Caplin, A. (2016). Measuring and modeling attention. Annual Review of Economics, 8, 379–403.
— (2021). Economic Data Engineering. Tech. rep., NBER Working Paper Series No. 29378.
—, Csaba, D., Leahy, J. and Nov, O. (2020). Rational inattention, competitive supply, and
psychometrics. The Quarterly Journal of Economics, 135 (3), 1681–1724.
— and Dean, M. (2015). Revealed preference, rational inattention, and costly information acquisi-
tion. American Economic Review, 105 (7), 2183–2203.
— and Martin, D. (2015). A testable theory of imperfect perception. Economic Journal, 125, 184–202.
Chahrour, R. and Jurado, K. (2021). Recoverability and Expectations-Driven Fluctuations. Tech. rep.
Chalfin, A., Danieli, O., Hillis, A., Jelveh, Z., Luca, M., Ludwig, J. and Mullainathan, S.
(2016). Productivity and Selection of Human Capital with Machine Learning. American Economic
Review, 106 (5), 124–127.
Chambers, C. P., Liu, C. and Martinez, S.-K. (2016). A test for risk-averse expected utility. Journal
of Economic Theory, 163, 775–785.
Chan, D. C., Gentzkow, M. and Yu, C. (2021). Selection with Variation in Diagnostic Skill: Evidence
from Radiologists. Tech. rep., NBER Working Paper No. 26467.
Chandra, A. and Staiger, D. O. (2007). Productivity spillovers in health care: Evidence from the
treatment of heart attacks. Journal of Political Economy, 115 (1), 103–140.
— and — (2020). Identifying sources of inefficiency in healthcare. The Quarterly Journal of Economics,
135 (2), 785—-843.
Charness, G., Gneezy, U. and Kuhn, M. A. (2012). Experimental methods: Between-subject and
within-subject design. Journal of Economic and Business Organization, 81 (1), 1–8.
Chen, I. Y., Joshi, S., Ghassemi, M. and Ranganath, R. (2020). Probabilistic Machine Learning for
Healthcare. Tech. rep., arXiv preprint, arXiv:2009.11087.
Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation and inference.
Econometrica, 81 (2), 667–737.
Chin, A. (2018). Central limit theorems via stein’s method for randomized experiments under
interference. arXiv preprint arXiv:1804.03105.
119
Chouldechova, A., Benavides-Prado, D., Fialko, O. and Vaithianathan, R. (2018). A case
study of algorithm-assisted decision making in child maltreatment hotline screening decisions.
Proceedings of the 1st Conference on Fairness, Accountability and Transparency, pp. 134–148.
— and Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications
of the ACM, 63 (5), 82–89.
Cloyne, J. S., Óscar Jordá and Taylor, A. M. (2020). Decomposing the Fiscal Multiplier. Tech. rep.
Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of Finance, 66 (4), 1047–
1108.
— and Piazessi, M. (2002). The Fed and interest rates - a high-frequency identification. American
Economic Review, 92, 90–95.
Coston, A., Mishler, A., Kennedy, E. H. and Chouldechova, A. (2020). Counterfactual risk
assessments, evaluation and fairness. FAT* ’20: Proceedings of the 2020 Conference on Fairness,
Accountability, and Transparency, pp. 582––593.
—, Rambachan, A. and Chouldechova, A. (2021). Characterizing fairness over the set of good
models under selective labels.
Cowgill, B. (2018). Bias and Productivity in Humans and Machines: Theory and Evidence. Tech. rep.
— (1958b). The regression analysis of binary sequences (with discussion). Journal of the Royal
Statistical Society: Series B, 20, 215–42.
Cox, G. and Shi, X. (2020). Simple Adaptive Size-Exact Testing for Full-Vector and Subvector Inference
in Moment Inequality Models. Tech. rep.
Cunha, F., Heckman, J. J., Lochner, L. and Masterov, D. V. (2006). Chapter 12 Interpreting the
Evidence on Life Cycle Skill Formation. In E. Hanushek and F. Welch (eds.), Handbook of the
Economics of Education, vol. 1, Elsevier, pp. 697–812.
Currie, J. and Macleod, W. B. (2017). Diagnosing expertise: Human capital, decision making,
and performance among physicians. Journal of Labor Economics, 35 (1), 1–43.
— and — (2020). Understanding doctor decision making: The case of depression treatment.
Econometrica, 88 (3), 847–878.
Czibor, E., Jimenez-Gomez, D. and List, J. A. (2019). The dozen things experimental economists
should do (more of). Southern Economic Journal, 86 (2), 371–432.
120
— (1979). The robust beauty of improper linear models in decision making. American Psychologist,
34 (7), 571–582.
—, Faust, D. and Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 249 (4899),
1668–1674.
de Chaisemartin, C. (2017). Tolerating defiance? local average treatment effects without mono-
tonicity. Quantitative Economics, 8 (2), 367–396.
— and D’Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment
effects. American Economic Review, 110 (9), 2964–96.
Dobbie, W., Goldin, J. and Yang, C. (2018). The effects of pretrial detention on conviction, future
crime, and employment: Evidence from randomly assigned judges. American Economic Review,
108 (2), 201–240.
—, Liberman, A., Paravisini, D. and Pathania, V. (2020). Measuring Bias in Consumer Lending.
Tech. rep.
— and Yang, C. (2019). Proposals for Improving the U.S. Pretrial System. Tech. rep., The Hamilton
Project.
Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods. Oxford: Oxford
University Press, 2nd edn.
Echenique, F. (2020). New developments in revealed preference theory: Decisions under risk,
uncertainty, and intertemporal choice. Annual Review of Economics, 12, 299–316.
—, — and Imai, T. (2021). Approximate Expected Utility Rationalization. Tech. rep., arXiv preprint,
arXiv:2102.06331.
Einav, L., Jenkins, M. and Levin, J. (2013). The impact of credit scoring on consumer lending.
Rand Journal of Economics, 44 (2), 249—-274.
Elliott, G., Komunjer, I. and Timmerman, A. (2008). Biases in macroeconomic forecasts: Irra-
tionality or asymmetric loss? Journal of the European Economic Association, 6 (1), 122–157.
—, Timmerman, A. and Komunjer, I. (2005). Estimation and testing of forecast rationality under
flexible loss. The Review of Economic Studies, 72 (4), 1107–1125.
Erel, I., Stern, L. H., Tan, C. and Weisbach, M. S. (2019). Selecting Directors Using Machine
Learning. Tech. rep., NBER Working Paper Series No. 24435.
121
Fang, Z., Santos, A., Shaikh, A. M. and Torgovitsky, A. (2020). Inference for Large-Scale Linear
Systems with Known Coefficients. Tech. rep.
Farmer, L., Nakamura, E. and Steinsson, J. (2021). Learning About the Long Run. Tech. rep.
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. and Venkatasubramanian, S. (2015).
Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 259–268.
Frandsen, B. R., Lefgren, L. J. and Leslie, E. C. (2019). Judging Judge Fixed Effects. Tech. rep.,
NBER Working Paper Series No. 25528.
Frisch, R. (1933). Propagation Problems and Impulse Problems in Dynamic Economics. London, United
Kingdom: Allen and Unwin.
Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T. and Walther, A. (2018). Predictably Unequal?
The Effects of Machine Learning on Credit Markets. Tech. rep.
Gabaix, X. (2014). A sparsity-based model of bounded rationality. The Quarterly Journal of Economics,
129 (4), 1661–1710.
Gafarov, B. (2019). Inference in high-dimensional set-identified affine models. Tech. rep., arXiv preprint,
arXiv:1904.00111.
Gallant, A. R., Rossi, P. E. and Tauchen, G. (1993). Nonlinear dynamic structures. Econometrica,
61, 871–907.
Gelbach, J. (2021). Testing Economic Models of Discrimination in Criminal Justice. Tech. rep.
Gennaioli, N., Ma, Y. and Shleifer, A. (2016). Expectations and investment. NBER Macroeconomics
Annual, 30, 379–431.
— and Shleifer, A. (2010). What comes to mind. The Quarterly Journal of Economics, 125 (4),
1399–1433.
Gertler, M. L. and Karadi, P. (2015). Monetary policy surprises, credit costs, and economic
activity. American Economic Journal: Macroeconomics, 7, 44–76.
Gillis, T. (2019). False Dreams of Algorithmic Fairness: The Case of Credit Pricing. Tech. rep.
Goncalves, S., Herreraz, A. M., Kilian, L. and Pesavento, E. (2021). Impulse response analysis for
structural dynamic models with nonlinear regressors. Tech. rep.
Gordon, N. J., Salmond, D. J. and Smith, A. F. M. (1993). A novel approach to nonlinear and
non-Gaussian Bayesian state estimation. IEE-Proceedings F, 140, 107–113.
122
Gourieroux, C. and Jasiak, J. (2005). Nonlinear innovations and impulse responses with applica-
tion to VaR sensitivity. Annales d’Economie et de Statistique, 78, 1–31.
— and — (2019b). The principles and limits of algorithm-in-the-loop decision making. Proceedings
of the ACM on Human-Computer Interaction, 3 (CSCW).
Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Economet-
rica, 45, 1–22.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E. and Nelson, C. (2000). Clinical versus
mechanical prediction: A meta-analysis. Psychological Assessment, 12 (1), 19–30.
Gualdani, C. and Sinha, S. (2020). Identification and Inference in Discrete Choice Models with Imperfect
Information. Tech. rep., arXiv preprint, arXiv:1911.04529.
Gul, F., Natenzon, P. and Pesendorfer, W. (2014). Random choice as behavioral optimization.
Econometrica, 82, 1873–1912.
Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S. and Athey, S. (2021). Confidence intervals
for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences,
118 (15).
Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Applications. San Diego, California,
USA: Academic Press.
Hamilton, J. (1989). A new approach to the economic analysis of nonstationary time series and
the business cycle. Econometrica, 57, 357–384.
Hamilton, J. D. (2011). Nonlinearities and the macroeconomic effects of oil prices. Macroeconomic
Dynamics, 15 (S3), 364–378.
Han, S. (2019). Identification in nonparametric models for dynamic treatment effects. Journal of
Econometrics, forthcoming.
Handel, B. and Schwartzstein, J. (2018). Frictions or mental gaps: What’s behind the information
we (don’t) use and when do we care? Journal of Economic Perspectives, 32 (1), 155–178.
Hardt, M., Price, E. and Srebro, N. (2016). Equality of opportunity in supervised learning.
NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems,
pp. 3323–3331.
Harvey, A. C. (1996). Intervention analysis with control groups. International Statistical Review, 64,
313–328.
123
— and Durbin, J. (1986). The effects of seat belt legislation on British road casualties: A case study
in structural time series modelling. Journal of the Royal Statistical Society: Series A, 149, 187–227.
Heckman, J. J. (1974). Shadow prices, market wages, and labor supply. Econometrica, 42 (4),
679–694.
— and Navarro, S. (2007). Dynamic discrete choice and dynamic treatment effects. Journal of
Econometrics, 136, 341–396.
— and Vytlacil, E. J. (2006). Econometric evaluation of social programs, part i: Causal models,
structural models and econometric policy evaluation. In Handbook of Econometrics, vol. 6, pp.
4779–4874.
Henry, M., Meango, R. and Mourifie, I. (2020). Revealing Gender-Specific Costs of STEM in an
Extended Roy Model of Major Choice. Tech. rep.
Herbst, E. and Schorfheide, F. (2015). Bayesian Estimation of DSGE Models. Princeton, New Jersey,
USA: Princeton University Press.
Hernan, M. A. and Robins, J. M. (2019). Causal Inference. Boca Raton, Florida, USA: Chapman &
Hall, forthcoming.
Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J. and Parkes, D. (2021). Learning representations
by humans, for humans. In M. Meila and T. Zhang (eds.), Proceedings of the 38th International
Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 4227–4238.
Ho, K. and Rosen, A. M. (2017). Partial Identification in Applied Research: Benefits and Challenges,
Cambridge University Press, vol. 2, p. 307–359.
Hoffman, M., Kahn, L. B. and Li, D. (2018). Discretion in hiring. The Quarterly Journal of Economics,
133 (2), 765—-800.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association,
81, 945–960.
Hull, P. (2018). Estimating Treatment Effects in Mover Designs. Tech. rep., arXiv preprint,
arXiv:1804.06721.
— (2021). What Marginal Outcome Tests Can Tell Us About Racially Biased Decision-Making. Tech. rep.,
NBER Working Paper Series No. 28503.
124
Imai, K. and Kim, I. (2019). When should we use unit fixed effects regression models for causal
inference with longitudinal data? American Journal of Political Science, 63, 467—-490.
— and — (2020). On the use of two-way fixed effects regression models for causal inference with
panel data. Political Analysis, forthcoming.
— and Angrist, J. D. (1994). Identification and estimation of local average treatment effects.
Econometrica, 62, 467–475.
— and Rubin, D. B. (2015). Causal Inference for statistics, social and biomedical sciences: an introduction.
Cambridge, United Kingdom: Cambridge University Press.
Jacob, B. A. and Lefgren, L. (2008). Can principals identify effective teachers? evidence on
subjective performance evaluation in education. Journal of Labor Economics, 26 (1), 101–136.
Jordá, O. (2005). Estimation and inference of impulse responses by local projections. American
Economic Review, 95, 161–182.
Jordá, Óscar., Schularick, M. and Taylor, A. M. (2015). Betting the house. Journal of International
Economics, 96, S2–S18.
Jung, J., Concannon, C., Shroff, R., Goel, S. and Goldstein, D. G. (2020a). Simple rules to
guide expert classifications. Journal of the Royal Statistical Society Series A, 183 (3), 771–800.
—, Shroff, R., Feller, A. and Goel, S. (2020b). Bayesian sensitivity analysis for offline policy
evaluation. AIES ’20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, p. 64–70.
Kallus, N., Mao, X. and Zhou, A. (2018). Interval Estimation of Individual-Level Causal Effects Under
Unobserved Confounding. Tech. rep., arXiv preprint arXiv:1810.02894.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic
Engineering, Transactions ASMA, Series D, 82, 35–45.
Kamenica, E. (2019). Bayesian persuasion and information design. Annual Review of Economics, 11,
249–272.
— and Gentzkow, M. (2011). Bayesian persuasion. American Economic Review, 101, 2590–2615.
Kastelman, D. and Ramesh, R. (2018). Switchback tests and randomized experimentation under
network effects at doordash. URL: https://medium.com/@DoorDash/switchback-tests-and-randomized-
experimentation-under-network-effects-at-doordash-f1d938ab7c2a.
125
Khandani, A. E., Kim, A. J. and Lo, A. W. (2010). Consumer credit-risk models via machine-
learning algorithms. Journal of Banking & Finance, 34 (11), 2767 – 2787.
Kilian, L. and Lutkepohl, H. (2017). Structural Vector Autoregressive Analysis. Cambridge, United
Kingdom: Cambridge University Press.
Killian, L. and Vigfusson, R. J. (2011a). Are the responses of the U.S. economy asymmetric in
energy price increases and decreases? Quantitative Economics, 2, 419–453.
Kitagawa, T. (2020). The Identification Region of the Potential Outcome Distributions under Instrument
Independence. Tech. rep., Cemmap Working Paper CWP23/20.
Kitamura, Y. and Stoye, J. (2018). Nonparametric analysis of random utility models. Econometrica,
86 (6), 1883–1909.
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. and Mullainathan, S. (2018a). Human
decisions and machine predictions. Quarterly Journal of Economics, 133 (1), 237–293.
—, Ludwig, J., Mullainathan, S. and Obermeyer, Z. (2015). Prediction policy problems. American
Economic Review: Papers and Proceedings, 105 (5), 491–495.
—, —, — and Rambachan, A. (2018b). Algorithmic fairness. AEA Papers and Proceedings, 108,
22–27.
Kling, J. R. (2006). Incarceration length, employment, and earnings. American Economic Review,
96 (3), 863–876.
Koop, G., Pesaran, M. H. and Potter, S. M. (1996). Impulse response analysis in nonlinear
multivariate models. Journal of Econometrics, 74, 119–147.
Kubler, F., Selden, L. and Wei, X. (2014). Asset demand based tests of expected utility maximiza-
tion. American Economic Review, 104 (11), 3459–3480.
Kuncel, N. R., Klieger, D. M., Connelly, B. S. and Ones, D. S. (2013). Mechanical versus clinical
data combination in selection and admissions decisions: A meta-analysis. Journal of Applied
Psychology, 98 (6), 1060—1072.
Kuttner, K. (2001). Monetary policy surprises and interest rates: evidence from the Fed funds
futures market. Journal of Monetary Economics, 47, 523–544.
Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in
Applied Mathematics, 6 (1), 4–22.
Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J. and Mullainathan, S. (2017). The selective
labels problem: Evaluating algorithmic predictions in the presence of unobservables. KDD ’17
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 275–284.
126
— and Rudin, C. (2017). Learning cost-effective and interpretable treatment regimes. Proceedings
of the 20th International Conference on Artificial Intelligence and Statistics, 54, 166–175.
Lechner, M. (2011). The relation of different concepts of causality used in time series and
microeconomics. Econometric Reviews, 30, 109–127.
Lee, S. and Salanie, B. (2020). Filtered and unfiltered treatment effects with targeting instruments,
unpublished paper: Department of Economics, Columbia University.
Leslie, E. and Pope, N. G. (2017). The unintended impact of pretrial detention on case outcomes:
Evidence from new york city arraignments. The Journal of Law and Economics, 60 (3), 529–557.
Li, D., Raymond, L. and Bergman, P. (2020). Hiring as Exploration. Tech. rep., NBER Working
Paper Series No. 27736.
Li, L., Chu, W., Langford, J. and Schapire, R. E. (2010). A contextual-bandit approach to
personalized news article recommendation. In Proceedings of the 19th international conference on
World wide web, pp. 661–670.
Li, S., Karatzoglou, A. and Gentile, C. (2016). Collaborative filtering bandits. In Proceedings of
the 39th International ACM SIGIR conference on Research and Development in Information Retrieval,
pp. 539–548.
Li, X. and Ding, P. (2017). General Forms of Finite Population Central Limit Theorems with
Applications to Causal Inference. Journal of the American Statistical Association, 112 (520).
Lillie, E. O., Patay, B., Diamant, J., Issell, B., Topol, E. J. and Schork, N. J. (2011). The n-of-1
clinical trial: the ultimate strategy for individualizing medicine? Personalized Medicine, 8 (2),
161–173.
Liu, L. T., Dean, S., Rolf, E., Simchowitz, M. and Hardt, M. (2018). Delayed impact of fair
machine learning. Proceedings of the 35th International Conference on Machine Learning.
Low, H. and Pistaferri, L. (2015). Disability insurance and the dynamics of the incentive insurance
trade-off. American Economic Review, 105 (10), 2986–3029.
— and — (2019). Disability Insurance: Error Rates and Gender Differences. Tech. rep., NBER Working
Paper No. 26513.
Lu, J. (2016). Random choice and private information. Econometrica, 84 (6), 1983–2027.
— (2019). Bayesian identification: A theory for state-dependent utilities. American Economic Review,
109 (9), 3192–3228.
Lu, X., Su, L. and White, H. (2017). Granger causality and structural causality in cross-section and
panel data. Econometric Theory, 33, 263–291.
Lucas, R. E. (1972). Expectations and the neutrality of money. Journal of Economic Theory, 4,
103–124.
Madras, D., Pitassi, T. and Zemel, R. (2018). Predict Responsibly: Improving Fairness and Accuracy
by Learning to Defer. Tech. rep., arXiv preprint, arXiv:1711.06664.
Manski, C. F. (1989). Anatomy of the selection problem. Journal of Human Resources, 24 (3), 343–360.
127
— (1994). The selection problem. In C. Sims (ed.), Advances in Econometrics: Sixth World Congress,
vol. 1, Cambridge University Press, pp. 143–170.
— (2017). Improving Clinical Guidelines and Decisions Under Uncertainty. Tech. rep., NBER Working
Paper No. 23915.
Marquardt, K. (2021). Mis(sed) Diagnosis: Physician Decision Making and ADHD. Tech. rep.
Martin, D. and Marx, P. (2021). A Robust Test of Prejudice for Discrimination Experiments. Tech. rep.
Mastakouri, A., Scholkopf, B. and Janzing, D. (2021). Necessary and sufficient conditions for
causal feature selection in time series with latent common causes. Proceedings of Machine Learning
Research, 139, 7502–7511.
Meehl, P. E. (1954). Clinical versus Statistical Prediction: A Theoretical Analysis and a Review of the
Evidence. University of Minnesota Press.
Menchetti, F. and Bojinov, I. (2021). Estimating the Effectiveness of Permanent Price Reductions for
Competing Products Using Multivariate Bayesian Structural Time Series Models. Tech. rep.
Mitchell, S., Potash, E., Barocas, S., D’Amour, A. and Lum, K. (2019). Prediction-Based Decisions
and Fairness: A Catalogue of Choices, Assumptions, and Definitions. Tech. rep., arXiv Working Paper,
arXiv:1811.07867.
Mourifie, I., Henry, M. and Meango, R. (2019). Sharp bounds and testability of a Roy model of STEM
major choices. Tech. rep., arXiv preprint arXiv:1709.09284.
— and Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic
Perspectives, 31 (2), 87–106.
Munoz, I. D. and van der Laan, M. (2012). Population intervention causal effects based on
stochastic interventions. Biometrics, 68, 541–549.
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statisticsl Society B,
65, 331–366.
—, van der Laan, M. J., Robins, J. M. and Group, C. P. P. R. (2001). Marginal mean models for
dynamic regimes. Journal of the American Statistical Association, 96, 1410–1423.
128
Natenzon, P. (2019). Random choice and learning. Journal of Political Economy, 127 (1), 419–457.
Obermeyer, Z. and Emanuel, E. J. (2016). Predicting the future - big data, machine learning, and
clinical medicine. The New England Journal of Medicine, 375 (13), 1216–9.
Papadogeorgou, G., Imai, K., Lyall, J. and Li, F. (2021). Causal Inference with Spatio-temporal
Data: Estimating the Effects of Airstrikes on Insurgent Violence in Iraq. Tech. rep., arXiv preprint,
arXiv:2003.13555.
—, Mealli, F. and Zigler, C. M. (2019). Causal inference with interfering units for cluster and
population level treatment allocation programs. Biometrics, 75, 778–787.
Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: auxiliary particle filter. Journal of
the American Statistical Association, 94, 590–599.
Polisson, M., Quah, J. K. H. and Renou, L. (2020). Revealed preferences over risk and uncertainty.
American Economic Review, 110 (6), 1782–1820.
Raghavan, M., Barocas, S., Kleinberg, J. and Levy, K. (2020). Mitigating bias in algorithmic hiring:
Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability,
and Transparency, p. 469–481.
Raghu, M., Blumer, K., Corrado, G., Kleinberg, J., Obermeyer, Z. and Mullainathan, S.
(2019). The Algorithmic Automation Problem: Prediction, Triage, and Human Effort. Tech. rep., arXiv
preprint, arXiv:1903.12220.
Rambachan, A., Kleinberg, J., Ludwig, J. and Mullainathan, S. (2021). An Economic Approach to
Regulating Algorithms. Tech. rep., NBER Working Paper Series No. 27111.
— and Ludwig, J. (2021). Empirical Analysis of Prediction Mistakes in New York City Pretrial Data.
Tech. rep., University of Chicago Crime Lab Technical Report.
— and Shephard, N. (2020). Econometric analysis of potential outcomes time series: instruments, shocks,
linearity and the causal response function. Tech. rep., arXiv preprint arXiv:1903.01637.
Ramey, V. A. (2016). Macroeconomics shocks and their propagation. In J. B. Taylor and H. Uhlig
(eds.), Handbook of Macroeconomics, vol. 2A, Amsterdam, The Netherlands: North Holland, pp.
71–162.
— and Zubairy, S. (2018). Government spending multipliers in good times and in bad: Evidence
from US historical data. Journal of Political Economy, 126, 850–901.
Rehbeck, J. (2020). Revealed Bayesian Expected Utility with Limited Data. Tech. rep.
129
Ribers, M. A. and Ullrich, H. (2019). Battling Antibiotic Resistance: Can Machine Learning Improve
Prescribing? Tech. rep., arXiv preprint arXiv:1906.03044.
— and — (2020). Machine Predictions and Human Decisions with Variation in Payoffs and Skills. Tech.
rep.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American
Mathematical Society, 58 (5), 527–535.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained
exposure periods: application to control of the healthy worker survivor effect. Mathematical
Modelling, 7, 1393–1512.
— (1994). Correcting for non-compliance in randomization trials using structural nested mean
models. Communications in Statistics — Theory and Methods, 23, 2379–2412.
—, Greenland, S. and Hu, F.-C. (1999). Estimation of the causal effect of a time-varying exposure
on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association,
94, 687–700.
Rockoff, J. E., Jacob, B. A., Kane, T. J. and Staiger, D. O. (2011). Can you recognize an effective
teacher when you recruit one? Education Finance and Policy, 6 (1), 43–74.
Romer, C. D. and Romer, D. H. (2004). A new measure of monetary shocks: derivation and
implications. American Economic Review, 94, 1055–1084.
— (1980). Randomization analysis of experimental data: The Fisher randomization test comment.
Journal of the American Statistical Association, 75, 591–593.
Russell, T. M. (2019). Sharp bounds on functionals of the joint distribution in the analysis of
treatment effects. Journal of Business & Economic Statistics.
Sargent, T. J. (1981). Interpreting economic time series. Journal of Political Economy, 89, 213–248.
Sävje, F., Aronow, P. M. and Hudgens, M. G. (2019). Average treatment effects in the presence of
unknown interference. Tech. rep., arXiv preprint arXiv:1711.06399.
Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69 (1),
99–118.
— (1956). Rational choice and the structure of the environment. Psychological Review, 63 (2),
129–138.
Sims, C. A. (1972). Money, income and causality. American Economic Review, 62, 540–552.
130
Slutzky, E. (1937). The summation of random causes as the source of cyclic processes. Econometrica,
5, 105–146.
Sneider, C. and Tang, Y. (2018). Experiment rigor for switchback experiment analysis. URL:
https://doordash.engineering/2019/02/20/experiment-rigor-for-switchback-experiment-analysis/.
Stevenson, M. (2018). Assessing risk assessment in action. Minnesota Law Review, 103.
— and Doleac, J. (2019). Algorithmic Risk Assessment in the Hands of Humans. Tech. rep.
Stock, J. H. and Watson, M. W. (2016). Dynamic factor models, factor-augmented vector autore-
gressions, and structural vector autoregressions in macroeconomics. In J. B. Taylor and H. Uhlig
(eds.), Handbook of Macroeconomics, vol. 2A, pp. 415–525.
— and — (2018). Identification and estimation of dynamic causal effects in macroeconomics using
external instruments. Economic Journal, 128, 917—-948.
Sun, L. and Abraham, S. (2020). Estimating Dynamic Treatment Effects in Event Studies with Heteroge-
neous Treatment Effects. Tech. rep., arXiv preprint, arXiv:1804.05785.
Syrgkanis, V., Tamer, E. and Ziani, J. (2018). Inference on Auctions with Weak Assumptions on
Information. Tech. rep., arXiv preprint, arXiv:1710.03830.
Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple equilibria. The
Review of Economic Studies, 70 (1).
Tan, S., Adebayo, J., Inkpen, K. and Kamar, E. (2018). Investigating Human + Machine Complemen-
tarity for Recidivism Predictions. Tech. rep., arXiv preprint, arXiv:1808.09123.
Tenreyro, S. and Thwaites, G. (2016). Pushing on a string: US monetary policy is less powerful
in recessions. American Economic Journal: Macroeconomics, 8 (4), 43–74.
Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,
185 (4157), 1124–1131.
van der Laan, M. J. (2008). The construction and analysis of adaptive group sequential designs.
White, H. and Kennedy, P. (2009). Retrospective estimation of causal effects through time. In
J. L. Castle and N. Shephard (eds.), The Methodology and Practice of Econometrics: A Festschrift in
Honour of David F. Hendry, Oxford University Press, pp. 59–87.
— and Lu, X. (2010). Granger causality and dynamic structural systems. Journal of Financial
Econometrics, 8, 193–243.
Wiener, N. (1956). The theory of prediction. In E. F. Beckenbeck (ed.), Modern Mathematics, New
York, USA: McGraw-Hill, pp. 165–190.
Wilder, B., Horvitz, E. and Kamar, E. (2020). Learning to complement humans. In Proceedings
of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International
Joint Conferences on Artificial Intelligence Organization, pp. 1526–1533.
131
Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high
dimensional data in c++ and r. Journal of Statistical Software, Articles, 77 (1), 1–17.
Wu, X., Weinberger, K. R., Wellenius, G. A., Dominici, F. and Braun, D. (2021). Assessing
the causal effects of a stochastic intervention in time series data: Are heat alerts effective
in preventing deaths and hospitalizations?, unpublished paper: Department of Biostatistics,
Harvard T.H. Chan School of Public Health.
Yadlowsky, S., Namkoong, H., Basu, S., Duchi, J. and Tian, L. (2020). Bounds on the condi-
tional and average treatment effect with unobserved confounding factors. Tech. rep., arXiv preprint
arXiv:1808.09521.
Yang, C. and Dobbie, W. (2020). Equal protection under algorithms: A new statistical and legal
framework. Michigan Law Review, 119 (2), 291–396.
Zhang, K. W., Janson, L. and Murphy, S. A. (2020). Inference for Batched Bandits. Tech. rep., arXiv
preprint arXiv:2002.03217.
132
Appendix A
Appendix to Chapter 1
Figure A.1: Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-felony charge cells for one
judge in New York City.
Notes: This figure plots the observed failure to appear rate among released defendants (orange, circles) and the
bounds based on the judge leniency for the failure to appear rate among detained defendants (blue, triangles) at each
decile of predicted failure to appear risk and race-by-felony charge cell for the judge that heard the most cases in
the main estimation sample. he judge leniency instrument Z P Z is defined as the assigned judge’s quintile of the
constructed, leave-one-out leniency measure. Judges in New York City are quasi-randomly assigned to defendants
within court-by-time cells. The bounds on the failure to appear rate among detained defendants (blue, triangles) are
constructed using the most lenient quintile of judges, and by applying the instrument bounds for a quasi-random
instrument (see Appendix A.4.1). Section 1.5.3 describes the estimation details for these bounds. Source: Rambachan
and Ludwig (2021).
133
Figure A.2: Estimated bounds on implied prediction mistakes between top and bottom predicted
failure to appear risk deciles made by judges within each race-by-felony charge cell.
Notes: This figure plots the 95% confidence interval on the implied prediction mistake δpw, dq{δpw, d1 q between the top
decile d and bottom decile d1 of the predicted failure to appear risk distribution for each judge in the top 25 whose
pretrial release decisions violated the implied revealed preference inequalities (Table 1.1) and each race-by-felony
charge cell. The implied prediction mistake δpw, dq{δpw, d1 q measures the degree to which judges’ beliefs underreact
or overreact to variation in failure to appear risk. The confidence intervals highlighted in orange show that judges
under-react to predictable variation in failure to appear risk from the highest to the lowest decile of predicted failure to
appear risk (i.e., the estimated bounds lie below one). These confidence intervals are constructed by first constructing a
95% joint confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the
moment inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated
with each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on
the implied prediction mistake and Section 1.5.5 for the estimation details. Source: Rambachan and Ludwig (2021).
134
Figure A.3: Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that make detectable prediction mistakes about failure to appear risk
by defendant race.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decisions against the judge’s observed release decisions among judges who were found to make
detectable prediction mistakes, broken out by defendant race. Worst case total expected social welfare under each
decision rule is computed by first constructing a 95% confidence interval for total expected social welfare under the
decision rule, and reporting smallest value that lies in the confidence interval. These decisions rules are constructed
and evaluated over race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the relative social
welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention).
The solid line plots the median change across judges that make mistakes, and the dashed lines report the minimum
and maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).
135
Figure A.4: Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that make detectable prediction mistakes about failure to appear risk.
Notes: This figure reports the overall release rate of the algorithmic decision rule that fully automates decisions against
the judge’s observed release rates among judges who were found to make detectable prediction mistakes. These
decisions rules are constructed and evaluated over race-by-age cells and deciles of predicted risk. The x-axis plots the
relative social welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary
detention). The solid line plots the median release rate across judges that make detectable prediction mistakes, and
the dashed lines report the minimum and maximum release rates across judges. See Section 1.6.2 for further details.
Source: Rambachan and Ludwig (2021).
136
Figure A.5: Ratio of total expected social welfare under algorithmic decision rule relative to release
decisions of judges that do not make detectable prediction mistakes about failure to appear risk.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization at accurate beliefs about failure to appear risk. Worst case total expected
social welfare under each decision rule is computed by constructing 95% confidence intervals for total expected social
welfare under the decision rule, and reporting smallest value that lies in the confidence interval. These decisions rules
are constructed and evaluated over race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the
relative social welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary
detention). The solid line plots the median change across judges, and the dashed lines report the minimum and
maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).
137
Figure A.6: Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes by defendant race.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization behavior at accurate beliefs, broken out by defendant race. Worst case
total expected social welfare under each decision rule is computed by first constructing a 95% confidence interval for
total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence interval.
These decisions rules are constructed and evaluated over race-by-age cells and deciles of predicted failure to appear
risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not fail to appear in court
U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges, and the dashed lines
report the minimum and maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and
Ludwig (2021).
138
Figure A.7: Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that do not make detectable prediction mistakes.
Notes: This figure reports the overall release rate of the algorithmic decision rule that fully automates decisions
against the judge’s observed release rates among among judges whose choices were consistent with expected utility
maximization behavior at accurate beliefs. These decisions rules are constructed and evaluated over race-by-age cells
and deciles of predicted failure to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant
that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median release
rate across judges that do not make systematic prediction mistakes, and the dashed lines report the minimum and
maximum release rates across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).
139
Table A.1: Estimated lower bound on the fraction of judges whose release decisions are inconsistent
with expected utility maximization behavior at accurate beliefs about any pretrial misconduct risk
given defendant characteristics.
140
A.2 User’s Guide to Identifying Prediction Mistakes in Screening De-
cisions
This section provides a step-by-step guide on how the identification results in Sections 1.3-1.4 for
Suppose we observe a decision maker making many decisions, and for each decision we
observe the characteristics of the decision pW, Xq, decision maker’s choice C P t0, 1u, and the
outcome Y :“ C ¨ Y ˚ , where Y ˚ P t0, 1u is the latent outcome of interest. We observe the dataset
n , which is an i.i.d. sample from the joint distribution pW, X, C, Yq „ P. As
tpWi , Xi , Ci , Yi qui“1
discussed in Section 1.2.2, pretrial release decisions, medical testing and diagnostic decisions, and
hiring decisions are all examples of a screening decision. The basic algorithm for testing whether
the decision maker makes systematic prediction mistakes about the outcome Y ˚ based on the
Step 1: Specify which characteristics W directly affect the utility function Upc, y˚ ; wq As
discussed in Section 1.2.3 and Section 1.3.1, this is the key restriction on behavior imposed by
the expected utility maximization model. Researchers may motivate this choice in two ways.
First, in a particular setting, there may be common assumptions about the specification of a
decision maker’s utility function used in empirical research. The researcher may directly appeal
to these established modelling choices to guide the choice of this exclusion restriction. Second,
the exclusion restriction may be chosen to summarize various social or legal restrictions on what
characteristics ought not to directly affect the utility function. I recommend that researchers report
a sensitivity analysis that examines how their conclusions change under alternative assumptions
about which characteristics W directly affect the utility function. This explores the extent to which
conclusions about systematic prediction mistakes are robust to alternative assumptions on the
Step 2: Construct partitions Dw pXq of the excluded characteristics X For each value of the
t1, . . . , Nd u as discussed in Section 1.3.3. The researcher may approach constructing such partitions
141
in two ways. First, there may be data on held-out decisions (e.g., decisions made by other decision
makers), and so the partition may be constructed using supervised machine learning based
prediction methods to predict the outcome on the held out decisions. In this case, estimate a
prediction function fˆ : W ˆ X Ñ r0, 1s on a set of held-out decisions and define Dw pxq by binning
the characteristics X into percentiles of predicted risk within each value w P W . In the empirical
appear in court over decisions may be all judges in New York City except for the top 25 judges
(see Section 1.5.3). Second, there may be an existing integer-valued risk score that can be used to
construct the partitions and the researcher may simply choose the partition Dw pxq to be level-sets
Step 3: Estimate the observable choice-dependent outcome probabilities Given the parti-
First, as discussed in Section 1.3.2, there may be a randomly assigned instrument Z P Z (i.e.,
satisfies pY ˚ , W, Xq | Z) that generates random variation in the decision maker’s choices. In this
case, we can construct an upper bound on PY˚ p1 | 0, w, d, zq at each value z̃ P Z of the form
π0 pw,d,z̃q`PC,Y˚ p1,1|w,d,z̃q
π0 pw,d,zq . This upper bound is directly estimated by
If the instrument is quasi-randomly assigned (i.e., satisfies pY ˚ , W, Xq KK Z | T), then apply the
identification results in Appendix A.4.1. In the empirical application, I used the quasi-random
assignment of judges to cases to construct bounds on the unobservable failure to appear rate
142
among detained defendants (see Section 1.5.3).
Second, researchers may introduce additional assumptions that bound the unobservable choice-
I refer to this as “direct imputation.” In direct imputation, the researcher specifies κw,d ě 0 for
each cell w P W , d P t1, . . . , Nd u and assumes that PY˚ p1 | 0, w, dq ď p1 ` κw,d qPY˚ p1 | 1, w, dq. See
Supplement A.7 for further details. I recommend that researchers report a sensitivity analysis on
their conclusions based on the choices of κw,d ě 0. I illustrate such a sensitivity analysis for direct
imputation in Supplement A.9 for the New York City pretrial release setting.
Finally, researchers may also observe an additional proxy outcome that does not suffer from
the missing data problem, and can therefore be used to construct bounds on the unobservable
choice-dependent outcome probabilities provided the researcher introduces bounds on the joint
distribution of the proxy outcome and the latent outcome. See Supplement A.7 for further details.
Step 5: Test whether the decision maker makes systematic prediction mistakes Testing whether
the decision maker makes systematic prediction mistakes given a choice of directly payoff-relevant
characteristics, a partition of the excluded characteristics and bounds on the unobservable choice-
Corollary 1.3.4 are satisfied. Testing whether these moment inequalities are satisfied tests the null
hypothesis that the decision maker’s choices are consistent with expected utility maximization
behavior at preferences that satisfy the researcher’s conjectured exclusion restriction. Researchers
may pick their preferred moment inequality testing procedure from the econometrics literature
(Canay and Shaikh, 2017; Molinari, 2020). In the empirical application in Section 1.5, I use the
conditional least-favorable hybrid test developed in Andrews et al. (2019) since it is computationally
fast given estimates of the moments and the variance-covariance matrix and has desirable power
properties.
Step 6: Conduct Inference on how biased are the decision maker’s predictions To conduct
inference on how biased are the decision maker’s predictions between cells w P W , d P t1, . . . , Nd u
and w P W , d1 P t1, . . . , Nd u, first construct a joint confidence set for the decision maker’s reweighed
utility thresholds τpw, dq, τpw, d1 q at cells pw, dq, pw, d1 q based on the moment inequalities in (A.2).
143
This can be done through test-inversion: for a grid of possible values for the reweighted thresholds,
test the null hypothesis that the moment inequalities in (A.2) are satisfied at each point in the
grid and collect together all points at which we fail to reject the null hypothesis. Second, for each
value in the joint confidence set, construct the ratio in (A.3). This provides a confidence set for the
decision maker’s implied prediction mistake between cells pw, dq and pw, d1 q. If this confidence set
for the implied prediction mistake lies everywhere below one, then the decision maker’s beliefs
about the latent outcome given the characteristics are underreacting to predictable variation in
the latent outcome. Analogously, if this confidence set for the implied prediction mistake lies
everywhere above one, then the decision maker’s beliefs are overreacting. See Section 1.4.2 for
In this section of the appendix, I provide additional results for the expected utility maximization
In Section 1.3, I analyzed the testable implications of expected utility maximization behavior at
accurate beliefs in screening decisions with a binary outcome. I now show that these identification
results extend to treatment decisions with a multi-valued outcome for particular classes of utility
functions U . These extensions also apply to screening decisions with a multi-valued outcome.
Linear Utility
First, I analyze the conditions under which the decision maker’s choices are consistent with
expected utility maximization at a linear utility function of the form Upc, ⃗y; wq “ βpwqy ´ λpwqc,
where Y P R and βpwq ą 0, λpwq ą 0 for all w P W . This is an extended Roy model in which the
1 Henry et al. (2020) studies an extended Roy model under the assumption that utility function satisfies Up0, ⃗y; wq “
y0 , Up1, ⃗y; wq “ Y1 ´ λpY1 q for some function λp¨q. The authors derive testable restrictions on behavior under this
extended Roy model provided the researcher observes a stochastically monotone instrumental variable.
144
As notation, let µY1 ´Y0 pc, w, xq :“ ErY1 ´ Y0 | C “ c, W “ w, X “ xs. Define X 0 pwq :“ tx P
Theorem A.3.1. Consider a treatment decision. The decision maker’s choices are consistent with expected
utility maximization at some utility function Upc, ⃗y; wq “ βpwqY ´ λpwqC if and only if, for all w P W ,
where
Proof. This result follows immediately from applying the inequalities in Theorem 1.2.1. Over
λpwq
pw, xq P W ˆ X such that π1 pw, xq ą 0, βpwq ď µY1 ´Y0 p1, w, xq must be satisfied: Analogously, over
λpwq
pw, xq P W ˆ X such that π0 pw, xq ą 0, βpwq ě µY1 ´Y0 p0, w, xq must be satisfied. The result is then
immediate.
In a treatment decision with a binary outcome, Theorem A.3.1 immediately implies negative
results about the testability of expected utility maximization behavior that are analogous to those
stated in the main text for a screening decision with binary outcomes. I state these negative results
as a corollary. Define
and
Corollary A.3.1. Consider a treatment decision with a binary outcome. The decision maker’s choices
are consistent with expected utility maximization at some utility function Upc, ⃗y; wq “ βpwqy ´ λpwqc if
either:
(i) All characteristics affect utility (i.e., X “ H) and P⃗Y p0, 1 | 0, wq ´ P⃗Y p1, 0 | 0, wq ď P⃗Y p0, 1 |
145
(ii) The researcher’s bounds on the choice-dependent potential outcome probabilities are uninformative,
ř
meaning that for both c P t0, 1u, Bc,w,x is the set of all Pr⃗Y p ¨ | c, w, xq satisfying yc̃ PY Pr⃗Y p ¨, yc̃ |
Proof. Case (i) follows immediately from Theorem A.3.1. Case (ii) follows since under uninfor-
mative bounds on the missing data, P⃗Y p0, 1 | 1, w, xq ´ P⃗Y p1, 0 | 1, w, xq “ PY1 p1 | 1, w, xq and
I analyze the conditions under which the decision maker’s choices are consistent with expected
utility maximization at a utility function that is a simple function over Y . That is, for some known
Ỹ Ď Y , define Ỹc “ 1tYc P Ỹ u and Ỹ “ CỸ1 ` p1 ´ CqỸ0 . Consider the class of utility functions
of the form upc, ⃗y; wq :“ upc, ỹ; wq. For this class of utility function, the decision maker faces a
treatment decision with a binary outcome, and so the previous analysis applies if we further
The expected utility maximization model in the main text assumes that the decision maker
exactly maximizes expected utility given their preferences and beliefs about the outcome given
the characteristics and some private information. The decision maker, however, may suffer from
various cognitive limitations that prevent them from exact maximization. That is, the decision
maker may be boundedly rational and therefore make choices to “satisfice” rather than fully
optimize (Simon, 1955, 1956). I next weaken the definition of expected utility maximization to
Definition A.3.1 (ϵw -approximate expected utility maximization). The decision maker’s choices are
consistent with ϵw -approximate expected utility maximization if there exists a utility function U P U ,
i. Approximate Expected Utility Maximization: For all c P t0, 1u, c1 ‰ c, pw, x, vq P W ˆ X ˆ V such
146
that Qpc | w, x, vq ą 0,
” ı ” ı
EQ Upc, ⃗
Y; Wq | W “ w, X “ x, V “ v ě EQ Upc1 , ⃗
Y; Wq | W “ w, X “ x, V “ v ´ ϵw .
That is, the decision-maker’s choices are consistent with ϵw -approximate expected utility maxi-
mization if there choices are within ϵw ě 0 of being optimal. This is analogous to the proposed
et al. (2021). Allen and Rehbeck (2020) propose a measure of whether a decision maker’s choices
interest is focused on how well the decision maker’s choices are approximated by expected utility
maximization in empirical treatment decisions. Apesteguia and Ballester (2015) propose a “swaps-
index” to measure how much an observed preference relation violates utility maximization and
expected utility maximization, which summarizes the number of choices that must be swapped
in order to rationalize the data. Notice that the special case with ϵw “ 0 nests the definition of
Theorem A.3.2. The decision maker’s choices are consistent with ϵw -expected utility maximization if
and only if there exists a utility function U P U , ϵw ě 0 for all w P W , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and
for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where the joint distribution
pW, X, C, ⃗
Yq „ Q is given by Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.
Proof. The proof follows the same argument as the proof of Theorem 1.2.1.
In words, the decision maker’s choices are consistent with ϵw -approximate expected utility
maximization if and only if they approximately satisfy the revealed preference inequalities derived
147
The value of Theorem A.3.2 comes in applying the approximate revealed preference inequalities
to analyze particular decision problems. I next use this result to characterize whether the decision
maker’s choices are consistent with approximate expected utility maximization at strict preferences
Theorem A.3.3. Consider a screening decision with a binary outcome. Assume PY˚ p1 | 1, w, xq ă 1 for
all pw, xq P W ˆ X with π1 pw, xq ą 0. The decision maker’s choices are consistent with ϵw -approximate
expected utility maximization at some strict preference utility function if and only if, for all w P W , there
Proof. The proof follows the same argument as the Proof of Theorem 1.3.1, which I provide
for completeness. For all pw, xq P W ˆ X with π1 pw, xq ą 0, Theorem A.3.2 requires that
Up0,0;wq
PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq ` ϵw
´Up0,0;wq´Up1,1;wq . Analogously, for all pw, xq P W ˆ X with
Up0,0;wq ϵw
π0 pw, xq ą 0, Theorem A.3.2 requires that Up0,0;wq`Up1,1;wq ´ ´Up0,0;wq´Up1,1;wq ď PY˚ p1 | 0, w, xq.
ϵw
The result is immediate after defining ϵ̃w “ ´Up0,0;wq´Up1,1;wq ě 0.
That is, the decision maker’s choices are consistent with ϵw -approximate expected utility maxi-
mization if and only if the decision maker is acting as if she applies an approximate, incomplete
threshold rule in selecting her choices. This means that the observed choice-dependent outcome
probabilities and bounds on the unobservable choice-dependent outcome probabilities must satisfy
a relaxation of the inequalities given in Theorem 1.3.1. As shown in the proof, the relaxation
ϵw
satisfies ϵ̃w “ ´Up0,0;wq´Up1,1;wq . By further normalizing the scale of the decision maker’s utility
´Up0, 0; wq ´ Up1, 1; wq “ 1, it follows ϵ̃w “ ϵw . Notice that as ϵw grows large for each w P W ,
the decision maker’s choices are always rationalizable under ϵw -approximate expected utility
maximization. Therefore, searching for the minimal value ϵ̃w ě 0 such that the inequalities are
satisfied provides a simple summary measure of how “far from optimal” are the decision maker’s
148
where pAq` “ maxtA, 0u. Alternatively, the minimal value can also be characterized as the
Finally, the implied revealed preference inequalities over partitions of the excluded charac-
teristics x P X given in Corollary 1.3.4 can analogously be extended. In particular, for partitions
Dw : X Ñ t1, . . . , Nd u, if the decision maker’s choices are consistent with ϵw -approximate expected
Therefore, researchers can again characterize the implied relaxation ϵ̃w ě 0 over the constructed
partitions Dw pXq.
A.3.3 Expected Utility Maximization with Inaccurate Beliefs after Dimension Reduc-
tion
In this section, I show that the identification result for expected utility maximization behavior with
a function that partitions the observable characteristics X into level sets tx P X : Dw pxq “ du. The
Proposition A.3.1. Suppose the decision maker’s choices are consistent with expected utility maximization
behavior at inaccurate beliefs and some utility function U P U . Then, for each w P W , d P t1, . . . , Nd u,
c P t0, 1u and c1 ‰ c,
ÿ ÿ
QC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq,
⃗yPY Nc ⃗yPY Nc
149
where
¨ ˛
ÿ
QC,⃗Y pc, ⃗y | w, Dw pxq “ dq “ ˝ PC pc | ⃗y, w, xqQ⃗Y p⃗y | w, xqPpx | wq‚{PpDw pxq “ d | wq,
x : Dw pxq“d
Provided that PC,⃗Y pc, ⃗y | w, xq ą 0 for all pc, ⃗yq P C ˆ Y Nc and pw, xq P W ˆ X , Proposition A.3.1
can be recast as checking whether there exists non-negative weights ωpc, ⃗y; w, dq ě 0 satisfying
ÿ
ωpc, ⃗y; w, dqPC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq ě
⃗yPY Nc
ÿ
ωpc, ⃗y; w, dqPC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq
⃗yPY Nc
” ı
and EP ωpC, ⃗
Y; W, Dw pXqq | W “ w, Dw pxq “ d “ 1.
I next apply this result in a screening decision with a binary outcome. In this special case,
this result may be applied to derive bounds on the decision maker’s reweighed utility threshold
through
ωp0, 0; w, dqUp0, 0; wq
PY˚ p1 | 1, w, dq ď ď PY˚ p1 | 0, w, dq, (A.2)
ωp0, 0; w, dqUp0, 0; wq ` ωp1, 1; w, dqUp1, 1; wq
d, d1 P t1, . . . , Nd u, we arrive at
QpC“1,Y ˚ “1|M“1,w,dq{QpC“0,Y ˚ “0|M“1,w,dq
p1 ´ τpw, dqq{τpw, dq QpC“1,Y ˚ “1|M“1,w,d1 q{QpC“0,Y ˚ “0|M“1,w,d1 q
“ PpC“1,Y ˚ “1|M“1,w,dq{PpC“0,Y ˚ “0|M“1,w,dq
. (A.3)
p1 ´ τpw, d1 qq{τpw, d1 q ˚ 1 ˚ 1
PpC“1,Y “1|M“1,w,d q{PpC“0,Y “0|M“1,w,d q
By examining values in the identified set of reweighted utility thresholds defined on the coarsened
characteristic space, bounds may be constructed on a parameter that summarizes the decision
maker’s beliefs about her own “ex-post mistakes.” That is, how does the decision maker’s belief
about the relative probability of choosing C “ 0 and outcome Y ˚ “ 0 occurring vs. choosing C “ 1
and outcome Y ˚ “ 1 occurring compare to the true probability? If these bounds lie everywhere
below one, then the decision maker’s beliefs are under-reacting to variation in risk across the cells
pw, dq and pw, d1 q. If these bounds lie everywhere above one, then the decision maker’s beliefs are
150
over-reacting.
1.6.1. I construct an algorithmic decision rule based on analyzing how the policymaker would
make choices herself in the binary screening decision. Rambachan et al. (2021) refer to this as the
Due to the missing data problem, the conditional probability of Y ˚ “ 1 given the characteristics
is partially identified and I assume the policymaker adopts a max-min evaluation criterion to
evaluate decision rules. Let p˚ pw, xq P r0, 1s denote the probability the policymaker selects C “ 1
min p˚ pw, xq PrY˚ p1 | w, xqU ˚ p1, 1q ` p1 ´ p˚ pw, xqqp1 ´ PrY˚ p1 | w, xqqU ˚ p0, 0q
PrY˚ p1|w,xq
Proposition A.3.2. Consider a binary screening decision and a policymaker with social welfare function
U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0, who chooses p˚ pw, xq P r0, 1s to maximize her worst-case expected utility.
U ˚ p0,0q
Defining τ ˚ pU ˚ q :“ U ˚ p0,0q`U ˚ p1,1q , her max-min decision rule is
$
’
’
’
’
’1 if PY˚ p1 | w, xq ď τ ˚ ,
’
&
p˚ pw, x; U ˚ q “ 0 if PY˚ p1 | w, xq ě τ ˚ ,
’
’
’
’
’
%τ ˚ if PY˚ p1 | w, xq ă τ ˚ ă PY˚ p1 | w, xq.
’
151
Case 2: Suppose PpY ˚ “ 1 | W “ w, X “ xq ě τ ˚ . In this case,
noticing that p˚ pw, xq “ τ ˚ delivers constant expected payoffs for all PpY ˚ “ 1 | W “ w, X “ xq
p˚ pw, xqqU ˚ p0, 0q, which equals zero if p˚ pw, xq “ τ ˚ . Moreover, worst case expected social
U ˚ p0,0qU ˚ p1,1q
welfare at p˚ pw, xq “ τ ˚ is equal to the constant U ˚ p0,0q`U ˚ p1,1q . I show that any other choice of
p˚ pw, xq delivers strictly lower worst-case expected social welfare in this case.
Consider any p˚ pw, xq ă τ ˚ . At this choice, expected social welfare is minimized at PpY ˚ “
xqqU ˚ p0, 0q, which is strictly positive since PpY ˚ “ 1 | W “ w, X “ xq ă τ ˚ . This implies that
152
xqqU ˚ p0, 0q, which is strictly negative since PpY ˚ “ 1 | W “ w, X “ xq ą τ ˚ . This implies that
The policymaker makes choices based on a threshold rule, where the threshold τ ˚ depends on the
relative costs to ex-post errors assigned by the social welfare function. If the upper bound on the
chooses C “ 1 with probability one. If the lower bound on the probability of Y ˚ “ 1 is sufficiently
high, then the policymaker chooses C “ 0 with probability one. Otherwise, if the identified set for
PpY “ 1 | W “ w, X “ xq contains the threshold τ ˚ , the policymaker randomizes her decision and
In my empirical analysis in Section 1.6.2, I evaluate the choices of judges against this first-
best decision rule applied to each cell of payoff relevant characteristics W and each decile of
predicted risk Dw pXq. The bounds on the probability of Y ˚ “ 1 conditional on the characteristics
is constructed using the quasi-random assignment of judges as discussed in Section 1.5.3, and the
threshold τ ˚ varies as the social welfare function U ˚ p0, 0q, U ˚ p1, 1q varies.
In this section of the appendix, I provide additional results that are useful for implementing the
In this section, I modify Assumption 1.3.1 to only impose that the instrument be quasi-randomly
assigned conditional on some additional characteristics t P T with finite support. The joint
153
distribution pW, X, T, Z, C, Y ˚ q „ P satisfies
pW, X, Y ˚ q KK Z | T (A.4)
application to pretrial release decisions in New York City, bail judges are quasi-randomly assigned
Under (A.4), researchers can derive bounds on the unobservable choice-dependent outcome
ÿ ÿ
PY˚ p1 | w, x, zq “ PY˚ p1 | w, x, z, tqPpt | w, x, zq “ PY˚ p1 | w, x, z̃, tqPpt | w, x, zq,
tPT tPT
where the last equality follows by quasi-random assignment. Furthermore, for each value of t P T
Therefore, for a given z P Z , valid lower and upper bounds on PY˚ p1 | w, x, zq are given by
for any z̃ P Z . Since PC,Y˚ p1, 1 | w, x, zq is observed, this naturally implies bounds on PC,Y˚ p0, 1 |
w, x, zq. This in turn gives a bound on PY˚ p1 | 0, w, x, zq since πc pw, x, zq is also observed (assuming
πc pw, x, zq ą 0).
In this section, I extend the econometric framework for analyzing screening decisions in Section
1.3 to treatment decisions. First, I discuss how the researcher may construct bounds on the
discuss how testing the revealed preference inequalities in these settings reduces to testing many
154
Constructing Bounds with an Instrument in Treatment Decisions
As in the main text, let Z P Z be a finite support instrument. Assume that the joint distribution
pW, X, Z, C, ⃗
Yq „ P satisfies pW, X, ⃗
Yq KK Z and PpW “ w, X “ x, Z “ zq ą 0 for all pw, x, zq P
W ˆ X ˆ Z . In this case, the conditional joint distribution pC, ⃗Yq | W, X, Z is partially identified
and the next result provides bounds on this quantity.
Proposition A.4.1. Consider any pw, x, zq P W ˆ X ˆ Z . If PrC,⃗Y p¨, ¨ | w, x, zq P H P pPC,⃗Y p¨, ¨ | w, x, zqq,
then it satisfies
ÿ
PrC,⃗Y p1, y0 , y | w, x, zq “ PC,Y1 p1, y | w, x, zq
y0 PY
ÿ
PrC,⃗Y p0, y, y1 | w, x, zq “ PC,Y0 p0, y | w, x, zq.
y1 PY
0 ď PrC,⃗Y p1, ⃗y | w, x, zq ` PrC,⃗Y p0, ⃗y | w, x, zq ď PC,Y1 p1, y1 | w, x, z̃q ` PC,Y0 p0, y0 | w, x, z̃q.
Proof. Consider a particular value pw, x, zq P W ˆ Z . Notice that P⃗Y p ¨ | w, x, zq “ P⃗Y p ¨ | w, x, z̃q
by random assignment of the instrument. Furthermore, notice that P⃗Y p⃗y | w, x, zq “ PC,⃗Y p0, ⃗y |
Therefore, we observe that P⃗Y p⃗y | w, x, zq must be bounded below by 0 (trivially) and bounded
above by PC,Y0 p0, y0 | w, x, zq ` PC,Y1 p0, y1 | w, x, zq for each z P Z . The result is then immediate by
These simple bounds are non-sharp, but they can be tightened by using Arstein’s Theorem.
Results in Russell (2019) and Kitagawa (2020) characterize the sharp identified set of potential
outcomes in treatment assignment problems and imply sharp bounds on P⃗Y p ¨ | w, x, zq. We can
replace the non-sharp bounds in (ii) in Proposition A.4.1 with these sharp bounds to obtain tight
155
bounds on the conditional joint distribution pC, ⃗
Yq | W, X, Z. The number of inequalities in these
sharp bounds grow exponentially in the support of the potential outcomes and equals 2Y ˆY .
Testing for Prediction Mistakes Reduces to Testing Many Moment Inequalities with Linear
Nuisance Parameters
Suppose the researcher wishes to test whether the decision maker’s choices are consistent with
decision, where recall that U is the set of feasible utility functions specified by the researcher.
Denote this null hypothesis by H0 pU q As a stepping stone, I provide a reduction to show how
the researcher may test whether the decision maker’s choices are consistent with expected utility
maximization behavior at a particular utility function U P U . Denote this particular null hypothesis
as H0 pUq. As discussed in Bugni et al. (2015), the researcher may construct a conservative test of
H0 pU q by constructing a confidence interval for the identified set of utility functions through test
With this in mind, consider a fixed utility function U P U and suppose the researcher constructs
inequalities with linear nuisance parameters. I will prove this result for the non-sharp bounds
stated in Proposition A.4.1 but the result extends to using sharp bounds based on Arstein’s
Inequality.
Proposition A.4.2. Let Ny :“ |Y |. Assume there is a randomly assigned instrument. The decision
maker’s choices at z P Z are consistent with expected utility maximization behavior at utility function
where ÃpUq is a matrix of known constants that depend on the specified utility function U, b is a vector of
known constants, µpPq is a m :“ 2dw d x Ny ` dw d x Ny2 pNz ´ 1q dimensional vector that collects together
2 For a matrix B, the notation Bp¨,1:mq denotes the submatrix containing the first m columns of B and Bp¨,´1:mq
156
Proof. Applying the non-sharp bounds in Proposition A.4.1, I begin by restating Lemma A.5.1 in
Lemma A.4.1. The decision maker’s choices at z P Z are consistent with expected utility maximization
behavior at utility function U if for all pw, xq P W ˆ X there exists PrC,⃗Y p ¨ | w, x, zq P ∆pC ˆ Y ˆ Y q
satisfying
ÿ ÿ
PrC,⃗Y pc, ⃗y | w, x, zqUpc, ⃗y; w, zq ě PrC,⃗Y pc, ⃗y | w, x, zqUpc̃, ⃗y; w, zq.
⃗y ⃗y
ř ř
ii. For all y P Y , y1 PrC,⃗Y p0, y, y1 | w, x, zq “ PC,Y0 p0, y | w, x, zq and y0 PrC,⃗Y p1, y0 , y | w, x, zq “
0 ď PrC,⃗Y p1, ⃗y | w, x, zq ` PrC,⃗Y p0, ⃗y | w, x, zq ď PC,Y1 p1, y1 | w, x, z̃q ` PC,Y0 p0, y0 | w, x, z̃q.
For each c P t0, 1u and c̃ ‰ c, define the 1 ˆ Ny2 dimensional row vector Acw,x,z pUq as
´ ¯
Acw,x,z pUq “ Upc̃, ⃗y1 ; w, zq ´ Upc, ⃗y1 ; w, zq, . . . , Upc̃, ⃗y Ny ; w, zq ´ Upc, ⃗y Ny ; w, zq .
For each pw, xq P W ˆ X , define the 2 ˆ 2Ny2 dimensional block diagonal matrix Aw,x,z pUq as
¨ ˛
0
˚ Aw,x,z pUq
Aw,x,z pUq “ ˝ ‚.
‹
A1w,x,z pUq
157
Define the 2dw d x ˆ 2dw d x Ny2 dimensional block diagnoal matrix Az pUq as
¨ ˛
Aw1 ,x1 ,z pUq
˚ ‹
˚ ‹
˚ Aw1 ,x2 ,z pUq ‹
Az pUq “ ˚ ‹.
˚ ‹
..
˚
˚ . ‹
‹
˝ ‚
Awdw ,xdx ,z pUq
´ ¯
Letting PrC,⃗Y p ¨ | zq “ PC,⃗Y p ¨ | w1 , x1 , zq, . . . , PC,⃗Y p ¨ | wdw , xdx , zq , the revealed preference con-
Az pUq PrC,⃗Y p ¨ | zq ď 0,
We may construct 2dw d x Ny ˆ 2dw d x Ny2 dimensional matrix Bz,eq that forms the data consistency
conditions in (ii) of Lemma A.4.1. For each z̃ P Z , we may construct the dw d x Ny2 ˆ 2dw d x Ny2
matrices Bz,z̃ that forms the upper bounds in (iii) of Lemma A.4.1. Stack these together to form
Bz , Finally, define dw ¨ ˆ2dw d x Ny2 matrix Dz,eq that imposes PrC,⃗Y p ¨ | w, x, zq sums to one and the
2dw d x Ny2 ˆ 2dw d x Ny2 matrix Dz,` that imposes each element of PrC,⃗Y p ¨ | w, x, zq is non-negative.
We next introduce non-negative slack parameters associated with each inequality constrain in
where µpPq is the vector that collects together the observable data and the bounds, and δ is a
Therefore, we observe that Lemma A.4.1 implies that the decision maker’s choices at z P Z are
consistent with expected utility maximization behavior at utility function U if and only if there
158
exists vector δ satisfying
¨ ˛ ¨ ˛
Az pUq 0 0
˚ ‹ ˚ ‹
˚ ‹ ˚ ‹
˚ Dz,eq 0 ‹ ˚1‹
˚ ‹ ˚ ‹
˚ ‹ ˚ ‹
0 ‹ ‹ δ ď ˚´1‹ and Bz δ “ µpPq.
˚´D ˚ ‹
˚ z,eq
˚ ‹ ˚ ‹
˚ Dz,` 0 ‹ ˚0‹
˚ ‹ ˚ ‹
˝ ‚ ˝ ‚
0 Dz,` 0
looooooooomooooooooon loomoon
:“ Ãz pUq :“b
The matrix Bz has full row rank, nrowpBz q “ 2dw d x Ny ` dw d x Ny2 pNz ´ 1q, ncolpBz q “ 2dw d x Ny2 `
¨ ˛
˚ Bz ‹
dw d x Ny2 pNz ´ 1q, and so nrowpBz q ď ncolpBz q. Therefore, we may define Hz “ ˝ ‚ to be the
Γz
full-rank, square matrix that pads Bz with linearly independent rows. Then,
¨ ˛
˚µpPq‹
Ãz pUqδ “ Ãz pUqHz´1 Hz δ “ Ãz pUqHz´1 ˝ ‚,
δ̃
where δ̃ :“ Γz δ is a 2dw d x Ny pNy ´ 1q dimensional vector. This completes the proof with the slight
Proposition A.4.2 showed that testing whether the decision maker’s choices are consistent with
expected utility maximization behavior at a candidate utility function may be reduced to testing a
many moment inequalities with linear nuisance parameters. This same testing problem may be
also reduced to testing whether there exists a non-negative solution to a large, linear system of
equations, which was recently studied in Kitamura and Stoye (2018) and Fang et al. (2020).
For each w P W , define Dw : X Ñ t1, . . . , Nd u to be some function that partitions the support
of the characteristics x P X into the level sets tx : Dw pxq “ du. Through an application of iterated
expectations, if the decision maker’s choices are consistent with expected utility maximization
behavior at some utility function U, then their choices must satisfy implied revealed preference
inequalities.
Corollary A.4.1. Suppose the decision maker’s choices are consistent with expected utility maximization
159
behavior at some utility function U. Then, for all w P W and d P t1, . . . , Nd u,
” ı ” ı
EQ Upc, ⃗
Y; Wq | C “ c, W “ w, Dw pXq “ d ě EQ Upc1 , ⃗
Y; Wq | C “ c, W “ w, Dw pXq “ d ,
for all c P t0, 1u, pw, dq P W ˆ t1, . . . , Nd u with πc pw, dq :“ PpC “ c | W “ w, Dw pXq “ dq ą 0 and
c1 ‰ c, where EQ r¨s is the expectation under Q and for some Pr⃗Y p ¨ | 0, w, xq P B0,w,x , PrY˚ p ¨ | 1, w, xq P
Therefore, in treatment decisions, researchers may test a lower dimensional set of moment
inequalities with linear nuisance parameters that characterize the set of implied revealed preference
inequalities. This is useful as recent work develops computationally tractable and power inference
procedures for lower-dimensional moment inequality with linear nuisance parameters such as
Andrews et al. (2019), Cox and Shi (2020) and Rambachan and Roth (2020).
A.4.3 Expected Social Welfare Under the Decision Maker’s Observed Choices
Consider a policymaker with social welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 as in Section 1.6.1.
Total expected social welfare under the decision maker’s observed choices is given by
Since PC,Y˚ p0, 1 | w, xq is partially identified, total expected social welfare under the decision
maker’s observed choices is also partially identified and the sharp identified set is an interval.
Proposition A.4.3. Consider a screening decision with a binary outcome and a policymaker with social
welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. The sharp identified set of total expected social welfare under
160
where
ÿ
PpC “ 0, Y ˚ “ 1q “ " max * Ppw, xq PrC,Y˚ p0, 1 | w, xq
PrC,Y˚ p0,1|w,xq : pw,xqPW ˆX
pw,xqPW ˆX
The sharp identified set of the conditional probability of C “ 0, Y ˚ “ 1 given the characteristics
is an interval, and therefore the sharp identified set of total expected social welfare under the
decision maker’s observed choices can be characterized as the solution to two linear programs.
Provided the joint distribution of the characteristics pW, Xq are known, then testing the null
hypothesis that total expected social welfare is equal to some candidate value is equivalent to
testing a system of moment inequalities with a large number of nuisance parameters that enter
the moments linearly. A confidence interval for total expected social welfare under the decision
Proposition A.4.4. Consider a binary screening decision and a policymaker with social welfare function
U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. Conditional on the characteristics pW, Xq, testing the null hypothesis
where PC,Y˚ p0, 1q, PC,Y˚ p0, 1q are the dw d x -dimensional vectors of lower and upper bounds on PC,Y˚ p0, 1 |
rDM is a known matrix.
w, xq respectively, and A
denote the dw d x dimensional vector with entries equal to PrC,Y˚ p0, 1 | w, xq. From the definition of
ÿ
´U ˚ p0, 0q PrC,Y˚ p0, 1 | w, xqPpW “ w, X “ xq “
pw,xqPW ˆX
161
and for each pw, xq P W ˆ X
hypothesis
D PrC,Y˚ p0, 1q satisfying ℓ⊺ pU ˚ q PrC,Y˚ p0, 1q “ θ0 ´ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ´ U ˚ p0, 0qPpC “ 0q and
¨ ˛
˚´PC,Y˚ p0, 1q‹
A PrC,Y˚ p0, 1q ď ˝ ‚.
PC,Y˚ p0, 1q
Next, we apply a change of basis argument. Define the full rank matrix Γ, whose first row is equal
I prove the following Lemma, and then show that it implies Theorem 1.2.1.
Lemma A.5.1. The decision maker’s choices are consistent with expected utility maximization behavior if
and only if there exists a utility function U P U , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and Pr⃗Y p ¨ | 1, w, xq P B1,w,x that
ÿ ÿ
Pr⃗Y p⃗y | c, w, xqPC pc | w, xqUpc, ⃗y; wq ě Pr⃗Y p⃗y | c, w, xqPC pc | w, xqUpc1 , ⃗y; w, q.
⃗yPY ˆY ⃗yPY ˆY
162
Proof of Lemma A.5.1: Necessity Suppose that the decision maker’s choices are consistent
with expected utility maximization behavior at some utility function U and joint distribution
pW, X, V, C, ⃗
Yq „ Q.
First, I show that if the decision maker’s choices are consistent with expected utility maxi-
information with support V , then her choices are also consistent with expected utility maximiza-
tion behavior at some finite support private information. Partition the original signal space V
into the subsets Vt0u , Vt1u , Vt0,1u , which collect together the signals v P V at which the decision
rPV
private information V r as
Qp
r Vr “ vt0u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt0u | ⃗
Y “ ⃗y, W “ w, X “ xq
Qp
r Vr “ vt1u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt1u | ⃗
Y “ ⃗y, W “ w, X “ xq
Qp
r Vr “ vt0,1u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt0,1u | ⃗
Y “ ⃗y, W “ w, X “ xq.
Define QpC
r “0|V
r “ vt0u , W “ w, X “ xq “ 1, QpC
r “1|V
r “ vt1u , W “ w, X “ xq “ 1 and
QpC “ 1, V P Vt0,1u | W “ w, X “ xq
QpC
r “1|V
r “ vt0,1u , W “ w, X “ xq “ .
QpV P Vt0,1u | W “ w, X “ xq
Define the finite support expected utility representation for the decision maker by the utility
r C, ⃗
function U and the random vector pW, X, V, r where Qpw,
Yq „ Q, r x, v, c, ⃗yq “ Qpw, x, ⃗yqQp
r vr |
w, x, ⃗yqQpc
r | w, x, vrq. The information set and expected utility maximization conditions are satisfied
by construction. Data consistency is satisfied since it is satisfied at the original private information
163
V P V . To see this, notice that for all pw, x, ⃗yq P W ˆ X ˆ Y ˆ Y
PpC “ 1, ⃗
Y “ ⃗y | W “ w, X “ xq “
QpC “ 1, V “ V , ⃗
Y “ ⃗y | W “ w, X “ xq “
QpC “ 1, V P Vt1u , ⃗
Y “ ⃗y | W “ w, X “ xq ` QpC “ 1, V P Vt0,1u , ⃗
Y “ ⃗y | W “ w, X “ xq “
QpC r “ vC“1 , ⃗
r “ 1, V Y “ ⃗y | W “ w, X “ xq ` QpC r “ vC“e , ⃗
r “ 1, V Y “ ⃗y | W “ w, X “ xq “
ÿ
QpC r “ vr, ⃗
r “ 1, V r “ 1, ⃗
Y “ ⃗y | W “ w, X “ xq “ QpC Y “ ⃗y | W “ w, X “ xq.
vrPV
r
the necessity proof, it is without loss of generality to assume the private information V P V has
finite support.
I next show that if there exists an expected utility representation for the decision maker’s
choices, then the stated inequalities in Lemma A.5.1 are satisfied by adapting the necessity
argument given the “no-improving action switches inequalities” in Caplin and Martin (2015).
Suppose that the decision maker’s choices are consistent with expected utility maximization
through the expected utility maximization condition. Multiply both sides by QV pv | w, xq to arrive
at ¨ ˛
ÿ
QC pc | w, x, vqQV pv | w, xq ˝ Q⃗Y p⃗y | w, x, vqUpc, ⃗y; wq‚ ě
⃗yPY ˆY
¨ ˛
ÿ
QC pc | w, x, vqQV pv | w, xq ˝ Q⃗Y p⃗y | w, x, vqUpc1 , ⃗y; wq‚.
⃗yPY ˆY
Next, use information set to write QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc | w, x, vq and arrive at
¨ ˛
ÿ
QV pv | w, xq ˝ QC,⃗Y pc, ⃗y | w, x, vqUpc, ⃗y; w, xq‚ ě
⃗yPY ˆY
164
¨ ˛
ÿ
QV pv | w, xq ˝ QC,⃗Y pc, ⃗y | w, x, vqUpc1 , ⃗y; wq‚.
⃗yPY ˆY
Finally, we use QC,⃗Y pc, ⃗y, v | w, xq “ QC,⃗Y pc, ⃗y | w, x, vqQV pv | w, xq and then further sum over
v P V to arrive at
˜ ¸ ˜ ¸
ÿ ÿ ÿ ÿ
QV,C,⃗Y pv, c, ⃗y | w, xq Upc, ⃗y; wq ě QV,C,⃗Y pv, c, ⃗y | w, xq Upc1 , ⃗y; wq
⃗yPY ˆY vPV ⃗yPY ˆY vPV
ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY
The inequalities in Lemma A.5.1 then follow from an application of data consistency.
Proof of Lemma A.5.1: Sufficiency To establish sufficiency, I show that if the conditions in
Lemma A.5.1 holds, then private information v P V can be constructed that recommends choices
c P t0, 1u and an expected utility maximizer would find it optimal to follow these recommendations
as in the sufficiency argument in Caplin and Martin (2015) for the “no-improving action switches”
inequalities.
Towards this, suppose that the conditions in Lemma A.5.1 are satisfied at some Pr⃗Y p ¨ | c, w, xq P
Bc,w,x for all c P t0, 1u, pw, xq P W ˆ X . As notation, let v P V :“ t1, . . . , 3u index the subsets in
2t0,1u “ tt0u, t1u, t0, 1uu.
selected with positive probability, and partition Cw,x into subsets that have identical choice-
dependent potential outcome probabilities. There are V̄w,x ď |Cw,x | such subsets. Each subset
of this partition of Cw,x is a subset in the power set 2t0,1u , and so I associate each subset in this
partition with its associated index v P V . Denote these associated indices by the set Vw,x . Denote
the choice-dependent potential outcome probability associated with the subset labelled v by
P⃗Y p ¨ | v, w, xq P ∆pY ˆ Y q. Finally, define Q⃗Y p⃗y | w, xq “ cPt0,1u Pr⃗Y p⃗y | c, w, xqπc pw, xq.
ř
165
Define the random variable V P V according to
ÿ
QV pv | w, xq “ πc pw, xq if v P Vw,x ,
c : P⃗Y p ¨|c,w,xq“P⃗Y p ¨|v,w,xq
$
& P⃗Y p⃗y|v,w,xqQpV“v|w,xq if v P Vw,x and Q⃗ p⃗y | w, xq ą 0,
’
’ r
Q p⃗y|w,xq
⃗Y Y
QV pv | ⃗y, w, xq “
’
%0 otherwise.
’
Qpw, x, ⃗y, v, cq “ PW,X pw, xqQ⃗Y p⃗y | w, xqQV pV “ v | ⃗y, w, xqQC pc | v, w, xq.
We now check that this construction satisfies information set, expected utility maximization and
data consistency. First, information set is satisfied since QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc |
w, x, vq by construction. Next, for any pw, xq P W ˆ X and c P Cw,x , define vc,w,x P Vw,x to be the
label satisfying P⃗Y p ¨ | c, w, xq “ P⃗Y p ¨ | v, w, xq. For PC,⃗Y pc, ⃗y | w, xq ą 0, observe that
PC,⃗Y pc, ⃗y | w, xq “
Moreover, whenever PC,⃗Y pc, ⃗y | w, xq “ 0, Q⃗Y p⃗y | vc,w,x , w, xqQC pc | vc,w,x , w, xq “ 0. Therefore, data
166
consistency holds. Finally, by construction, for QC pC “ c | V “ vc,w,x , W “ w, X “ xq ą 0,
Qp⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “
QpV “ vc,w,x | ⃗
Y “ ⃗y, W “ w, X “ xqQp⃗Y “ ⃗y | W “ w, X “ xq
“
QpV “ vc,w,x | W “ w, X “ xq
r⃗
PpY “ ⃗y | C “ c, W “ w, X “ xq.
Therefore, expected utility maximization is satisfied since the inequalities in Lemma A.5.1 were
Lemma A.5.1 implies Theorem 1.2.1: Define the joint distribution Q as Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y |
c, w, xqPpc, w, xq. Then, rewrite conditions (i)-(ii) in Lemma A.5.1 as: for all c P t0, 1u and c1 ‰ c,
ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY
c P C with πc pw, xq “ 0 are satisfied. Next, inequalities involving c P C with πc pw, xq ą 0 can be
equivalently rewritten as
ÿ ÿ
QC,⃗Y p⃗y | c, w, xqUpc, ⃗y; wq ě QC,⃗Y p⃗y | c, w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY
ÿ ” ı
Q⃗Y p⃗y | c, w, xqUpc, ⃗y; wq “ EQ Upc, ⃗
Y; wq | C “ c, W “ w, X “ x ,
⃗yPY ˆY
ÿ ” ı
QY˚ p⃗y | c, w, xqUpc1 , ⃗y; wq “ EQ Upc1 , ⃗
Y; wq | C “ c, W “ w, X “ x .
⃗yPY ˆY
Lemma A.5.2. The decision maker’s choices are consistent with expected utility maximization behavior
at some strict preference utility function if and only if there exists strict preference utility functions U
satisfying
Up0,0;wq
i. PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq for all pw, xq P W ˆ X with π1 pw, xq ą 0,
167
Up0,0;wq
ii. Up0,0;wq`Up1,1;wq ď PY˚ p1 | 0, w, xq for all pw, xq P W ˆ X with π0 pw, xq ą 0.
Proof. This is an immediate consequence of applying Lemma A.5.1 to analyzing expected utility
maximization at strict preferences in a screening decision with a binary outcome. For all pw, xq P
Up0,0;wq
W ˆ X with π1 pw, xq ą 0, Lemma A.5.1 requires PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq . For all
Up0,0;wq
pw, xq P W ˆ X with π0 pw, xq ą 0, Lemma A.5.1 requires Up0,0;wq`Up1,1;wq ď PY˚ p1 | 0, w, xq.
result.
By Lemma A.5.2, the human DM’s choices are consistent with expected utility maximization
behavior if and only if there exists strict preference utility functions U satisfying
Up0, 0; wq
max PY˚ p1 | 1, w, xq ď ď min PY˚ p1 | 0, w, xq
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq xPX 0 pwq
for all w P W . These inequalities are only non-empty if the stated conditions in Theorem 1.3.1 are
satisfied. The characterization of the identified set of utility functions also follows from Lemma
A.5.2. l
Under Assumption 1.3.1, PY˚ p1 | w, x, zq “ PY˚ p1 | w, x, z̃q “ PY˚ p1 | w, xq for all pw, xq P W ˆ X
for all z P Z .
drop the conditioning on pw, xq P W ˆ X to simplify notation, and so this argument applies
conditionally on each pw, xq P W ˆ X . The screening decision setting establishes the model
correspondence is given by G´1 pz, c, yq “ ty˚ : y “ y˚ 1tc “ 1uu. The observable joint distribution
pZ, C, Yq „ P characterizes a random set G´1 pZ, C, Yq via the generalized likelihood TpA | Z “
zq “ P pC, Yq : G´1 pz, C, Yq X A ‰ H for all A P 2Y . Artstein’s Theorem implies that there exists
` ˘
a random variable Y ˚ that rationalizes the observed data through the model correspondence G if
168
and only if there exists some Y ˚ „ Pr satisfying
PpAq
r ď TpA | Z “ zq for all A P 2Y and z P Z .
A sharp characterization of identified set for the marginal distribution of Y ˚ is then given by
Ş
H P pPY˚ p ¨qq “ zPZ PrY˚ p ¨ | zq. For Y “ t0, 1u. these inequalities give for each z P Z ,
r ˚ “ 0q ď PpC “ 0 | Z “ zq ` PpC “ 1, Y “ 0 | Z “ zq
PpY
r ˚ “ 0q ` PpY
Since PpY r ˚ “ 1q “ 1, these inequalities may be further rewritten as requiring for each
zPZ
This delivers sharp bonds on the marginal distribution of Y ˚ conditional on any z P Z since the
instrument is assumed to be independent of the outcome of interest. The sharpness of the bounds
To prove this result, I first establish the following lemma, and then show Theorem 1.4.1 follows as
a consequence.
Lemma A.5.3. Assume Pr⃗Y p ¨ | w, xq ą 0 for all Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q and all pw, xq P
W ˆ X . The decision maker’s choices are consistent with expected utility maximization behavior at
inaccurate beliefs if and only if there exists a utility function U P U , prior beliefs Q⃗Y p ¨ | w, xq P ∆pY ˆ Y q
for all pw, xq P W ˆ X , Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X satisfying
ÿ ÿ
Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xqUpc, ⃗y; wq ě Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xqUpc1 , ⃗y; wq,
⃗yPY ˆY ⃗yPY ˆY
169
Proof of Lemma A.5.3: Necessity To show necessity, we apply the same steps as the proof of
necessity for Lemma A.5.1. First, by an analogous argument as given in the proof of necessity for
Lemma A.5.1, it is without loss of generality to assume the private information V P V has finite
support. Second, following the same steps as the proof of necessity for Lemma A.5.1, I arrive at
ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y, w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY
Then, we immediately observe that QC,⃗Y pc, ⃗y | w, xq “ QC pc | ⃗y, w, xqQ⃗Y p⃗y | w, xq “ PrC pc |
⃗y, w, xqQ⃗Y p⃗y | w, xq, where the last equality follows via data consistency with inaccurate beliefs.
Proof of Lemma A.5.3: Sufficiency To show sufficiency, suppose that the conditions in Lemma
A.5.3 are satisfied at some Pr⃗Y p ¨ | c, w, xq P Bc,w,x for c P t0, 1u, pw, xq P W ˆ X and some
w, xqPpw, xq, where PrC p ¨ | ⃗y, w, xq is defined in the statement of the Lemma. Given the inequalities
set, expected utility maximization behavior and data consistency (Definition 1.2.3) for the con-
under the constructed joint distribution pW, X, C, ⃗ r Let v P V :“ t1, . . . , 3u index the possible
Yq „ P.
selected with positive probability, and partition Cw,x into subsets that have identical constructed
choice-dependent outcome probabilities. There are V̄w,x ď |Cw,x | such subsets. Associate each
subset in this partition with its associated index v P V and denote the possible values as Vw,x .
Denote the choice-dependent outcome probability associated with the subset labelled v by P̃⃗Y p ¨ |
v, w, xq P ∆pY ˆ Y q.
170
Define the random variable V P V according to
ÿ
QpV “ v | w, xq “ PrC pc | w, xq if v P Vw,x ,
c : Pr⃗Y p ¨|c,w,xq“ Pr⃗Y p ¨|v,w,xq
$
& P⃗Y p⃗y|v,w,xqQpV“v|w,xq if v P Vw,x and Q⃗ p⃗y | w, xq ą 0,
’
’ r
Q p⃗y|w,xq
⃗Y Y
QpV “ v | ⃗y, w, xq “
’
%0 otherwise.
’
We now check that this construction satisfies information set, expected utility maximization and
data consistency. First, information set is satisfied since QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc |
w, x, vq by construction. Next, for any pw, xq P W ˆ X and c P Cw,x , define vc,w,x P Vw,x to be the
label satisfying Pr⃗Y p ¨ | c, w, xq “ Pr⃗Y p ¨ | v, w, xq. For PrC,⃗Y pc, ⃗y | w, xq ą 0, observe that
PrC,⃗Y pc, ⃗y | w, xq “
Moreover, whenever PrC,⃗Y pc, ⃗y | w, xq “ 0, Q⃗Y p⃗y | vc,w,x , w, xqQC pc | vc,w,x , w, xq “ 0. Therefore, data
consistency holds (Definition 1.2.3) holds for the constructed joint distribution pW, X, C, ⃗
Yq „ P.
r
data consistency at inaccurate beliefs (Definition 1.4.1). Finally, for QpC “ c | V “ vc,w,x , W “
171
w, X “ xq ą 0,
Qp⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “
QpV “ vc,w,x | ⃗
Y “ ⃗y, W “ w, X “ xqQp⃗Y “ ⃗y | W “ w, X “ xq
“
QpV “ vc,w,x | W “ w, X “ xq
r⃗
PpY “ ⃗y | C “ c, W “ w, X “ xq
ÿ ÿ
Q⃗Y p⃗y | v, w, xqUpc, ⃗y; wq ě Q⃗Y p⃗y | v, w, xqUpc1 , ⃗y; wq,
⃗yPY ˆY ⃗yPY ˆY
which follows from the fact that Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xq “ QC,⃗Y pc, ⃗y | w, xq by data consistency
r and Qp⃗
and the construction of P, r⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “ PpY “ ⃗y | C “ c, W “
Rewrite inequalities in Lemma A.5.3 in terms of weights: Define Pr as in the statement of the
Theorem. Rewrite the condition in Lemma A.5.3 as: for all c P t0, 1u and c̃ ‰ c,
Notice that if πc pw, xq “ 0, then PrC,⃗Y pc, ⃗y | w, xq “ 0. Therefore, the inequalities involving c P t0, 1u
with πc pw, xq “ 0 are trivially satisfied. Next, inequalities involving c P t0, 1u with πc pw, xq ą 0
172
Proof of Theorem 1.4.2
Under the stated conditions, the necessity statement in Theorem 1.4.1 implies that for all pw, xq P
W ˆX
ωp0; w, xqUp0, 0; w, xq
PY˚ p1 | 1, w, xq ď ď PrY˚ p1 | 0, w, xq.
ωp0; w, xqUp0, 0; w, xq ` ωp1; w, xqUp1, 1; w, xq
The result then follows by applying the bounds on PrY˚ p1 | 0, w, xq in a screening decision with a
binary outcome. l
where ℓ⊺ pp˚ , U ˚ q is defined in the statement of the proposition and PC,Y˚ p1, 1q, PC,Y˚ p0, 1q are
the dw d x vectors whose elements are the moments PC,Y˚ p1, 1 | w, xq :“ PpC “ 1, Y ˚ “ 1 | W “
173
respectively. Therefore, the null hypothesis H0 : θpp˚ , U ˚ q “ θ0 is equivalent to the null hypothesis
D PrC,Y˚ p0, 1q satisfying ℓ⊺ pp˚ , U ˚ q PrC,Y˚ p0, 1q “ θ0 ´ βpp˚ , U ˚ q ´ ℓ⊺ pp˚ , U ˚ qPC,Y˚ p1, 1q and
¨ ˛
˚´PC,Y˚ p0, 1q‹
A PrC,Y˚ p0, 1q ď ˝ ‚.
PC,Y˚ p0, 1q
Next, we apply a change of basis argument. Define the full rank matrix Γ, whose first row is equal
174
A.6 Expected Utility Maximization Behavior with Continuous Charac-
teristics
In this section, I extend the setting described in Section 1.2 to allow for the characteristics X P X
to be continuously distributed. I focus this extension on the case of a screening decision for
simplicity. The outcome Y ˚ P Y , the choices C P C :“ t0, 1u and the characteristics W P W are still
finite. I now allow the characteristics X P X Ď Rdx to be continuously distributed. The random
the characteristics, choices and latent outcome. I assume the joint distribution P admits a density
for all W ˆ X .
The researcher observes the characteristics pW, Xq and the decision maker’s choice C in each
decision, but only observes the outcome Y ˚ if the decision maker selected C “ 1. Therefore, the re-
searcher observes the joint distribution pW, X, C, Yq „ P, where Y :“ Y ˚ ¨ 1tC “ 1u. The researcher
decision maker selects C “ 1, the researcher observes pW, X, C, Y ˚ q, and so B1,w is a singleton that
only contains the observable density ppx, y˚ | C “ 1, W “ wq. Over the choice c “ 0, the set B0,w
is a set of joint densities p̃px, y˚ | C “ 0, W “ wq that (i) ppx, y˚ | C “ 0, W “ wq P B0,w , and (ii)
ř ˚ ˚
y˚ PY p̃px, y | C “ 0, W “ wq “ ppX “ x | C “ 0, W “ wq for all p̃px, y | C “ 0, W “ wq P B0,w .
The expected utility maximization model requires minimal modification to account for the con-
tinuous characteristics. The definition of a utility function and private information is unchanged.
Definition 1.2.3 is extended to ask whether there exists a random vector pW, X, C, V, Y ˚ q „ Q that
admits a density qpw, x, v, c, y˚ q that is consistent with the observable data ("Data Consistency")
by replacing the probability mass functions with the appropriate probability density functions.
Analogously, the characterization of expected utility maximization behavior also extends directly.
Theorem A.6.1. The decision maker’s choices are consistent with expected utility maximization behavior if
and only if there exists a utility function U P U and prpx, y˚ | C “ 0, W “ wq P B0,w for all w P W such
175
that
EQ rUpc, Y ˚ ; Wq | C “ c, W “ w, X “ xs ě EQ Upc1 , Y ˚ ; Wq | C “ c, W “ w, X “ x ,
“ ‰
for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where EQ r¨s is the expectation under Q
qpw, x, 1, y˚ q “ ppw, x, 1, y˚ q,
Proof. The proof of this result is analogous to the proof of Theorem 1.2.1 Towards this, I first extend
Lemma A.5.1 to the case with continuous characteristics (which analyzes the case with multiple
Lemma A.6.1. The decision maker’s choices are consistent with expected utility maximization behavior if
ÿ ÿ
pC,Y˚ pc, y˚ | w, xqUpc, y˚ ; wq ě pC,Y˚ pc, y˚ | w, xqUpc1 , y˚ ; wq.
y˚ PY y˚ PY
ii. For all c P C zC y and w P W , there exists prC,Y˚ p ¨ | w, cq P Bw,c such that
ÿ ÿ
prC,Y˚ pc, y˚ | w, xqUpc, y˚ ; wq prC,Y˚ pc, y˚ | w, xqUpc1 , y˚ ; wq
y˚ PY y˚ PY
for all x P X and c1 ‰ c, where prC,Y˚ pc, y˚ | w, xq “ prX,Y˚ px, y˚ | w, cqpW,C pw, cq{pW,X pw, xq.
Proof of Necessity for Lemma A.6.1: The proof of necessity follows the same argument as the
proof of necessity of Lemma A.5.1 below by replacing the probability mass function Q with the
density q. l
Proof of Sufficiency for Lemma A.6.1: The proof of sufficiency follows the proof of sufficiency of
Lemma A.5.1 below by again simply replacing all probability mass functions with the appropriate
density function. l
176
Theorem A.6.1 then follows directly from Lemma A.6.1 by considering the special case with
Theorem A.6.1 can be applied to analyze the special case in which the latent outcome Y ˚ is
binary. The bounds on the unobservable choice-dependent outcome probability B0,w are simply
w, C “ 0q. From Theorem A.6.1, the decision maker’s choices are consistent with expected utility
maximization behavior at some strict preference utility function U if and only if for all w P W
Up0, 0; wq
sup ppY ˚ “ 1 | C “ 1, W “ w, X “ xq ď (A.5)
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq
Up0, 0; wq
ď inf ppY ˚ “ 1 | C “ 0, W “ w, X “ xq, (A.6)
Up0, 0; wq ` Up1, 1; wq xPX 0 pwq
densities satisfying the bounds Bc“0,w . This provides a sharp characterization of the identified set
of strict preference utility functions in terms of “intersection bounds.” Therefore, researchers test
whether a decision maker’s choices are consistent with expected utility maximization behavior at
accurate beliefs using inference procedures developed in, for example, Chernozhukov et al. (2013).
Finally, Theorem A.6.1 can also be simplified through dimension reduction over the continu-
that partition the characteristic space into level sets tx P X : Dw pxq “ du. In a binary screening
decision, if the decision maker’s choices are consistent with expected utility maximization be-
havior at some strict preference utility function U that satisfies an exclusion restriction on the
characteristics X, then
Up0, 0; wq
max PpY “ 1 | C “ 1, W “ w, Dw pXq “ dq ď (A.7)
dPt1,...,dw u Up0, 0; wq ` Up1, 1; wq
Up0, 0; wq
ď PpY “ 1 | C “ 1, W “ w, Dw pXq “ dq (A.8)
Up0, 0; wq ` Up1, 1; wq
177
A.7 Alternative Bounds on the Missing Data
In the main text, I discussed how researchers can construct bounds on the missing data using a
randomly assigned instrument. I now discuss alternative assumptions under which researchers
can construct bounds on the missing data. I define these alternative assumptions for the leading
The simplest empirical strategy for constructing bounds on the unobservable choice-dependent
Assumption A.7.1. For each pw, xq P W ˆ X with 0 ă π1 pw, xq ă 1, there exists κw,x ě 0 satisfying
The parameter κw,x ě 0 specifies how different the unobservable choice-dependent outcome
release decisions, setting κw,x “ 1 means that the researcher is willing to assume that the
two times the conditional probability of pretrial misconduct among release defendants. Such
bounding assumptions are used in, for example, Kleinberg et al. (2018a), and Jung et al. (2020a).
In practice, the researcher may wish to test whether the decision maker is making systematic
prediction mistakes under various choices of the parameter κw,x , and thereby conduct a sensi-
tivity analysis of how robust the behavioral conclusions are to various assumptions about the
Appendix A.9, where I report how the fraction of judges for whom we can reject expected utility
maximization behavior varies as the parameter κw,x varies in the New York City pretrial release
setting.
Finally, Assumption A.7.1 has a natural interpretation under the expected utility maximization
model. The parameter κw,x bounds the average informativeness of the decision maker’s private
information V P V .
178
Proposition A.7.1. Consider a screening decision with a binary outcome and suppose Assumption A.7.1
holds. If the decision maker’s choices are consistent with expected utility maximization behavior at some
private information V P V and joint distribution pW, X, V, C, Y ˚ q „ Q, then for each pw, xq P W ˆ X
QpC “ 0 | Y ˚ “ 1, W “ w, X “ xq{QpC “ 1 | Y ˚ “ 1, W “ w, X “ xq
QpC “ 0 | W “ w, X “ xq{QpC “ 1 | W “ w, X “ xq
QpY ˚ “ 1 | C “ 0, W “ w, X “ xq
“ .
QpY ˚ “ 1 | C “ 1, W “ w, X “ xq
Since the decision maker’s choices are consistent with expected utility maximization behavior,
pW, X, V, C, Y ˚ q „ Q satisfies the data consistency condition in Definition 1.2.3 at some P̃Y˚ p¨ |
0, w, xq satisfying the bounds in Assumption A.7.1 for each pw, xq P W ˆ X . Therefore, QpY ˚ “
QpY ˚ “1|C“0,W“w,X“xq
1 | C “ 0, W “ w, X “ xq “ P̃Y˚ p1 | 0, w, xq and it immediately follows that QpY ˚ “1|C“1,W“w,X“xq “
P̃Y˚ p1|0,w,xq
PY˚ p1|1,w,xq P r1, 1 ` κw,x s under Assumption A.7.1. This proves (a). To show (b), notice that the
The direct imputation bounds imply bounds on the relative odds ratio of the decision maker’s
choice probabilities conditional on the outcome and the characteristics relative to their choice
probabilities conditional on only the characteristics. This places a bound on the average infor-
mativeness of the decision maker’s private information under the expected utility maximization
179
model since
QpC “ 1 | Y ˚ “ 1, W “ w, X “ xq “ EQ rQpC “ 1 | V “ v, W “ w, X “ xq | Y ˚ “ 1, W “ w, X “ xs
QpC “ 1 | Y ˚ “ 0, W “ w, X “ xq “ EQ rQpC “ 1 | V “ v, W “ w, X “ xq | Y ˚ “ 0, W “ w, X “ xs
under the Information Set condition in Definition 1.2.3. In this sense, the direct imputation bounds
are related to classic approaches for modelling violations of unconfoundedness in causal inference
there exists some unobserved characteristics V that governs selection and places bounds on the
magnitude of the relative odds ratio of the propensity score conditional on V and the observable
characteristics versus the propensity score conditional on just the observable characteristics. See
Imbens (2003), which develops a tractable parametric model for such a violation of unconfound-
edness in a treatment assignment problem. Kallus et al. (2018) and Yadlowsky et al. (2020) derive
bounds on the conditional average treatment effect and average treatment effect under related
models for violations of unconfoundedness, and provide methods for inference on the derived
bounds.
In some empirical applications, the researcher may observe an additional proxy outcome Ỹ P Ỹ ,
which does not suffer from the missing data problem but is correlated with the outcome Y ˚ P Y .
By specifying bounds on the relationship between the proxy outcome Ỹ and the outcome Y ˚ , the
Proxy outcomes are common in medical and consumer lending settings. For example, Mul-
lainathan and Obermeyer (2021) observe each patient’s longer term health outcomes regardless
of whether a stress test for a heart attack was conducted during a particular emergency room
visit. A patient’s longer term health outcomes are related to whether the patient actually had
a heart attack, no matter the testing decisions of doctors. Similarly, Chan et al. (2021) observe
whether each patient had a future pneumonia diagnosis within one week of an initial examination,
regardless of whether a doctor ordered an MRI at the initial examination. Future pneumonia
diagnoses may be a useful proxy for whether the doctor failed to correctly diagnose pneumonia
180
during the initial examination. In mortgage approvals, Blattner and Nelson (2021) observe each
loan applicant’s default performance on other credit products, regardless of whether each loan
applicant was approved for a mortgage. A loan applicant’s default performance on other credit
products is related with whether they would have defaulted on the mortgage.
pw, x, yrq P W ˆ X ˆ Yr .
Over decisions in which the decision maker selected C “ 1, the researcher observes the joint
distribution of the proxy outcome and the outcome PpỸ “ ỹ, Y ˚ “ y˚ | C “ 1, W “ w, X “ xq.
By placing assumptions on how the joint distribution of the proxy outcome and the outcome
conditional on C “ 0 is bounded by the observable joint distribution of the proxy outcome and the
outcome conditional on C “ 1, the researcher can construct bounds on the unobservable choice-
dependent outcome probabilities of the form given in Assumption 1.2.1. Let PY˚ py˚ | ỹ, c, w, xq :“
Assumption A.7.3 (Proxy Bounds). For each pw, x, ỹq P W ˆ X ˆ Ỹ satisfying 0 ă π1 pỹ, w, xq ă 1,
Proposition A.7.2. Consider a binary screening decision in which Assumptions A.7.2-A.7.3 hold. For
each pw, xq P W ˆ X ,
ÿ
p1 ` κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xqPỸ pỹ | 0, w, xq ď PY˚ p1 | 0, w, xq,
ỹPỸ
ÿ
PY˚ p1 | 0, w, xq ď p1 ` κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xqPỸ pỹ | 0, w, xq.
ỹPỸ
The advantage of this approach is that it may be easier to express domain-specific knowledge
through the use of proxy outcomes. For example, in the mortgage approvals setting in Blattner and
Nelson (2021), the proxy bounds summarize the extent to which the mortgage default rate among
accepted applicants that also defaulted on other credit products differs from the counterfactual
181
mortgage default rate among rejected applicants that also defaulted on other credit products. In
the medical testing setting in Mullainathan and Obermeyer (2021), the proxy bounds summarize
the extent to which heart attack rate among tested patients that went on to die within 30 days of
their emergency room visit differs from the heart attack rate among untested patients that went
182
A.8 Summary Figures and Tables for New York City Pretrial Release
Figure A.8: Histogram of number of cases heard by each judge in the top 25 judges.
Notes: This figure plots a histogram of the number of cases heard per judge in the top 25 judges that are the focus of
my empirical analysis in the New York City pretrial release data. Every judge in the top 25 made at least 5,000 pretrial
release decisions over the sample period. See Section 1.5.2 for further details. Source: Rambachan and Ludwig (2021).
183
Figure A.9: Receiver-operating characteristic (ROC) curves for ensemble prediction functions
Notes: This figure plots the Receiver-Operating Characteristic (ROC) curves for the ensemble prediction that predicts
failure to appear among defendants that were released by the top 25 judges. It reports the ROC curve for the ensemble
prediction function constructed within race-by-age cells and race-by-felony charge cells separately. Age is binarized
into young and older defendants, where older defendants are defined as defendants older than 25 years. The ensemble
prediction function is constructed over cases heard by the remaining bail judges and evaluated out-of-sample on cases
heard by the top 25 judges. The ensemble prediction function averages the predictions of a random forest, which is
estimated using the the R package ranger at the default hyperparameter values (Wright and Ziegler, 2017), and an
elastic net model, whose hyperparameters are tuned using three-fold cross-validation. The ROC curve plots the false
positive rate on the x-axis and the true positive rate on the y-axis. The out-of-sample area under the curve (AUC)
on all defendants equals 0.693 for the ensemble prediction function constructed over race-by-age cells and 0.694 for
the ensemble prediction function constructed over race-by-felony cells. See Section 1.5.3 for further details. Source:
Rambachan and Ludwig (2021).
184
All Defendants White Defendants Black Defendants
Estimation Top Estimation Top Estimation Top
Sample Judges Sample Judges Sample Judges
(1) (2) (3) (4) (5) (6)
Released before trial 0.720 0.736 0.757 0.777 0.687 0.699
Defendant Characteristics
White 0.475 0.481 1.000 1.000 0.000 0.000
Female 0.173 0.173 0.154 0.152 0.190 0.192
Age at Arrest 31.95 31.75 32.03 31.88 31.87 31.63
Arrest Charge
Number of Charges 1.152 1.167 1.187 1.217 1.119 1.121
Felony Charge 0.372 0.367 0.367 0.356 0.376 0.377
Any Drug Charge 0.253 0.224 0.253 0.217 0.253 0.230
Any DUI Charge 0.047 0.049 0.070 0.072 0.027 0.027
Any Violent Crime Charge 0.375 0.395 0.358 0.379 0.390 0.410
Property Charge 0.130 0.132 0.122 0.123 0.138 0.140
185
Defendant Priors
Any FTA 0.516 0.497 0.443 0.419 0.582 0.570
Number of FTAs 2.177 2.034 1.633 1.492 2.670 2.537
Any Misdemeanor Arrest 0.683 0.667 0.615 0.596 0.744 0.734
Any Misdemeanor Conviction 0.383 0.368 0.334 0.315 0.427 0.418
Any Felony Arrest 0.581 0.566 0.503 0.482 0.652 0.644
Any Felony Conviction 0.285 0.271 0.234 0.215 0.331 0.323
Any Violent Felony Arrest 0.398 0.387 0.306 0.292 0.481 0.476
Any Violent Felony Conviction 0.119 0.114 0.084 0.078 0.150 0.147
Total Cases 569,256 243,118 270,704 117,073 298,552 126,045
Table A.2: Summary statistics comparing the main estimation sample and cases heard by the top 25 judges, broken out by defendant
race.
Notes: This table provides summary statistics about defendant and case characteristics for the main estimation sample and cases heard by the top 25 judges in the
NYC pretrial release data for all defendants and separately by defendant race. See Section 1.5.2 for further discussion. Source: Rambachan and Ludwig (2021).
Released Detained
All Defendants Defendants Defendants
Estimation Top Estimation Top Estimation Top
Sample Judges Sample Judges Sample Judges
(1) (2) (3) (4) (5) (6)
Released before trial 0.720 0.736 1.000 1.000 0.000 0.000
Defendant Characteristics
White 0.475 0.481 0.499 0.508 0.412 0.407
Female 0.173 0.173 0.199 0.197 0.107 0.106
Age at Arrest 31.95 31.75 31.22 31.20 33.82 33.29
Arrest Charge
Number of Charges 1.152 1.167 1.148 1.162 1.161 1.182
Felony Charge 0.372 0.367 0.288 0.288 0.588 0.586
Any Drug Charge 0.253 0.224 0.229 0.204 0.314 0.279
Any DUI Charge 0.047 0.049 0.062 0.063 0.010 0.010
Any Violent Crime Charge 0.375 0.395 0.388 0.409 0.341 0.355
Property Charge 0.130 0.132 0.115 0.114 0.171 0.181
186
Defendant Priors
Any FTA 0.516 0.497 0.409 0.395 0.793 0.784
Number of FTAs 2.177 2.034 1.362 1.295 4.284 4.103
Any Misdemeanor Arrest 0.683 0.667 0.610 0.598 0.871 0.863
Any Misdemeanor Conviction 0.383 0.368 0.284 0.278 0.637 0.621
Any Felony Arrest 0.581 0.566 0.487 0.477 0.824 0.814
Any Felony Conviction 0.285 0.271 0.200 0.194 0.505 0.487
Any Violent Felony Arrest 0.398 0.387 0.315 0.309 0.614 0.608
Any Violent Felony Conviction 0.119 0.114 0.081 0.080 0.216 0.210
Total Cases 569,256 243,118 410,394 179,143 158,862 63,975
Table A.3: Summary statistics for released and detained defendants in the main estimation sample and for cases heard by the top 25
judges.
Notes: This table provides summary statistics about defendant and case characteristics for the main estimation sample and the cases heard by the top 25 judges in the
NYC pretrial release data for all defendants and by whether the defendant was released or detained. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants White Defendants Black Defendants
187
Total Cases 410,394 179,143 205,174 91,026 205,220 88,117
Table A.4: Summary statistics of misconduct rates among released defendants in the main estimation sample and cases heard by the
top 25 judges.
Notes: This table summarizes the observed misconduct rates among released defendants for the main estimation sample and the cases heard by the top 25 judges in
the New York City pretrial release data for all defendants and separately by the race of the defendant. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants White Defendants Black Defendants
Full Estimation Full Estimation Full Estimation
Sample Sample Sample Sample Sample Sample
(1) (2) (3) (4) (5) (6)
Released before trial 0.736 0.720 0.765 0.757 0.691 0.687
Defendant Characteristics
White 0.457 0.475 1.000 1.000 0.000 0.000
Female 0.169 0.173 0.153 0.154 0.184 0.190
Age at Arrest 32.06 31.95 32.06 32.03 31.88 31.87
Arrest Charge
Number of Charges 1.165 1.152 1.176 1.187 1.114 1.119
Felony Charge 0.335 0.372 0.332 0.367 0.346 0.376
Any Drug Charge 0.244 0.253 0.251 0.253 0.252 0.253
Any DUI Charge 0.053 0.047 0.074 0.070 0.028 0.027
Any Violent Crime Charge 0.365 0.375 0.348 0.358 0.380 0.390
Property Charge 0.135 0.130 0.127 0.122 0.145 0.138
188
Defendant Priors
Any FTA 0.499 0.516 0.442 0.443 0.586 0.582
Number of FTAs 2.099 2.177 1.635 1.633 2.707 2.670
Any Misdemeanor Arrest 0.668 0.683 0.616 0.615 0.747 0.744
Any Misdemeanor Conviction 0.371 0.383 0.335 0.334 0.430 0.427
Any Felony Arrest 0.565 0.581 0.502 0.503 0.654 0.652
Any Felony Conviction 0.273 0.285 0.232 0.234 0.334 0.331
Any Violent Felony Arrest 0.384 0.398 0.306 0.306 0.484 0.481
Any Violent Felony Conviction 0.114 0.119 0.084 0.084 0.152 0.150
Total Cases 758,027 569,256 347,006 270,704 370,793 298,552
Table A.5: Summary statistics in the universe of all cases subject to a pretrial release decision and main estimation sample in the NYC
pretrial release data, broken out by defendant race.
Notes: This table provides summary statistics about defendant and case characteristics for the sample of all cases subject to a pretrial release decision and the main
estimation sample in the NYC pretrial release data, broken out for all defendants and by defendant race. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants Released Defendants Detained Defendants
Full Estimation Full Estimation Full Estimation
Sample Sample Sample Sample Sample Sample
(1) (2) (3) (4) (5) (6)
Released before trial 0.736 0.720 1.000 1.000 0.000 0.000
Defendant Characteristics
White 0.457 0.475 0.476 0.499 0.406 0.412
Female 0.169 0.173 0.192 0.199 0.105 0.107
Age at Arrest 32.06 31.95 31.41 31.22 33.90 33.82
Arrest Charge
Number of Charges 1.165 1.152 1.166 1.148 1.161 1.161
Felony Charge 0.335 0.372 0.258 0.288 0.549 0.588
Any Drug Charge 0.244 0.253 0.221 0.229 0.307 0.314
Any DUI Charge 0.053 0.047 0.068 0.062 0.011 0.010
Any Violent Crime Charge 0.365 0.375 0.377 0.388 0.333 0.341
Property Charge 0.135 0.130 0.119 0.115 0.179 0.171
189
Defendant Priors
Any FTA 0.499 0.516 0.394 0.409 0.792 0.793
Number of FTAs 2.099 2.177 1.310 1.362 4.304 4.284
Any Misdemeanor Arrest 0.668 0.683 0.596 0.610 0.870 0.871
Any Misdemeanor Conviction 0.371 0.383 0.275 0.284 0.639 0.637
Any Felony Arrest 0.565 0.581 0.472 0.487 0.823 0.824
Any Felony Conviction 0.273 0.285 0.190 0.200 0.503 0.505
Any Violent Felony Arrest 0.384 0.398 0.302 0.315 0.612 0.614
Any Violent Felony Conviction 0.114 0.119 0.077 0.081 0.216 0.216
Total Cases 758,027 569,256 558,167 410,394 199,860 158,862
Table A.6: Summary statistics for released and detained defendants in the universe of all cases subject to a pretrial release decision and
the main estimation sample in the NYC pretrial release data.
Notes: This table provides summary statistics about defendant and case characteristics for the sample of all cases subject to a pretrial release decision and the main
estimation sample in the NYC pretrial release data, broken out for all defendants and by whether the defendant was released or detained. See Section 1.5.2 for
further discussion. Source: Rambachan and Ludwig (2021).
Table A.7: Balance check estimates for the quasi-random assignment of judges for all defendants
and by defendant race.
White Black
All Defendants Defendants Defendants
(1) (2) (3)
Defendant Characteristics
Black ´0.00011
(0.00008)
Female 0.000003 0.00005 ´0.00003
(0.00013) (0.00017) (0.00017)
Age ´0.00001 ´0.00002 ´0.000002
(0.000003) (0.00001) (0.000004)
Arrest Charge
Number of Charges ´0.000003 ´0.000003 0.000003
(0.00001) (0.00001) (0.00003)
Felony Charge 0.00009 ´0.00012 0.00027
(0.00015) (0.00017) (0.00018)
Any Drug Charge ´0.00012 ´0.00010 ´0.00013
(0.00013) (0.00018) (0.00016)
Any Violent Crime Charge ´0.00004 ´0.00013 0.00004
(0.00010) (0.00015) (0.00014)
Any Property Charge ´0.00033 ´0.00029 ´0.00035
(0.00016) (0.00019) (0.00025)
Any DUI Charge 0.00044 0.00039 0.00028
(0.00024) (0.00027) (0.00039)
Defendant Priors
Prior FTA ´0.00004 ´0.00011 0.00003
(0.00010) (0.00016) (0.00013)
Prior Misdemeanor Arrest 0.00007 0.00003 0.00011
(0.00010) (0.00013) (0.00015)
Prior Felony Arrest 0.00006 0.00003 0.00009
(0.00014) (0.00021) (0.00019)
Prior Violent Felony Arrest ´0.00013 ´0.00008 ´0.00016
(0.00011) (0.00019) (0.00016)
Prior Misdemeanor Conviction 0.00016 0.00021 0.00011
(0.00013) (0.00017) (0.00016)
Prior Felony Conviction ´0.00019 0.00011 ´0.00040
(0.00012) (0.00018) (0.00015)
Prior Violent Felony Conviction ´0.00008 ´0.00024 0.00002
(0.00015) (0.00021) (0.00020)
Joint p-value 0.06953 0.15131 0.41840
Court ˆ Time FE ✓ ✓ ✓
Cases 569,256 270,704 298,552
Notes: This table reports OLS estimates for regressions of constructed judge leniency on defendant and case characteris-
tics in the main estimation sample. These regressions are estimated over all defendants and separately by defendant
race. Standard errors (in parentheses) are clustered at the defendant-judge level. The joint p-value is based on the
F-statistic for whether all defendant and case characteristics are jointly significant. See Section 1.5.3 for further details.
Source: Rambachan and Ludwig (2021).
190
Table A.8: Balance check estimates for the quasi-random assignment of judges by defendant race
and age.
192
A.9 Additional Empirical Results for New York City Pretrial Release
I now present additional empirical results on the behavior of judges in the New York City pretrial
release system.
Section 1.6.2 of the main text compared the total expected social welfare under the observed
release decisions by judges in new York City against the total expected social welfare under
counterfactual algorithmic decisions, conducting this exercise over race-by-age cells and deciles
of predicted failure to appear risk. In this section of the Supplement, I report the results of the
same analysis over race-by-felony charge cells and deciles of predicted failure to appear risk for
Figure A.10a plots the improvement in worst-case total expected social welfare under the algo-
rithmic decision rule that fully replaces judges who were found to make detectable prediction
For most values of the social welfare function, the algorithmic decision rule dominates the
observed choices of these judges, but for social welfare costs of unnecessary detentions ranging
over U ˚ p0, 0q P r0.3, 0.7s, the algorithmic decision rule either leads to no improvement or strictly
lowers worst-case expected total social welfare relative to the judges’ observed decisions.
Figure A.10b therefore plots the improvement in worst-case total expected social welfare under
the algorithmic decision rule that only corrects detectable prediction mistakes at the tails of the
predicted failure to appear risk distribution against the observed release decisions of these judges.
As found in the main text, the algorithmic decision rule that only corrects detectable prediction
mistakes appears to weakly dominate the observed release decisions of judges, no matter the
193
Automating Judges Who Do Not Make Prediction Mistakes
I next compare welfare effects of automating the release decisions of judges whose choices were
found to be consistent with expected utility maximization behavior at accurate beliefs about
Figure A.10: Ratio of total expected social welfare under algorithmic decision rules relative to
observed release decisions of judges that make detectable prediction mistakes over race-by-felony
charge cells.
Notes: This figure reports the change in worst-case total expected social welfare under two algorithmic decision rules
against the judge’s observed release decisions among judges who were found to make detectable prediction mistakes.
Worst case total expected social welfare under each decision rule is computed by first constructing a 95% confidence
interval for total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence
interval. These decisions rules are constructed and evaluated over race-by-felony cells and deciles of predicted failure
to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not fail to appear
to in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges that make
mistakes, and the dashed lines report the minimum and maximum change across judges. See Section 1.6.2 of the main
text and Supplement A.9.1 for further details. Source: Rambachan and Ludwig (2021).
194
failure to appear risk. Figure A.11 plots the improvement in worst-case total expected social
welfare under the algorithmic decision rule that fully replaces these judges against their observed
release decisions. As in the main text, I find that automating these judge’s release decisions may
strictly lower worst-case expected total social welfare for a large range of social welfare costs of
unnecessary detentions,
Figure A.11: Ratio of total expected social welfare under full automation decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes over race-by-felony
charge cells.
Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization behavior at accurate beliefs about failure to appear risk. Worst case
total expected social welfare under each decision rule is computed by first constructing a 95% confidence interval for
total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence interval.
These decisions rules are constructed and evaluated over race-by-felony cells and deciles of predicted risk. The x-axis
plots the relative social welfare cost of detaining a defendant that would not fail to appear to in court U ˚ p0, 0q (i.e., an
unnecessary detention). The solid line plots the median change across judges that make mistakes, and the dashed lines
report the minimum and maximum change across judges. See Section 1.6.2 of the main text and Supplement A.9.1 for
further details. Source: Rambachan and Ludwig (2021).
195
A.9.2 Identifying Prediction Mistakes: Direct Imputation
Sections 1.5-1.4 of the main text tested whether the pretrial release decisions of judges in New
York City were consistent with expected utility maximization behavior at accurate beliefs about
failure to appear risk under various exclusion restrictions on their preferences by constructing
bounds on the failure to appear rate of detained defendants using the quasi-random assignment
of judges.
I now show how the same test may be conducted using direct imputation (Supplement A.7.1)
to construct bounds on the failure to appear rate of detained defendants. The key input in
constructing the direct imputation bounds is the parameter κw,d ě 0 for each value W “ w,
Dw pXq “ d, which bounds the failure to appear rate among detained defendants relative to the
observed failure to appear rate among released defendants. I assume that κw,d ” κ does not vary
across values W “ w, Dw pXq “ d and conduct my analysis under various assumptions on the
sensitivity analysis on the informativeness of the judges’ private information: how do conclusions
about behavior change as we allow judges to have more accurate private information?
What Fraction of Judges Make Prediction Mistakes? Using the direct imputation bounds, I
test whether the release decisions of each judge in the top 25 are consistent with expected utility
maximization behavior at strict preferences under preferences that (i) do not depend on any
observable characteristics, (ii) depend on only the defendant’s race, (iii) depend on both the
defendant’s race and age, or (iv) depend on on the defendant’s race and whether the defendant
was charged with a felony offense. I test the inequalities in Corollary 1.3.4 across deciles of
predicted risk with each possible W cell. As in the main text, I include inequalities that compare
the observed failure to appear rate among released defendants at predicted risk deciles six to ten
against the direct imputation bounds on the failure to appear rate among detained defendants
at predicted risk deciles one to five. I construct the variance-covariance matrix of the moments
using the empirical bootstrap conditional on the observable characteristics W and predicted risk
decile Dw pXq on the cases assigned to a particular judge. Figure A.12 reports the fraction of
judges in the top 25 for whom we can reject expected utility maximization behavior at strict
196
preferences under various assumption on which observable characteristics W affect the utility
function. The adjusted rejection rate reports the fraction of rejections after multiple hypotheses
using the Holm-Bonferroni step down procedure, which controls the family-wise error rate at the
5% level.
Figure A.12: Fraction of judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about failure to appear risk using direct imputation
bounds.
Notes: This figure summarizes the results for testing whether the release decisions of each judge in the top 25 are
consistent with expected utility maximization behavior at strict preference utility Upc, y˚ ; wq that (i) do not depend on
any observable characteristics, (ii) depend on the defendant’s race, (iii) depend on both the defendant’s race and age,
and (iv) depend on both the defendant’s race and whether the defendant was charged with a felony offense. Bounds on
the failure to appear rate among detained defendants are constructed using direct imputation (see Supplement A.7.1)
for κ “ t0, 1, . . . , 10u. I first construct the unadjusted rejection rate by testing whether the pretrial release decisions
of each judge in the top 25 are consistent with the moment inequalities in Corollary 1.3.4 at the 5% level using the
conditional least-favorable hybrid test. I construct the variance-covariance matrix of the moments using the empirical
bootstrap conditional on the payoff-relevant characteristics W and the predicted risk decile Dw pXq. The adjusted
rejection rate reports the fraction of rejections after correcting for multiple hypothesis testing using the Holm-Bonferroni
step down procedure, which controls the family-wise error rate at the 5% level. This is the same procedure described in
Section 1.5.4. The adjusted rejection rate reports the fraction of rejections correcting for multiple hypotheses using the
Holm-Bonferroni step-down procedure, which controls the family-wise error rate at the 5% level. Source: Rambachan
and Ludwig (2021).
What Types of Prediction Mistakes are Being Made? I next apply the identification results in
Section 1.4 to analyze the types of prediction mistakes based on observable characteristics made
by judges in the New York City pretrial release data, constructing bounds on the unobservable
failure to appear rate among detained defendants using direct imputation. Figure A.13a reports
95% confidence intervals for the identified set of values δpw, dq{δpw, d1 q between the highest d
197
and lowest decile d1 of predicted risk within each race-by-age W cell using the direct imputation
bounds with κ “ 2. Figure A.13b plots the 95% confidence intervals for the identified set on
the same object within each race-by-felony charge W cell. As in found in Section 1.5.5, judges
appear to underreact to predictable variation in failure to appear risk. Whenever these bounds are
198
Figure A.13: 95% confidence intervals for the implied prediction mistake of failure to appear risk
between the highest and lowest predicted failure to appear risk deciles using direct imputation
bounds with κ “ 2.
Notes: This figures plots the 95% confidence interval for the identified set on the implied prediction mistake
δpw, dq{δpw, d1 q between the highest predicted failure to appear risk decile d and the lowest predicted failure to
appear risk decile d1 within each race-by-age cell and race-by-felony charge cell. The bounds on the failure to appear
rate among detained defendants are constructed using direct imputation with κ “ 2 (Section A.7.1) and for each judge
in the top 25 whose choices are inconsistent with expected utility maximization behavior at these bounds (Figure
A.12). These confidence intervals are constructed by first constructing a 95% joint confidence interval for a judge’s
reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the moment inequalities in Theorem 1.4.2,
and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated with each pair τpw, dq, τpw, d1 q in the
joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on the implied prediction mistake. Source:
Rambachan and Ludwig (2021).
199
A.9.3 Defining the Outcome to be Any Pretrial Misconduct
Section 1.5 of the main text considered an extension to my baseline empirical results on detectable
prediction mistakes in the New York City pretrial release system that defined the outcome of
interest to be whether a defendant would commit “any pretrial misconduct” (i.e., either fail to
appear in court or be re-arrested for a new crime). I now characterize the extent to which judges’
predictions of any pretrial misconduct are systematically biased using the identification results in
What Types of Prediction Mistakes are Being Made? Figure A.14a reports 95% confidence
intervals for the identified set of the implied prediction mistake δpw, dq{δpw, d1 q between the
highest d and lowest decile d1 of predicted pretrial misconduct risk within each race-by-age W cell.
Figure A.14b plots the 95% confidence intervals for the identified set on the same object within
variation in pretrial misconduct risk between defendants at the tails of the pretrial misconduct
risk distribution. Whenever these bounds are informative, they lie strictly below one.
Furthermore, among judges whose choices are inconsistent with expected utility maximization
behavior at accurate beliefs about pretrial misconduct risk, Table A.10 reports the location of
the studentized maximal violation of the revealed preference inequalities and shows the fraction
of judges for whom the maximal violation occurs over the tails of the predicted distribution
(deciles 1-2, 9-10) or the middle of the predicted risk distribution (deciles 3-8) for black and
white defendants respectively. I again find that maximal violations of the revealed preference
inequalities mainly occur over defendants that lie in the tails of the predicted risk distribution, and
furthermore the majority occur over black defendants at the tails of the predicted risk distribution.
200
Figure A.14: 95% confidence intervals for the implied prediction mistakes of any pretrial miscon-
duct risk between the highest and lowest predicted any pretrial misconduct risk deciles
Notes: This figures plots the 95% confidence interval for the identified set on δpw, dq{δpw, d1 q between the highest
predicted any pretrial misconduct risk decile d and the lowest predicted any pretrial misconduct risk decile d1 within
each race-by-age cell and race-by-felony charge cell. The outcome Y ˚ is whether the defendant would commit any
pretrial misconduct upon release (i.e., either fail to appear in court or be re-arrested for a new crime). Bounds on the
any pretrial misconduct rate among detained defendants are constructed using the judge leniency instrument (see
Section 1.5.3). Confidence intervals are constructed for each judge whose choices are inconsistent with expected utility
maximization at these bounds (Table A.1). These confidence intervals are constructed by first constructing a 95% joint
confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the moment
inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated with
each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on the
implied prediction mistake. Source: Rambachan and Ludwig (2021).
201
Table A.10: Location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization behavior
at accurate beliefs about any pretrial misconduct risk.
White Defendants
Middle Deciles 0.00% 0.00%
Tail Deciles 4.76% 4.16%
Black Defendants
Middle Deciles 9.52% 16.66%
Tail Deciles 85.71% 79.16%
Notes: This table summarizes the location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization behavior at accurate beliefs
about pretrial misconduct risk and preferences that depend on both the defendant’s race and age as well as the
defendant’s race and whether the defendant was charged with a felony. Bounds on the failure to appear rate among
detained defendants are constructed using the judge leniency instrument (see Section 1.5.3). Among judge’s whose
release decision violate the revealed preference inequalities at the 5% level, I report the fraction of judges for whom the
maximal studentized violation occurs among white and black defendants at the tails of the any pretrial misconduct
predicted risk distribution (deciles 1-2, 9-10) and at the middle of the any pretrial misconduct predicted risk distribution
(deciles 3-8). The outcome Y ˚ is whether the defendant would commit any pretrial misconduct upon release (i.e., either
fail to appear in court or be re-arrested for a new crime). Source: Rambachan and Ludwig (2021).
202
A.9.4 Alternative Pretrial Release Definition
In Section 1.5 of the main text, I tested whether the pretrial release decisions of judges in New York
City were consistent with expected utility maximization behavior at accurate beliefs about failure
to appear risk under various exclusion restrictions on their preferences. To do so, I collapsed the
pretrial release decision into a binary choice of simply whether to release or detain the defendant.
However, in practice, judges in New York City choose what bail conditions and monetary
amount to set for a defendant. Defendants may either be “released on recognizance” (i.e.,
automatically released without any bail conditions) or the judge may set some monetary bail, in
which case the defendant is only released if they can post the set bail amount. To account for this,
To develop this extension, I first apply the identification results in Section 1.2 to analyze this
modified decision problem. Let C P t0, 1u denote whether the judge chose “release on recognizance”
(C “ 1). Let W P W denote the directly payoff relevant defendant characteristics and X P X
denote the excluded defendant characteristics as before. The latent outcome is now defined as
the pair pR˚ , Y ˚ q, where R˚ P t0, 1u denotes whether the defendant would satisfy the monetary
bail condition set by the judge (i.e., would the defendant be able to pay the bail amount set by
the judge?) and Y ˚ P t0, 1u is whether the defendant would fail to appear in court if released.
Let R P t0, 1u denote the observed release decision. The observed release decision satisfies R “
C ` p1 ´ CqR˚ , meaning that the defendant is released if the judge selects release on recognizance
or the judge sets monetary bail conditions and the defendant satisfies them.
I assume that the judge’s utility function takes the same form as in Section 1.3 of the main
text. The judge receives some payoff if a defendant is released that goes on to fail to appear in
court or a defendant is detained that would not fail to appear in court. That is, I consider the
set of utility functions satisfying Upc, r˚ , y˚ ; wq “ Upr, y˚ ; wq, Up0, 1; wq “ 0, Up1, 0; wq “ 0 and
Up0, 0; wq ă 0, Up1, 1; wq ă 0.
I apply Theorem 1.2.1 to derive conditions under which the judge’s choices are consistent with
203
expected utility maximization behavior at accurate beliefs about both failure to appear risk and
the ability of defendant’s to meet the bail conditions. As in the main text, for each w P W , define
Proposition A.9.1. Assume PY˚ p1 | 1, w, xq ă 1 for all pw, xq P W ˆ X with π1 pw, xq ą 0 and
choices are consistent with expected utility maximization behavior at some strict preference utility function
Proof. The inequalities in Theorem 1.2.1 imply that the judge’s choices are consistent with expected
´Up0, 0; wq
PpY ˚ “ 1 | C “ 1, W “ w, X “ xq ď
´Up0, 0; wq ´ Up1, 1; wq
The judge’s choices are inconsistent with expected utility maximization behavior at accurate
204
beliefs about both failure to appear risk and the ability of defendant’s to satisfy the monetary bail
conditions if and only if the maximal failure to appear rate among defendants that were released
on recognizance is less than the minimal bound on the failure to rate among defendants that
could not satisfy their monetary bail conditions. The same dimension reduction techniques may
be applied from the main text to reduce the number of moment inequalities that must be tested.
To apply this identification result, I test whether the implied revealed preference inequalities
in Proposition A.9.1 are satisfied over the deciles of predicted failure to appear risk that were
constructed in Section 1.5 of the main text. I use the quasi-random assignment of judges to cases
to construct bounds on the unobservable failure to appear rate among detained defendants that
could not satisfy their monetary bail conditions. The only modification is that I now estimate the
observed failure to appear rate among defendants that were released on recognizance.
The results are summarized in Table A.11 below. I find that at least 32% of judges make
detectable prediction mistakes about failure to appear risk and the ability of defendants to satisfy
their monetary bail conditions. This suggests that a large fraction of judges are making prediction
mistakes in their joint predictions of failure to appear risk and the ability of defendants to satisfy
their monetary bail conditions in their release on recognizance vs. monetary bail decisions.
205
Table A.11: Estimated lower bound on the fraction of judges whose “release on recognizance”
decisions are inconsistent with expected utility maximization behavior at accurate beliefs about
behavior under bail conditions and failure to appear risk given defendant characteristics.
Utility Functions
No Characteristics Race Race ` Age Race ` Felony Charge
Adjusted Rejection Rate 32% 32% 32% 52%
Notes: This table summarizes the results of the robustness exercise to assess whether the “release on recognizance” vs
monetary bail decisions of judges are consistent with expected utility maximization behavior at strict preference utility
functions that either (i) do not depend on any characteristics, (ii) depend on the defendant’s race, (iii) depend on both
the defendant’s race and age, and (iv) depend on both the defendant’s race and whether the defendant was charged
with a felony offense. The outcome is defined to be whether the defendant would be released under the chosen bail
condition (i.e., either the judge decides to release the defendant on recognizance or the defendant satisfies the bail
conditions set by the judge) and FTA if released. I first construct the unadjusted rejection rate by testing whether the
pretrial release decisions of each judge in the top 25 are consistent with the moment inequalities in Corollary 1.3.4 at
the 5% level using the conditional least-favorable hybrid test using the same procedure described in Section 1.5.4. The
adjusted rejection rate reports the fraction of rejections after multiple hypothesis correction using the Holm-Bonferroni
step down procedure. See Section A.9.4 for discussion. Source: Rambachan and Ludwig (2021).
206
Appendix B
Appendix to Chapter 2
To prove this result, we begin by rewriting ErYj,t`h 1tWk,t “ wk us. Notice that
ErYj,t`h 1tWk,t “ wk us
taking the difference, and (iii) applying the definition of Yj,t`h pwk q. l
207
B.1.2 Proof of Theorem 2.3.3
The style proof extends Angrist et al. (2000) in their analysis of the Wald estimand in a cross-
where (1) and (2) follow since Wk,t KK tYt`h pwk q : wk P Wk u. Interchanging the order of the
so
ż wk
VarpWk,t q “ ErpWk,t ´ wk qpWk,t ´ ErWk,t sqs “ E r1tw̃k ď Wk,t upWk,t ´ ErWk,t sqs dw̃k .
wk
The result then follows immediately. To see that the resulting weights are non-negative, observe
208
B.1.3 Proof of Theorem 2.3.4
The proof is analogous to the proof of Theorem 2.3.1. We start by rewriting ErYj,t`h 1tWk,t “ wk u |
“ ErYj,t`h pw1:t´1
obs
, wk , W´k,t , Wt`1:t`h q1tWk,t “ wk u | Ft´1 s
“ ErYj,t`h pw1:t´1
obs
, wk , W´k,t , Wt`1:t`h q | Ft´1 sEr1tWk,t “ wk u | Ft´1 s
´ ¯
obs
`Cov Yj,t`h pw1:t´1 , wk , W´k,t , Wt`1:t`h q, 1tWk,t “ wk u | Ft´1 .
the difference, and (iii) applying the definition of the potential outcome Yj,t`h pwk q. l
“ ErYj,t`h pw1:t´1
obs
, Wk,t pzq, W´k,t , Wt`1:t`h qs
by (iii). Therefore,
ErYj,t`h | Zt “ zs ´ ErYj,t`h | Zt “ z1 s
“ ErYj,t`h pw1:t´1
obs obs
, Wk,t pzq, W´k,t , Wt`1:t`h q ´ Yj,t`h pw1:t´1 , Wk,t pz1 q, W´k,t , Wt`1:t`h qs.
209
where we used the definition Yj,t`h pwk q :“ Yj,t`h pW1:t´1 , wk,t , W´k,t , Wt`1:t`h q. Finally, assuming
We may apply the same argument to the denominator (again assuming that we can exchange the
ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s “
ż
ErWk,t pzq ´ Wk,t pz qs “
1
Er1tWk,t pz1 q ď wk ď Wk,t pzqus.
W
The proof is the same as the Proof of Theorem 2.5.1, except we must now condition on Ft´1
throughout. l
Y Y
ErYj,t`h |pYk,t “ yk q, Ft´1 s “ ErYj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s, Assumption (i)
Y Y
“ ErErYj,t`h pW1:t`h q|pYk,t “ yk q, W1:t , Ft´1 s|Yk,t “ yk , Ft´1 s, Adam’s law
Y
“ ErErYj,t`h pW1:t`h q|W1:t qs|pYk,t “ yk q, Ft´1 s, Assumption (i)
Y
“ Erψj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s, Assumption (ii)
the last line holds as the future assignments are not informed by the historical ones. Applying
210
Appendix C
Appendix to Chapter 3
This supplement contains the following sections: Section C.1 contains the proofs of the main
results in the paper, Section C.2 contains additional theoretical results discussed in the main text
of the paper, Section C.3 contains additional results from the simulation study and Section C.4
contains additional results for the empirical application to a panel experiment in experimental
economics.
We begin the proof with a Lemma that will be used later on.
Lemma C.1.1. Assume a potential outcome panel with an assignment mechanism that is individ-
ualistic (Definition 3.2.3) and probabilistic (Assumption 3.3.1). Define, for any w P W pp`1q , the
random function Zi,t´p:t pwq :“ pi,t´p pwq´1 1tWi,t´p:t “ wu. Then, over the assignment mecha-
nism, EpZi,t´p:t pwq|Fi,t´p´1 q “ 1 and VarpZi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´1 p1 ´ pi,t´p pwqq, and
CovpZi,t´p:t pwq, Zi,t´p:t pw̃q|Fi,t´p´1 q “ ´1 for all w ‰ w̃. Moreover, Zi,t´p:t pwq and Zj,t´p:t pwq are,
Proof. The expectation is by construction, the variance comes from the variance of a Bernoulli trial.
211
For any w, w̃ P W pp`1q , let ui,t´p pw, w̃; pq “ τ̂i,t pw, w̃; pq ´ τi,t pw, w̃; pq be the estimation error.
Now
obs obs
ui,t´p pw, w̃; pq “ Yi,t pwi,1:t´p´1 , wqpZi,t´p:t pwq ´ 1q ´ Yi,t pwi,1:t´p´1 , w̃qpZi,t´p:t pw̃q ´ 1q.
obs
Varpui,t´p pw, w̃; pq|Fi,t´p´1 q “ Yi,t pwi,1:t´p´1 , wq2 VarpZi,t´p:t pwq|Fi,t´p´1 q
obs
` Yi,t pwi,1:t´p´1 , w̃q2 VarpZi,t´p:t pw̃q|Fi,t´p´1 q
obs obs
´ 2Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̃qCovpZi,t´p:t pw̃q, Zi,t´p:t pw̃|Fi,t´p´1 q
obs
“ Yi,t pwi,1:t´p´1 , wq2 pi,t´p pwq´1 p1 ´ pi,t´p pwqq
obs
` Yi,t pwi,1:t´p´1 , w̃q2 pi,t´p pw̃q´1 p1 ´ pi,t´p pw̃qq
obs obs
´ 2Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̃q.
Simplifying gives the result on the variance of the estimation error. Then,
Finally, conditional independence of the errors follows due to the individualistic assignment of
treatments. l
212
Proof of Proposition 3.3.1
The proof of this result is analogous to the proof of Theorem 3.3.1. We state the analogue of
Lemma C.1.2. Assume a potential outcome panel with an assignment mechanism that is individual-
istic (Definition 3.2.3) and probabilistic (Assumption 3.3.1). Define, for any w P W pp`1q , the ran-
dom function Vi,t´p:t pwq :“ pi,t´p pwq´2 1tWi,t´p:t “ wu. Then, over the assignment mechanism,
EpVi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´1 and VarpVi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´3 p1 ´ pi,t´p pwqq,
and CovpVi,t´p:t pwq, Vi,t´p:t pw̃q|Fi,t´p´1 q “ pi,t´p pwq´1 pi,t´p pw̃q´1 for all w ‰ w̃. Moreover, Vi,t´p:t pwq
Now
obs obs
vi,t´p pw, w̃; pq “ Yi,t pwi,1:t´p´1 , wq2 pVi,t´p:t pwq ´ pi,t´p pwq´1 q ` Yi,t pwi,1:t´p´1 , w̃q2 pVi,t´p:t pw̃q ´ p̃i,t´p pwq´1 q.
Therefore, the conditional expectation is zero by Lemma C.1.2. The conditional independence of
Only the third results requires a new proof. The first result is a reinterpretation of the classic cross-
sectional result using a triangular array central limit theorem, for the usual Lindeberg condition
must hold due to the bounded potential outcomes and the treatments being probabilistic. The
second result follows from results in Bojinov and Shephard (2019), who use a martingale difference
The third result, which holds for NT going to infinity, can be split into three parts. For NT to
go to infinity we must have either: (i) T goes to infinity with N finite, (ii) N goes to infinity with
T finite, or (iii) both N and T go to infinity. In the case (i), we apply the martingale difference CLT
but now we have preaveraged the cross-sectional errors over the N terms for each time period.
The preaverage is still a martingale difference, so the technology is the same. In the case (ii) we
preaverage the time aspect. Then we are back to a standard triangular array CLT. As we have both
213
Proof of Proposition 3.3.2
The unbiasedness statements follow directly from Proposition 3.3.1. The proofs of the consistency
statements are analogous to the proof of Theorem 3.3.2. The first result follows from an application
of the triangular array law of law of large numbers, which may be applied due to the bounded
potential outcomes and the treatments being probabilistic. The second statement follows from an
application of a martingale difference sequence law of large numbers (Theorem 2.13 in Hall and
Heyde (1980)). The third statement can be again proved in three cases: (i) T goes to infinity with
N finite, (ii) N goes to infinity with T finite, or (iii) both N and T go to infinity as in the proof of
1 ř T řt
Similarly, write Ȳi¨ “ Ȳi¨ p0q ` βW i¨ , where βW i¨ “ T t“1 s“1 β i,t,t´s Wi,s . The transformed
Consider the numerator of the unit fixed effects estimator. Substituting in, we arrive at
˜ ¸
N T N T N ÿ T t´1 N T
1 ÿÿq q 1 ÿÿ 1 ÿ ÿ 1 ÿÿq
Yi,t Wi,t “ β i,t,0 Wi,t Wi,t `
q β i,t,t´s Wi,s Wi,t `
q Yi,t p0qW
q i,t
NT NT NT NT
i“1 t“1 i“1 t“1 i“1 t“1 s“1 i“1 t“1
˜ ¸ ˜ ¸ ˜ ¸
T N T t´1 N T N
1 ÿ 1 ÿ 1 ÿ ÿ 1 ÿ 1 ÿ 1 ÿ
“ β i,t,0 Wi,t W
q i,t ` β i,t,t´s Wi,s W
q i,t ` Y
qi,t p0qW q i,t .
T t“1 N T t“1 s“1 N T t“1 N
i“1 i“1 i“1
214
Therefore, for fixed T as N Ñ 8,
˜ ¸
T N T
1 ÿ 1 ÿ p 1 ÿ
β i,t,0 Wi,t W
q i,t ÑÝ κqW,β,t,t ,
T t“1 N T t“1
i“1
˜ ¸
T t´1 N T t´1
1 ÿ ÿ 1 ÿ p 1 ÿÿ
β i,t,t´s Wi,s Wq i,t ÑÝ κqW,β,t,s ,
T t“1 s“1 N T t“1 s“1
i“1
˜ ¸
T N T
1 ÿ 1 ÿq p 1 ÿq
Yi,t p0qWqi Ñ Ý δt .
T t“1 N T t“1
i“1
1 řT ř N p 1 řT
q2 Ý 2 .
Similarly, the denominator converges to NT t“1 i“1 Wi,t Ñ T σW,t
t“1 q The result then follows
by Slutsky. l
Begin by writing
t
ÿ
Yi,t “ Yi,t p0q ` β i,t,t´s Wi,s .
s“1
Then, Ȳ¨t “ Ȳ¨t p0q ` βW ¨t , Ȳi¨ “ Ȳi¨ p0q ` βW i¨ and Ȳ “ Ȳp0q ` βW. Therefore,
˜ ¸
t
9q 9q ÿ ` ˘ ` ˘
Yi,t “ Yi,t p0q ` β i,t,t´s Wi,s ´ βW ´ βW ´ βW ´ βW ´ βW . ¨t i¨
s“1
Consider the numerator of the unit fixed effects estimator. Substituting in,
N T N T N T t´1 N T
1 ÿ ÿ q9 | 9 1 ÿÿ 9 1 ÿÿÿ 9 1 ÿ ÿ q9 9
Yi,t Wi,t “ β i,t,0 Wi,t W
| i,t ` β i,t,t´s Wi,s W
| i,t ` Yi,t p0qW
| i,t
NT NT NT NT
i“1 t“1 i“1 t“1 i“1 t“1 s“1 i“1 t“1
˜ ¸ ˜ ¸ ˜ ¸
T N T N t´1 T N
1 ÿ 1 ÿ 9 1 ÿ 1 ÿÿ 9 1 ÿ 1 ÿ q9 9
“ β i,t,0 Wi,t Wi,t `
| β i,t,t´s Wi,s Wi,t `
| Yi,t p0qWi,t .
|
T t“1 N T t“1 N s“1
T t“1 N
i“1 i“1 i“1
Therefore,
N
1 ÿ 9 p
β i,t,0 Wi,t W
| Ý κq9 W,β,t,t ,
i,t Ñ
N
i“1
N t´1 t´1
1 ÿ ÿ 9 p ÿ 9
β i,t,t´s Wi,s W
| i,t Ñ
Ý κqW,β,t,s ,
N
i“1 s“1 s“1
N
1 ÿ 9
q9 i,t p0qW p 9
Y | Ý δqt .
i,t Ñ
N
i“1
215
C.2 Additional theoretical results
The adapted propensity score can be decomposed using individualistic assignment (Definition
Lemma C.2.1. For a potential outcome panel satisfying individualistic assignment (Definition 3.2.3) and
Proof. Use the prediction decomposition for assignments, given all outcomes,
Even though the assignment mechanism is known, we only observe the outcomes along the
obs q, and so it is not possible to use Lemma C.2.1 to compute
realized assignment path Yi,1:t pwi,1:t
pi,t´p pwq for all assignment path. We can, however, compute the adapted propensity score along
obs
the observed assignment path, pi,t´p pwi,t´p:t q, since the associated outcomes are observed.
9 1,1:t , ..., W
pW 9 N,1:t q1 . The least squares coefficient in the regression of Y9 1:N,t on W
9 1:N,t is β̂
1:N,t “
91 W
pW 9 ´1 9 1 9
1:N,t 1:N,t q W1:N,t Y1:N,t . Proposition C.2.1 derives the finite population limiting distribution of
216
Proposition C.2.1. Assume a potential outcome panel and consider the “control” only path, for 0 P W let
” ı
w̃i,1:t “ 0. Let µ9 i,t be the t ˆ 1 vector whose u-th element is E W9 i,t´pu´1q | F1:N,0,T and Ωi,t be the t ˆ t
9 i,t´pu´1q , W
matrix whose u, v-th element is CovpW 9 i,t´pv´1q |F1:N,0,T q. Additionally assume that:
1. The potential outcome panel is linear (Definitions 3.4.1) and homogeneous with βit ” βt “
2. Wi,1:t is an individualistic stochastic assignment path and, over the randomization distribution,
2
VarpWi,t |F1:N,0,T q “ σW,i,t ă 8 for each i P rNs, t P rTs.
3. As N Ñ 8,
řN
(a) Non-stochastically, N ´1 i“1 Ωi,t Ñ Γ2,t , where Γ2,t is positive definite.
řN d
(b) N ´1{2 9
i“1 pWi,1:t Ý Np0, Γ1,t q.
´ µ9 i,t qY9 i,t p0q|F1:N,0,T Ñ
řN
(c) Non-stochastically, N ´1 9
i“1 Yi,t p0qµ
9 i,t Ñ δ9t .
? d
Np β̂1:N,t ´ βt ´ Γ´1 9 Ý Np0, Γ´1
2,t δt q|F1:N,0,T Ñ 2,t Γ1,t Γ2,t q.
´1
217
Stacking everything across units, this becomes Y9 1:N,t “ W
9 1:N,t β t ` Y9 1:N,t p0q, and so the linear
91 W
β̂ t “ pW 9 ´1 9 1 9 91 9 ´1 9 1 9
1:N,t 1:N,t q W1:N,t Y1:N,t “ β t ` pW1:N,t W1:N,t q W1:N,t Y1:N,t p0q.
The important unusual point here is that Y9 1:N,t p0q is non-stochastic and that W
9 1:N,t is random,
exactly the opposite of the case often discussed in the statistical analysis of linear regression. Now
N
1 91 9 1:N,t “ 1
ÿ
9 i,1:t W
91 ,
W1:N,t W W i,1:t
N N
i“1
and
N N N
1 ÿ 9 1 ÿ 9 1 ÿ
Wi,1:t Y9 i,t p0q “ pWi,1:t ´ µ9 i,t qY9 i,t p0q ` µi,t Y9 i,t p0q.
N N N
i“1 i“1 i“1
recalling Y9 i,t p0q is non-stochastic and applying Assumptions 3(b) and 3(c), then Slutsky’s theorem
C.3.1 Additional simulations for the estimator of the total average dynamic causal
effects
Quantile-quantile plot for the normal approximation: Figure C.1 provides quantile-quantile
plots of the simulated randomization distribution for the estimator τ̄ˆ p1, 0; 0q presented in Section
Simulation results for the estimator of the lag-1 total weighted average dynamic causal effect,
τ̄ : p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for
the lag-1 total weighted average dynamic causal effect, τ̄ˆ : p1, 0; 1q. We choose the weights to av to
place equal weight on the future treatment paths. Figure C.2 plots the simulated randomization
distribution for τ̄ˆ : p1, 0; 1q and Figure C.3 plots the associated quantile-quantile plot. We observe
218
Figure C.1: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the quantile-quantile
plots simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 100.
Results are computed over 5,000 simulations. See Section 3.5 of the main text for further details.
219
that the normal approximation remains accurate for lagged dynamic causal effects.
Figure C.2: Simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure plots the simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the simulated randomization distribution with
Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 10. Results are computed over 5,000 simulations. See
Section 3.5 of the main text for further details.
220
Figure C.3: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details.
221
C.3.2 Simulations for the estimator of the time-t average dynamic causal effects
We present simulation results for our estimator of the time-t average dynamic causal effect,
τ̄ˆ¨,t p1, 0; 0q, with N “ 100 units when the potential outcomes are generated with normally dis-
Normal approximations and size control: Figure C.4 plots the randomization distribution for
the estimator of the contemporaneous time-t average dynamic causal effect, τ̄ˆ¨t p1, 0; 0q, under
the null hypothesis of β “ 0 for different combinations of the parameter ϕ P t0.25, 0.5, 0.75u and
treatment probability ppwq P t0.25, 0.5, 0.75u. When the errors ϵi,t are normally distributed, the
is accurate when there are only N “ 100 units in the experiment. As expected, when the errors are
Cauchy distributed, the number of units must be quite large for the randomization distribution to
become approximately normal. There is little difference in the results across the values of ϕ and
ppwq. Figure C.5 provides quantile-quantile plots of the simulated randomization distributions
to further illustrate the quality of the normal approximations. Testing based on the normal
asymptotic approximation controls size effectively, staying close to the nominal 5% level (the exact
rejection rates for the null hypothesis, H0 : τ̄¨t p1, 0; 0q “ 0 are reported in Table C.1).
Table C.1: Null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.
ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.046 0.048 0.048 0.25 0.031 0.031 0.034
ϕ 0.5 0.049 0.049 0.050 ϕ 0.5 0.048 0.039 0.043
0.75 0.050 0.049 0.045 0.75 0.052 0.047 0.057
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This table summarizes the null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based
upon the normal asymptotic approximation to the randomization distribution of τ̄ˆ¨t p1, 0; 0q. Panel (a) reports the null
rejection probabilities in simulations with ϵi,t „ Np0, 1q and N “ 100. Panel (b) reports the null rejection probabilities
in simulations with ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.
Rejection rates: Figure C.6 plots rejection rate curves against the null hypotheses as the parameter
β varies for different choices of the parameter ϕ and treatment probability ppwq in simulations
222
Figure C.4: Simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 iterations. See Section 3.5 of the
main text for further details on the simulation design.
223
Figure C.5: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.
224
with N “ 100 units. For p “ 0, the rejection rate against H0 : τ̄¨t p1, 0; 0q “ 0 quickly converges to
one as β moves away from zero across a range of simulations. This is encouraging as it indicates
that the conservative variance bound still leads to informative tests. However, when p “ 1, the
persistence of the causal effects ϕ has an important effect on the power of our tests. In particular,
when ϕ “ 0.25, the rejection rate against H0 : τ̄¨t: p1, 0; 1q “ 0 is quite low for all values of β – lower
values of ϕ imply less persistence in the causal effects across periods. When ϕ “ 0.75, there is
substantial persistence across periods and observe that the rejection rate curves improve for p “ 1.
Additionally, Figure C.7 shows the same power plots for N “ 1000 units. We again observe that
power is relatively low for low values of ϕ, but when ϕ “ 0.75, the rejection rate curves for p “ 0, 1
appear similar. This suggests that detecting dynamic causal effects requires larger sample sizes.
Figure C.6: Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.
Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and H0 :
τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄¨t p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄¨t: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 100.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.
Simulation results for the estimator of the lag-1, time-t weighted average dynamic causal effect,
:
τ̄¨,t p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for
:
the lag-1 total weighted average dynamic causal effect, τ̄ˆ¨,t p1, 0; 1q. We choose the weights to av to
225
Figure C.7: Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.
Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and H0 :
τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄¨t p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄¨t: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 1000.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.
place equal weight on the future treatment paths. Figure C.8 plots the simulated randomization
:
distribution for τ̄ˆ¨,t p1, 0; 1q and Figure C.9 plots the associated quantile-quantile plot. We observe
that the normal approximation remains accurate for lagged dynamic causal effects.
226
Figure C.8: Simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.
227
Figure C.9: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.
228
C.3.3 Simulations for the estimator of the unit-i average dynamic causal effects
We present simulation results for our estimator of the unit-i average dynamic causal effect,
τ̄ˆi,¨ p1, 0; 0q, with T “ 100 time periods when the potential outcomes are generated with normally
Normal approximations and size control: Figure C.10 plots the randomization distribution
for τ̄ˆi¨ p1, 0; 0q. We see a similar pattern as before—when the errors are normally distributed, the
when the errors are heavy-tailed. Figure C.11 provides quantile-quantile plots of the simulation
randomization distributions to further illustrate the quality of the normal approximations. The
null rejection rates for the hypothesis, H0 : τ̄i,¨ p1, 0; 0q “ 0 are reported in Table C.2 and, again, the
Figure C.10: Simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u,
and the columns index the treatment probability ppwq P t0.25, 0.5, 0.75u.
(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.
229
Table C.2: Null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.
ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.052 0.047 0.054 0.25 0.031 0.031 0.034
ϕ 0.5 0.049 0.049 0.048 ϕ 0.5 0.048 0.039 0.043
0.75 0.058 0.046 0.054 0.75 0.052 0.047 0.057
(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This table summarizes the null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based
upon the normal asymptotic approximation to the randomization distribution of τ̄ˆi¨ p1, 0; 0q. Panel (a) reports the null
rejection probabilities in simulations with ϵi,t „ Np0, 1q and T “ 100. Panel (b) reports the null rejection probabilities in
simulations with ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.
Figure C.11: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.
230
Rejection rates: Next, we investigate the rejection rate of the statistical test based on the normal
asymptotic approximation for H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 : τ̄i¨: p1, 0; 1q “ 0, plotting the rejection rates
in Figure C.12. For p “ 0, Once again, we observe that the rejection rate against H0 : τ̄i¨: p1, 0; 0q “ 0
has good power properties across a range of simulations. However, once again for p “ 1, our
conservative test has low power and the persistence of the causal effects ϕ has an important effect
on the power of our tests. Additionally, Figure C.13 shows the same power plots for T “ 1000
time periods. In this case, we observe that the conservative test has good power against the weak
null of no unit-i average dynamic causal effects for both p “ 0, 1. This suggests that detecting
unit-i average dynamic causal effects requires a long time dimension in the panel experiment.
Figure C.12: Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.
Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 :
τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄i¨: p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄i¨: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and T “ 100.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.
Simulation results for the estimator of the lag-1, unit-i weighted average dynamic causal effect,
:
τ̄i,¨ p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for
:
the lag-1 total weighted average dynamic causal effect, τ̄ˆi,¨ p1, 0; 1q. We choose the weights to av to
place equal weight on the future treatment paths. Figure C.14 plots the simulated randomization
231
Figure C.13: Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.
Notes: This figure plots rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 : τ̄i¨: p1, 0; 1q “ 0
as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The rejection
rate curve against H0 : τ̄i¨: p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄i¨: p1, 0; 1q “ 0 is
plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.
:
distribution for τ̄ˆi,¨ p1, 0; 1q and Figure C.15 plots the associated quantile-quantile plot. We observe
that the normal approximation remains accurate for lagged dynamic causal effects.
232
Figure C.14: Simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.
233
Figure C.15: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.
(a) ϵi,t „ Np0, 1q, T “ 1000 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.
234
C.4 Additional empirical results
We estimate unit-specific average dynamic causal effects in the panel experiment conducted by
Andreoni and Samuelson (2006). We focus on two randomly selected units in the experiment
and construct estimates of their average i, t-th lag-0 dynamic causal effect, τi,t p1, 0; 0q (Definition
3.2.5). Figure C.16 shows the nonparametric estimates τ̂i,t p1, 0; 0q for t P rTs, for the two units. The
figure also contains the nonparametric estimate of the average unit-i lag-0 dynamic causal effect
řT
τ̄i¨ p1, 0; 0q “ T1 t“1 τ̂i,t p1, 0; 0q. The result shows that the point estimate of the average unit-i lag-0
dynamic causal effect is positive for both units, suggesting that a larger value of λ in the current
game increases the likelihood of cooperation for both units. Since each unit only plays a total of
twenty rounds, the estimated variance of these unit-specific estimators is quite large.
Figure C.16: Estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5)
of W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of Andreoni and
Samuelson (2006).
Notes: This figure plots estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5) of
W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of Andreoni and Samuelson (2006). The
solid black line plots the nonparametric estimator τ̂i,t p1, 0; 0q given in Remark 3.3.1. The dashed black line plots the
running average of the period-specific estimator for each unit: for each t P rTs, 1t ts“1 τ̂i,s p1, 0; 0q. The dashed red line
ř
řT
plots the estimated weighted average unit-i lag-0 dynamic causal effect, τ̄ˆi¨ p1, 0; 0q “ T1 t“1 τ̂i,t p1, 0; 0q.
We next estimate period-specific, weighted average dynamic causal effects that pool informa-
235
tion across units in order to gain precision. For each time period t P rTs, we construct estimates
based on the nonparametric estimator of the weighted average time-t, lag-p dynamic causal effect
řN :
τ̄¨t: p1, 0; pq “ N1 i“1 τi,t p1, 0; pq for p “ 0, 1, 2, 3. For each value of p, the dashed black line in
Figure C.17 plots the estimates τ̄ˆ¨t: p1, 0; pq and the grey region plots a 95% pointwise conservative
confidence band for the period-specific weighted average dynamic causal effects. For each value of
p, there appears to be some heterogeneity in the period-specific weighted causal dynamic causal
To further investigate these dynamic causal effects, the solid blue line in Figure C.17 plots the
nonparametric estimator the total lag-p weighted average causal effect τ̄ : p1, 0; pq for p “ 0, 1, 2, 3,
which further pools information across all units and time periods. The dashed blue lines plot the
conservative confidence interval for the total lag-p weighted average causal effect. See the main
text for further discussion of the total lag-p weighted average causal effect estimates.
236
Figure C.17: Estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq
of W “ 1tλ ě 0.6u on cooperation in period one based on the experiment of Andreoni and
Samuelson (2006) for each time period t P rTs and p “ 0, 1, 2, 3.
Notes: This figure plots estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq of W “ 1tλ ě
0.6u on cooperation in period one based on the experiment of Andreoni and Samuelson (2006) for each time period
t P rTs and p “ 0, 1, 2, 3. The black dashed line plots the nonparametric estimator of the time-t lag-p weighted average
dynamic causal effect, τ̄ˆ¨t: p1, 0; pq, for each period t P rTs. The grey region plots the 95% point-wise confidence band for
τ̄¨t: p1, 0; pq based on the conservative estimator of the asymptotic variance of the nonparametric estimator (Theorem
3.3.2). The solid blue line plots the nonparametric estimator of the total lag-p weighted average dynamic causal effect,
τ̄ˆ : p1, 0; pq and the dashed blue lines plot the 95% confidence interval for τ̄ : p1, 0; pq based on the conservative estimator
of the asymptotic variance of the nonparametric estimator.
237