0% found this document useful (0 votes)
13 views254 pages

Asheshrambachan Harvard EconomicsPhD Dissertation Revised

Uploaded by

nwobodope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views254 pages

Asheshrambachan Harvard EconomicsPhD Dissertation Revised

Uploaded by

nwobodope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 254

Essays on Identification and Causality

Citation
Rambachan, Asheshananda. 2022. Essays on Identification and Causality. Doctoral dissertation,
Harvard University Graduate School of Arts and Sciences.

Permanent link
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37371934

Terms of Use
This article was downloaded from Harvard University’s DASH repository, and is made available
under the terms and conditions applicable to Other Posted Material, as set forth at http://
nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Share Your Story


The Harvard community has made this article openly available.
Please share how this access benefits you. Submit a story .

Accessibility
Essays on Identification and Causality

A dissertation presented

by

Asheshananda Rambachan

to

The Department of Economics

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the subject of

Economics

Harvard University

Cambridge, Massachusetts

April 2022
© 2022 Asheshananda Rambachan

All rights reserved.


Dissertation Advisor: Author:
Professor Isaiah Andrews Asheshananda Rambachan

Essays on Identification and Causality

Abstract

This dissertation contains three chapters in econometrics. A common theme is identification

analysis, with a particular focus on understanding what researchers can learn from data under

weak assumptions on economic behavior and dynamic causal effects.

The first chapter characterizes the behavioral and econometric assumptions under which

researchers can identify whether expert decision makers, such as doctors, judges, and managers,

make systematic prediction mistakes in observational empirical settings like medical testing,

pretrial release, and hiring. Under these assumptions, I provide a statistical test for whether the

decision maker makes systematic prediction mistakes and methods for conducting inference on

the ways in which the decision maker’s predictions are systematically biased. As an empirical

illustration, I analyze the pretrial release decisions of judges in New York City.

The second chapter, which is coauthored with Neil Shephard, develops the direct potential

outcome system as a foundational framework for analyzing dynamic causal effects in observational

time series settings. We provide novel conditions under which popular time series estimands,

such as the impulse response function, local projections, and local projection with an instrumental

variable, have nonparametric causal interpretations in terms of dynamic causal effects.

The third chapter, which is coauthored with Iavor Bojinov and Neil Shephard, proposes a rich

class of finite population dynamic causal effects in panel experiments. We provide a nonparametric

estimator that is unbiased for these dynamic causal effects over the randomization distribution

of assignments, derive its finite population limiting distribution, and develop two methods for

conducting inference. We further show that population linear fixed effect estimators do not recover

causally interpretable estimands if there are dynamic causal effects and serial correlation in the

assignment mechanism of the panel experiment.

iii
Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Introduction 1

1 Identifying Prediction Mistakes in Observational Data 3


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 An Empirical Model of Expected Utility Maximization . . . . . . . . . . . . . . . . . 12
1.2.1 Setting and Observable Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Motivating Empirical Applications . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.3 Expected Utility Maximization Behavior . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 Characterization Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Testing Expected Utility Maximization in Screening Decisions . . . . . . . . . . . . . 22
1.3.1 Characterization in Screening Decisions . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Constructing Bounds on the Missing Data with an Instrument . . . . . . . . 27
1.3.3 Reduction to Moment Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4 Bounding Prediction Mistakes based on Characteristics . . . . . . . . . . . . . . . . . 31
1.4.1 Expected Utility Maximization at Inaccurate Beliefs . . . . . . . . . . . . . . . 32
1.4.2 Bounding Prediction Mistakes in Screening Decisions . . . . . . . . . . . . . 34
1.5 Do Judges Make Prediction Mistakes? . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.5.1 Pretrial Release Decisions in New York City . . . . . . . . . . . . . . . . . . . 37
1.5.2 Data and Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5.3 Empirical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.5.4 What Fraction of Judges Make Systematic Prediction Mistakes? . . . . . . . . 42
1.5.5 Bounding Prediction Mistakes based on Defendant Characteristics . . . . . . 45
1.5.6 Which Decisions Violate Expected Utility Maximization? . . . . . . . . . . . 46
1.6 The Effects of Algorithmic Decision-Making . . . . . . . . . . . . . . . . . . . . . . . 47
1.6.1 Social Welfare Under Candidate Decision Rules . . . . . . . . . . . . . . . . . 48
1.6.2 Measuring the Effects of Algorithms in Pretrial Release Decisions . . . . . . 50
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

iv
2 When Do Common Time Series Estimands Have Nonparametric Causal Meaning? 55
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 The Direct Potential Outcome System and Dynamic Causal Effects . . . . . . . . . . 59
2.2.1 The Direct Potential Outcome System . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.2 Dynamic Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.2.3 Links to macroeconometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3 Estimands Based on Assignments and Outcomes . . . . . . . . . . . . . . . . . . . . 67
2.3.1 Impulse Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.3.2 Local Projection Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.3.3 Generalized Impulse Response Function . . . . . . . . . . . . . . . . . . . . . 71
2.3.4 Generalized Local Projection and Local Filtered Projection Estimands . . . . 73
2.4 The Instrumented Potential Outcome System . . . . . . . . . . . . . . . . . . . . . . . 73
2.4.1 The Instrumented System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.5 Estimands Based on Assignments, Instruments and Outcomes . . . . . . . . . . . . . 75
2.5.1 Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5.2 IV Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5.3 Generalized Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.5.4 Generalized IV and Filtered IV Estimands . . . . . . . . . . . . . . . . . . . . 80
2.6 Estimands Based on Instruments and Outcomes . . . . . . . . . . . . . . . . . . . . . 81
2.6.1 Ratio Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.6.2 Local Projection IV Estimand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.6.3 Generalized Ratio Wald Estimand . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6.4 Generalized Local Projection IV and Local Filtered Projection IV Estimands 85
2.7 Estimands Based Only on Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.7.1 Linear simultaneous equation approach . . . . . . . . . . . . . . . . . . . . . . 86
2.7.2 Causal meaning of the GIRF of Yk,t on Yj,t`h . . . . . . . . . . . . . . . . . . . 88
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3 Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective 90


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 Potential outcome panel and dynamic causal effects . . . . . . . . . . . . . . . . . . . 93
3.2.1 Assignment panels and potential outcomes . . . . . . . . . . . . . . . . . . . 93
3.2.2 The potential outcome panel model . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2.3 Assignment mechanism assumptions . . . . . . . . . . . . . . . . . . . . . . . 95
3.2.4 Dynamic causal effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.3 Nonparametric estimation and inference . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.3.1 Setup: adapted propensity score and probabilistic assignment . . . . . . . . 99
3.3.2 Estimation of the i, t-th lag-p dynamic causal effect . . . . . . . . . . . . . . . 100
3.3.3 Estimation of lag-p average causal effects . . . . . . . . . . . . . . . . . . . . . 102
3.3.4 Confidence intervals and testing for lag-p average causal effects . . . . . . . 103

v
3.4 Estimation in a linear potential outcome panel . . . . . . . . . . . . . . . . . . . . . . 104
3.4.1 Interpreting the unit fixed effects estimator . . . . . . . . . . . . . . . . . . . . 105
3.4.2 Interpreting the two-way fixed effects estimator . . . . . . . . . . . . . . . . . 106
3.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.5.1 Simulation design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.5.2 Normal approximations and size control . . . . . . . . . . . . . . . . . . . . . 109
3.5.3 Rejection rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.6 Empirical application in experimental economics . . . . . . . . . . . . . . . . . . . . 111
3.6.1 Inference on total lag-p weighted average dynamic causal effects . . . . . . . 112
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

References 115

Appendix A Appendix to Chapter 1 133


A.1 Additional Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 User’s Guide to Identifying Prediction Mistakes in Screening Decisions . . . . . . . 141
A.3 Additional Results for the Expected Utility Maximization Model . . . . . . . . . . . 144
A.3.1 Characterization of Expected Utility Maximization in Treatment Decisions . 144
A.3.2 ϵw -Approximate Expected Utility Maximization . . . . . . . . . . . . . . . . . 146
A.3.3 Expected Utility Maximization with Inaccurate Beliefs after Dimension
Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.3.4 The Policymaker’s First-Best Decision Rule . . . . . . . . . . . . . . . . . . . . 151
A.4 Additional Results for the Econometric Framework . . . . . . . . . . . . . . . . . . . 153
A.4.1 Constructing Bounds on the Missing Data through a Quasi-Random Instru-
ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A.4.2 Testing Expected Utility Maximization Behavior in Treatment Decisions . . . 154
A.4.3 Expected Social Welfare Under the Decision Maker’s Observed Choices . . . 160
A.5 Proofs of Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
A.6 Expected Utility Maximization Behavior with Continuous Characteristics . . . . . . 175
A.7 Alternative Bounds on the Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 178
A.7.1 Direct Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
A.7.2 Proxy Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
A.8 Summary Figures and Tables for New York City Pretrial Release . . . . . . . . . . . 183
A.9 Additional Empirical Results for New York City Pretrial Release . . . . . . . . . . . 193
A.9.1 Welfare Effects of Automation Policies: Race-by-Felony Charge Cells . . . . 193
A.9.2 Identifying Prediction Mistakes: Direct Imputation . . . . . . . . . . . . . . . 196
A.9.3 Defining the Outcome to be Any Pretrial Misconduct . . . . . . . . . . . . . . 200
A.9.4 Alternative Pretrial Release Definition . . . . . . . . . . . . . . . . . . . . . . . 203

vi
Appendix B Appendix to Chapter 2 207
B.1 Proofs of Results for Assignments and Outputs . . . . . . . . . . . . . . . . . . . . . 207
B.1.1 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
B.1.2 Proof of Theorem 2.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
B.1.3 Proof of Theorem 2.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2 Proofs of Results for Assignments, Instruments and Outputs . . . . . . . . . . . . . 209
B.2.1 Proof of Theorem 2.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2.2 Proof of Theorem 2.5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3 Proofs of Results for Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3.1 Proof of Theorem 2.7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Appendix C Appendix to Chapter 3 211


C.1 Proofs of main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
C.2 Additional theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
C.2.1 Prediction decomposition of the adapted propensity score . . . . . . . . . . . 216
C.2.2 Estimation as a repeated cross-section . . . . . . . . . . . . . . . . . . . . . . . 216
C.3 Additional simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
C.3.1 Additional simulations for the estimator of the total average dynamic causal
effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
C.3.2 Simulations for the estimator of the time-t average dynamic causal effects . 222
C.3.3 Simulations for the estimator of the unit-i average dynamic causal effects . . 229
C.4 Additional empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
C.4.1 Analysis of unit and time-specific average dynamic causal effects . . . . . . . 235

vii
List of Tables

1.1 Estimated lower bound on the fraction of judges whose release decisions are
inconsistent with expected utility maximization at accurate beliefs about failure to
appear risk given defendant characteristics. . . . . . . . . . . . . . . . . . . . . . . . . 43
1.2 Location of the maximum studentized violation of revealed preference inequali-
ties among judges whose release decisions are inconsistent with expected utility
maximization at accurate beliefs about failure to appear risk given defendant
characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.1 Top line results for the causal interpretation of common estimands based on assign-
ments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.2 Top line results for the causal interpretation of common estimands based on assign-
ments, instruments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.3 Top line results for the causal interpretation of common estimands based on instru-
ments and outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.1 Null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2 Stage games from twice-played prisoners’ dilemma in the experiment conducted by
Andreoni and Samuelson (2006). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3 Summary statistics for the experiment in Andreoni and Samuelson (2006). . . . . . 112
3.4 Estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3. 113

A.1 Estimated lower bound on the fraction of judges whose release decisions are
inconsistent with expected utility maximization behavior at accurate beliefs about
any pretrial misconduct risk given defendant characteristics. . . . . . . . . . . . . . . 140
A.2 Summary statistics comparing the main estimation sample and cases heard by the
top 25 judges, broken out by defendant race. . . . . . . . . . . . . . . . . . . . . . . . 185
A.3 Summary statistics for released and detained defendants in the main estimation
sample and for cases heard by the top 25 judges. . . . . . . . . . . . . . . . . . . . . . 186
A.4 Summary statistics of misconduct rates among released defendants in the main
estimation sample and cases heard by the top 25 judges. . . . . . . . . . . . . . . . . 187

viii
A.5 Summary statistics in the universe of all cases subject to a pretrial release decision
and main estimation sample in the NYC pretrial release data, broken out by
defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
A.6 Summary statistics for released and detained defendants in the universe of all cases
subject to a pretrial release decision and the main estimation sample in the NYC
pretrial release data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.7 Balance check estimates for the quasi-random assignment of judges for all defen-
dants and by defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.8 Balance check estimates for the quasi-random assignment of judges by defendant
race and age. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
A.9 Balance check estimates for the quasi-random assignment of judges by defendant
race and felony charge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.10 Location of the maximum studentized violation of revealed preference inequali-
ties among judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about any pretrial misconduct risk. . . . 202
A.11 Estimated lower bound on the fraction of judges whose “release on recognizance”
decisions are inconsistent with expected utility maximization behavior at accurate
beliefs about behavior under bail conditions and failure to appear risk given
defendant characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

C.1 Null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
C.2 Null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based upon
the normal asymptotic approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

ix
List of Figures

1.1 Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-age cells for
one judge in New York City. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.2 Estimated bounds on implied prediction mistakes between lowest and highest
predicted failure to appear risk deciles made by judges within each race-by-age cell. 46
1.3 Ratio of total expected social welfare under algorithmic decision rule relative to
release decisions of judges that make detectable prediction mistakes about failure
to appear risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.4 Ratio of total expected social welfare under algorithmic decision rule that corrects
prediction mistakes relative to release decisions of judges that make detectable
prediction mistakes about failure to appear risk. . . . . . . . . . . . . . . . . . . . . . 53

3.1 Simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 108
3.2 Rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and
H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A.1 Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-felony charge
cells for one judge in New York City. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Estimated bounds on implied prediction mistakes between top and bottom pre-
dicted failure to appear risk deciles made by judges within each race-by-felony
charge cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
A.3 Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that make detectable prediction mistakes about failure
to appear risk by defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.4 Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that make detectable prediction mistakes about failure to appear risk.136
A.5 Ratio of total expected social welfare under algorithmic decision rule relative to
release decisions of judges that do not make detectable prediction mistakes about
failure to appear risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

x
A.6 Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes by
defendant race. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.7 Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that do not make detectable prediction mistakes. . . . . . . . . . . . 139
A.8 Histogram of number of cases heard by each judge in the top 25 judges. . . . . . . . 183
A.9 Receiver-operating characteristic (ROC) curves for ensemble prediction functions . 184
A.10 Ratio of total expected social welfare under algorithmic decision rules relative to
observed release decisions of judges that make detectable prediction mistakes over
race-by-felony charge cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.11 Ratio of total expected social welfare under full automation decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes over
race-by-felony charge cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.12 Fraction of judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about failure to appear risk using direct
imputation bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.13 95% confidence intervals for the implied prediction mistake of failure to appear risk
between the highest and lowest predicted failure to appear risk deciles using direct
imputation bounds with κ “ 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
A.14 95% confidence intervals for the implied prediction mistakes of any pretrial miscon-
duct risk between the highest and lowest predicted any pretrial misconduct risk
deciles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

C.1 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 219
C.2 Simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 220
C.3 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 221
C.4 Simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 223
C.5 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 224
C.6 Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
C.7 Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

xi
C.8 Simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 227
C.9 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 228
C.10 Simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of
the parameter ϕ and treatment probability ppwq. The rows index the parame-
ter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability ppwq P
t0.25, 0.5, 0.75u. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
C.11 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 230
C.12 Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
C.13 Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter
ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
C.14 Simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq. . . . . . . . . . . . . . . . . . . . . . . . 233
C.15 Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq. . . . . . 234
C.16 Estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5)
of W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of
Andreoni and Samuelson (2006). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
C.17 Estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq of
W “ 1tλ ě 0.6u on cooperation in period one based on the experiment of Andreoni
and Samuelson (2006) for each time period t P rTs and p “ 0, 1, 2, 3. . . . . . . . . . . 237

xii
Acknowledgments

I have so many people to thank for filling my life as a Ph.D. student with happiness and joy.

I am grateful to my extraordinary advisors for their advice, mentorship, and unflinching

support. Sendhil Mullainathan took me under his wing early on, and he has been a constant

source of inspiration and fun ever since. A call or text from Sendhil could always put a smile

on my face and reignite my passion for research. Isaiah Andrews is my role model in every

sense. He constantly made time for me, taught me something new in every conversation, and

showed me that brilliance does not conflict with caring, generosity, and kindness. Neil Shephard

showed me the ropes of conducting research – from conceiving an idea to writing a draft, and

from presentation to publication. My collaboration with Neil was the best apprenticeship I

could have imagined. Elie Tamer provided expert advice and helped me see how I fit into the

greater econometrics community. Gary Chamberlain patiently answered my many questions and

provided thoughtful feedback on my half-baked ideas. He gave me the confidence that I belonged

in the econometrics group at Harvard, and for that I am forever grateful.

Beyond my advisors, I am grateful to have had Jens Ludwig, Iavor Bojinov, Alexandra

Chouldechova, Ed Glaeser, Pepe Montiel Olea, and Joshua Schwartzstein first as mentors and

now as friends. I received invaluable advice from Jim Stock and Larry Katz. My undergraduate

advisors at Princeton, Mark Watson and Hank Farber, inspired me to do a Ph.D. in Economics

and continued to be a well of support and encouragement. I am fortunate to have collaborated

with Jon Kleinberg, Nano Barahona, Matthew Gentzkow, and Jesse Shapiro. Their intellectual

rigor is an inspiration, and I continue to learn so much from them.

Jonathan Roth and Amanda Coston were close collaborators, and even better friends. Andrea

Hamaui, Francois-Xavier Ladant, Robert Minton, Ljubica Ristovska, Ricardo Rodriguez-Padilla,

Giorgio Saponaro, Sagar Saxena, and Roman Sigalov filled my Ph.D. with raucous laughter. I am

so grateful we were in the same cohort and to have become their friends.

My family was my bedrock of love and support throughout my Ph.D. My parents, Anant and

Geeta, nurtured my dream of doing a Ph.D. in Economics, encouraged me to relentlessly pursue

it, and supported me every step of the way. I owe my parents everything. The highlight of my

Ph.D. was living within a mile of my brother Akshar and sister-in-law Apoorva for two years. Our

xiii
weekly visits to Cambridge Common and Mario Kart battles were an oasis of fun amidst classes

and research. My sister Ishanaa and brother-in-law James were never more than a Facetime call

away. Many adventure-filled weekend visits to D.C. or San Francisco often meant they were much

closer. During my Ph.D., our family grew in size with the birth of my two nieces, Aadyaa and

Amiyaa. They are two rays of sunshine that brighten my life every day.

Finally, I was blessed to meet the love of my life, Jessica, during my Ph.D. Her love, support,

and humor gave me the strength to persevere, and she means more to me than I can possibly put

in words. I am so lucky to have her in my life.

xiv
To my Nani, Sohnmatee Balkisoon; and the loving memories of my
Nana, Dibnarine Balkissoon; Ajee, Chanrawtee Rambachan; and Aja,
Deoraj Rambachan.

xv
Introduction

This dissertation contains three chapters in econometrics. A common theme is identification

analysis, with a particular focus on understanding what researchers can learn from data under

weak assumptions on economic behavior and dynamic causal effects.

Decision makers, such as doctors, judges, and managers, make consequential choices based on

predictions of unknown outcomes. In the first chapter, I consider the following questions: Do these

decision makers make systematic prediction mistakes based on the available information? If so, in

what ways are their predictions systematically biased? Uncovering systematic prediction mistakes

is difficult as the preferences and information sets of decision makers are unknown to researchers.

In this paper, I characterize behavioral and econometric assumptions under which systematic

prediction mistakes can be identified in observational empirical settings such as hiring, medical

testing, and pretrial release. I derive a statistical test for whether the decision maker makes

systematic prediction mistakes under these assumptions and show how supervised machine

learning based methods can be used to apply this test. I provide methods for conducting inference

on the ways in which the decision maker’s predictions are systematically biased. As an illustration,

I apply this econometric framework to analyze the pretrial release decisions of judges in New

York City, and I estimate that at least 20% of judges make systematic prediction mistakes about

failure to appear risk given defendant characteristics.

In the second chapter, which is joint work with Neil Shephard, we introduce the nonparametric,

direct potential outcome system as a foundational framework for analyzing dynamic causal effects

of assignments on outcomes in observational time series settings. We place no functional form

restrictions on the potential outcome process nor restrictions on the extent to which past assign-

ments may causally affect outcomes. Using this framework, we provide conditions under which

1
common predictive time series estimands, such as the impulse response function, generalized

impulse response function, local projection, and local projection instrument variables, have a

nonparametric causal interpretation in terms of such dynamic causal effects.

The third chapter, which is joint work with Iavor Bojinov and Neil Shephard, analyzes dynamic

causal effects in panel experiments. In panel experiments, we randomly assign units to different

interventions, measuring their outcomes, and repeating the procedure in several periods. Using

the potential outcomes framework, we define finite population dynamic causal effects that capture

the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects,

we provide a nonparametric estimator that is unbiased over the randomization distribution and

derive its finite population limiting distribution as either the sample size or the duration of

the experiment increases. We develop two methods for inference: a conservative test for weak

null hypotheses and an exact randomization test for sharp null hypotheses. We further analyze

the finite population probability limit of linear fixed effects estimators. These commonly-used

estimators do not recover a causally interpretable estimand if there are dynamic causal effects and

serial correlation in the assignments, highlighting the value of our proposed estimator.

2
Chapter 1

Identifying Prediction Mistakes in


Observational Data1

1.1 Introduction

Decision makers, such as doctors, judges, and managers, are commonly tasked with making

consequential choices based on predictions of unknown outcomes. For example, in deciding

whether to detain a defendant awaiting trial, a judge predicts what the defendant will do if released

based on information such as the defendant’s current criminal charge and prior arrest record. Are

these decision makers making systematic prediction mistakes based on this available information?

If so, in what ways are their predictions systematically biased? These foundational questions

in behavioral economics and psychology (e.g., Meehl, 1954; Tversky and Kahneman, 1974) have

renewed policy relevance and empirical life as machine learning based models increasingly replace

or inform decision makers in criminal justice, health care, labor markets, and consumer finance.2

1I am especially grateful to Isaiah Andrews, Sendhil Mullainathan, Neil Shephard, Elie Tamer and Jens Ludwig
for their invaluable feedback, support, and advice. I thank Alex Albright, Nano Barahona, Laura Blattner, Iavor
Bojinov, Raj Chetty, Bo Cowgill, Will Dobbie, Xavier Gabaix, Matthew Gentzkow, Ed Glaeser, Yannai Gonczarowski,
Larry Katz, Daniel Martin, Ross Mattheis, Robert Minton, Ljubica Ristovska, Jonathan Roth, Suproteem Sarkar, Joshua
Schwartzstein, Jesse Shapiro, Chris Walker, and participants at the Brookings Institution’s Artificial Intelligence
Conference for many useful comments and suggestions. I also thank Hye Chang, Nicole Gillespie, Hays Golden, and
Ellen Louise Dunn for assistance at the University of Chicago Crime Lab. All empirical results based on New York City
pretrial data were originally reported in a University of Chicago Crime Lab technical report (Rambachan and Ludwig,
2021). I acknowledge financial support from the NSF Graduate Research Fellowship (Grant DGE1745303).
2 Riskassessment tools are used in criminal justice systems throughout the United States (Stevenson, 2018; Albright,
2019; Dobbie and Yang, 2019; Stevenson and Doleac, 2019; Yang and Dobbie, 2020). Clinical risk assessments aid

3
In assessing whether such machine learning based models can improve decision-making,

empirical researchers evaluate decision makers’ implicit predictions through comparisons of their

choices against those made by predictive models.3 Uncovering systematic prediction mistakes from

decisions is challenging, however, as both the decision maker’s preferences and information set

are unknown to us. For example, we do not know how judges assess the cost of pretrial detention.

Judges may uncover useful information through their courtroom interactions with defendants, but

we do not observe these interactions. Hence, the decision maker’s choices may diverge from the

model not because she is making systematic prediction mistakes, but rather she has preferences

that differ from the model’s objective function or observes information that is unavailable to the

model. While this empirical literature recognizes these challenges (e.g., Kleinberg et al., 2018a;

Mullainathan and Obermeyer, 2021), it lacks a unifying econometric framework for analyzing the

decision maker’s choices under the weakest possible assumptions about their preferences and

information sets.

This paper develops such an econometric framework for analyzing whether a decision maker

makes systematic prediction mistakes and to characterize how predictions are systematically

biased. This clarifies what can (and cannot) be identified about systematic prediction mistakes

from data and empirically relevant assumptions about behavior, and maps those assumptions

into statistical inferences about systematic prediction mistakes. I consider empirical settings, such

as pretrial release, medical treatment or diagnosis, and hiring, in which a decision maker must

make decisions for many individuals based on a prediction of some unknown outcome using

each individual’s characteristics. These characteristics are observable to both the decision maker

and the researcher. The available data on the decision maker’s choices and associated outcomes

suffer from a missing data problem (Heckman, 1974; Rubin, 1976; Heckman, 1979; Manski, 1989):

doctors in diagnostic and treatment decisions (Obermeyer and Emanuel, 2016; Beaulieu-Jones et al., 2019; Abaluck
et al., 2020; Chen et al., 2020). For applications in consumer finance, see Einav et al. (2013), Fuster et al. (2018), Gillis
(2019), Dobbie et al. (2020), and Blattner and Nelson (2021) in economics, and see Khandani et al. (2010), Hardt et al.
(2016), Liu et al. (2018), and Coston et al. (2021) in computer science. For discussions of workforce analytics and resume
screening software, see Autor and Scarborough (2008), Jacob and Lefgren (2008), Rockoff et al. (2011), Feldman et al.
(2015), Hoffman et al. (2018), Erel et al. (2019), Li et al. (2020), Raghavan et al. (2020), and Frankel (2021).
3 See, for example, Kleinberg et al. (2015), Berk et al. (2016), Chalfin et al. (2016), Chouldechova et al. (2018), Cowgill

(2018), Hoffman et al. (2018), Kleinberg et al. (2018a), Erel et al. (2019), Ribers and Ullrich (2019), Li et al. (2020), Jung
et al. (2020a), and Mullainathan and Obermeyer (2021). Comparing a decision maker’s choices against a predictive
model has a long tradition in psychology (e.g., Dawes, 1971, 1979; Dawes et al., 1989; Camerer and Johnson, 1997; Grove
et al., 2000; Kuncel et al., 2013). See Camerer (2019) for a recent review of this literature.

4
the researcher only observes the outcome conditional on the decision maker’s choices (e.g., the

researcher only observes a defendant’s behavior upon release if a judge released them).4

This paper then makes four main contributions. First, I characterize behavioral and econometric

assumptions under which systematic prediction mistakes can be identified in these empirical

settings. Second, under these assumptions, I derive a statistical test for whether the decision

maker makes systematic prediction mistakes and show how machine learning based models can

be used to apply this test. Third, I provide methods for conducting inference on the ways in which

the decision maker’s predictions are systematically biased. These contributions provide, to my

knowledge, the first microfounded econometric framework for studying systematic prediction

mistakes in these empirical settings, enabling researchers to answer a wider array of behavioral

questions under weaker assumptions than existing empirical research.5 Finally, I apply this

econometric framework to analyze the pretrial release decisions of judges in New York City as an

empirical illustration.

I explore the restrictions imposed on the decision maker’s choices by expected utility max-

imization, which models the decision maker as maximizing some (unknown to the researcher)

utility function at beliefs about the outcome given the characteristics as well as some private

information.6,7,8 Due to the missing data problem, the true conditional distribution of the outcome

given the characteristics is partially identified. The expected utility maximization model therefore

4A large literature explores whether forecasters, households, or individuals have rational expectations in settings
where both outcomes and subjective expectations or forecasts are observed. See, for example, Manski (2004), Elliott
et al. (2005), Elliott et al. (2008), Gennaioli et al. (2016), Bordalo et al. (2020), D’Haultfoeuille et al. (2020), and Farmer et al.
(2021). I focus on settings in which we only observe an individual’s discrete choices and partial information about the
outcome.
5 Appendix A.2 provides a step-by-step user’s guide for empirical researchers interested in applying these methods.
6 Mourifie et al. (2019) and Henry et al. (2020) analyze the testable implications of Roy-style and extended Roy-style

selection. See Heckman and Vytlacil (2006) for an econometric review of Roy selection models. The expected utility
maximization model can be interpreted as a generalized Roy model, and I discuss these connections in Section 1.2.
7A literature in decision theory explores conditions under which a decision maker’s random choice rule, which
summarizes their choice probabilities in each possible menu of actions, has a random utility model representation. See,
for example, Gul and Pesendorfer (2006), Gul et al. (2014), Lu (2016), and Natenzon (2019). I consider empirical settings
in which we only observe choice probabilities from a single menu.
8 Kubler et al. (2014), Echenique and Saito (2015), Chambers et al. (2016), Polisson et al. (2020), and Echenique et al.

(2021) use revealed preference analysis to characterize expected utility maximization behavior in consumer demand
settings, in which a consumer’s state-contingent consumption choices across several budget sets are observed. See the
review in Echenique (2020).

5
only restricts the decision maker’s beliefs given the characteristics to lie in this identified set,

what I call “accurate beliefs.” If there exists no utility function in a researcher-specified class

that rationalizes observed choices under this model, I therefore say the decision maker is making

systematic prediction mistakes based on the characteristics of individuals.

I derive a sharp characterization, based on revealed preference inequalities, of the identified

set of utility functions at which the decision maker’s choices are consistent with expected utility

maximization at accurate beliefs. If these revealed preference inequalities are satisfied at a

candidate utility function, then some distribution of private information can be constructed such

that the decision maker cannot do better than their observed choices in an expected utility sense.9

If the identified set of utility functions is empty, then the decision maker is making systematic

prediction mistakes as there is no combination of utility function and private information at which

their observed choices are consistent with expected utility maximization at accurate beliefs.

I then prove that without further assumptions systematic prediction mistakes are untestable. If

either all characteristics of individuals directly affect the decision maker’s utility function or the

missing data can take any value, then the identified set of utility functions is non-empty. Any

variation in the decision maker’s conditional choice probabilities can be rationalized by a utility

function and private information that sufficiently vary across all the characteristics. However,

placing an exclusion restriction on which characteristics may directly affect the utility function

and constructing informative bounds on the missing data restore the testability of expected utility

maximization behavior.10 Under such an exclusion restriction, variation in the decision maker’s

choices across characteristics that do not directly affect the utility function must only arise due

to variation in beliefs. The decision maker’s beliefs given the characteristics and her private

information must further be Bayes-plausible with respect to some distribution of the outcome

given the characteristics that lies in the identified set. Together this implies testable restrictions on

9 By searching for any distribution of private information that rationalizes the decision maker’s choices, my analysis
follows in the spirit of the robust information design literature (e.g., Kamenica and Gentzkow, 2011; Bergemann and
Morris, 2013, 2016, 2019; Kamenica, 2019). Syrgkanis et al. (2018) and Bergemann et al. (2019) use results from this
literature to study multiplayer games, whereas I analyze the choices of a single decision maker. Gualdani and Sinha
(2020) also analyzes single-agent settings under weak assumptions on the information environment.
10 The exclusion restriction on which characteristics may directly affect utility complements recent results on the
validity of “marginal outcome tests” for discrimination (Bohren et al., 2020; Canay et al., 2020; Gelbach, 2021; Hull, 2021).
In the special case of a binary decision and binary outcome, the expected utility maximization model is a generalization
of the extended Roy model analyzed in Canay et al. (2020) and Hull (2021). I formalize these connections in Section 1.3.

6
the decision maker’s choices across characteristics that do not directly affect utility. Behavioral

assumptions about the decision maker’s utility function and econometric assumptions to address

the missing data problem are therefore sufficient to identify systematic prediction mistakes.

These results clarify what conclusions can be logically drawn about systematic prediction

mistakes given the researcher’s assumptions on the decision maker’s utility function and the

missing data problem. These testable restrictions arise from the joint null hypothesis that the

decision maker maximizes expected utility at accurate beliefs and that their utility function

satisfies the conjectured exclusion restriction.11 If these restrictions are satisfied, we cannot

logically reject that the decision maker’s choices maximize expected utility at some utility function

in the researcher-specified class and beliefs about the outcomes given the characteristics that lie

in the identified set. Stronger conclusions would require stronger assumptions on the decision

maker’s utility function or tighter bounds on the missing data.

Testing these restrictions implied by expected utility maximization at accurate beliefs is

equivalent to a moment inequality problem. This is a well-studied problem in econometrics for

which many procedures are available (e.g., see the reviews by Canay and Shaikh, 2017; Molinari,

2020). The number of moment inequalities grows with the dimensionality of the observable

characteristics of individuals, which will typically be quite large in empirical applications. To

deal with this practical challenge, I discuss how supervised machine learning methods may be

used to reduce the dimension of this testing problem. Researchers may construct a prediction

function for the outcome on held out data and partition the characteristics into percentiles of

predicted risk based on this estimated prediction function. Testing implied revealed preference

inequalities across percentiles of predicted risk is a valid test of the joint null hypothesis that the

decision maker’s choices maximize expected utility at a utility function satisfying the conjectured

exclusion restriction and accurate beliefs. This provides, to my knowledge, the first microfounded

procedure for using supervised machine learning based prediction functions to identify systematic

prediction mistakes.

With this framework in place, I further establish that the data are informative about how

11 This finding echoes a classic insight in finance that testing whether variation in asset prices reflect violations of

rational expectations requires assumptions about admissible variation in underlying stochastic discount factors. See
Campbell (2003), Cochrane (2011) for reviews and Augenblick and Lazarus (2020) for a recent contribution.

7
the decision maker’s predictions are systematically biased. I extend the behavioral model to

allow the decision maker to have possibly inaccurate beliefs about the unknown outcome and

sharply characterize the identified set of utility functions at which the decision maker’s choices

are consistent with “inaccurate” expected utility maximization.12 This takes no stand on the

behavioral foundations for the decision maker’s inaccurate beliefs, and so it encompasses various

frictions or mental gaps such as inattention to characteristics or representativeness heuristics (e.g.,

Sims, 2003; Gabaix, 2014; Caplin and Dean, 2015; Bordalo et al., 2016; Handel and Schwartzstein,

2018). I derive bounds on an interpretable parameter that summarizes the extent to which the

decision maker’s beliefs overreact or underreact to the characteristics of individuals. For a fixed

pair of characteristic values, these bounds summarize whether the decision maker’s beliefs about

the outcome vary more (“overreact”) or less than (“underreact”) the true conditional distribution

of the outcome across these values. These bounds again arise because any variation in the decision

maker’s choices across characteristics that do not directly affect utility must only arise due to

variation in beliefs. Comparing observed variation in the decision maker’s choice probabilities

against possible variation in the probability of the outcome is therefore informative about the

extent to which the decision maker’s beliefs are inaccurate.

As an empirical illustration, I analyze the pretrial release system in New York City, in which

judges decide whether to release defendants awaiting trial based on a prediction of whether

they will fail to appear in court.13 For each judge, I observe the conditional probability that she

releases a defendant given a rich set of characteristics (e.g., race, age, current charge, prior criminal

record, etc.) as well as the conditional probability that a released defendant fails to appear in

court. The conditional failure to appear rate among detained defendants is unobserved due to

the missing data problem. If all defendant characteristics may directly affect the judge’s utility

function or the conditional failure to appear rate among detained defendants may take any value,

then my theoretical results establish that the judge’s release decisions are always consistent with

12 The decision maker’s beliefs about the outcome conditional on the characteristics are no longer required to lie in

the identified set for the conditional distribution of the outcome given the characteristics.
13 Several empirical papers also study the New York City pretrial release system. Leslie and Pope (2017) estimates

the effects of pretrial detention on criminal case outcomes. Arnold et al. (2020b) and Arnold et al. (2020a) estimate
whether judges and pretrial risk assessments respectively discriminate against black defendants. Kleinberg et al. (2018a)
studies whether a statistical risk assessment could improve pretrial outcomes in New York City. I discuss the differences
between my analysis and this prior research in Section 1.5.

8
expected utility maximization behavior at accurate beliefs. We cannot logically rule out that the

judge’s release decisions reflect either a utility function that varies richly based on defendant

characteristics or sufficiently predictive private information absent further assumptions.

However, empirical researchers often assume that while judges may engage in taste-based

discrimination on a defendant’s race, other defendant characteristics such as prior pretrial mis-

conduct history only affects judges’ beliefs about failure to appear risk. Judges in New York City

are quasi-randomly assigned to defendants, which implies bounds on the conditional failure to

appear rate among detained defendants. Given such exclusion restrictions and quasi-experimental

bounds on the missing data, expected utility maximization behavior is falsified by “misrankings”

in the judge’s release decisions. Holding fixed defendant characteristics that may directly affect

utility (e.g., among defendants of the same race), do all defendants released by the judge have a

lower observed failure to appear rate than the researcher’s upper bound on the failure to appear

rate of all defendants detained by the judge? If not, there is no combination of a utility function

that satisfies the conjectured exclusion restriction nor private information such that the judge’s

choices maximize expected utility at accurate beliefs about failure to appear risk given defendant

characteristics.

By testing for such misrankings in the pretrial release decisions of individual judges, I estimate,

as a lower bound, that at least 20% of judges in New York City from 2008-2013 make systematic

prediction mistakes about failure to appear risk based on defendant characteristics. Under a

range of exclusion restrictions and quasi-experimental bounds on the failure to appear rate among

detained defendants, there exists no utility function nor distribution of private information such

that the release decisions of these judges would maximize expected utility at accurate beliefs

about failure to appear risk. I further find that these systematic prediction mistakes arise because

judges’ beliefs underreact to variation in failure to appear risk based on defendant characteristics

between predictably low risk and predictably high risk defendants. Rejections of expected utility

maximization behavior at accurate beliefs are therefore driven by release decisions on defendants

at the tails of the predicted risk distribution.

Finally, to highlight policy lessons from this behavioral analysis, I explore the implications of

replacing decision makers with algorithmic decision rules in the New York City pretrial release

setting. Since supervised machine learning methods are tailored to deliver accurate predictions

9
(Mullainathan and Spiess, 2017; Athey, 2017), such algorithmic decision rules may improve

outcomes by correcting systematic prediction mistakes. I show that expected social welfare under

a candidate decision rule is partially identified, and inference on counterfactual expected social

welfare can be reduced to testing moment inequalities with nuisance parameters that enter the

moments linearly (e.g., see Gafarov, 2019; Andrews et al., 2019; Cho and Russell, 2020; Cox and

Shi, 2020). Using these results, I then estimate the effects of replacing judges who were found to

make systematic prediction mistakes with an algorithmic decision rule. Automating decisions

only where systematic prediction mistakes occur at the tails of the predicted risk distribution

weakly dominates the status quo, and can lead to up to 20% improvements in worst-case expected

social welfare, which is measured as a weighted average of the failure to appear rate among

released defendants and the pretrial detention rate. Automating decisions whenever the human

decision maker makes systematic prediction mistakes can therefore be a free lunch.14 Fully

replacing judges with the algorithmic decision rule, however, has ambiguous effects that depend

on the parametrization of social welfare. In fact, for some parametrizations of social welfare, I

find that fully automating decisions can lead to up to 25% reductions in worst-case expected

social welfare relative to the judges’ observed decisions. More broadly, designing algorithmic

decision rules requires carefully assessing their predictive accuracy and their effects on disparities

across groups (e.g., Barocas et al., 2019; Mitchell et al., 2019; Chouldechova and Roth, 2020). These

findings highlight that it is also essential to analyze which the existing decision makers make

systematic prediction mistakes and if so, on what decisions. This paper relates to a large empirical

literature that evaluates decision makers’ implicit predictions through either comparisons of their

choices against those made by machine learning based models or estimating structural models of

decision making in particular empirical settings. While the challenges of unknown preferences

and information sets are recognized, researchers typically resort to strong assumptions. Kleinberg

et al. (2018a) and Mullainathan and Obermeyer (2021) restrict preferences to be constant across

both decisions and decision makers. Lakkaraju and Rudin (2017), Chouldechova et al. (2018),

14 This
finding relates to a computer science literature on “human-in-the-loop” analyses of algorithmic decision
support systems (e.g., Tan et al., 2018; Green and Chen, 2019a,b; De-Arteaga et al., 2020; Hilgard et al., 2021). Recent
methods estimate whether a decision should be automated by an algorithm or instead be deferred to an existing decision
maker (Madras et al., 2018; Raghu et al., 2019; Wilder et al., 2020; De-Arteaga et al., 2021). I show that understanding
whether to automate or defer requires assessing whether the decision maker makes systematic prediction mistakes.

10
Coston et al. (2020), and Jung et al. (2020a) assume that observed choices were as-good-as randomly

assigned given the characteristics, eliminating the problem of unknown information sets. Recent

work introduces parametric models for the decision maker’s private information, such as Abaluck

et al. (2016), Arnold et al. (2020b), Jung et al. (2020b), and Chan et al. (2021). See also Currie

and Macleod (2017), Ribers and Ullrich (2020), and Marquardt (2021). I develop an econometric

framework for studying systematic prediction mistakes that only requires exclusion restrictions on

which characteristics affect the decision maker’s preferences but no further restrictions. I model

the decision maker’s information environment fully nonparametrically. This enables researchers

to both identify and characterize systematic prediction mistakes in many empirical settings under

weaker assumptions than existing research.

My identification results build on a growing literature in microeconomic theory that derives

the testable implications of various behavioral models in “state-dependent stochastic choice

(SDSC) data” (Caplin and Martin, 2015; Caplin and Dean, 2015; Caplin, 2016; Caplin et al., 2020;

Caplin and Martin, 2021). While useful in analyzing lab-based experiments, such characterization

results have had limited applicability so far due to the difficulty of collecting such SDSC data

(Gabaix, 2019; Rehbeck, 2020). I focus on common empirical settings in which the data suffer

from a missing data problem, and show that these settings can approximate ideal SDSC data

by using quasi-experimental variation to address the missing data problem. Quasi-experimental

variation in observational settings such as pretrial release, medical treatment or diagnosis, and

hiring can therefore solve the challenge of “economic data engineering” recently laid out by

Caplin (2021). Martin and Marx (2021) study the identification of taste-based discrimination

by a decision maker in a binary choice experiment, providing bounds on the decision maker’s

group-dependent threshold rule. The setting I consider nests theirs by allowing for several key

features of observational data such as missing data, multi-valued outcomes, and multiple choices.

Lu (2019) shows that a decision maker’s state-dependent utilities and beliefs can be identified

provided choice probabilities across multiple informational treatments are observed.

11
1.2 An Empirical Model of Expected Utility Maximization

A decision maker makes choices for many individuals based on a prediction of an unknown

outcome using each individual’s characteristics. Under what conditions do the decision maker’s

choices maximize expected utility at some (unknown to us) utility function and accurate beliefs

given the characteristics and some private information?

1.2.1 Setting and Observable Data

A decision maker makes choices for many individuals based on predictions of an unknown

outcome. The decision maker selects a binary choice c P t0, 1u for each individual. Each individual

is summarized by characteristics pw, xq P W ˆ X and potential outcomes ⃗y :“ py0 , y1 q.15 The

potential outcome yc P Y is the outcome that would occur if the decision maker were to select

choice c P t0, 1u. The characteristics and potential outcomes have finite support, and I denote

dw :“ |W |, d x :“ |X |. The random vector pW, X, C, ⃗


Yq „ P defined over W ˆ X ˆ t0, 1u ˆ Y 2

summarizes the joint distribution of the characteristics, the decision maker’s choices, and the

potential outcomes across all individuals. I assume PpW “ w, X “ xq ě δ for all pw, xq P W ˆ X

for some δ ą 0 throughout the paper.

The researcher observes the characteristics of each individual as well as the decision maker’s

choice. There is, however, a missing data problem: the researcher only observes the potential

outcome associated with the choice selected by the decision maker (Rubin, 1974; Holland, 1986).

Defining the observable outcome as Y :“ CY1 ` p1 ´ CqY0 , the researcher therefore observes the

joint distribution pW, X, C, Yq „ P. I assume the researcher knows this population distribution

with certainty to focus on the identification challenges in this setting. The researcher observes the

decision maker’s conditional choice probabilities

πc pw, xq :“ PpC “ c | W “ w, X “ xq for all pw, xq P W ˆ X

as well as the conditional potential outcome probabilities

PYc pyc | c, w, xq :“ PpYc “ yc | C “ c, W “ w, X “ xq for all pw, xq P W ˆ X , c P t0, 1u.

15 Thecharacteristics pw, xq P W ˆ X will play different roles in the expected utility maximization model, and so I
introduce separate notation now.

12
The counterfactual potential outcome probabilities PY0 py0 | 1, w, xq, PY1 py1 | 0, w, xq are not observed

due to the missing data problem.16 As a consequence, the choice-dependent potential outcome

probabilities

P⃗Y p⃗y | c, w, xq :“ Pp⃗


Y “ ⃗y | C “ c, W “ w, X “ xq

are unobserved for all pw, xq P W ˆ X , c P t0, 1u.

Notation: For a finite set A, let ∆pAq denote the set of all probability distributions on A. For

c P t0, 1u, let P⃗Y p ¨ | c, w, xq P ∆pY 2 q denote the vector of conditional potential outcome probabilities

given C “ c and characteristics pw, xq P W ˆ X . For any pair c, c̃ P t0, 1u I write PYc pyc | c̃, w, xq :“

PpYc “ yc | C “ c̃, W “ w, X “ xq and PYc p ¨ | c̃, w, xq P ∆pY q as the distribution of the potential

outcome Yc given C “ c̃ and characteristics pw, xq. Analogously, P⃗Y p⃗y | w, xq :“ Pp⃗
Y “ ⃗y | W “

w, X “ xq with P⃗Y p ¨ | w, xq P ∆pY 2 q and PYc pyc | W “ w, X “ xq :“ PpYc “ yc | W “ w, X “ xq

with PYc p ¨ | w, xq P ∆pY q denote the distributions of potential outcomes given the characteristics

pw, xq P W ˆ X .

Throughout the paper, I model the researcher’s assumptions about the missing data problem

in the form of bounds on the choice-dependent potential outcome probabilities.

Assumption 1.2.1 (Bounds on Missing Data). For each c P t0, 1u and pw, xq P W ˆ X , there exists a

known subset Bc,w,x Ď ∆pY 2 q satisfying

(i) P⃗Y p ¨ | c, w, xq P Bc,w,x ,


ř
(ii) yc̃ PY Pr⃗Y p ¨, yc̃ | c, w, xq “ PYc p ¨ | c, w, xq for all Pr⃗Y p ¨ | c, w, xq P Bc,w,x .

Let Bw,x denote the collection tB0,w,x , B1,w,x u and B denote the collection Bw,x at all pw, xq P W ˆ X .

In some cases, researchers may wish to analyze the decision maker’s choices without placing

any further assumptions on the missing data, which corresponds to setting Bc,w,x equal to the

set of all choice-dependent potential outcome probabilities that are consistent with the joint

distribution of the observable data pW, X, C, Yq „ P. In other cases, researchers may use quasi-

experimental variation or introduce additional assumptions to provide informative bounds on the

choice-dependent potential outcome probabilities, as I discuss in Section 1.3.

16 I adopt the convention that Pp⃗


Y “ ⃗y | C “ c, W “ w, X “ xq “ 0 if PpC “ c | W “ w, X “ xq “ 0.

13
Under Assumption 1.2.1, various features of the joint distribution pW, X, C, ⃗
Yq „ P are partially

identified. The sharp identified set for the distribution of the potential outcome vector given the

characteristics pw, xq P W ˆ X , denoted by H P pP⃗Y p ¨ | w, xq; Bw,x q, equals the set of Pr⃗Y p ¨ | w, xq P

∆pY 2 q satisfying, for all ⃗y P Y 2 ,

Pr⃗Y p⃗y | w, xq “ Pr⃗Y p⃗y | 0, w, xqπ0 pw, xq ` Pr⃗Y p⃗y | 1, w, xqπ1 pw, xq

for some Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x . If the bounds B0,w,x , B1,w,x are singletons for

all pw, xq P W ˆ X , then the joint distribution pW, X, C, ⃗


Yq „ P is point identified by the observable

data and the researcher’s assumptions.

The main text of the paper assumes that the decision maker faces only two choices and that

the characteristics have finite support. My results directly extend to settings with multiple choices,

and I provide an extension to the case with continuous characteristics in Supplement A.6.

1.2.2 Motivating Empirical Applications

In this section, I illustrate how several motivating empirical applications map into this setting.

Example 1.2.1 (Medical Treatment). A doctor decides what medical treatment to give a patient based on a

prediction of how that medical treatment will affect the patient’s health outcomes (Chandra and Staiger, 2007;

Manski, 2017; Currie and Macleod, 2017; Abaluck et al., 2020; Currie and Macleod, 2020). For example,

a doctor decides whether to give reperfusion therapy C P t0, 1u to an admitted patient that has suffered

from a heart attack (Chandra and Staiger, 2020). The potential outcomes Y0 , Y1 P t0, 1u denote whether the

patient would have died within 30 days of admission had the doctor not given or given reperfusion therapy

respectively. The characteristics pW, Xq summarize rich information about the patient that is available

at the time of admission such as demographic information, collected vital signs and the patient’s prior

medical history. The researcher observes the doctor’s conditional treatment probability π1 pw, xq, the 30-day

mortality rate among patients that received reperfusion therapy PY1 p1 | 1, w, xq, and the 30-day mortality

rate among patients that did not receive reperfusion therapy PY0 p1 | 0, w, xq. The counterfactual 30-day

mortality rates PY0 p1 | 1, w, xq and PY1 p1 | 0, w, xq are not observed. ▲

In a large class of empirical applications, the decision maker’s choice does not have a direct

causal effect on the outcome of interest, but still generates a missing data problem through

14
selection. In these settings, the potential outcome given C “ 0 satisfies Y0 ” 0, and Y1 :“ Y ˚ is

a latent outcome that is revealed whenever the decision maker selects choice C “ 1. Hence, the

observable outcome is Y :“ C ¨ Y ˚ . These screening decisions are a leading class of prediction policy

problems (Kleinberg et al., 2015).

Example 1.2.2 (Pretrial Release). A judge decides whether to to detain or release defendants C P t0, 1u

awaiting trial (Arnold et al., 2018; Kleinberg et al., 2018a; Arnold et al., 2020b). The latent outcome

Y ˚ P t0, 1u is whether a defendant would commit pretrial misconduct if released. The characteristics

pW, Xq summarize information about the defendant that is available at the pretrial release hearing such as

demographic information, the current charges filed against the defendant and the defendant’s prior criminal

record. The researcher observes the characteristics of each defendant, whether the judge released them, and

whether the defendant committed pretrial misconduct only if the judge released them. The judge’s conditional

release rate π1 pw, xq and conditional pretrial misconduct rate among released defendants PY˚ p1 | 1, w, xq

are observed. The conditional pretrial misconduct rate among detained defendants PY˚ p1 | 0, w, xq is

unobserved. ▲

Example 1.2.3 (Medical Testing and Diagnosis). A doctor decides whether to conduct a costly medical

test or make a particular diagnosis (Abaluck et al., 2016; Ribers and Ullrich, 2019; Chan et al., 2021).

For example, shortly after an emergency room visit, a doctor decides whether to conduct a stress test on

patients C P t0, 1u to determine whether they had a heart attack (Mullainathan and Obermeyer, 2021). The

latent outcome Y ˚ P t0, 1u is whether the patient had a heart attack. The characteristics pW, Xq summarize

information that is available about the patient such as their demographics, reported symptoms, and prior

medical history. The researcher observes the characteristics of each patient, whether the doctor conducted a

stress test and whether the patient had a heart attack only if the doctor conducted a stress test. The doctor’s

conditional stress testing rate π1 pw, xq and the conditional heart attack rate among stress tested patients

PY˚ p1 | 1, w, xq are observed. The conditional heart attack rate among untested patients PY˚ p1 | 0, w, xq is

unobserved. ▲

Example 1.2.4 (Hiring). A hiring manager decides whether to hire job applicants C P t0, 1u (Autor and

Scarborough, 2008; Chalfin et al., 2016; Hoffman et al., 2018; Frankel, 2021).17 The latent outcome Y ˚ P Y

17 Thesetting also applies to job interview decisions (Cowgill, 2018; Li et al., 2020), where the choice C P t0, 1u is
whether to interview an applicant and the outcome Y ˚ P t0, 1u is whether the applicant is ultimately hired by the firm.

15
is some measure of on-the-job productivity, which may be length of tenure since turnover is costly. The

characteristics pW, Xq are various information about the applicant such as demographics, education level

and prior work history. The researcher observes the characteristics of each applicant, whether the manager

hired the applicant, and their length of tenure only if hired. The manager’s conditional hiring rate π1 pw, xq

and the conditional distribution of tenure lengths among hired applicants PY˚ py˚ | 1, w, xq are observed.

The distribution of tenure lengths among rejected applicants PY˚ py˚ | 0, w, xq is unobserved. ▲

Other examples of a screening decision include loan approvals (e.g., Fuster et al., 2018; Dobbie

et al., 2020; Blattner and Nelson, 2021; Coston et al., 2021), child welfare screenings (Chouldechova

et al., 2018), and disability insurance screenings (e.g., Benitez-Silva et al., 2004; Low and Pistaferri,

2015, 2019).

1.2.3 Expected Utility Maximization Behavior

I examine the restrictions imposed on the decision maker’s choices by expected utility maximiza-

tion. I define the two main ingredients of the expected utility maximization model. A utility

function summarizes the decision maker’s payoffs over choices, outcomes, and characteristics.

Private information is some additional random variable V P V that summarizes all additional

information that is available to the decision maker but unobserved by the researcher.

Definition 1.2.1. A utility function U : t0, 1u ˆ Y 2 ˆ W Ñ R specifies the payoff associated with each

choice-outcome pair, where Upc, ⃗y; wq is the payoff associated with choice c and potential outcome vector ⃗y

at characteristics w P W . Let U denote the feasible set of utility functions specified by the researcher.

Definition 1.2.2. The decision maker’s private information is a random variable V P V .

Under the model, the decision maker observes the characteristics pW, Xq as well as some

private information V P V prior to selecting a choice. In empirical settings, researchers often

worry that the decision maker observes additional private information that is not recorded in

the observable data. Doctors may learn useful information about the patient’s current health in

an exam. Judges may learn useful information about defendants from courtroom interactions.

But these interactions are often not recorded. Since it is unobservable to the researcher, I explore

restrictions on the decision maker’s behavior without placing distributional assumptions on their

private information.

16
Based on this information set, the decision maker forms beliefs about the unknown outcome

and selects a choice to maximize expected utility. The expected utility maximization model is

summarized by a joint distribution over the characteristics, private information, choices and

potential outcomes, denoted by pW, X, C, V, ⃗


Yq „ Q.

Definition 1.2.3. The decision maker’s choices are consistent with expected utility maximization if there

exists a utility function U P U and joint distribution pW, X, V, C, ⃗


Yq „ Q satisfying

i. Expected Utility Maximization: For all c P t0, 1u, c1 ‰ c, pw, x, vq P W ˆ X ˆ V such that

Qpc | w, x, vq ą 0,
” ı ” ı
EQ Upc, ⃗
Y; Wq | W “ w, X “ x, V “ v ě EQ Upc1 , ⃗
Y; Wq | W “ w, X “ x, V “ v .

K⃗
ii. Information Set: C K Y | W, X, V under Q.

iii. Data Consistency: For all pw, xq P W ˆ X , there exists Pr⃗Y p ¨ | 0, w, xq P B0,w,x and Pr⃗Y p ¨ | 1, w, xq P

B1,w,x satisfying for all ⃗y P Y 2 and c P t0, 1u

Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.

Definition 1.2.4. The identified set of utility functions, denoted by H P pU; B q Ď U , is the set of utility

functions U P U such that there exists a joint distribution pW, X, V, C, ⃗


Yq „ Q satisfying Definition 1.2.3.

In words, the decision maker’s choices are consistent with expected utility maximization if

three conditions are satisfied. First, if the decision maker selects a choice c with positive probability

given W “ w, X “ x, V “ v under the model Q, then it must have been optimal to do so ("Expected

Utility Maximization"). The decision maker may flexibly randomize across choices whenever they

are indifferent. Second, the decision maker’s choices must be independent of the outcome given

the characteristics and private information under the model Q ("Information Set"), formalizing

the sense in which the decision maker’s information set consists of only pW, X, Vq.18 Finally, the

joint distribution of characteristics, choices and outcomes under the model Q must be consistent

with the observable joint distribution P (“Data Consistency”).

18 The“Information Set” condition is related to sensitivity analyses in causal inference that assumes there is some
unobserved confounder such that the decision is only unconfounded conditional on both the observable characteristics
and the unobserved confounder (e.g., Rosenbaum, 2002; Imbens, 2003; Kallus and Zhou, 2018; Yadlowsky et al., 2020).
See Supplement A.7.1 for further discussion of this connection.

17
The key restriction is that only the characteristics W P W directly enter the decision maker’s

utility function. The decision maker’s utility function satisfies an exclusion restriction on their

private information V P V and the characteristics X. The private information V P V and

characteristics X P X may only affect their beliefs. In medical treatment, the utility function

specifies the doctor’s payoffs from treating a patient given their potential health outcomes.

Researchers commonly assume that a doctor’s payoffs are constant across patients, and patient

characteristics only affect beliefs about potential health outcomes under the treatment (e.g.,

Chandra and Staiger, 2007, 2020).19 In pretrial release, the utility function specifies a judge’s

relative payoffs from detaining a defendant that would not commit pretrial misconduct and

releasing a defendant that would commit pretrial misconduct. These payoffs may vary based

on only some defendant characteristics W. For example, the judge may engage in tasted-based

discrimination against black defendants (Becker, 1957; Arnold et al., 2018, 2020b), be more lenient

towards younger defendants (Stevenson and Doleac, 2019), or be more harsh towards defendants

charged with violent crimes (Kleinberg et al., 2018a).

Since this is a substantive economic assumption, I discuss three ways to specify such exclusion

restrictions on the decision maker’s utility function. First, as mentioned, exclusion restrictions on

the decision maker’s utility function are common in existing empirical research. The researcher

may therefore appeal to established modelling choices to guide this assumption. Second, the

exclusion restriction may be normatively motivated, summarizing social or legal restrictions on

what observable characteristics ought not to directly enter the decision maker’s utility function.

Third, the researcher may conduct a sensitivity analysis and report how their conclusions vary

as the choice of payoff-relevant characteristics varies. Such a sensitivity analysis summarizes

how flexible the decision maker’s utility function must be across observable characteristics to

rationalize choices.

If Definition 1.2.3 is satisfied, then the decision maker’s implied beliefs about the outcome

given the observable characteristics under the expected utility maximization model, denoted

by Q⃗Y p ¨ | w, xq P ∆pY 2 q, lie in the identified set for the distribution of the outcome given the

19 Similarly, in medical testing and diagnosis decisions, researchers assume that a doctor’s preferences are constant

across patients, and patient characteristics only affect beliefs about the probability of an underlying medical condition
(e.g., Abaluck et al., 2016; Chan et al., 2021; Mullainathan and Obermeyer, 2021).

18
observable characteristics. This is an immediate consequence of Data Consistency in Definition

1.2.3.

Lemma 1.2.1. If the decision maker’s choices are consistent with expected utility maximization, then

Q⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q for all pw, xq P W ˆ X .

Therefore, if the decision maker’s choices are consistent with expected utility maximization, then

their implied beliefs Q⃗Y p ¨ | w, xq must be “accurate” in this sense. Conversely, if the decision

maker’s choices are inconsistent with expected utility maximization, then there is no configuration

of utility function and private information such that their choices would maximize expected utility

given any implied beliefs in the identified set for the distribution of the outcome conditional on

the characteristics. In this case, their implied beliefs are systematically mistaken.

Definition 1.2.5. The decision maker is making detectable prediction mistakes based on the observable

characteristics if their choices are inconsistent with expected utility maximization, meaning H P pU; B q “ H.

I refer to this as a “detectable prediction mistake” as the interpretation of a prediction mistake

under Definitions 1.2.3-1.2.5 is tied to both the researcher-specified bounds on the missing data

B0,w,x , B1,w,x (Assumption 1.2.1) and the feasible set of utility functions U (Definition 1.2.1). Less
informative bounds on the missing data imply that expected utility maximization places fewer

restrictions on behavior as there are more candidate values of the missing choice-dependent

potential outcome probabilities that may rationalize choices. Observed behavior that is con-

sistent with expected utility maximization at bounds B0,w,x , B1,w,x may, in fact, be inconsistent

with expected utility maximization at alternative, tighter bounds Br0,w,x , Br1,w,x .20 Analogously, a

larger feasible set of utility functions U implies that expected utility maximization places fewer

restrictions on behavior as the researcher is entertaining a larger set of utility functions that may

rationalize choices. Definition 1.2.5 must therefore be interpreted as a prediction mistake that can

be detected given the researcher’s assumptions on both the missing data and the feasible set of

utility functions.

20 Consider an extreme case in which P⃗Y p ¨ | w, xq is partially identified under bounds B0,w,x , B1,w,x but point
identified under alternative bounds Br0,w,x , Br1,w,x . Under Definitions 1.2.3-1.2.5, a detectable prediction mistake at
bounds Br0,w,x , Br1,w,x means that the decision maker’s implied beliefs Q⃗Y p ¨ | w, xq do not equal the point identified
quantity P⃗Y p ¨ | w, xq, yet a detectable prediction mistake at bounds B0,w,x , B1,w,x means that the decision maker’s
implied beliefs Q⃗Y p ¨ | w, xq do not lie in the identified set H P pP⃗Y p ¨ | w, xq; Bw,x q.

19
Remark 1.2.1. The expected utility maximization model relates to recent developments on Roy-style selec-

tion (Mourifie et al., 2019; Henry et al., 2020) and marginal outcome tests for taste-based discrimination

(Bohren et al., 2020; Canay et al., 2020; Gelbach, 2021; Hull, 2021). Defining the expected benefit functions
” ı ” ı
Λ0 pw, x, vq “ EQ Up0, ⃗ Y; Wq | W “ w, X “ x, V “ v , Λ1 pw, x, vq “ EQ Up1, ⃗ Y; Wq | W “ w, X “ x, V “ v ,

the expected utility maximization model is a generalized Roy model that imposes that the observable char-

acteristics W P W enter into the utility function and affect beliefs, whereas the observable characteristics

X P X and private information V P V only affect beliefs. The expected utility maximization model also

takes no stand on how the decision maker resolves indifferences, and so it is an incomplete econometric

model of decision making in the spirit of Tamer (2003). ■

1.2.4 Characterization Result

The decision maker’s choices are consistent with expected utility maximization if and only if

there exists a utility function U P U and values of the missing data that satisfy a series of revealed

preference inequalities.

Theorem 1.2.1. The decision maker’s choices are consistent with expected utility maximization if and

only if there exists a utility function U P U and Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all

pw, xq P W ˆ X satisfying
” ı ” ı
EQ Upc, ⃗
Y; Wq | C “ c, W “ w, X “ x ě EQ Upc1 , ⃗
Y; Wq | C “ c, W “ w, X “ x . (1.1)

for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where the joint distribution

pW, X, C, ⃗
Yq „ Q is given by Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.

Corollary 1.2.1. The identified set of utility functions H P pU; B q is the set of all utility functions U P U

such that there exists Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X satisfying

(1.1)

Theorem 1.2.1 provides a necessary and sufficient characterization of expected utility maximization

behavior that only involves the data and the bounds on the choice-dependent potential outcome

probabilities. Importantly, the characterization no longer depends on the decision maker’s private

information, which allows me to derive interpretable conditions on behavior in decision problems

20
of interest and statistically test whether these inequalities are satisfied. In the next section, I use

this result to analyze the large class of screening decisions.

The key insight is that checking whether observed choices are consistent with expected utility

maximization is an information design problem (Bergemann and Morris, 2019; Kamenica, 2019).

Observed choices are consistent with expected utility maximization at utility function Upc, ⃗y; wq if

and only if, at each pw, xq P W ˆ X , an information designer could induce a decision maker with

accurate beliefs about the potential outcomes given the characteristics (i.e., some Q⃗Y p ¨ | w, xq P

HpP⃗Y p ¨ | w, xq; Bw,x q) to take the observed choices by providing additional information to them via
some information structure. The information structure is the decision maker’s private information

under the expected utility maximization model. Due to the missing data problem, I must check

whether the information designer could induce the observed choices at any accurate beliefs in

the identified set HpP⃗Y p ¨ | w, xq; Bw,x q. This is possible if and only if the revealed preference

inequalities (1.1) are satisfied. In this sense, Theorem 1.2.1 builds on the “no-improving action

switches” inequalities, which were originally derived by Caplin and Martin (2015) to analyze

choice behavior over state-dependent lotteries. The potential outcome vector and characteristics

of each individual can be interpreted as a payoff-relevant state that is partially observed by the

decision maker. The decision maker’s treatment choice can therefore be interpreted as a choice

between state-dependent lotteries over her utility payoffs Upc, ⃗y; wq. By incorporating the missing

data problem, I tractably characterize expected utility maximization in treatment assignment

problems.

The proof of sufficiency for Theorem 1.2.1 shows that if the revealed preference inequalities (1.1)

are satisfied, then a joint distribution pW, X, V, C, ⃗


Yq „ Q can be constructed under the expected

utility maximization model that satisfies Data Consistency (Definition 1.2.3).21 I construct a

likelihood function for the private information QV p ¨ | ⃗y, w, xq such that if the decision maker

finds it optimal to select choice C “ 0, C “ 1, then the decision maker’s posterior beliefs Q⃗Y p ¨ |

w, x, vq equal the choice-dependent potential outcome probabilities Pr⃗Y p ¨ | 0, w, xq, P⃗Y p ¨ | 1, w, xq

21 While its proof is constructive, Theorem 1.2.1 can also be established through Bergemann and Morris (2016)’s
equivalence result between the set of Bayesian Nash Equilibrium and the set of Bayes Correlated Equilibrium in
incomplete information games, where the potential outcome vector ⃗ Y and the characteristics pW, Xq are the state, the
initial information structure is the null information structure, the private information V is the augmenting signal
structure, and applying Data Consistency (Definition 1.2.3) on the equilibrium conditions.

21
respectively. By this construction, these choice-dependent potential outcome probabilities are

Bayes-plausible posterior beliefs with respect to some conditional distribution for the potential

outcomes given the characteristics in the identified set H P pP⃗Y p ¨ | w, xq; Bw,x q. The revealed

preference inequalities (1.1) imply that this construction satisfies Expected Utility Maximization,

and additional work remains to show that it also satisfies Data Consistency (Definition 1.2.3). In

this sense, the choice-dependent potential outcome probabilities summarize the decision maker’s

posterior beliefs under any distribution of private information. The researcher’s assumptions

about the missing data therefore restrict the possible informativeness of the decision maker’s

private information.22

1.3 Testing Expected Utility Maximization in Screening Decisions

In this section, I apply Theorem 1.2.1 to characterize the testable implications of expected utility

maximization in screening decisions with a binary outcome, such as pretrial release and medical

testing, under various assumptions on the decision maker’s utility function and the missing data

problem. Testing these restrictions is equivalent to testing many moment inequalities, and I discuss

how supervised machine learning based methods may be used to reduce the dimensionality of

this testing problem.

1.3.1 Characterization in Screening Decisions

In a screening decision, the potential outcome under the decision maker’s choice C “ 0 satisfies

Y0 ” 0 and Y1 :“ Y ˚ is a latent outcome that is revealed whenever the decision maker selected

choice C “ 1. For exposition, I further assume that the latent outcome is binary Y “ t0, 1u as in

the motivating applications of pretrial release and medical testing. Appendix A.3.1 extends these

results to treatment decisions with a scalar outcome.

Focusing on a screening decision with a binary outcome simplifies the setting in Section

1.2. The bounds on the choice-dependent latent outcome probabilities given choice C “ 0 are

an interval for the conditional probability of Y ˚ “ 1 given choice C “ 0 with B0,w,x “ rPY˚ p1 |

22 InSupplement A.7.1, I formally show that the researchers’ bounds on the missing data (Assumption 1.2.1) restrict
the average informativeness of the decision maker’s private information in a screening decision.

22
0, w, xq, PY˚ p1 | 0, w, xqs for all pw, xq P W ˆ X . The bounds given choice C “ 1 are the point

identified conditional probability of Y ˚ “ 1 given choice C “ 1 with B1,w,x “ tPY˚ p1 | 1, w, xqu for

all pw, xq P W ˆ X . Finally, it is without loss of generality to normalize two entries of the utility

function Upc, y˚ ; wq, and so I normalize Up0, 1; wq “ 0, Up1, 0; wq “ 0 for all w P W .23

I derive conditions under which the decision maker’s choices are consistent with expected

utility maximization at strict preferences, meaning the decision maker is assumed to strictly prefer

a unique choice at each latent outcome. This rules out trivial cases such as complete indifference.

Definition 1.3.1 (Strict Preferences). The utility functions U P U satisfy strict preferences if Up0, 0; wq ă

0 and Up1, 1; wq ă 0 for all w P W .

In the pretrial release example, focusing on strict preference utility functions means that the

researcher is willing to assume it is always costly for the judge to either detain a defendant (C “ 0)

that would not commit pretrial misconduct (Y ˚ “ 0) or release a defendant (C “ 1) that would

commit pretrial misconduct (Y ˚ “ 1).

By applying Theorem 1.2.1, I characterize the conditions under which the decision maker’s

choices in a screening decision with a binary outcome are consistent with expected utility

maximization at some strict preference utility function and private information. For each w P W ,

define X 1 pwq :“ tx P X : π1 pw, xq ą 0u and X 0 pwq :“ tx P X : π0 pw, xq ą 0u.

Theorem 1.3.1. Consider a screening decision with a binary outcome. Assume PY˚ p1 | 1, w, xq ă 1 for

all pw, xq P W ˆ X with π1 pw, xq ą 0. The decision maker’s choices are consistent with expected utility

maximization at some strict preference utility function if and only if, for all w P W ,

max PY˚ p1 | 1, w, xq ď min PY˚ p1 | 0, w, xq.


xPX 1 pwq xPX 0 pwq

Otherwise, H P pU; B q “ H, and the decision maker is making detectable prediction mistakes based on the

observable characteristics.

Corollary 1.3.1. The identified set of strict preference utility functions H P pU; B q equals the set of all

23 In some settings, it may be more natural to instead normalize Up0, 0; wq “ 0, Up1, 1; wq “ 0.

23
utility functions satisfying, for all w P W , Up0, 0; wq ă 0, Up1, 1; wq ă 0 and

Up0, 0; wq
max PY˚ p1 | 1, w, xq ď ď min PY˚ p1 | 0, w, xq.
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq xPX 0 pwq

In a screening decision with a binary outcome, expected utility maximization at strict preferences

requires the decision maker to make choices according to an incomplete threshold rule based

on their posterior beliefs given the characteristics and private information. The threshold only

depends on the characteristics W, and it is incomplete since it takes no stand on how possible

indifferences are resolved. Theorem 1.2.1 establishes that the choice-dependent latent outcome

probabilities summarize all possible posterior beliefs under the expected utility maximization

model. Applying an incomplete threshold rule to posterior beliefs under the expected utility

maximization model is therefore observationally equivalent to applying a threshold rule to the

choice-dependent latent outcome probabilities. Theorem 1.3.1 formalizes this argument, and

checks whether there exists some value of the unobservable choice-dependent latent outcome

probabilities that are consistent with the researcher’s bounds (Assumption 1.2.1) and would

reproduce the decision maker’s observed choices under such a threshold rule.

If the conditions in Theorem 1.3.1 are violated, there exists no strict preference utility function

nor private information such that the decision maker’s choices are consistent with expected utility

maximization at accurate beliefs. By examining cases in which these conditions are satisfied, I

next characterize conditions under which we cannot identify whether the decision maker makes

detectable prediction mistakes based on the characteristics.

First, Theorem 1.3.1 highlights the necessity of placing an exclusion restriction on which

observable characteristics directly affect the decision maker’s utility function. If all observable

characteristics directly affect the decision maker’s utility function (i.e., X “ H), then the decision

maker’s choices are consistent with expected utility maximization whenever the researcher

assumes the decision maker observes useful private information.

Corollary 1.3.2. Under the same conditions as Theorem 1.3.1, suppose X “ H and all observable

characteristics therefore directly affect the decision maker’s utility function. If PY˚ p1 | 1, wq ď PY˚ p1 | 0, wq

for all w P W , then the decision maker’s choices are consistent with expected utility maximization at some

strict preference utility function.

24
This negative result arises because a characteristic-dependent threshold can always be constructed

that rationalizes the decision maker’s observed choices if the researcher’s assumptions allow the

probability of Y ˚ “ 1 given C “ 0 to be at least as large as the observed probability of Y ˚ “ 1 given

C “ 1 for all characteristics. In this case, the decision maker’s observed choices are consistent with

expected utility maximization at a strict preference utility function that varies richly across the

characteristics w P W . If the researcher suspects that the decision maker observes useful private

information, then imposing an exclusion restriction that some observable characteristics do not

directly enter into the utility function is necessary for identifying prediction mistakes.

Unfortunately, imposing such an exclusion restriction is not alone sufficient to restore the

testability of expected utility maximization. The researcher must still address the missing data

problem.

Corollary 1.3.3. Under the same conditions as Theorem 1.3.1, if PY˚ p1 | 0, w, xq “ 1 for all pw, xq P

W ˆ X , then the decision maker’s choices are consistent with expected utility maximization at some strict
preferences.

Without informative bounds on the unobservable choice-dependent latent outcome probabilities,

the decision maker’s choices may always be rationalized by the extreme case in which the decision

maker’s private information is perfectly predictive of the unknown outcome (i.e., PY˚ p1 | 0, w, xq “

1 for all pw, xq P W ˆ X ).

Corollaries 1.3.2-1.3.3 highlight that testing expected utility maximization, and therefore

detectable prediction mistakes, requires both behavioral assumptions on which characteristics may

directly affect the decision maker’s utility function and econometric assumptions that generate

informative bounds on the unobservable choice-dependent latent outcome probabilities. Under

such assumptions, Theorem 1.3.1 provides interpretable conditions to test for detectable prediction

mistakes. At any fixed w P W , does there exist some x P X such that the largest possible

probability of Y ˚ “ 1 given C “ 0 is strictly lower than the observed probability of outcome Y ˚ “ 1

given C “ 1 at some other x1 P X ? If so, then the decision maker cannot be maximizing expected

utility as the decision maker could do strictly better by raising their probability of selecting choice

C “ 0 at x1 and lowering their probability of selecting choice C “ 1 at x. Theorem 1.3.1 shows that

these “misranking” arguments are necessary and sufficient to test the joint null hypothesis that

25
the decision maker’s choices are consistent with expected utility maximization at accurate beliefs

and their preferences satisfy the conjectured exclusion restriction.

Example 1.3.1 (Pretrial Release). Theorem 1.3.1 requires that, holding fixed the defendant characteristics

that directly affect the judge’s utility function, all detained defendant must have a higher worst-case

probability of committing pretrial misconduct than any released defendant. Suppose the researcher assumes

that the judge may engage in taste-based discrimination based on the defendant race W, but the judge’s

utility function is unaffected by remaining defendant characteristics X (Arnold et al., 2018, 2020b). To test

for detectable prediction mistakes, the researcher must check, among defendants of the same race, whether

there exists some group of released defendants with a higher pretrial misconduct than the worst-case pretrial

misconduct rate of some group of detained defendants among defendants. In this case, the judge must be

misranking the probability of pretrial misconduct, and their choices are inconsistent with expected utility

maximization at strict preferences that only depend on defendant race. ▲

Remark 1.3.1 (Connection to Marginal Outcome Tests and Inframarginality). In a screening decision

with a binary outcome, my analysis complements recent results in Canay et al. (2020), Gelbach (2021),

and Hull (2021), which use an extended Roy model to explore the validity of marginal outcome tests for

taste-based discrimination. This literature exploits the continuity of private information to derive testable

implications of extended Roy selection in terms of underlying marginal treatment effect functions, which

requires that the researcher identify the conditional expectation of the outcome at each possible “marginal”

decision. In contrast, Theorem 1.3.1 involves only point identified and partially identified choice-dependent

latent outcome probabilities. Common sources of quasi-experimental variation lead to informative bounds

on these quantities without additional assumptions such as monotonicity or functional form restrictions

needed to estimate marginal treatment effect curves. ■

Remark 1.3.2 (Approximate Expected Utility Maximization). This identification analysis assumes

that the decision maker fully maximizes expected utility. The decision maker, however, may face cognitive

constraints that prevent them from doing so completely. Appendix A.3.2 considers a generalization in which

the decision maker “approximately” maximizes expected utility, meaning that the decision maker’s choices

must only be within ϵw ě 0 of optimal. The decision maker is boundedly rational in this sense, and selects a

choice to “satisfice” expected utility (Simon, 1955, 1956). Expected utility maximization (Definition 1.2.3)

is the special case with ϵw “ 0, and any behavior is rationalizable as approximately maximizing expected

26
utility for ϵw sufficiently large. The smallest, rationalizing parameter ϵw ě 0 summarizes how far from

optimal are the decision maker’s observed choices, and I show that researchers can conduct inference on this

object. ■

1.3.2 Constructing Bounds on the Missing Data with an Instrument

Suppose there is a randomly assigned instrument that generates variation in the decision maker’s

choice probabilities in a screening decision. Such instruments commonly arise, for example,

through the random assignment of decision makers to individuals. Judges are randomly assigned

to defendants in pretrial release (Kling, 2006; Dobbie et al., 2018; Arnold et al., 2018; Kleinberg

et al., 2018a; Arnold et al., 2020b) and doctors may be randomly assigned to patients in medical

testing (Abaluck et al., 2016; Chan et al., 2021).24

Assumption 1.3.1 (Random Instrument). Let Z P Z be a finite support instrument, and the joint

distribution pW, X, Z, C, Y ˚ q „ P satisfies pW, X, Y ˚ q KK Z and PpW “ w, X “ x, Z “ zq ą 0 for all

pw, x, zq P W ˆ X ˆ Z .

The researcher observes the joint distribution pW, X, Z, C, Yq „ P, where Y “ C ¨ Y ˚ as before.

The goal is to construct bounds on the unobservable choice-dependent latent outcome probabilities

at each observable characteristic pw, xq P W ˆ X and at a particular value of the instrument z P Z .

In the case where decision makers are randomly assigned to cases, this corresponds to constructing

bounds for a particular decision maker.

Under Assumption 1.3.1, the unobservable choice-dependent latent outcome probabilities are

partially identified, denoting their sharp identified sets as H P pPY˚ p ¨ | 0, w, x, zqq.

Proposition 1.3.1. Suppose Assumption 1.3.1 holds and consider a screening decision with a binary

outcome. Then, for any pw, x, zq P W ˆ X ˆ Z with π0 pw, x, zq ą 0, H P pPY˚ p ¨ | 0, w, x, zqq “ rPY˚ p1 |

0, w, x, zq, PY˚ p1 | 0, w, x, zqs, where


" *
PY˚ p1 | w, xq ´ PC,Y˚ p1, 1 | w, x, zq
PY˚ p1 | 0, w, x, zq “ max ,0 ,
π0 pw, x, zq

24 There are other examples of instruments in empirical research. For example, Mullainathan and Obermeyer (2021)
argue that doctors are less likely to conduct stress tests for a heart attack on Fridays and Saturdays due to weekend
staffing constraints, even though patients that arrive on these days are no less risky. The introduction of or changes to
recommended guidelines may also affect decision makers’ choices (Albright, 2019; Abaluck et al., 2020).

27
" *
PY˚ p1 | w, xq ´ PC,Y˚ p1, 1 | w, x, zq
PY˚ p1 | 0, w, x, zq “ min ,1 ,
π0 pw, x, zq
and PY˚ p1 | w, xq “ maxz̃PZ tPC,Y˚ p1, 1 | w, x, z̃qu, PY˚ p1 | w, xq “ minz̃PZ tπ0 pw, x, z̃q ` PC,Y˚ p1, 1 |

w, x, z̃qu.

The bounds in Proposition 1.3.1 follow from worst-case bounds on PY˚ p1 | w, xq (e.g., Manski,

1989, 1994) and point identification of PC,Y˚ p1, 1 | w, x, zq, π0 pw, x, zq.25,26 For a fixed value z P Z ,

the researcher may therefore apply the identification results derived in Section 1.3.1 by defining

B0,w,x “ H P pPY˚ p ¨ | 0, w, x, zqq under Assumption 1.3.1.


Appendix A.4.1 extends these bounds to allow for the instrument to be quasi-randomly

assigned conditional on some additional characteristics, which will be used in the empirical

application to pretrial release decisions in New York City. Appendix A.4.2 extends these instrument

bounds to treatment decisions.

Under the expected utility maximization model, Assumption 1.3.1 requires that the decision

maker’s beliefs about the latent outcome given the observable characteristics do not depend on

the instrument.

Proposition 1.3.2. Suppose Assumption 1.3.1 holds. If the decision maker’s choices are consistent with

expected utility maximization at some utility function U and joint distribution pW, X, Z, V, C, Y ˚ q „ Q,

then Y ˚ KK Z | W, X under Q.

This is a consequence of Data Consistency in Definition 1.2.3. Requiring that the decision maker’s

beliefs about the outcome given the characteristics be accurate imposes that the instrument cannot

affect their beliefs about the outcome given the observable characteristics if it is randomly assigned.

Aside from this restriction, Assumption 1.3.1 places no further behavioral restrictions on the

expected utility maximization model. Consider the pretrial release setting in which the instrument

arises through the random assignment of judges, meaning Z P Z refers to a judge identifier.

25 An active literature focuses on the concern that the monotonicity assumption in Imbens and Angrist (1994) may

be violated in settings where decision makers are randomly assigned to individuals. de Chaisemartin (2017) and
Frandsen et al. (2019) develop weaker notions of monotonicity for these settings. Proposition 1.3.1 imposes no form of
monotonicity.
26 Lakkaraju et al. (2017) and Kleinberg et al. (2018a) use the random assignment of decision makers to evaluate a
statistical decision rule C̃ by imputing its true positive rate PpY ˚ “ 1 | C̃ “ 1q. In contrast, Proposition 1.3.1 constructs
bounds on a decision maker’s conditional choice-dependent latent outcome probabilities, PY˚ p1 | 0, w, xq.

28
Proposition 1.3.2 implies that if all judges make choices as-if they are maximizing expected utility

at accurate beliefs given defendant characteristics, then all judges must have the same beliefs

about the probability of pretrial misconduct given defendant characteristics. Judges may still

differ from one another in their preferences and private information.

Remark 1.3.3 (Other empirical strategies for constructing bounds). Supplement A.7 discusses two

additional empirical strategies for constructing bounds on the unobservable choice-dependent latent outcome

probabilities. First, researchers may use the observable choice-dependent latent outcome probabilities to

directly bound the unobservable choice-dependent latent outcome probabilities. Second, the researcher may

use a “proxy outcome,” which does not suffer the missing data problem and is correlated with the latent

outcome. For example, researchers often use future health outcomes as a proxy for whether patients had a

particular underlying condition at the time of the medical testing or diagnostic decision (Chan et al., 2021;

Mullainathan and Obermeyer, 2021). ■

1.3.3 Reduction to Moment Inequalities

Testing whether the decision maker’s choices in a screening decision with a binary outcome are

consistent with expected utility maximization at strict preferences (Theorem 1.3.1) reduces to

testing many moment inequalities.

Proposition 1.3.3. Consider a screening decision with a binary outcome. Suppose Assumption 1.3.1

holds, and 0 ă πc pw, x, zq ă 1 for all pw, x, zq P W ˆ X ˆ Z . The decision maker’s choices at z P Z are

consistent with expected utility maximization at some strict preference utility function if and only if for all

w P W , pairs x, x̃ P X and z̃ P Z

PY˚ p1 | 1, w, x, zq ´ PY˚ ,z̃ p1 | 0, w, x̃, zq ď 0,

π0 pw,x,z̃q`PC,Y˚ p1,1|w,x,z̃q PC,Y˚ p1,1|w,x,zq


where PY˚ ,z̃ p1 | 0, w, x, zq “ π0 pw,x,zq ´ π0 pw,x,zq .

The number of moment inequalities is equal to dw ¨ d2x ¨ pdz ´ 1q, and grows with the number

of support points of the characteristics and instruments. In empirical applications, this will be

quite large since the characteristics of individuals are both finite-valued and extremely rich. A

key challenge, therefore, in directly testing this sharp characterization is that the number of

observations in each cell of characteristics pw, xq P W ˆ X can be extremely small.

29
To deal with this challenge, the number of moment inequalities may be reduced by testing

implied revealed preference inequalities over any partition of the excluded characteristics. For

each w P W , define Dw : X Ñ t1, . . . , Nd u to be some function that partitions the support of

the excluded characteristics x P X into level sets tx : Dw pxq “ du. By iterated expectations, if

the decision maker’s choices are consistent with expected utility maximization at some strict

preference utility function, then their choices must satisfy implied revealed preference inequalities.

Define PY˚ py˚ | c, w, dq :“ PY˚ pY ˚ “ y˚ | C “ c, W “ w, Dw pXq “ dq and πc pw, dq :“ PpC “ c |

W “ w, Dw pXq “ dq.

Corollary 1.3.4. Consider a screening decision with a binary outcome, and assume PY˚ p1 | 1, w, xq ă 1 for

all pw, xq P W ˆ X with π1 pw, xq ą 0. If the decision maker’s choices are consistent with expected utility

maximization at some strict preference utility function, then for all w P W

max PY˚ p1 | 1, w, dq ď min PY˚ p1 | 0, w, dq,


dPD 1 pwq xPD 0 pwq

where D 1 pwq :“ td : π1 pw, dq ą 0u and D 0 pwq :“ td : π0 pw, dq ą 0u.

In practice, this can drastically reduce the number of moment inequalities that must be tested.

If Nd ! d x , researchers may instead test implied revealed preference inequalities in Corollary

1.3.4 using procedures that are valid in low-dimensional settings, which is a mature literature in

econometrics (Canay and Shaikh, 2017; Ho and Rosen, 2017; Molinari, 2020).

A natural choice is to construct the partitioning functions Dw p¨q using supervised machine

learning methods that predict the outcome on a set of held-out decisions. Suppose the researcher

estimates the prediction function fˆ : W ˆ X Ñ r0, 1s on a set of held-out decisions. Given the

estimated prediction function, the researcher may define Dw pxq by binning the characteristics

X into percentiles of predicted risk within each value w P W . The resulting implied revealed

preference inequalities search for misrankings in the decision maker’s choices across percentiles

of predicted risk. Alternatively, there may already exist a benchmark risk score. In pretrial

release systems, the widely-used Public Safety Assessment summarizes observable defendant

characteristics into an integer-valued risk score (e.g., Stevenson, 2018; Albright, 2019). In medical

testing or treatment decisions, commonly used risk assessments summarize observable patient

characteristics into an integer-valued risk score (e.g., Obermeyer and Emanuel, 2016; Lakkaraju and

30
Rudin, 2017). The researcher may therefore define the partition Dw pxq to be level sets associated

with this existing risk score.

Corollary 1.3.4 clarifies how supervised machine learning methods can be used to formally

test for prediction mistakes in the large class of screening decisions. In existing empirical work,

researchers directly compare the recommended choices of an estimated decision rule against the

choices of a human decision maker. See, for example, Meehl (1954), Dawes et al. (1989), Grove

et al. (2000) in psychology, Kleinberg et al. (2018a), Ribers and Ullrich (2019), Mullainathan and

Obermeyer (2021) in economics, and Chouldechova et al. (2018), Jung et al. (2020a) in computer

science. Such comparisons rely on strong assumptions that restrict preferences to be constant

across decisions and decision makers (Kleinberg et al., 2018a; Mullainathan and Obermeyer, 2021)

or that observed choices were as-good-as randomly assigned given the characteristics (Lakkaraju

and Rudin, 2017; Chouldechova et al., 2018; Jung et al., 2020a). In contrast, I show that estimated

prediction models should not be used for such direct comparisons, but instead be used to construct

low-dimensional partitions of the characteristics that are assumed to not directly affect preferences.

Given such a partition, checking whether the implied revealed preference inequalities in Corollary

1.3.4 are satisfied provides, to the best of my knowledge, the first microfounded procedure for

using out-of-sample prediction to test whether the decision maker’s choices are consistent with

expected utility maximization at accurate beliefs.

Appendix A.4.2 extends these results to treatment decisions, showing that testing whether the

decision maker’s choices are consistent with expected utility maximization in a treatment decision

reduces to testing a system of moment inequalities with nuisance parameters that enter linearly.

1.4 Bounding Prediction Mistakes based on Characteristics

So far, I have shown that researchers can test whether a decision maker’s choices are consistent

with expected utility maximization at accurate beliefs about the outcome, and therefore whether

the decision maker is making detectable prediction mistakes. By modifying the expected utility

maximization model and the revealed preference inequalities, researchers can further investigate

the ways in which the decision maker’s predictions are systematically biased.

31
1.4.1 Expected Utility Maximization at Inaccurate Beliefs

The definition of expected utility maximization (Definition 1.2.3) implied that the decision maker

acted as-if their implied beliefs about the outcome given the characteristics were accurate (Lemma

1.2.1). As a result, the revealed preference inequalities may be violated if the decision maker

acted as-if they maximized expected utility based on inaccurate beliefs, meaning that their implied

beliefs do not lie in the identified set for the distribution of the outcome given the characteristics

H P pP⃗Y p ¨ | w, xq; Bw,x q. This is a common behavioral hypothesis in empirical applications. Empiri-
cal researchers conjecture that judges may systematically mis-predict failure to appear risk based

on defendant characteristics, and the same concern arises in analyses of medical diagnosis and

treatment decisions.27

To investigate whether the decision maker’s choices maximize expected utility at inaccurate

beliefs, I modify “Data Consistency” in Definition 1.2.3.

Definition 1.4.1. The decision maker’s choices are consistent with expected utility maximization at

inaccurate beliefs if there exists some utility function U P U and joint distribution pW, X, V, C, ⃗
Yq „ Q

satisfying (i) Expected Utility Maximization, (ii) Information Set, and

iii. Data Consistency with Inaccurate Beliefs: For all pw, xq P W ˆ X , there exists Pr⃗Y p ¨ | 0, w, xq P

B0,w,x and Pr⃗Y p ¨ | 1, w, xq P B1,w,x such that for all ⃗y P Y 2 and c P t0, 1u

QC pc | ⃗y, w, xq Pr⃗Y p⃗y | w, xqQpw, xq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq,

where Pr⃗Y p⃗y | w, xq “ Pr⃗Y p⃗y | 0, w, xqπ0 pw, xq ` Pr⃗Y p⃗y | 1, w, xqπ1 pw, xq.

Definition 1.4.1 requires that the joint distribution pW, X, V, C, ⃗


Yq „ Q under the model matches

the joint distribution of the observable data pW, X, C, Yq „ P if the decision maker’s model-implied

beliefs given the characteristics, Q⃗Y p ¨ | w, xq, are replaced with some marginal distribution of the

outcome given the characteristics that lies in the identified set, Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q.

This imposes that the decision maker is acting as-if they correctly specified the likelihood of

their private information V | ⃗


Y, W, X. Their prediction mistakes only arise from their inaccurate

27 Kleinberg et al. (2018a) write, “a primary source of error is that all quintiles of judges misuse the signal available
in defendant characteristics available in our data” (pg. 282-283). In the medical treatment setting, Currie and Macleod
(2017) write, “we are concerned with doctors, who for a variety of possible reasons, do not make the best use of the
publicly available information at their disposal to make good decisions” (pg. 5).

32
prior beliefs about the outcome given the characteristics. Notice that Definition 1.4.1 places

no restrictions on the decision maker’s implied prior beliefs Q⃗Y p ¨ | w, xq, so behavior that is

consistent with expected utility maximization at inaccurate beliefs could arise from several

behavioral hypotheses. The decision maker’s implied prior beliefs may, for instance, be inaccurate

due to inattention to the characteristics (e.g., Sims, 2003; Gabaix, 2014; Caplin and Dean, 2015) or

use of representativeness heuristics (e.g., Gennaioli and Shleifer, 2010; Bordalo et al., 2016, 2021).

The next result characterizes whether the decision maker’s observed choices are consistent

with expected utility maximization at inaccurate beliefs in terms of modified revealed preference

inequalities.

Theorem 1.4.1. Assume Pr⃗Y p ¨ | w, xq ą 0 for all Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q and all

pw, xq P W ˆ X . The decision maker’s choices are consistent with expected utility maximization at

inaccurate beliefs if and only if there exists a utility function U P U , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and

Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X , and non-negative weights ωp⃗y; w, xq satisfying

i. For all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0, c1 ‰ c


” ı
EPr ωp⃗
Y; W, XqUpc, ⃗
Y; Wq | C “ c, W “ w, X “ x ě
” ı
EPr ωp⃗
Y; W, XqUpc1 , ⃗
Y; Wq | C “ c, W “ w, X “ x

ii. For all pw, xq P W ˆ X , EPrrωp⃗


Y; W, Xq | W “ w, X “ xs “ 1

where EPrr¨s is the expectation under the joint distribution under pW, X, C, ⃗
Yq „ Pr defined as Ppw,
r x, c, ⃗yq “

Pr⃗Y p⃗y | c, w, xqPpc, w, xq.

These modified inequalities ask whether revealed preference inequalities are satisfied at a

reweighed utility function, where the weights ωp⃗y; w, xq are the likelihood ratio of the decision

maker’s implied prior beliefs relative to some conditional distribution of the potential outcomes

given the characteristics in the identified set. Since the decision maker’s prediction mistakes only

arise from misspecification of prior beliefs Q⃗Y p ¨ | w, xq, her posterior beliefs under the model are

proportional to the likelihood ratio between her prior beliefs and the underlying potential outcome

distribution. These likelihood ratio weights can be interpreted as a modification of the decision

maker’s utility function since expected utility is linear in beliefs and utility. This suggests that

33
expected utility maximization at inaccurate beliefs is equivalent to expected utility maximization at

accurate beliefs and preferences that are summarized by this reweighed utility function. Theorem

1.4.1 shows that this intuition completely characterizes expected utility maximization at inaccurate

beliefs.

1.4.2 Bounding Prediction Mistakes in Screening Decisions

Theorem 1.4.1 implies that researchers can bound the extent to which the decision maker’s prior

beliefs given the characteristics overreact or underreact to variation in the characteristics in a

screening decision with a binary outcome.28 This enables researchers to report interpretable

bounds on the extent to which the decision maker’s predictions are systematically biased.

As a first step, Theorem 1.4.1 implies bounds on the decision maker’s reweighted utility

function ωpy˚ ; w, xqUpc, y˚ ; wq in a screening decision with a binary outcome.

Theorem 1.4.2. Consider a binary screening decision, and assume 0 ă π1 pw, xq ă 1 for all pw, xq P

W ˆ X . Suppose the decision maker’s choices are consistent with expected utility maximization at inaccurate
beliefs and some strict preference utility function Upc, y˚ ; wq at PrY˚ p ¨ | w, xq P H P pPY˚ p ¨ | w, xq; Bw,x q

satisfying 0 ă PrY˚ p1 | w, xq ă 1 for all pw, xq P W ˆ X . Then, there exists non-negative weights

ωpy˚ ; w, xq ě 0 satisfying for all pw, xq P W ˆ X

ωp0; w, xqUp0, 0; wq
PY˚ p1 | 1, w, xq ď ď PY˚ p1 | 0, w, xq, (1.2)
ωp0; w, xqUp0, 0; wq ` ωp1; w, xqUp1, 1; wq

where ωpy˚ ; w, xq “ QY˚ py˚ | w, x, q{ PrY˚ py˚ | w, xq and QY˚ py˚ | w, xq, PrY˚ py˚ | w, xq are defined in

Definition 1.4.1.

That is, in a screening decision with a binary outcome, expected utility maximization at

inaccurate beliefs is observationally equivalent to an incomplete threshold rule based on the

choice-dependent latent outcome probabilities, where the threshold now depends on the decision

maker’s reweighed utility function. This result may be exploited to derive an identified set on the

extent to which the decision maker overreacts or underreacts to variation in the characteristics.

28 These results generalize to multi-valued outcomes over the class of binary-valued utility functions. That is, for some

known Ỹ Ď Y , define Ỹ ˚ “ 1tY ˚ P Ỹ u. The class of two-valued utility functions take the form upc, y˚ ; wq :“ upc, ỹ; wq
and satisfy strict preferences. Over this class of utility functions, the decision maker faces a screening decision with a
binary outcome.

34
QY˚ p1|w,xq{QY˚ p0|w,xq
Define δpw, xq :“ to be the relative odds ratio of the unknown outcome
PrY˚ p1|w,xq{ PrY˚ p0|w,xq

under the decision maker’s implied beliefs relative to the true conditional distribution, and
ωp0;w,xqUp0,0;wq
τpw, xq :“ ωp0;w,xqUp0,0;wq`ωp1;w,xqUp1,1;wq to be the decision maker’s reweighed utility threshold. If

the reweighed utility threshold were known, then the decision maker’s implied prediction mistake

could be backed out.

Corollary 1.4.1. Under the same conditions as Theorem 1.4.2,

p1 ´ τpw, xqq{τpw, xq δpw, xq


1 1
“ (1.3)
p1 ´ τpw, x qq{τpw, x q δpw, x1 q

or any w P W , x, x1 P X .

δpw,xq
The ratio δpw,x1 q summarizes the extent to which the decision maker’s implied beliefs about

the outcome overreact or underreact to variation in the characteristics relative to the true condi-
δpw,xq QY˚ p1|w,xq{QY˚ p0|w,xq
tional distribution. In particular, δpw,x1 q may be rewritten as the ratio of QY˚ p1|w,x1 q{QY˚ p0|w,x1 q to
PrY˚ p1|w,xq{ PrY˚ p0|w,xq
. The first term summarizes how the odds ratio of Y ˚ “ 1 relative to Y ˚ “ 0
PrY˚ p1|w,x1 q{ PrY˚ p0|w,x1 q

varies across pw, xq and pw, x1 q under the decision maker’s implied beliefs and the second term
δpw,xq
summarizes how the true odds ratio varies across the same characteristics. If δpw,x1 q is less than

one, then the decision maker’s implied beliefs about the relative probability of Y ˚ “ 1 versus

Y ˚ “ 0 react less to variation across the characteristics pw, xq and pw, x1 q than the true distribution.

In this sense, their implied beliefs are underreacting across these characteristics. Analogously if
δpw,xq
δpw,x1 q is strictly greater than one, then the decision maker’s implied beliefs about the relative

probability of Y ˚ “ 1 versus Y ˚ “ 0 are overreacting across the characteristics pw, xq and pw, x1 q.

Since Theorem 1.4.2 provides an identified set for the reweighted utility thresholds, an identified
δpw,xq
set for the implied prediction mistake δpw,x1 q can in turn be constructed by computing the ratio

(1.3) for each pair τpw, xq, τpw, x1 q that satisfies (1.2).
δpw,xq
The implied prediction mistake δpw,x1 q summarizes how relative changes in the decision maker’s

beliefs across the characteristics pw, xq, pw, x1 q compare to relative changes in the underlying

distribution of outcomes. A particular value of the implied prediction mistake could therefore be

consistent with multiple values of the decision maker’s perceptions of risk QY˚ p1 | w, xq, QY˚ p1 |

w, x1 q. As an example, suppose that PrY˚ p1 | w, xq “ 4{5, PrY˚ p1 | w, x1 q “ 1{5 and the decision

maker’s perceptions of baseline risk are QY˚ p1 | w, xq “ 2{3, QY˚ p1 | w, x1 q “ 1{3. The true odds

35
PrY˚ p1|w,xq PrY˚ p1|w,x1 q
ratio of Y ˚ “ 1 relative to Y ˚ “ 0 at the characteristics are “ 4, “ 1{4. The
Pr ˚ p0|w,xq
Y Pr ˚ p0|w,x1 q
Y
QY˚ p1|w,xq QY ˚p1|w,x1 q
decision maker’s perceived odds ratio are QY˚ p0|w,xq “ 2, QY˚ p0|w,x1 q “ 1{2. In this case, the decision
δpw,xq
maker’s implied prediction mistake δpw,x1 q equals 1{4. If instead the decision maker’s perceptions

of baseline risk were QY˚ p1 | w, xq “ 3{4, QY˚ p1 | w, x1 q “ 3{7, then the decision maker’s perceived

odds ratios would equal 3, 3{4 at characteristics pw, xq, pw, x1 q respectively. Even though the

decision maker’s perceptions of baseline risk are different, the implied prediction mistake again

equals 1{4. In both cases, the decision maker’s perceptions across characteristics pw, xq, pw, x1 q

underreact relative to the true variation in risk. The true odds ratio at pw, xq is 16 times larger

than the true odds ratio at pw, x1 q, but the decision maker perceives it to be only 4 times larger.

Finally, these bounds on the implied prediction mistake are obtained only by assuming that

the decision maker’s utility function satisfy the researcher’s conjectured exclusion restriction

and strict preferences. Under such an exclusion restriction, variation in the decision maker’s

choices across excluded characteristics must only arise due to variation in beliefs. Examining

how choice-dependent latent outcome probabilities vary across characteristics relative to the

decision maker’s choices is therefore informative about the decision maker’s systematic prediction

mistakes. In this sense, preference exclusion restrictions are sufficient to partially identify the

extent to which variation in the decision maker’s implied beliefs are biased.29

After applying the dimension reduction strategy in Section 1.3.3, the implied prediction

mistake across values Dw pXq “ d, Dw pXq “ d1 , denoted by δpw, dq{δpw, d1 q, measures how the

decision maker’s implied beliefs of their own ex-post mistakes vary relative to the true probability

of ex-post mistakes across values Dw pXq “ d, Dw pXq “ d1 . See Appendix A.3.3 for details.

1.5 Do Judges Make Prediction Mistakes?

As an empirical illustration, I apply this econometric framework to analyze the pretrial release

decisions of judges in New York City, which is a leading example of a high-stakes screening

29 Thisresult relates to Proposition 1 in Martin and Marx (2021), which shows that utilities and prior beliefs are
not separately identified in state-dependent stochastic choice environments (see also Bohren et al. (2020)). Their result
arises because the authors focus on settings in which there are no additional characteristics of decisions beyond those
which affect both utility and beliefs.

36
decision.30 I find that at least 20% of judges in New York City make detectable prediction mistakes

in their pretrial release decisions. Under various exclusion restrictions on their utility functions,

their pretrial release decisions are inconsistent with expected utility maximization at accurate

beliefs about failure to appear risk given defendant characteristics. These systematic prediction

mistakes arise because judges fail to distinguish between predictably low risk and predictably

high risk defendants. Rejections of expected utility maximization at accurate beliefs are driven

primarily by release decisions on defendants at the tails of the predicted risk distribution.

1.5.1 Pretrial Release Decisions in New York City

I focus on the pretrial release system in New York City, which has been previously studied in

Leslie and Pope (2017), Kleinberg et al. (2018a) and Arnold et al. (2020b). The New York City

pretrial system is an ideal setting to apply this econometric framework in three important ways.

First, as discussed in Kleinberg et al. (2018a), the New York City pretrial system narrowly asks

judges to only consider failure to appear risk, not new criminal activity or public safety risk,

in deciding whether to release a defendant. The latent outcome Y ˚ is therefore whether the

defendant would fail to appear in court if released. Section 1.5.4 reports the robustness of my

empirical findings to other choices of outcome Y ˚ . Second, the pretrial release system in New

York City is one of the largest in the country, and consequently I observe many judges making a

large number of pretrial release decisions. Finally, judges in New York City are quasi-randomly

assigned to cases within court-by-time cells, which implies bounds on the conditional failure to

appear rate among detained defendants.

1.5.2 Data and Summary Statistics

I observe the universe of all arrests made in New York City between November 1, 2008 and

November 1, 2013. This contains information on 1,460,462 cases, of which 758,027 cases were

subject to a pretrial release decision.31 I apply additional sample restrictions to construct the main

30 Because the data are sensitive and only available through an official data sharing agreement with the New York
court system, I conducted this empirical analysis in conjunction with the University of Chicago Crime Lab (Rambachan
and Ludwig, 2021).
31 To
construct the set of cases that were subject to a pretrial release decision, I apply the sample restrictions used in
Kleinberg et al. (2018a). This removes (i) removes desk appearance tickets, (ii) cases that were disposed at arraignment,

37
estimation sample, which consists of 569,256 cases heard by 265 unique judges.32,33 My empirical

analysis tests whether each of the top 25 judges that heard the most cases make detectable

prediction mistakes about failure to appear risk in their pretrial release decisions. These top 25

judges heard 243,118 cases in the main estimation sample. Each judge heard at least 5,000 cases in

total (see Supplement Figure A.8).

For each case, I observe demographic information about the defendant such as their race,

gender, and age, information about the current charge filed against the defendant, the defendant’s

criminal record, and the defendant’s record of pretrial misconduct. I observe a unique identifier

for the judge assigned to each defendant, and whether the assigned judge released or detained

the defendant.34 If the defendant was released, I observe whether the defendant either failed to

appear in court or was re-arrested for a new crime.

Supplement Table A.2 provides descriptive statistics about the main estimation sample and

the cases heard by the top 25 judges, broken out by the race of the defendant. Overall, 72.0% of

defendants are released in the main estimation sample, whereas 73.6% of defendants assigned

to the top 25 judges were released. Defendants in the main estimation sample are similar on

demographic information and current charge information to defendants assigned to the top 25

judges. However, defendants assigned to the top 25 judges have less extensive prior criminal

records. Supplement Table A.3 reports the same descriptive statistics broken out by whether the

defendant was released or detained, revealing that judges respond to defendant characteristics in

their release decisions. Among defendants assigned to the top 25 judges, released and detained

and (iii) cases that were adjourned in contemplation of dismissal as well as duplicate cases.
32 Thefollowing cases are excluded: (i) cases involving non-white and non-black defendants; (ii) cases assigned to
judges with fewer than 100 cases; and (iii) cases heard in a court-by-time cell in which there were fewer than 100 cases
or only one unique judge, where a court-by-time cell is defined at the assigned courtroom by shift by day of week by
month by year level.
33 Supplement Tables A.5-A.6 compare the main estimation sample to the universe of 758,027 cases that were subject

to a pretrial release decision, broken out by the race of the defendant and by whether the defendant was released.
Cases in the main estimation sample have more severe charges and a lower release rate than the universe of cases
subject to a pretrial release decision.
34 Judges in New York City decide whether to release defendants without conditions (“release on recognizance”),
require that defendants post a chosen amount of bail, or deny bail altogether. Following prior empirical work on the
pretrial release setting (e.g., Arnold et al., 2018; Kleinberg et al., 2018a; Arnold et al., 2020b), I collapse these choices into
the binary decision of whether to release (either release on recognizance or set a bail among that the defendant pays) or
detain (either set a bail amount that the defendant does not pay or deny bail altogether). I report the robustness of my
findings to this choice in Section 1.5.4.

38
defendants differ demographically: 50.8% of released defendants are white and 19.7% are female,

whereas only 40.7% of detained defendants are white and only 10.6% are female. Released and

detained defendants also differ on their current charge and criminal record. For example, only

28.8% of defendants released by the top 25 judges face a felony charge, yet 58.6% of detained

defendants face a felony charge.

1.5.3 Empirical Implementation

I test whether the observed release decisions of judges in New York City maximize expected utility

at accurate beliefs about failure to appear risk given defendant characteristics as well as some

private information under various exclusion restrictions. I test whether the revealed preference

inequalities are satisfied assuming that either (i) no defendant characteristics, (ii) the defendant’s

race, (iii) the defendant’s race and age, or (iv) the defendant’s race and whether the defendant

was charged with a felony offense directly affect the judges’ utility function. I discretize age into

young and older defendants, where older defendants are those older than 25 years.

Constructing the Prediction Function

As a first step, I construct a partition of the excluded characteristics X P X to reduce the number

of moment inequalities as described in Section 1.3.3. I predict failure to appear Y ˚ P t0, 1u among

defendants released by all other judges within each value of the payoff-relevant characteristics

W P W , which are defined as either race-by-age cells or race-by-felony charge cells. The prediction

function is an ensemble that averages the predictions of an elastic net model and a random

forest.35 Over defendants released by the top 25 judges, the ensemble model achieves an area

under the receiver operating characteristic (ROC) curve, or AUC, of 0.693 when the payoff-relevant

characteristics are defined as race-by-age cells and an AUC of 0.694 when the payoff relevant

characteristics are defined as race-by-felony charge cells (see Supplement Figure A.9). Both

ensemble models achieve similar performance on released black and white defendants.

35 I
use three-fold cross-validation to tune the penalties for the elastic net model. The random forest is constructed
using the R package ranger at the default hyperparameter values (Wright and Ziegler, 2017).

39
Constructing Bounds through the Quasi-Random Assignment of Judges

Judges in New York City are quasi-randomly assigned to cases within court-by-time cells defined

at the assigned courtroom by shift by day of week by month by year level.36 To verify quasi-

random assignment, I conduct balance checks that regress a measure of judge leniency on a

rich set of defendant characteristics as well as court-by-time fixed effects that control for the

level at which judges are as-if randomly assigned to cases. I measure judge leniency using the

leave-one-out release rate among all other defendants assigned to a particular judge (Dobbie et al.,

2018; Arnold et al., 2018, 2020b). I conduct these balance checks pooling together all defendants

and separately within each payoff-relevant characteristic cell (defined by race-by-age cells or

race-by-felony charge cells), reporting the coefficient estimates in Supplement Tables A.7-A.9. In

each subsample, the estimated coefficients are economically small in magnitude. A joint F-test

fails to reject the null hypothesis of quasi-random assignment for the pooled main estimation

sample and for all subsamples, except for young black defendants.

I use the quasi-random assignment of judges to construct bounds on the unobservable failure to

appear rate among defendants detained by each judge in the top 25. I group judges into quintiles

of leniency based on the constructed leniency measure, and define the instrument Z P Z to be the

leniency quintile of the assigned judge. Applying the results in Appendix A.4.1, the bound on

the failure to appear rate among defendants with W “ w, Dw pXq “ d for a particular judge using

leniency quintile z̃ P Z depends on quantities ErPpC “ 1, Y ˚ “ 1 | W “ w, Dw pXq “ d, Z “ z̃, Tq |

W “ w, Dw pXq “ ds and ErPpC “ 0 | W “ w, Dw pXq “ d, Z “ z̃, Tq | W “ w, Dw pXq “ ds, where

T P T denotes the court-by-time cells and the expectation averages over all cases assigned to this

particular judge. I model the conditional probabilities PpC “ 1, Y ˚ “ 1 | W “ w, Dw pxq, Z “ z, T “

tq and PpC “ 0 | W “ w, Dw pxq, Z “ z, T “ tq as

c,y˚
ÿ
1tC “ 1, Y ˚ “ 1u “ β w,d,z 1tW “ w, Dw pXq “ d, Z “ zu ` ϕt ` ϵ (1.4)
w,d,z
ÿ
1tC “ 0u “ βcw,d,z 1tW “ w, Dw pXq “ d, Z “ zu ` ϕt ` ν, (1.5)
w,d,z

36 There are two relevant features of the pretrial release system in New York City that suggest judges are as-if

randomly assigned to cases (Leslie and Pope, 2017; Kleinberg et al., 2018a; Arnold et al., 2020b). First, bail judges are
assigned to shifts in each of the five county courthouses in New York City based on a rotation calendar system. Second,
there is limited scope for public defenders or prosecutors to influence which judge will decide any particular case.

40
over all cases in the main estimation sample, where ϕt are court-by-time fixed effects.37 I estimate
c,y˚
the relevant quantities by taking the estimated coefficients β̂cw,d,z̃ , β̂ w,d,z̃ and adding it to the average

of the respective fixed effects associated with cases heard by the judge within each W “ w,

Dw pXq “ d cell.

Figure 1.1: Observed failure to appear rate among released defendants and constructed bound on
the failure to appear rate among detained defendants by race-and-age cells for one judge in New
York City.

Notes: This figure plots the observed failure to appear rate among released defendants (orange, circles) and the bounds
on the failure to appear rate among detained defendants based on the judge leniency instrument (blue, triangles)
at each decile of predicted failure to appear risk and race-by-age cell for the judge that heard the most cases in the
main estimation sample. The judge leniency instrument Z P Z is defined as the assigned judge’s quintile of the
constructed, leave-one-out leniency measure. Judges in New York City are quasi-randomly assigned to defendants
within court-by-time cells. The bounds on the failure to appear rate among detained defendants (blue, triangles) are
constructed using the most lenient quintile of judges, and by applying the instrument bounds for a quasi-random
instrument (see Appendix A.4.1). Section 1.5.3 describes the estimation details for these bounds. Source: Rambachan
and Ludwig (2021).

Figure 1.1 plots the observed failure to appear rate among defendants released by the judge

that heard the most cases in the sample period, as well as the bounds on the failure to appear

rate among detained defendants associated with the most lenient quintile of judges. The bounds

are plotted at each decile of predicted risk for each race-by-age cell. Testing whether this judge’s

pretrial release decisions are consistent with expected utility maximization at accurate beliefs

37 Inestimating these fixed effects regressions, I follow the empirical specification in Arnold et al. (2020b), who
estimate analogous linear regressions to construct estimates of race-specific release rates and pretrial misconduct rates
among released defendants.

41
about failure to appear risk involves checking whether, holding fixed characteristics that directly

affect preferences, all released defendants have a lower observed probability of failing to appear in

court (orange, circles) than the estimated upper bound on the failure to appear rate of all detained

defendants (blue, triangles).

Appendix Figure A.1 plots the analogous quantities for each race-by-felony cell. Supplement

A.9 reports findings using an alternative empirical strategy for constructing bounds on the

conditional failure to appear rate among detained defendants, which constructs bounds on the

conditional failure to appear rate among detained defendants using the observed failure to appear

rate among released defendants.

1.5.4 What Fraction of Judges Make Systematic Prediction Mistakes?

By constructing the observed failure to appear rate among released defendants and bounds on

the failure to appear rate among detained defendants as in Figure 1.1 for each judge in the top 25,

I test whether their release decisions satisfy the implied revealed preference inequalities across

deciles of predicted failure to appear risk (Corollary 1.3.4). I test the moment inequalities that

compare the observed failure to appear rate among released defendants in the top half of the

predicted failure to appear risk distribution against the bounds on the failure to appear rate

among detained defendants in the bottom half of the predicted failure to appear risk distribution.

The number of true rejections of these implied revealed preference inequalities provides a lower

bound on the number of judges whose choices are inconsistent with the joint null hypothesis

that they are maximizing expected utility at accurate beliefs about failure to appear risk and their

utility functions satisfy the specified exclusion restriction.

I construct the variance-covariance matrix of the observed failure to appear rates among

released defendants and the bounds on the failure to appear rate among detained defendants

using the empirical bootstrap conditional on the payoff-relevant characteristics W, predicted

risk decile Dw pXq and leniency instrument Z. I use the conditional least-favorable hybrid test

developed in Andrews et al. (2019) since it is computationally fast given estimates of the moments

and the variance-covariance matrix and has desirable power properties.

Table 1.1 summarizes the results from testing the implied revealed preference inequalities

42
Table 1.1: Estimated lower bound on the fraction of judges whose release decisions are inconsistent
with expected utility maximization at accurate beliefs about failure to appear risk given defendant
characteristics.

Utility Functions Upc, y˚ ; wq


No Characteristics Race Race ` Age Race ` Felony Charge
Unadjusted Rejection Rate 48% 48% 48% 56%
Adjusted Rejection Rate 24% 24% 20% 32%
Notes: This table summarizes the results for testing whether the release decisions of each judge in the top 25 are
consistent with expected utility maximization at strict preference utility functions Upc, y˚ ; wq that (i) do not depend on
any defendant characteristics, (ii) depend on the defendant’s race, (iii) depend on both the defendant’s race and age,
and (iv) depend on both the defendant’s race and whether the defendant was charged with a felony offense. Bounds
on the failure to appear rate among detained defendants are constructed using the judge leniency instrument (see
Section 1.5.3). The unadjusted rejection rate reports the fraction of judges in the top 25 whose pretrial release decisions
violate the moment inequalities in Corollary 1.3.4 at the 5% level using the conditional least-favorable hybrid test (see
Section 1.5.4). The adjusted rejection rate reports the fraction of rejections after correcting for multiple hypothesis
testing using the Holm-Bonferroni step down procedure, which controls the family-wise error rate at the 5% level.
Source: Rambachan and Ludwig (2021).

for each judge in the top 25 under various exclusion restrictions. After correcting for multiple

hypothesis testing (controlling the family-wise error rate at the 5% level), the implied revealed

preference inequalities are rejected for at least 20% of judges. This provides a lower bound on

the number of true rejections of the implied revealed preference inequalities among the top 25

judges.38 For example, when both race and age are allowed to directly affect judges’ preferences,

violations of the implied revealed preference inequalities means that the observed release decisions

could not have been generated by any possible discrimination based on the defendant’s race and

age nor variation in private information across defendants.

Kleinberg et al. (2018a) analyze whether judges in New York City make prediction mistakes

in their pretrial release decisions by directly comparing the choices of all judges against those

that would be made by an estimated, machine learning based decision rule (i.e., what the authors

call a “reranking policy”). By comparing the choices of all judges against the model, Kleinberg

et al. (2018a) is limited to making statements about average decision making across judges under

two additional assumptions: first, that judges’ preferences do not vary based on observable

38 The number of rejections returned by a procedure that controls the family-wise error rate at the α-level provides
a valid 1 ´ α lower bound on the number of true rejections. Given j “ 1, . . . , m null hypotheses, let k be the
number of false null hypotheses and let k̂ be the number of rejections by a procedure that controls the family-wise
error rate at the α-level. Observe that Ppk̂ ď kq “ 1 ´ Ppk̂ ą kq. Since tk̂ ą ku Ď tat least one false rejectionu,
Ppk̂ ą kq ď Ppat least one false rejectionq, which implies Ppk̂ ď kq ě 1 ´ Ppat least one false rejectionq ě 1 ´ α.

43
characteristics, and second, that preferences do not vary across judges. In contrast, I conduct my

analysis judge-by-judge, allow each judge’s utility function to flexibly vary based on defendant

characteristics, and allow for unrestricted heterogeneity in utility functions across judges.

Extensions to Baseline Empirical Implementation

Robustness to the Pretrial Misconduct Outcome: I defined the outcome Y ˚ P t0, 1u to be

whether a defendant would fail to appear in court if they were released. If this is incorrectly

specified and a judge’s preferences depend on some other definition of pretrial misconduct, then

rejections of expected utility maximization may be driven by mis-specification of the outcome. For

example, even though the New York City pretrial release system explicitly asks judges to only

consider failure to appear risk, judges may additionally base their release decisions on whether a

defendant would be arrested for a new crime (Arnold et al., 2018; Kleinberg et al., 2018a; Arnold

et al., 2020b). I reconduct my empirical analysis defining the outcome to be whether the defendant

would either fail to appear in court or be re-arrested for a new crime (“any pre-trial misconduct”).

Appendix Table A.1 shows that, for this alternative definition of pretrial misconduct, the pretrial

release decisions of at least 64% of judges in New York City are inconsistent with expected utility

maximization at preferences that satisfy these conjectured exclusion restrictions and accurate

beliefs about any pretrial misconduct risk given defendant characteristics.

Robustness to Pretrial Release Definition: My empirical analysis collapsed the pretrial release

decision into a binary choice of either to release or detain a defendant. In practice, judges in New

York City decide whether to release a defendant “on recognizance,” meaning the defendant is

released automatically without bail conditions, or set monetary bail conditions. Consequently,

judges could be making two distinct prediction mistakes: first, judges may be systematically

mis-predicting failure to appear risk; and second, judges may be systematically mis-predicting the

ability of defendants to post a specified bail amount.

In Supplement A.9.4, I redefine the judge’s choice to be whether to release the defendant on

recognizance and the outcome as both whether the defendant will pay the specified bail amount

and whether the defendant would fail to appear in court if released. Since the modified outcome

takes on multiple values, I use Theorem 1.2.1 to show that expected utility maximization at

44
accurate beliefs about bail payment ability and failure to appear risk can again characterized as a

system of moment inequalities (Proposition A.9.1). I find that at least 32% of judges in New York

City make decisions that are inconsistent with expected utility maximization at accurate beliefs

about the ability of defendants to post a specified bail amount and failure to appear risk given

defendant characteristics.

1.5.5 Bounding Prediction Mistakes based on Defendant Characteristics

Given that a large fraction of judges make pretrial release decisions that are inconsistent with

expected utility maximization at accurate beliefs about failure to appear risk, I next apply the

identification results in Section 1.4.2 to bound the extent to which these judges’ implied beliefs

overreact or underreact to predictable variation in failure to appear risk based on defendant

characteristics. For each judge whose choices violate the implied revealed preference inequalities,

I construct a 95% confidence interval for their implied prediction mistakes δpw, dq{δpw, d1 q between

the top decile d and bottom decile d1 of the predicted failure to appear risk distribution. To do so,

I first construct a 95% joint confidence set for the reweighted utility thresholds τpw, dq, τpw, d1 q at

the bottom and top deciles of the predicted failure to appear risk distribution using test inversion

based on Theorem 1.4.2. I then construct a 95% confidence interval for the implied prediction
p1´τpw,dqq{τpw,dq
mistake by calculating p1´τpw,d1 qq{τpw,d1 q for each pair τpw, dq, τpw, d1 q of values that lie in the joint

confidence set as in Corollary 1.4.1.

Figure 1.2 plots the constructed confidence intervals for the implied prediction mistakes

δpw, dq{δpw, d1 q for each judge over the race-and-age cells. Whenever informative, the confidence

intervals highlighted in orange lie everywhere below one, indicating that these judges’ are acting

as-if their implied beliefs about failure to appear risk underreact to predictable variation in failure

to appear risk. That is, these judges are acting as-if they perceive the change in failure to appear

risk between defendants in the top decile and bottom decile of predicted risk to be lass then true

change in failure to appear risk across these defendants. This could be consistent with judges

“over-regularizing” how their implicit predictions of failure to appear risk respond to variation in

the characteristics across these extreme defendants, and may therefore be suggestive of some form

inattention (Gabaix, 2014; Caplin and Dean, 2015; Gabaix, 2019). Developing formal tests for these

45
Figure 1.2: Estimated bounds on implied prediction mistakes between lowest and highest predicted
failure to appear risk deciles made by judges within each race-by-age cell.

Notes: This figure plots the 95% confidence interval on the implied prediction mistake δpw, dq{δpw, d1 q between the top
decile d and bottom decile d1 of the predicted failure to appear risk distribution for each judge in the top 25 whose
pretrial release decisions violated the implied revealed preference inequalities (Table 1.1) and each race-by-age cell. The
implied prediction mistake δpw, dq{δpw, d1 q measures the degree to which judges’ beliefs underreact or overreact to
variation in failure to appear risk. When informative, the confidence intervals highlighted in orange show that judges
under-react to predictable variation in failure to appear risk from the highest to the lowest decile of predicted failure to
appear risk (i.e., the estimated bounds lie below one). These confidence intervals are constructed by first constructing a
95% joint confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the
moment inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated
with each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on
the implied prediction mistake and Section 1.5.5 for the estimation details. Source: Rambachan and Ludwig (2021).

behavioral mechanisms in empirical settings like pretrial release is beyond the scope of this paper.

Analogous estimates for race-and-felony charge cells are summarized in Figure A.2. Supple-

ment A.9 shows that judges’ implied beliefs underreact to variation in the latent outcome using

alternative bounds on the missing data and alternatively defining the latent outcome to be any

pretrial misconduct.

1.5.6 Which Decisions Violate Expected Utility Maximization?

As a final step to investigate why the release decisions of judges in New York City are inconsistent

with expected utility maximization at accurate beliefs, I also report the cells of defendants on

which the maximum studentized violation of the revealed preference inequalities in Corollary

46
1.3.4 occurs. This shows which defendants are associated with the largest violations of the revealed

preference inequalities.

Table 1.2: Location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization at
accurate beliefs about failure to appear risk given defendant characteristics.

Utility Functions Upc, y; wq


Race and Age Race and Felony Charge
Unadjusted Rejection Rate 48% 56%

White Defendants
Middle Deciles 0% 0%
Tail Deciles 25% 7.14%
Black Defendants
Middle Deciles 0% 0%
Tail Deciles 75% 92.85%
Notes: This table summarizes the location of the maximum studentized violation of revealed preference inequalities in
Corollary 1.3.4 among judges whose release decisions are inconsistent with expected utility maximization at accurate
beliefs and utility functions that depend on both the defendant’s race and age as well as the defendant’s race and
whether the defendant was charged with a felony. Bounds on the failure to appear rate among detained defendants are
constructed using the judge leniency instrument (see Section 1.5.3). Among judge’s whose release decision violate
the revealed preference inequalities at the 5% level, I report the fraction of judges for whom the maximal studentized
violation occurs among white and black defendants on tail deciles (deciles 1-2, 9-10) and middle deciles (3-8) of
predicted failure to appear risk. Source: Rambachan and Ludwig (2021).

Among judges whose choices are inconsistent with expected utility maximization at accurate

beliefs, Table 1.2 reports the fraction of judges for whom the maximal studentized violation occurs

over the tails (deciles 1-2, 9-10) or the middle of the predicted failure to appear risk distribution

(deciles 3-8) for black and white defendants respectively. Consistent with the previous findings,

I find all maximal violations of the revealed preference inequalities occur over defendants that

lie in the tails of the predicted risk distribution. Furthermore, the majority occur over decisions

involving black defendants as well. These empirical findings together highlight that detectable

prediction mistakes primarily occur on defendants at the tails of the predicted risk distribution.

1.6 The Effects of Algorithmic Decision-Making

Finally, I illustrate that the preceding behavioral analysis has important policy implications about

the design of algorithmic decision systems by analyzing policy counterfactuals that replace judges

47
with algorithmic decisions rules in the New York City pretrial release setting. As a technical step,

I show that average social welfare under a candidate decision rule is partially identified due to

the missing data problem, and provide simple methods for inference on this quantity.

1.6.1 Social Welfare Under Candidate Decision Rules

Focusing on binary screening decisions, consider a policymaker whose payoffs are summarized

by the social welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. The policymaker evaluates a candidate

decision rule p˚ pw, xq P r0, 1s, which denotes the probability C “ 1 is chosen given W “ w, X “ x.

Due to the missing data problem, expected social welfare under the candidate decision rule is

partially identified. I characterize the identified set of expected social welfare under the candidate

decision rule, and show that testing the null hypothesis that expected social welfare is equal to

some value is equivalent to testing a series of moment inequalities with nuisance parameters

that enter linearly. These results extend to analyzing expected social welfare under the decision

maker’s observed choices (see Appendix A.4.3).

For a candidate decision rule p˚ pw, xq, expected social welfare at pw, xq P W ˆ X is

θpw, xq “ ℓpw, x; p˚ , U ˚ qPY˚ p1 | w, xq ` βpw, x; p˚ , U ˚ q, (1.6)

where ℓpw, x; p˚ , U ˚ q :“ U ˚ p1, 1qp˚ pw, xq ´ U ˚ p0, 0qp1 ´ p˚ pw, xqq and βpw, x; p˚ , U ˚ q :“ U ˚ p0, 0qp1 ´

p˚ pw, xqq. Total expected social welfare equals

ÿ
θpp˚ , U ˚ q “ βpp˚ , U ˚ q ` Ppw, xqℓpw, x; p˚ , U ˚ qPY˚ p1 | w, xq (1.7)
pw,xqPW ˆX
ř
where βpp˚ , U ˚ q :“ pw,xqPW ˆX Ppw, xqβpw, x; p˚ , U ˚ q. This definition of the social welfare func-

tion is strictly utilitarian. It does not incorporate additional fairness considerations that have

received much attention in an influential literature in computer science are particularly important

in the criminal justice system (e.g., see Barocas and Selbst, 2016; Mitchell et al., 2019; Barocas et al.,

2019; Chouldechova and Roth, 2020).39

Since PY˚ p1 | w, xq is partially identified due to the missing data problem, total expected

39 The identification and inference results could be extended to incorporate a social welfare function that varies
across groups defined by the characteristics as in Rambachan et al. (2021), or a penalty that depends on the average
characteristics of the individuals that receive C “ 1 as in Kleinberg et al. (2018b).

48
social welfare is also partially identified and its sharp identified set of total expected welfare is an

interval.

Proposition 1.6.1. Consider a binary screening decision, a policymaker with social welfare function

U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 and a candidate decision rule p˚ pw, xq. The sharp identified set of to-

tal expected social welfare, denoted by H P pθpp˚ , U ˚ q; B q, is an interval with H P pθpp˚ , U ˚ q; B q “


“ ˚ ˚ ‰
θpp , U q, θpp˚ , U ˚ q , where
ÿ
# " min * Ppw, xqℓpw, x; p˚ , U ˚ q PrY˚ p1 | w, xq +
P̃Y˚ p ¨|w,xq :
˚ ˚ ˚ ˚ pw,xqPW ˆX
θpp , U q “ βpp , U q ` pw,xqPW ˆX ,

s.t. P̃Y˚ p ¨ | w, xq P H P pPY˚ p ¨ | w, xq; B0,w,x q @pw, xq P W ˆ X


ÿ
# " max * Ppw, xqℓpw, x; p˚ , U ˚ q PrY˚ p1 | w, xq +
P̃Y˚ p ¨|w,xq : pw,xqPW ˆX
˚ ˚ ˚ ˚
θpp , U q “ βpp , U q ` pw,xqPW ˆX .

s.t. P̃Y˚ p ¨ | w, xq P H P pPY˚ p ¨ | w, xq; B0,w,x q @pw, xq P W ˆ X

The sharp identified set of total expected social welfare under a candidate decision rule is

characterized by the solution to two linear programs. Provided the candidate decision rule and

joint distribution of the characteristics pW, Xq are known, testing the null hypothesis that total

expected social welfare is equal to some candidate value is equivalent to testing a system of

moment inequalities with nuisance parameters that enter linearly.

Proposition 1.6.2. Consider a binary screening decision, a policymaker with social welfare function

U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 and a known candidate decision rule p˚ pw, xq. Conditional on the characteristics

pW, Xq, testing the null hypothesis H0 : θpp˚ , U ˚ q “ θ0 is equivalent to testing whether
¨ ˛
c“0,y˚ “1
´ ¯ ´P
Dδ P Rdw dx ´1 s.t. A
rp¨,1q θ0 ´ ℓ⊺ pp˚ , U ˚ qPc“1,y˚ “1 ´ βpp˚ , U ˚ q ` A ˝ c“0,y˚ “1 ‚,
rp¨,´1q δ ď ˚ ‹
P

˚ “1
where ℓpp˚ , U ˚ q is the dw d x -dimensional vector with elements Ppw, xqℓpw, x; p˚ , U ˚ q, Pc“1,y is the
c“0,y˚ “1 c“0,y˚ “1
dw d x -dimensional vector of moments PC,Y˚ p1, 1 | w, xq, P ,P are the dw d x -dimensional
r is a known matrix.40
vectors of lower and upper bounds on PC,Y˚ p0, 1 | w, xq respectively, and A

A confidence interval for total expected social welfare can then be constructed through test

40 For a matrix B, Bp¨,1q refers to its first column and Bp¨,´1q refers to all columns except its first column.

49
inversion. Testing procedures for moment inequalities with nuisance parameters are available

for high-dimensional settings in Belloni et al. (2018). Andrews et al. (2019) and Cox and Shi

(2020) develop inference procedures that exploit the additional linear structure and are valid in

low-dimensional settings. Using this testing reduction requires that the candidate decision rule be

some known function, which can be achieved by constructing the decision rule on held-out data.

1.6.2 Measuring the Effects of Algorithms in Pretrial Release Decisions

I use these results to compare total expected social welfare under the observed decisions of judges

in New York City against total expected social welfare under counterfactual algorithmic decision

rules. I vary the relative cost of detaining an individual that would not fail to appear in court

U ˚ p0, 0q (i.e., an “unnecessary detention”), normalizing U ˚ p1, 1q “ ´1. For a particular choice of

the social welfare function, I construct an algorithmic decision rule that decides whether to release

individuals based on a prediction of the probability of pretrial misconduct at each possible cell of

payoff relevant characteristics W and each decile of predicted failure to appear risk Dw pXq. The

decision rule is a threshold rule, whose cutoff depends on the particular parametrization of the

social welfare function. Appendix A.3.4 discusses the construction of this decision rule in more

detail.

I construct 95% confidence intervals for total expected social welfare under the algorithmic

decision rule and the judge’s observed released decisions. I report the ratio of worst-case total

expected social welfare under the algorithmic decision rule against worst-case total expected social

welfare under the judge’s observed release decisions, which summarizes the worst-case gain from

replacing the judge’s decisions with the algorithmic decision rule. I conduct this exercise for

each judge over the race-by-age cells, reporting the median, minimum and maximum gain across

judges. Supplement A.9 reports the same results over the race-by-felony charge cells.

Automating Judges Who Make Detectable Prediction Mistakes

I compare the algorithmic decision rule against the release decisions of judges who were found

to make detectable prediction mistakes. Figure 1.3 plots the improvement in worst-case total

expected social welfare under the algorithmic decision rule that fully replaces these judges over

50
all decisions against the observed release decisions of these judges. For most values of the social

welfare function, worst-case total expected social welfare under the algorithmic decision rule is

strictly larger than worst-case total expected social welfare under these judges’ decisions. Recall

these judges primarily made detectable prediction mistakes over defendants in the tails of the

predicted failure to appear risk distribution. Over the remaining defendants, however, their

choices were consistent with expected utility maximization at accurate beliefs about failure to

appear risk. Consequently, the difference in expected social welfare under the algorithmic decision

rule and the judges’ decisions are driven by three forces: first, the algorithmic decision rule

corrects detectable prediction mistakes over the tails of the predicted risk distribution; second, the

algorithmic decision rule corrects possible misalignment between the policymaker’s social welfare

function and judges’ utility function over the remaining defendants; and third, the judges may

observe predictive private information over the remaining defendants that is unavailable to the

algorithmic decision rule.

For social welfare costs of unnecessary detentions ranging over U ˚ p0, 0q P r0.3, 0.8s, the

algorithmic decision rule either leads to no improvement or strictly lowers worst-case expected

total social welfare relative to these judges’ decisions. Figure A.3 plots the improvement in worst-

case total expected social welfare by the race of the defendant, highlighting that these costs are

particularly large over white defendants. At these values, judges’ preferences may be sufficiently

aligned with the policymaker and observe sufficiently predictive private information over the

remaining defendants that it is costly to fully automate their decisions. Figure A.4 compares the

release rates of the algorithmic decision rule against the observed release rates of these judges. The

release rate of the algorithmic decision rule is most similar to the observed release rate of these

judges precisely over the values of social welfare function where the judges’ decisions dominate

the algorithmic decision rule.

The behavioral analysis suggests that it would most valuable to automate these judges’

decisions over defendants that lie in the tails of the predicted failure to appear risk distribution

where they make detectable prediction mistakes. I compare these judges’ observed release

decisions against an algorithmic decision rule that only automates decisions over defendants in

the tails of the predicted failure to appear risk distribution and otherwise defers to the judges’

observed decisions. This is a common way of statistical risk assessments are used in pretrial

51
Figure 1.3: Ratio of total expected social welfare under algorithmic decision rule relative to release
decisions of judges that make detectable prediction mistakes about failure to appear risk.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule that
fully automates decision-making against the observed release decisions of judges who were found to make detectable
prediction mistakes. Worst case total expected social welfare under each decision rule is computed by constructing 95%
confidence intervals for total expected social welfare under the decision rule, and reporting smallest value that lies
in the confidence interval. These decisions rules are constructed and evaluated over race-by-age cells and deciles of
predicted failure to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not
fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges,
and the dashed lines report the minimum and maximum change across judges. See Section 1.6.2 for further details.
Source: Rambachan and Ludwig (2021).

release systems throughout the United States (Stevenson, 2018; Albright, 2019; Dobbie and Yang,

2019). This algorithmic decision rule only corrects the detectable prediction mistakes, and its

welfare effects are plotted in Figure 1.4. The algorithmic decision rule that only corrects prediction

mistakes weakly dominates the observed release decisions of judges, no matter the value of the

social welfare function. For some parametrizations, the algorithmic decision rule leads to 20%

improvements in worst-case social welfare relative to the observed release decisions of these

judges. Removing judicial discretion over cases where detectable prediction mistakes are made

but otherwise deferring to them over all other cases may therefore be a “free lunch.” This provides

a behavioral mechanism for recent machine learning methods that attempt to estimate whether a

decision should be made by an algorithm or deferred to a decision maker (e.g., Madras et al., 2018;

Raghu et al., 2019; Wilder et al., 2020). These results suggest that deciding whether to automate a

52
Figure 1.4: Ratio of total expected social welfare under algorithmic decision rule that corrects
prediction mistakes relative to release decisions of judges that make detectable prediction mistakes
about failure to appear risk.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule that
only replaces the judge on cases in the tails of the predicted failure to appear risk distribution (deciles 1-2, and deciles
9-10) against the judge’s observed release decisions. Worst case total expected social welfare under each decision rule
is computed by constructing 95% confidence intervals for total expected social welfare under the decision rule, and
reporting smallest value that lies in the confidence interval. These decisions rules are constructed and evaluated over
race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the relative social welfare cost of
detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line
plots the median change across judges, and the dashed lines report the minimum and maximum change across judges.
See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).

decision or defer to a decision maker requires understanding whether the decision maker makes

systematic prediction mistakes, and if so on what decisions.

Automating Judges Who Do Not Make Detectable Prediction Mistakes

Figure A.5 reports the welfare effects of automating the release decisions of judges whose choices

were found to be consistent with expected utility maximization at accurate beliefs. Automating

these judge’s release decisions may strictly lower worst-case expected social welfare for a range of

social welfare costs of unnecessary detentions, These judges are making pretrial release decisions

as-if their preferences were sufficiently aligned with the policymaker over these parametrizations

of the social welfare function such that their private information leads to better decisions than the

algorithmic decision rule. Figure A.6 plots the results by defendant race and Figure A.7 compares

53
the release rates of the algorithmic decision rule against the observed release rates of these judges.

Understanding the welfare effects of automating a decision maker whose decisions are consistent

with expected maximization requires assessing the value of their private information against the

degree to which they are misaligned, which is beyond the scope of this paper.41

1.7 Conclusion

This paper develops an econometric framework for testing whether a decision maker makes

prediction mistakes in high stakes, empirical settings such as hiring, medical diagnosis and

treatment, pretrial release, and many others. I characterized expected utility maximization, where

the decision maker maximizes some utility function at accurate beliefs about the outcome given

the observable characteristics of each decision as well as some private information. I developed

tractable statistical tests for whether the decision maker makes systematic prediction mistakes

and methods for conducting inference on the ways in which their predictions are systematically

biased. Analyzing the pretrial release system in New York City, I found that a substantial fraction

of judges make systematic prediction mistakes about failure to appear risk given defendant

characteristics. Finally, I showed how this behavior analysis may inform the design of algorithmic

decision systems by comparing counterfactual social welfare under alternative algorithmic release

rules against the observed release decisions of judges.

This paper highlights that empirical settings, such as pretrial release, medical diagnosis, and

hiring, can serve as rich laboratories for behavioral analysis. I provided a first step by exploring

the empirical content of expected utility maximization, a canonical model of decision making

under uncertainty, in these settings. An exciting direction is to explore the testable implications of

alternative behavioral models such as rational inattention (e.g., Sims, 2003; Gabaix, 2014; Caplin

and Dean, 2015) as well as forms of salience (e.g., Gennaioli and Shleifer, 2010; Bordalo et al.,

2016, 2021). Exploiting the full potential of these empirical settings is an important, policy-

relevant agenda at the intersection of economic theory, applications of machine learning, and

microeconometrics.

41 See Frankel (2021) for a principal-agent analysis of delegating decisions to a misaligned decision maker who

observes additional private information.

54
Chapter 2

When Do Common Time Series


Estimands Have Nonparametric Causal
Meaning?1

2.1 Introduction

In this paper, we introduce the nonparametric, direct potential outcome system as a foundational

framework for analyzing dynamic causal effects of assignments on outcomes in observational time

series settings. We consider settings in which there is a single unit (e.g., macroeconomy or market)

observed over time. At each time period t ě 1, the unit receives a vector of assignments Wt ,

and an associated vector of outcomes Yt are generated. The outcomes are causally related to the

assignments through a potential outcome process, which is a stochastic process that describes what

would be observed along counterfactual assignment paths. A dynamic causal effect is generically

defined as the comparison of the potential outcome process along different assignment paths at a

fixed point in time.

Importantly, we place no functional form restrictions on the potential outcome process, no

1 This chapter is joint work with Neil Shephard. We thank Iavor Bojinov, Gary Chamberlain, Fabrizia Mealli, James

M. Robins, Frank Schorfheide and James H. Stock for conversations that have helped developed our thinking on
causality. We are also grateful to many others, including conference participants, for their comments on earlier versions
of this paper. Rambachan gratefully acknowledges financial support from the NSF Graduate Research Fellowship
under Grant DGE1745303.

55
restrictions on the extent to which past assignments may causally affect outcomes, nor common

time series assumptions such as “invertibility” or “recoverability” on the system of potential

outcomes and assignments. Most leading econometric models used to study dynamic causal

effects in time series settings, such as the structural vector moving average model and (both

linear and non-linear) structural vector autoregressions, can be cast as special cases of the direct

potential outcome system by introducing these additional restrictions on the potential outcome

process or the full system. In this sense, the direct potential outcome system provides a flexible,

nonparametric foundation upon which to analyze dynamic causal effects in time series settings.

We then analyze conditions under which predictive time series estimands, such as the impulse

response function (Sims, 1980), generalized impulse response function (Koop et al., 1996), local

projection (Jordá, 2005) and local projection instrumental variables (Jordá et al., 2015), have a

nonparametric causal interpretation in terms of dynamic causal effects of assignments on outcomes.

That is, under what conditions do these common time series estimands have a nonparametric

causal interpretation as measuring how movements in the outcomes Yt`h for some h ě 0 are

dynamically caused by changes in the assignments Wt ? In exploring this question, we focus on

four data environments, which place alternative assumptions on what output is observed by the

researcher.

First, we analyze a benchmark case in which the researcher directly observes both the outcomes

Yt`h and the assignments Wt generated by the potential outcome system through time. We show

that impulse response functions, local projections, and generalized impulse response functions of

the outcome Yt`h on the assignment Wt identify a dynamic average treatment effect, a weighted

average of marginal average treatment effects, and a filtered average treatment effect respectively

if the assignments Wt are randomly assigned. Random assignment requires that the assignment

must be independent of the potential outcome process (which is familiar from cross-sectional

causal inference) and the assignments must be independent over time. These results provide a

new perspective on a rapidly growing empirical literature in macroeconomics that constructs

measures of underlying economic shocks, and then uses these constructed measures to estimate

dynamic causal effects using reduced form methods. Nakamura and Steinsson (2018b) refers

to this empirical strategy as “direct causal inference.” Our first set of results therefore provide

conditions that these constructed shocks must satisfy in order for the resulting reduced form

56
estimates to be causally interpretable.

Second, we provide a special case of the direct potential outcome system to incorporate

instrument variables Zt that causally affect the assignment Wt but not the outcome Yt`h . Provided

the researcher directly observes the instrument, the assignment, and outcome, it is natural to

consider the causal interpretation of the time-series analogue of common instrumental variable

estimands. We focus attention on dynamic instrument variable estimands that take the ratio of an

impulse response function of the outcome Yt`h on the instrument Zt (a “reduced-form”) relative

to an impulse response function of the assignment Wt`h on the instrument Zt (a “first stage”).

We show that such dynamic instrumental variable estimands identify an appropriately defined

dynamic “local average treatment effect” in the spirit of Imbens and Angrist (1994), provided

the instrument is randomly assigned and satisfies a monotonicity condition that is familiar from

cross-sectional causal inference. The dynamic local average treatment effects that we characterize

measure the average dynamic treatment effect of the assignment on the h-period ahead outcome

conditional on the event that the instrument causally affects the treatment.

We further analyze the case in which the researcher only observes the instrument Zt and

outcome Yt`h but not the assignment Wt itself. This is an important case. Empirical researchers in

macroeconomics increasingly use “external instruments” to identify the dynamic causal effects of

unobservable economic shocks on macroeconomic outcomes (e.g., see Jordá et al., 2015; Gertler and

Karadi, 2015; Nakamura and Steinsson, 2018a; Ramey and Zubairy, 2018; Stock and Watson, 2018;

Plagborg-Møller and Wolf, 2020; Jordá et al., 2020). In this research, it is common for empirical

researchers to analyze estimands that involve two distinct elements of the outcome vector Yj,t`h , Yk,t

and the instrument. For example, given an external instrument Zt for the unobserved monetary

policy shock Wt (e.g., Kuttner, 2001; Cochrane and Piazessi, 2002; Gertler and Karadi, 2015), it is

common to measure the dynamic causal effect of the monetary policy shock on unemployment

Yj,t`h by (1) estimating a reduced-form impulse response function of unemployment on the

external instrument, (2) estimating a “first stage” impulse response function of the Federal Funds

rate Yk,t on the external instrument, (3) report the ratio of these impulse responses (e.g., Jordá

et al., 2015; Stock and Watson, 2018; Jordá et al., 2020).

We show that dynamic IV estimands constructed in this manner are causally interpretable,

and nonparametrically identify a relative, dynamic local average treatment effect that measures the

57
causal response of the h-step ahead outcome Yj,t`h to a change in the treatment Wk,t that raises the

contemporaneous outcome Yk,t by one unit among compliers (i.e., in the monetary policy example,

the dynamic causal effect of unemployment to a monetary policy shock that raises the Federal

funds rate by one unit). This result therefore provides a motivation for the recent interest in

external instruments in empirical macroeconomics — provided one exists, an external instrument

can be used to identify causally interpretable estimands without resorting to functional form

restrictions on the outcomes and without even directly observing the treatment itself.

Finally, we conclude by briefly discussing the most challenging data environment in which

the researcher only observes the outcomes Yt`h , but not the assignments Wt nor any external

instruments Zt . This case is considered by much of the foundational and influential research on

model-based approaches to analyzing dynamic causal effects in macroeconometrics (e.g., Sims,

1972, 1980). We consider this setting in order to place the direct potential outcome system in this

broader context, and illustrate that researchers can recover familiar model-based approaches by

introducing functional form restrictions on the potential outcome process.

Taking a step back, quantifying dynamic causal effects is one of the great themes of the broader

time series literature. Researchers use a variety of methods such as Granger causality (Wiener,

1956; Granger, 1969; White and Lu, 2010), highly structured models such as DSGE models (Herbst

and Schorfheide, 2015), state space modelling (Harvey and Durbin, 1986; Harvey, 1996; Brodersen

et al., 2015; Menchetti and Bojinov, 2021) as well as intervention analysis (Box and Tiao, 1975)

and regression discontinuity (Kuersteiner et al., 2018). The nonparametric potential outcome

framework we develop is distinct. References to some of the more closely related work will be

given in the next section. This paper is not focused on estimators and the associated distribution

theory: we do not have much to say in that regard which is novel.

Roadmap: Section 2.2 defines the direct potential outcome system and introduces the main class

of dynamic causal effects that we focus on throughout the paper. Section 2.3 looks at the causal

meaning of common statistical estimands based on seeing the realized assignments and outcomes.

The instrumented potential outcome system is defined in Section 2.4, which relates assignments

and instruments to outcomes. Section 2.5 studies the causal interpretation of estimands based

on seeing the realized assignments, instruments and outcomes. Section 2.6 looks at the causal

58
meaning of estimands where only the instruments and outcomes are observed. Section 2.7 looks

at the causal meaning of common statistical estimands where only the outcomes are observed. We

collect all proofs in the Appendix.

Notation: For a time series tAt utě1 with At P A for all t ě 1, let A1:t :“ pA1 , . . . , At q and

At :“ ts“1 A. A KK B says that random variables A and B are probabilistically independent.


Ś

2.2 The Direct Potential Outcome System and Dynamic Causal Effects

We now introduce the direct potential outcome system, which extends the design-based approach

developed in Bojinov and Shephard (2019) to stochastic processes. We define a large class of

casual estimands that summarize the dynamic causal effects of varying the assignment on future

outcomes. As an illustration, we show that the direct potential outcome system nests most leading

structural models in macroeconometrics as a special case.

The nonparametric potential outcome framework we develop relates to a vast literature on

dynamic treatment effects in small-T, large-N panels. The panel work of Robins (1986) and

Abbring and Heckman (2007), amongst others, led to an enormous literature on dynamic causal

effects in panel data (Murphy et al., 2001; Murphy, 2003; Heckman and Navarro, 2007; Lechner,

2011; Heckman et al., 2016; Boruvka et al., 2018; Lu et al., 2017; Blackwell and Glynn, 2018; Hernan

and Robins, 2019; Bojinov et al., 2021; Mastakouri et al., 2021). Beyond Bojinov and Shephard

(2019), our work is most closely related to Angrist and Kuersteiner (2011) and Angrist et al. (2018).

We discuss their work in Section 2.2.3.

2.2.1 The Direct Potential Outcome System

There is a single unit. At each time period t ě 1, the unit receives a dw -dimensional assignment

tWt utě1 . Associated with this assignment process, we observe a dy -dimensional outcome tYt utě1 .

The outcomes are causally related to the assignments through the potential outcome process,

which describes what outcome would be observed at time t along a particular path of assignments.

Assumption 2.2.1 (Assignment and Potential Outcome). The assignment process tWt utě1 satisfies

Wt P W :“ dk“1 Wk Ď Rdw . The potential outcome process is, for any deterministic sequence tws usě1
Św

59
with ws P W for all s ě 1, tYt ptws usě1 qutě1 , where the time-t potential outcome satisfies Yt ptws usě1 q P

Y Ď Rd y .

The simplest case is when the assignment is scalar and binary W “ t0, 1u, in which case Wt “ 1

corresponds to “treatment” and Wt “ 0 is “control.”

The potential outcome Yt ptws usě1 q may depend on future assignments tws usět`1 . Our next

assumption rules out this dependence, restricting the potential outcome to only depend on past

and contemporaneous assignments.

Assumption 2.2.2 (Non-anticipating Potential Outcomes). For each t ě 1, and all deterministic

twt utě1 , tw1t utě1 with wt , w1t P W ,

Yt pw1:t , tws usět`1 q “ Yt pw1:t , tw1s usět`1 q almost surely.

Assumption 2.2.2 is a stochastic process analogue of non-interference (Cox, 1958b; Rubin, 1980),

extending White and Kennedy (2009) and Bojinov and Shephard (2019). It still allows for rich

dependence on past and contemporaneous assignments. Under Assumption 2.2.2, we drop

references to the future assignments in the potential outcome process, and write

tYt ptws usě1 qutě1 “ tYt pw1:t qutě1 .

␣ (
The set Yt pw1:t q : w1:t P W t collects all the potential outcomes at time t.

Together, the assignments and potential outcome generate the output of the system.

Assumption 2.2.3 (Output). The output is tWt , Yt utě1 “ tWt , Yt pW1:t qutě1 . The tYt utě1 is called the

outcome process.

The outcome process is the potential outcome process evaluated at the assignment process.

Finally, we assume that the assignment process is sequentially probabilistic, meaning that any

assignment vector may be realized with positive probability at time t given the history of the

observable stochastic processes up to time t ´ 1. Let tFt utě1 denote the natural filtration generated

by (the σ-algebra of) the realized twt , yt utě1 .

Assumption 2.2.4 (Sequentially probabilistic assignment process). The assignment process satisfies

0 ă PpWt “ w | Ft´1 q ă 1 with probability one for all w P W . Here the probabilities are determined by a

filtered probability space of tWt , tYt pw1:t q, w1:t P W t uutě1 .

60
This is the time series analogue of the “overlap” condition in cross-sectional causal studies. We

make this assumption throughout the paper in order to focus attention on the causal interpretation

of common time series estimands in the presence of rich dynamic causal effects. Understanding

how violations of Assumption 2.2.4 affect the causal interpretation and estimation of common

time series estimands is an important but separate issue.

By putting these assumptions together, we define a direct potential outcome system.

Definition 2.2.1 (Direct Potential Outcome System). Any tWt , tYt pw1:t q : w1:t P W t uutě1 satisfying

Assumptions 2.2.1-2.2.4 is a direct potential outcome system.

We refer to Definition 2.2.1 as a “direct” potential outcome system in order to emphasize that it

focuses on nonparametrically modelling the direct causal effects of the assignment process tWt u

on the outcomes tYt u. We do not, however, explicitly allow for the assignment Wt to have a causal

effect on future assignments Ws for s ą t. That is, we do not introduce a potential assignment

Wt pw1:t´1 q which would model the assignment Wt that would be realized along the assignment

path w1:t´1 P W t´1 and would open an indirect, causal mechanism that allows the assignment Wt

to indirectly affect future outcomes through its effect on future assignments.2 The assignment

process tWt u in the direct potential outcome system can still nonetheless have rich dependence.

Assumption 2.2.1 places no restrictions on how Wt , Ws for s ‰ t are probabilistically related.

In focusing on the direct causal effects of assignments on outcomes, we adopt a common

perspective in both macroeconometrics and financial econometrics. In particular, it is common

in macroeconometrics to focus on studying the direct causal effects of underlying economic

“shocks” on outcomes, which are thought to be underlying “random causes” that drive economic

fluctuations and are causally unrelated to one another (Frisch, 1933; Slutzky, 1937; Sims, 1980).

The empirical goal is to therefore trace out the dynamic causal effects of these primitive, economic

shocks tWt u on macroeconomic outcomes tYt u. We refer the readers to Ramey (2016), Stock

and Watson (2016), and Stock and Watson (2018) for recent discussions of this perspective in

macroeconometrics. We further discuss the connections between the assignments in a direct

potential outcome system and economic “shocks” in Section 2.3.

2 Such
indirect causal mechanisms are often studied in a large biostatistics literature on longitudinal causal effects
and dynamic treatment regimes – e.g., see Chapter 19 of Hernan and Robins (2019).

61
Remark 2.2.1 (Background processes). We could have further introduced the background process

tXt utě1 that is causally unaffected by the assignment process. Such a process would play the same role as

pre-assignment covariates in cross-sectional or longitudinal studies.

2.2.2 Dynamic Causal Effects

Any comparison of the potential outcome process at a particular point in time along different

possible realizations of the assignment process define a dynamic causal effect. The dynamic

causal effect at time t for assignment path w1:t P W t and counterfactual path w1:t
1 P W t is
1 q. Of course, this is an enormous class of dynamic causal effects as there are
Yt pw1:t q ´ Yt pw1:t

exponentially many possible paths w1:t P W t . We therefore introduce causal estimands that

average over these dynamic causal effects along various underlying assignment paths.

To do so, let us introduce some shorthand. For t ě 1, h ě 0, and any fixed w P W , write the

time-pt ` hq potential outcome at the assignment process pW1:t´1 , w, Wt`1:t`h q as

Yt`h pwq :“ Yt`h pW1:t´1 , w, Wt`1:t`h q.

Notice that Yt`h “ Yt`h pWt q in this notation.

Definition 2.2.2 (Dynamic causal effects). For t ě 1, h ě 0, and any fixed w, w1 P W , the time-t,

h-period ahead impulse causal effect, filtered treatment effect, and average treatment effect are,

respectively:

Yt`h pwq ´ Yt`h pw1 q, ErYt`h pwq ´ Yt`h pw1 q | Ft´1 s, ErYt`h pwq ´ Yt`h pw1 qs.

The impulse causal effect measures the ceteris paribus causal effect of intervening to switch the

time-t assignment from w1 to w on the h-period ahead outcomes holding all else fixed along the

assignment process. The impulse causal effect is a random object since the potential outcome

process itself is stochastic as well as the past W1:t´1 and future Wt`1:t`h assignments are stochastic.

The filtered treatment effect averages the impulse causal effect conditional on the natural filtration

of assignments and observed outcomes up to time t ´ 1. We use the nomenclature “filtered”

following the stochastic process literature, where filtering refers to the sequential estimation of

time-varying unobserved variables, e.g. Kalman filter (Kalman, 1960; Durbin and Koopman, 2012),

62
particle filter (Gordon et al., 1993; Pitt and Shephard, 1999; Chopin and Papasphiliopoulos, 2020),

and hidden discrete Markov models (Baum and Petrie, 1966; Hamilton, 1989). This labelling fits

as the potential outcomes are unobserved.3

Finally, the average treatment effect further averages the filtered treatment effect over the filtration,

yielding the unconditional expectation of the impulse causal effect Yt`h pwq ´ Yt`h pw1 q.

Remark 2.2.2. If new outcome variables were added to an existing causal study, the impulse causal effect

and the average treatment effect for the existing variables would not be changed, but the filtered treatment

effect might as the new outcome variables would bulk up the filtration and so possibly change the conditional

expectation.

We further define analogous versions of the dynamic causal effects for a particular scalar

assignment. For any fixed wk P Wk , define

Yt`h pwk q :“ Yt`h pW1:t´1 , W1:k´1,t , wk , Wk`1:dW ,t , Wt`1:t`h q.

The corresponding time-t, h-period ahead impulse causal effect, filtered treatment effect, and

average treatment effect for the k-th assignment are, respectively:

Yt`h pwk q ´ Yt`h pw1k q, ErtYt`h pwk q ´ Yt`h pw1k qu | Ft´1 s, ErYt`h pwk q ´ Yt`h pw1k qs.

The dynamic causal effects in Definition 2.2.2 summarize the causal effect of discrete inter-

ventions to switch the time-t assignments on the outcomes. We finally introduce derivatives

that summarize marginal causal effects of incrementally varying the time-t assignment (see,

for example, Angrist and Imbens (1995) and Angrist et al. (2000) for analogous definitions in

cross-sectional settings).

Definition 2.2.3. If they individually exist,

BYt`h pwk q
1
Yt`h pwk q “ , ErYt`h
1
pwk q | Ft´1 s, ErYt`h
1
pwk qs,
Bwk

are called the time-t, h-period ahead marginal impulse causal effect, the marginal filtered treatment

effect, and the marginal average treatment effect respectively.

3 We note that Lee and Salanie (2020) also use the phrase “filtered treatment effect” in analyzing a cross-sectional

setting with partially observed assignments.

63
2.2.3 Links to macroeconometrics

Before continuing, we highlight how the direct potential outcome system naturally links to

several recent developments and debates in macroeconometrics and encompasses many familiar

parametric models in that field. We start with the former.

First, the direct potential outcome system provides a unifying framework to analyze what

assumptions must be placed on the assignment process to endow causal meaning to common

statistical estimands without resorting to functional form assumptions. Workhorse models in

macroeconometrics, such as the structural vector moving average, assume linearity. However, this

nullifies state-dependence and asymmetry in dynamic causal effects. Researchers recognize the

restrictiveness of linearity, yet attempt to weaken it on a case-by-case basis. For example, on the

possible nonlinear effects of oil prices (Killian and Vigfusson, 2011b,a; Hamilton, 2011); on the

nonlinear and state dependent effects of monetary policy (Tenreyro and Thwaites, 2016; Jordá et al.,

2020; Aruoba et al., 2021; Mavroeidis, 2021), and on state-dependent fiscal multipliers (Auerbach

and Gorodnichenko, 2012b,a; Ramey and Zubairy, 2018; Cloyne et al., 2020). Similarly, the direct

potential outcome system does not rely on “invertibility” or “recoverability” assumptions about

the assignment and potential outcome processes (Chahrour and Jurado, 2021). Understanding

what can be identified about dynamic causal effects without relying on these assumptions is

an active area (Stock and Watson, 2018; Plagborg-Møller, 2019; Plagborg-Møller and Wolf, 2020;

Chahrour and Jurado, 2021).

Second, a rapidly growing body of empirical research in macroeconometrics attempts to

estimate dynamic causal efects in settings where researchers directly observe both the assignments

and outcomes, twtobs , yobs


t utě1 . In this line of work, empirical researchers creatively construct

measures of the underlying economic shocks of interest Wt , and then use these constructed shocks

to directly estimate dynamic causal effects on macroeconomic outcomes using reduced-form

methods such as local projections (Jordá, 2005) or autoregressive distributed lag models (Baek

and Lee, 2021). This line of work has recently been called “direct causal inference” by Nakamura

and Steinsson (2018b) in order to contrast it with the dominant model-based approach to causal

inference in macroeconomics in the tradition of Sims (1980). We refer the reader to Nakamura

and Steinsson (2018b), Goncalves et al. (2021), and Baek and Lee (2021) for recent discussions

64
of this growing empirical literature in macroeconomics. The direct potential outcome system

provides a causal foundation for such reduced-form methods in time series, elucidating the

assumptions that the constructed shock must satisfy in order for the reduced form estimands to

have a nonparametric causal interpretation.

Examples from Macroeconomics

Many leading causal models in macroeconomics can be cast as special cases of the direct potential

outcome system that place additional restrictions on the potential outcome process.

Example 2.2.1 (Structural vector moving average (SVMA) model). The SVMA model is the leading

workhorse model for studying dynamic causal effects in macroeconometrics (e.g., Kilian and Lutkepohl,

2017; Stock and Watson, 2018). Any infinite-order SVMA model can be expressed as a direct potential

outcome system by assuming that the potential outcome process satisfies the functional form restriction
t´1
ÿ
Yt pw1:t q :“ Θl wt´l ` Yt˚ ,
l“0

where tWt utě1 is the assignment process, tΘl u0ďlăt is a sequence of lag-coefficient matrices, and tYt˚ utě1

is a stochastic process that is causally unaffected by the assignment process. In this sense, the SVMA model

imposes that the potential outcome process is linear in the assignment process. This mapping requires

no assumptions on the dimensionality of the assignment process dw , the dimensionality of the potential

outcome process dy , nor the lag-coefficient matrices. As discussed in Plagborg-Møller and Wolf (2020), such

an infinite-order SVMA model is consistent with all discrete-time Dynamic Stochastic General Equilibrium

models as well as all stable, linear structural vector autoregression (SVAR) models. We further discuss the

SVMA model in Section 2.7. ▲

Example 2.2.2 (Nonlinear structural vector autoregressions (SVAR)). Recent advances in nonlinear

SVARs can also be cast as special cases of the direct potential outcome system. As an illustration, consider

the motivating example in Goncalves et al. (2021), which is a non-linear SVAR of the form:

Y1,t pw1:t q “ w1,t , Y2,t pw1:t q “ b ` βY1,t pw1:t q ` ρY2,t´1 pw1:t´1 q ` c f pY1,t pw1:t qq ` w2,t ,

where f is a nonlinear function. Given a stochastic initial condition Y2,0 :“ ϵ2,0 that is causally unaffected

by the assignment process, iterating this system of equations forward arrives at a potential outcome process

65
Y1,t pw1:t q “ w1,t , and Y2,t pw1:t q “ g2,t pw1:t , ϵ2,0 ; θq, where g2,t is a known function and θ :“ pb, c, β, ρq

are the parameters. This is a direct potential outcome system where (1) Y1,t pw1:t q is non-random and only

depends on the contemporaneous assignment, (2) the randomness in Y2:t pw1:t q is driven by the initial

condition. Other recent examples of nonlinear SVARs include Aruoba et al. (2021) and Mavroeidis (2021).

Example 2.2.3 (Potential outcome model in Angrist and Kuersteiner (2011), Angrist et al. (2018)).

Angrist and Kuersteiner (2011) and Angrist et al. (2018) introduce a potential outcome model for time

series settings that is a special case of the direct potential outcome system. Using our notation, Angrist and

Kuersteiner (2011) introduce a system of structural equations in which for t ě 1,

Y1,t pw1:t q “ f 1,t pY1:t´1 pw1:t´1 q, w1,t ; ϵ0 q, Y2,t pw1:t q “ f 2,t pY1,t pw1:t q, w2,t , w1:t´1 ; ϵ0 q,

where f 1,t , f 2,t are deterministic functions and ϵ0 is a random initial condition. These structural equations

impose that w1:t only impacts Y1,t through w1,t directly and through Y1,t´1 indirectly. Further, w2,1:t

only impacts Y2,t contemporaneously. Related thinking includes White and Kennedy (2009) and White

and Lu (2010). Through forward iteration of the system starting at t “ 1, this can also be expressed

as a direct potential outcome system. In this system of structural equations, the authors defined the
obs , w, W
collection of their time-t ` h potential outcomes, as tYt`h pw1:t´1 t`1:t`h q : w P WW u and focused

on ErYt`h pw1:t´1
obs , w, W obs 1
t`1:t`h q ´ Yt`h pw1:t´1 , w , Wt`1:t`h qs, which they called the “average policy effect.”

Example 2.2.4 (Expectations). Macroeconomists often consider how assignments are influenced by the

distribution of future outcomes and how they in turn vary with assignments. For example, consumers and

firms are modelled as forward-looking and so, expectations about future outcomes influence behavior today.

Consider a simple optimization-based version (e.g., Lucas, 1972; Sargent, 1981) in which the assignment

process is given by

Wt P arg max max Er UpYt:T pw1:t´1


obs obs
, wt:T q, wt:T q | y1:t´1 obs
, w1:t´1 s, (2.1)
wt wt`1:T

where U is a utility function of future outcomes and assignments, while Ft´1 is written out in long hand as
obs , wobs . For each possible w T´t`1 , the expectation is over the law of Y pwobs , w q|yobs , wobs .
y1:t´1 1:t´1 t:T P W t:T 1:t´1 t:T 1:t´1 1:t´1

This decision rule delivers the output tWt , Yt pW1:t qutě1 . This looks like a direct potential outcome system

66
since Assumption 2.2.2 holds. The assignment Wt could be a deterministic function of past data if the

optimal choice is unique, which would violate Assumption 2.2.4. However, incorporating noise in the

decision rule (2.1) would deliver a direct potential outcome system. ▲

2.3 Estimands Based on Assignments and Outcomes

In this section, we establish nonparametric conditions under which common statistical estimands

based on assignments and outcomes have causal meaning in the direct potential outcome system

tWt , tYt pw1:t q : w1:t P W t uutě1 , where researchers observe the realized assignments and realized

outcomes twtobs , yobs


t utě1 . We ask if the following statistical estimands have causal meaning:

impulse response function, local projection, generalized impulse response function and the local

filtered projection. Table 2.1 defines these estimands and summarizes our main results on their

causal interpretation under important restrictions on the assignment process and other technical

conditions. The rest of this Section spells out the details.

Table 2.1: Top line results for the causal interpretation of common estimands based on assignments
and outcomes.

Name Estimand Causal Interpretation


Impulse Response ErYt`h | Wk,t “ wk s ErYt`h pwk q ´ Yt`h pw1k qs;
Function ´ErYt`h | Wk,t “ w1k s

ErYt`h
ş 1
CovpYt`h ,Wk,t q Wk pwk qsErGt pwk qsdwk
Local Projection
W ErGt pwk qsdwk
ş
VarpWk,t q
k

Generalized Impulse ErYt`h | Wk,t “ wk , Ft´1 s ErYt`h pwk q ´ Yt`h pw1k q | Ft´1 s;
Response Function ´ErYt`h | Wk,t “ w1k , Ft´1 s

ErErYt`h
ş 1
ErtYt`h ´Ŷt`h|t´1 utWk,t ´Ŵk,t|t´1 us Wk pwk q|Ft´1 sErGt|t´1 pwk q|Ft´1 ssdwk
Local Filtered Projection
W ErGt|t´1 pwk qsdwk
ş
ErtWk,t ´Ŵk,t|t´1 u2 s
k
Notes: This table summarizes the main results for the causal interpretation of common estimands based on assignments
and outcomes. Here h ě 0, wk , w1k P Wk , Gt pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t sq and Gt|t´1 pwk q “ 1twk ď Wk,t upWk,t ´
ErWk,t | Ft´1 sq, while Ŷt`h|t´1 :“ ErYt`h | Ft´1 s and Ŵk,t|t´1 :“ ErWk,t | Ft´1 s. Note that ErGt pwk qs ě 0 and
ErGt|t´1 pwk q | Ft´1 s ě 0.

In this section, there is no loss in generality in assuming the outcome Yt`h is univariate. The

more general case is covered by running the analysis equation by equation.

67
2.3.1 Impulse Response Function

We begin by determining the conditions under which the unconditional impulse response function

(Sims, 1980) is the h-period ahead average treatment effect. For h ě 0 and deterministic wk , w1k P

Wk , the impulse response function is defined by, if it exists,

IRFk,t,h pwk , w1k q :“ ErYt`h | Wk,t “ wk s ´ ErYt`h | Wk,t “ w1k s. (2.2)

IRFk,t,h pwk , w1k q can be decomposed into the average treatment effect and a selection bias term.

Theorem 2.3.1. Assume a direct potential outcome system, consider some k “ 1, . . . , dw , t ě 1, h ě 0, fix

wk , w1k P Wk and that Er|Yt`h pwk q ´ Yt`h pw1k q|s ă 8. Then,

IRFk,t,h pwk , w1k q “ ErYt`h pwk q ´ Yt`h pw1k qs ` ∆k,t,h pwk , w1k q,

where ` ˘
Cov pYt`h pwk q, 1tWk,t “ wk uq Cov Yt`h pw1k q, 1tWk,t “ w1k u
∆k,t,h pwk , w1k q :“ ´ .
Er1tWk,t “ wk us Er1tWk,t “ w1k us
The impulse response function is therefore equal to the average treatment effect if and only if

the selection bias term ∆k,t,h pwk , w1k q “ 0. A sufficient condition for this to hold is that the two

covariance terms are zero.

Notice that these covariance terms depend on how the assignment Wk,t covaries with the

potential outcome Yt`h pwk q. Since Yt`h pwk q :“ Yt`h pW1:t´1 , W1:k´1,t , wk , Wk`1:dW ,t , Wt`1:t`h q by

definition, the selection bias therefore depends on how the assignment Wk,t relates to

1. past assignments W1:t´1 ,

2. other contemporaneous assignments W1:k´1,t , Wk`1:dW ,t ,

3. future assignments Wt`1:t`h , and

4. the potential outcome process Yt`h pw1:t`h q.

By placing further restrictions on the assignment process, we immediately arrive at sufficient

conditions for ∆k,t,h pwk , w1k q to be zero.

Theorem 2.3.2. Under the same conditions as Theorem 2.3.1, if

` ˘
Cov pYt`h pwk q, 1tWk,t “ wk uq “ 0, Cov Yt`h pw1k q, 1tWk,t “ w1k u “ 0 (2.3)

68
then ∆k,t,h pwk , w1k q “ 0. Moreover, (2.3) is satisfied if

Wk,t K
K Yt`h pwk q, and Wk,t KK Yt`h pw1k q, (2.4)

which is in turn implied by

Wk,t KK tYt`h pwk q : wk P Wk u, (2.5)

which is in turn implied by


´ ¯
K W1:t´1 , W1:k´1,t , Wk`1:dW ,t , Wt`1:t`h , tYt`h pw1:t`h q : w1:t`h P W t`h u .
Wk,t K (2.6)

Equation (2.6) says the selection bias is zero if the assignment Wk,t is randomized in the sense that

it is independent of all other assignments and the time-pt ` hq potential outcomes.

Recent reviews on dynamic causal effects in macroeconometrics by Ramey (2016) and Stock

and Watson (2018) argue intuitively that the impulse response function of observed outcomes

to “shocks” in parametric structural models, such as the SVMA, are analogous to an average

treatment effect in a randomized experiment from cross-sectional causal inference.4 However,

these statements rely on either intuitive descriptions of the statistical properties of shocks5 , or on a

specific parametric model for the potential outcome process to link the impulse response function

to an average dynamic causal effect. Theorem 2.3.2 clarifies that if the assignment Wk,t is randomly

assigned in these sense of (2.6), then the impulse response function nonparametrically identifies

an average treatment effect in the direct potential outcome system. In this sense, Theorem

2.3.2 provides an interpretation of “shock” in terms of a random assignment assumption on the

assignment process in a direct potential outcome system.

Furthermore, Theorems 2.3.1-2.3.2 clarifies a recent empirical literature that seeks to directly

construct measures of the shocks of interest and measure dynamic causal effects through reduced-

form estimates of impulse response functions — so called “direct causal inference” (e.g., see

4 Stockand Watson (2018) write on pg. 922: “The macroeconometric jargon for this random treatment is a ’structural
shock:’ a primitive, unanticipated economic force, or driving impulse, that is unforecastable and uncorrelated with
other shocks. The macroeconomist’s shock is the microeconomists’ random treatment, and impulse response functions
are the causal effects of those treatments on variables of interest over time, that is, dynamic causal effects.”
5 Ramey (2016) writes on pg. 75, “the shocks should have the following characteristics: (1) they should be exogenous

with respect to the other current and lagged endogenous variables in the model; (2) they should be uncorrelated with
other exogenous shocks; otherwise, we cannot identify the unique causal effects of one exogenous shock relative to
another; and (3) they should represent either unanticipated movements in exogenous variables or news about future
movements in exogenous variables.”

69
Nakamura and Steinsson, 2018b; Baek and Lee, 2021). In order for researchers to causally

interpret reduced-form impulse response functions of outcomes on particular constructed shocks

as nonparametrically identifying an average treatment effect, then the constructed shocks must be

randomized in these sense given in Theorem 2.3.2.

2.3.2 Local Projection Estimand

Under the conditions of Theorem 2.3.1, impulse response functions are causal, but nonparamet-

rically estimating impulse response functions is in general challenging. If the assignment is

observed by the researcher, it is therefore common to estimate impulse response functions using

“local projections” (Jordá, 2005), which directly regresses the h-step ahead outcome on a constant

and the assignment. The corresponding local projection estimand is

CovpYt`h , Wk,t q
LPk,t,h :“ . (2.7)
VarpWk,t q

Theorem 2.3.3 establishes that LPk,t,h identifies a weighted average of marginal causal effects of

the assignment on the h-step ahead outcome.

Theorem 2.3.3. Under the same conditions as Theorem 2.3.1, further assume that:

i. The support of Wk,t is a closed interval, Wk :“ rwk , wk s Ă R.

ii. Differentiability: Yt`h pwk q is continuously differentiable in wk , as is ErYt`h


1 pw qs.
k

K tYt`h pwk q : wk P Wk u.
iii. Independence: Wk,t K

Then, if it exists,
ErYt`h
ş 1 pw qsErG pw qsdw
Wk k t k k
LPk,t,h “ ,
Wk ErGt pwk qsdwk
ş

where Gt pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t sq, noting ErGt pwk qs ě 0.

The local projection estimand LPk,t,h is therefore a weighted average of marginal average treatment

effects of Wk,t on the Yt`h , where the weights ErGt pwk qs are non-negative and sum to one. Thus, if

the assignment Wk,t is a shock in the sense stated in Theorem 2.3.2, the local projection estimand

also has a nonparametric causal interpretation.

70
2.3.3 Generalized Impulse Response Function

In non-linear time series models, it is common to focus on the conditional version of the impulse

response function, the h-period ahead generalized impulse response function (Gallant et al., 1993;

Koop et al., 1996; Gourieroux and Jasiak, 2005), which is

GIRFk,t,h pwk , w1k | Ft´1 q :“ ErYt`h | Wk,t “ wk , Ft´1 s ´ ErYt`h | Wk,t “ w1k , Ft´1 s. (2.8)

Mirroring our analysis of the impulse response function, we next show that GIRFk,t,h can be

decomposed into the filtered treatment effect and a selection bias term.

Theorem 2.3.4. Assume a direct potential outcome system, some k “ 1, . . . , dw , t ě 1, and h ě 0 and that

Er|Yt`h pwk q ´ Yt`h pw1k q| | Ft´1 s ă 8. Then, for any deterministic wk , w1k P W ,

GIRFk,t,h pwk , w1k | Ft´1 q “ ErtYt`h pwk q ´ Yt`h pw1k qu | Ft´1 s ` ∆k,t,h pwk , w1k | Ft´1 q,

where
` ˘
Cov pYt`h pwk q, 1tWk,t “ wk u | Ft´1 q Cov Yt`h pw1k q, 1tWk,t “ w1k u | Ft´1
∆k,t,h pwk , w1k | Ft´1 q :“ ´ .
Er1tWk,t “ wk u | Ft´1 s Er1tWk,t “ w1k u | Ft´1 s

Sufficient conditions for the selection bias term ∆k,t,h pwk , w1k | Ft´1 q to equal zero is that the

two conditional covariances are zero. Repeating the unconditional case, Theorem 2.3.5 provides

sufficient conditions such that the selection bias term is equal to zero.

Theorem 2.3.5. Under the same conditions as Theorem 2.3.4, if

` ˘
Cov pYt`h pwk q, 1tWk,t “ wk u | Ft´1 q “ 0, Cov Yt`h pw1k q, 1tWk,t “ w1k u | Ft´1 “ 0, (2.9)

then ∆k,t,h pwk , w1k q “ 0. Moreover, (2.9) is implied by

K Yt`h pwk q | Ft´1 ,


Wk,t K and Wk,t KK Yt`h pw1k q | Ft´1 , (2.10)

which is in turn implied by

K tYt`h pwk q : wk P Wk us | Ft´1 ,


rWk,t K (2.11)

which is in turn implied by


´ ¯
obs
rWk,t KK W1:k´1,t , Wk`1:dW ,t , Wt`1:t`h , tYt`h pw1:t´1 , wt:t`h q : wt:t`h P W h`1 u s | Ft´1 . (2.12)

71
Therefore, under (2.9), the selection bias ∆k,t,h pwk , w1k | Ft´1 q “ 0 and the generalized impulse

response function identifies the filtered impulse causal effect. Notice how much weaker (2.12)

is than (2.6) as it allows the assignment to probabilistically depend flexibly on the past realised

potential outcomes and realised assignments.

At first glance, (2.11) appears analogous to a typical unconfoundedness assumption from

cross-sectional causal inference or sequential randomization assumption from longitudinal causal

inference. That is, it imposes that conditional on the history up to time t ´ 1, the assignment

Wk,t must be as good as randomly assigned. However, recall that the notation Yt`h pwk q buries

dependence on (i) other contemporaneous assignments W1:k´1,t , Wk`1:dW ,t ; (ii) future assignments

Wt`1:t`h ; and (iii) the potential outcomes at time-pt ` hq. Therefore, (2.12) in Theorem 2.3.5 provides

further sufficient conditions under which (2.11) is satisfied, highlighting that it is sufficient to

further impose that the assignment Wk,t is jointly independent of all other contemporaneous and

future assignments as well as the underlying potential outcomes.

Remark 2.3.1. How do the conditions in Theorem 2.3.2 relate to the conditions in Theorem 2.3.5? Applying

the law of total covariance yields

CovpYt`h pwk q, 1tWk,t “ wk uq “ ErCovpYt`h pwk q, 1tWk,t “ wk u | Ft´1 qs

` CovpErYt`h pwk q | Ft´1 s, Er1tWk,t “ wk u | Ft´1 sq,

so CovpYt`h pwk q, 1tWk,t “ wk uq “ 0 neither implies or is implied by CovpYt`h pwk q, 1tWk,t “ wk u |

Ft´1 qs “ 0. Hence, the conditional and unconditional cases are non-nested. If we instead work probabilisti-
cally, then the condition
´ ¯
K W1:t´1 , W1:k´1,t , Wk`1:dW ,t , Wt`1:t`h , tY1:t`h pw1:t`h q : w1:t`h P W t`h u ,
Wk,t K

which strengthens (2.6) to additionally require independence of the full potential outcome process, implies

the condition (2.12). This second point is important practically. The generalized impulse response function

tells us the filtered treatment effect provided that rWk,t KK tYt`h pwk q : wk P Wk us | Ft´1 . A temporally

averaged generalized impulse response function therefore tells us the average treatment effect without the

need to employ the harsher condition rWk,t KK tYt`h pwk q : wk P Wk us as it sidesteps the use of the impulse

response function.

72
2.3.4 Generalized Local Projection and Local Filtered Projection Estimands

Again estimating generalized impulse response functions nonparametrically is challenging. Under

the same conditions as Theorem 2.3.3 but replacing condition (iii) with Equation (2.11), the

generalized local projection satisfies

ErYt`h
ş 1 pw q | F
CovpYt`h , Wk,t | Ft´1 q Wk k t´1 sErGt|t´1 pwk q | Ft´1 sdwk
“ ,
Wk ErGt|t´1 pwk q | Ft´1 sdwk
ş
VarpWk,t | Ft´1 q

where Gt|t´1 pwk q “ 1twk ď Wk,t upWk,t ´ ErWk,t | Ft´1 sq, noting ErGt|t´1 pwk q | Ft´1 s ě 0. The

generalized local projection is equivalent to a weighted average of conditional average marginal

effects of Wk,t on Yt`h , where the weights now depend on the natural filtration but still are

non-negative and sum to one.

Of more practical importance, is the local projection of Yt`h ´ Ŷt`h|t´1 on Wk,t ´ Ŵk,t|t´1 , where

Ŷt`h|t´1 :“ ErYt`h | Ft´1 s and Ŵk,t|t´1 :“ ErWk,t | Ft´1 s. We call the associated estimand the local

filtered projection, which is defined as

ErtYt`h ´ Ŷt`h|t´1 utWk,t ´ Ŵk,t|t´1 us


.
ErtWk,t ´ Ŵk,t|t´1 u2 s

Under the same conditions as needed for the generalized local projection plus needing the

unconditional expectations to exist, the local filtered projection estimand equals

Wk E ErYt`h pwk q | Ft´1 sErGt|t´1 pwk q | Ft´1 s dwk


ş “ 1

,
Wk ErGt|t´1 pwk qsdwk
ş

This is a long-run weighted average of the marginal filtered causal effect. The weights are non-zero

and average to one over time.

2.4 The Instrumented Potential Outcome System

We now use a special case of the direct potential outcome system to incorporate instrumental

variables for the assignment process. This is useful as a rapidly growing literature in macroeco-

nomics exploits the use of instruments to identify dynamic causal effects (e.g., see Jordá et al., 2015;

Gertler and Karadi, 2015; Ramey and Zubairy, 2018; Stock and Watson, 2018; Plagborg-Møller

and Wolf, 2020; Jordá et al., 2020, among many others). Section 2.5 details the case where the

researcher observes the assignments, the instruments, and the outcomes. Section 2.6 considers the

73
case where only the instruments and the outcomes are observed.

2.4.1 The Instrumented System


␣ ␣ ((
We start by setting up an “augmented assignment” Vt , so that Vt , Yt pv1:t q : v1:t P WVt tě1 is a

direct potential outcome system.

The instrumented potential outcome system then imposes two further assumptions on the

potential outcome system: (i) that tVt utě1 splits into an “instrument” tZt utě1 and a “potential as-

signment” tWt pzt q : zt P W Zt utě1 that is only causally affected by the contemporaneous instrument,

meaning Vt “ pZt , Wt pZt qq; (ii) the potential outcome process is only affected by the assignment

W1:t .

Definition 2.4.1 (Instrumented potential outcome system). Assume Wt P WW , Zt P W Z and write


␣ ␣ t ˆ Wt
((
Vt “ pWt , Zt q. Assume Vt , Yt pv1:t q : v1:t P WW Z tě1
is a direct potential outcome system.

Additionally, enforce three Assumptions:

i. Contemporaneous Instrument: The “potential assignments” satisfy

␣ (
Wk,t ptzs usě1 q “ Wk,t pz11:t´1 , zt , z1s sět`1 q

W1:k´1,t ptzs usě1 q “ W1:k´1,t ptz1s usě1 q

Wk`1:dW ,t ptzs usě1 q “ Wk`1:dW ,t ptz1s usě1 q

almost surely, for all t ě 1 and all deterministic tzt utě1 and tz1t utě1 . Write the potential assignments

as tWt pzt q “ pW1:k´1,t , Wk,t pzt q, Wk`1:dW ,t q : zt P W Z u, while the assignment is Wt “ Wt pZt q “

pW1:k´1,t , Wk,t pZt q, Wk`1:dW ,t q.

ii. Potential Outcome Exclusion:

` ˘ ` ˘
Yt ppw1 , z1 q , ..., pwt , zt qq “ Yt p w1 , z11 , ..., wt , z1t q


t and z , z1 P W t . Write the potential outcomes as Y pw q : w t
(
almost surely for all w1:t P WW 1:t 1:t Z t 1:t 1:t P WW

and outcome as Yt “ Yt pW1:t q.

iii. Output: The output is

tZt , Wt , Yt utě1 “ tZt , Wt pZt q, Yt pW1:t qutě1 ,

74
while Zt and tZt utě1 are called the “contemporaneous instrument,” and instrument process, respec-

tively.

t uu
Any tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW tě1 satisfying (i)-(iii) is an instrumented potential

outcome system.

The simplest case is when both the assignment and instrument are scalar and binary, WW “

t0, 1u, W Z “ t0, 1u. In this case, the instrument Zt “ 1 corresponds to “intention to treat” and

Zt “ 0 is “intention to control.” There is treatment and control as intended when Wt p1q “ 1 and

Wt p0q “ 0. But there can be noncompliance when Wt p1q “ 0 and Wt p0q “ 1.

Assumption (i) imposes that Zt is only an instrument for the time-t, k-th assignment. This

formalizes common empirical intuition in macroeconometrics where a constructed external in-

strument is often “targeted” towards a single economic shock of interest – for example, empirical

researchers construct proxies for a monetary policy shock (e.g., Gertler and Karadi, 2015; Naka-

mura and Steinsson, 2018a; Jordá et al., 2020) or a fiscal policy shock (Ramey and Zubairy, 2018).

Assumption (ii) is the familiar outcome exclusion restriction on the instrument from cross-sectional

causal inference.

To use this structure, we also need a type of “relevance” condition on the instrument. Such

conditions will be stated as needed below.

2.5 Estimands Based on Assignments, Instruments and Outcomes

We now study the conditions under which leading statistical estimands based on assignments,

instruments and outcomes have causal meaning in the context of an instrumented potential
t uu
outcome system tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW tě1 . We consider the case in which

researcher observes the instruments, the assignments, and the outcomes tzobs obs obs
t , wt , yt utě1 .

Since the assignments themselves are assumed to be directly observable, we focus on dynamic

IV estimands that involve taking the ratio of an impulse response function of the outcome on

the instrument relative to the impulse response function of the assignment on the instrument.

We show that such dynamic IV estimands identify local average impulse causal effects in the

sense of Imbens and Angrist (1994), Angrist et al. (1996), and Angrist et al. (2000). Our results in

75
this section are most closely related to Jordá et al. (2020), who used a potential outcome model

analogous to that introduced in Angrist et al. (2018), to understand the causal content of local

projection-IV with a binary assignment and binary instrument.

In particular, we ask if the following statistical estimands have causal meaning: the Wald

estimand, the IV estimand, the generalized Wald estimand, and the filtered IV estimand. Table 2.2

defines these estimands and summarizes our main results on their causal interpretation under

important restrictions on the assignment process and other technical conditions. The rest of this

Section spells out the details.

Table 2.2: Top line results for the causal interpretation of common estimands based on assignments,
instruments and outcomes.

Name Estimand Causal Interpretation


ErYt`h
ş 1 pw q|H pw q“1sErH pw qsdw
ErYt`h |Zt “zs´ErYt`h |Zt “z1 s W şk t k t k k
Wald ErWk,t |Zt “zs´ErWk,t |Zt “z1 s W ErHt pwk qsdwk

ErYt`h
ş 1 pz qsErG pz qsdz
CovpYt`h ,Zt q şWZ
t t t t
IV CovpWt ,Zt q ErWt1 pzt qsErGt pzt qsdzt
WZ

ErYt`h
ş 1 pw q|H pw q“1,F
ErYt`h |Zt “z,Ft´1 s´ErYt`h |Zt “z1 ,Ft´1 s W k
ş t k t´1 sErHt pwk q|Ft´1 sdwk
Generalized ErWk,t |Zt “z,Ft´1 s´ErWk,t |Zt “z1 ,Ft´1 s ErH pw q| F
W t k t´1 sdwk
Wald

ErErYt`h
ş 1 pz q|F
ErpYt`h ´Ŷt`h|t´1 qpZt ´Ẑt|t´1 qs t t´1 sErGt pzt q|Ft´1 ssdzt
Filtered IV şWZ
ErpWk,t ´Ŵk,t|t´1 qpZt ´Ẑt|t´1 qs WZ ErErWt1 pzt q|Ft´1 sErGt pzt q|Ft´1 ssdzt

Notes: This table summarizes the main results for the causal interpretation of common estimands based on assignments,
instruments and outcomes. Here h ě 0, z, z1 P W Z , Yt`h pzt q :“ Yt`h pW1:t´1 , Wt,1:k´1 , Wk pzt q, Wt,k`1:dW , Wt`1:t`h q,
t`h pzt q{Bzt , Ht pwk q “ 1tWk,t pz q ď wk ď Wk,t pzqu, Gt pzt q “ 1tzt ď Zt upZt ´ ErZt sq and Gt|t´1 pzt q “
1 pz q :“ BY
Yt`h 1
t
1tzt ď Zt upZt ´ ErZt s | Ft´1 sq, while Ŷt`h|t´1 “ ErYt`h | Ft´1 s, Ẑt|t´1 “ ErZt | Ft´1 s and Ŵk,t|t´1 “ ErWk,t | Ft´1 s.
Note that ErGt pzt qs ě 0 and ErGt|t´1 pzt q | Ft´1 s ě 0.

2.5.1 Wald Estimand

Consider the classic Wald estimand

ErYt`h | Zt “ zs ´ ErYt`h | Zt “ z1 s
.
ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s

The numerator is the impulse response of the outcome Yt`h on the instrument Zt , which can

be thought of as the “reduced-form.” The denominator is the impulse response function of the

assignment Wk,t on the instrument Zt , which can be thought of as the “first-stage.” Our next result

76
establishes that the Wald estimand identifies a weighted average of marginal causal effects for

“compliers” provided that (i) the potential outcome process is continuously differentiable in the

assignment; (ii) the instrument is independent of the potential assignment and outcome processes;

(iii) is a relevance condition; (iv) satisfies a monotonicity condition as introduced in Imbens and

Angrist (1994).

Theorem 2.5.1. Assume an instrumented potential outcome system, fix z, z1 P W Z and that

i. Differentiability: Yt`h pwk q is continuously differentiable in the closed interval wk P Wk :“

rwk , wk s Ă R.

ii. Independence: The instrument satisfies Zt KK tWk,t pzq : z P W Z u and Zt KK tYt`h pwk q : wk P Wk u.

Er1tWk,t pz1 q ď wk ď Wk,t pzqusdwk ą 0.


ş
iii. Relevance: W

iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq with probability one.

Then, Wald estimand equals, so long as it exists,

W ErYt`h pwşk q|Ht pwk q “ 1sErHt pwk qsdwk


ş 1
,
W ErHt pwk qsdwk

where Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu.

Provided the instrument is randomly assigned, relevant, and satisfies a monotonicity condition,

then the Wald estimand equals a weighted average of the marginal causal effects for “compliers”

(i.e., realizations of the potential assignment function for which moving the instrument from z1

to z changes the assignment). The marginal causal effect is the derivative of the h-step ahead

potential outcome process with respect to the k-th assignment, holding all else constant. The

weights are proportional to the probability of the potential assignment function being a “complier,”

so are non-negative and sum to one.

Since Yt`h pwk q :“ Yt`h pW1:t´1 , W1:k´1,t , wk , Wk`1:dW ,t , Wt`1:t`h q, Assumption (ii) implicitly re-

stricts the relationship between the instrument Zt and:

1. other assignments W1:k´1,1:t`h , Wk`1:dW ,1:t`h ,

2. future and past potential assignments tWk,1:t´1 pz1:t´1 q, Wk,t`1:t`h pzt`1:t`h q : z1:t´1 P Z t , zt`1:t`h P

Z h u,

77
3. future and past instruments Z1:t´1 and Zt`1:t`h , and

4. the potential outcome process tYj,t`h pw1:t`h q : w1:t`h P W t`h u.

We could extend Theorem 2.3.2 to the instrumented potential outcome system, and show that

Assumption (ii) is implied by restricting the instrument Zt to be independent of each of these

quantities.

Remark 2.5.1 (Binary Assignment, Binary Instrument Case). Consider the simplest case with Wk,t P

t0, 1u, Zt P t0, 1u and z “ 1, z1 “ 0. Although the math is different due to the discreteness of the assignment

and instrument, under the same conditions as Theorem 2.5.3, we can show that the Wald estimand in this

case equals

ErtYt`h p1q ´ Yt`h p0qu | Wk,t p1q ´ Wk,t p0q “ 1s,

which is the time-series generalization of the binary assignment, binary instrument local average treatment

effect originally derived in Imbens and Angrist (1994).

2.5.2 IV Estimand

Rather than directly estimating the Wald estimand, it is natural to estimate a two-stage least

squares regression of the outcome Yt`h on the assignment Wk,t using the instrument Zt . The

associated IV estimand is
CovpYt`h , Zt q
IVk,t,h :“ .
CovpWt , Zt q
This has a causal interpretation by applying Theorem 2.3.3 for the local projection estimand

on both the numerator for the local projection of Yt`h on Zt and the denominator for the local

projection of Wt on Zt . The statement of the results uses the notation

Yt`h pzt q :“ Yt`h pW1:t´1 , Wt,1:k´1 , Wk pzt q, Wt,k`1:dW , Wt`1:t`h q,

1 pz q :“ BY
and Yt`h t t`h pzt q{Bzt .

Theorem 2.5.2. Assume an instrumented potential outcome system. Further assume that

i. Differentiability: Yt`h pzq and that Wt pzq are continuously differentiable in the closed interval z P

W Z “ rz, zs Ă R.

78
K tWt pzq : z P W Z u,
ii. Independence: Zt K Zt KK tYt`h pzq : z P W Z u,

ErWt1 pzt qsErGt pzt qsdzt ‰ 0.


ş
iii. Relevance: WZ

Then, it follows, if it exists, that

ErYt`h
ş 1 pz qsErG pz qsdz
W t t t t
IVk,t,h “ ş Z
WZ ErWt1 pzt qsErGt pzt qsdzt

where Gt pzt q “ 1tzt ď Zt upZt ´ ErZt sq, noting ErGt pzt qs ě 0.

2.5.3 Generalized Wald Estimand

The generalized Wald estimand is a ratio of a reduced-form generalized impulse response function

to a first-stage generalized impulse response function. It is given by, for fixed z, z1 P W Z ,

ErYt`h | Zt “ z, Ft´1 s ´ ErYt`h | Zt “ z1 , Ft´1 s


. (2.13)
ErWk,t | Zt “ z, Ft´1 s ´ ErWk,t | Zt “ z1 , Ft´1 s

Theorem 2.5.3. Assume an instrumented potential outcome system, fix z, z1 P W Z and that

i. Differentiability: Yt`h pwk q is continuously differentiable in the closed interval wk P Wk :“

rwk , wk s Ă R.

ii. Independence: The instrument satisfies rZt KK tWk,t pzq : z P W Z us | Ft´1 and rZt KK tYt`h pwk q : wk P

Wk us | Ft´1 .

Er1tWk,t pz1 q ď wk ď Wk,t pzqu | Ft´1 sdwk ą 0.


ş
iii. Relevance: W

iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq with probability one.

Then, the generalized Wald estimand equals, so long as it exists,

W ErYt`h pwk q |ş Ht pwk q “ 1, Ft´1 sErHt pwk q | Ft´1 sdwk


ş 1
,
W ErHt pwk q | Ft´1 sdwk

where, again, Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu.

The generalized Wald estimand analogously equals a weighted average of the marginal filtered

causal effects for “compliers,” where the weights are proportional to the probability of the potential

assignment function being a “complier” conditional on the filtration.

79
We next provide a sufficient condition for the instrument to be randomly assigned in terms of

conditional independence restrictions on these underlying processes.

Theorem 2.5.4. Assume that the instrument satisfies


´
Zt KK Zt`1:t`h , W1:k´1,t:t`h , tWk,t`1:t`h pzt`1:t`h q : zt`1:t`h P Z h u, Wk`1:dW ,t:t`h ,
! ¯
Yt`h pw1:t`h q : w1:t`h P W t`h u | Ft´1 .

Then, Assumption (ii) in Theorem 2.5.3 is satisfied.

2.5.4 Generalized IV and Filtered IV Estimands

Estimating generalized Wald estimand is not easy, particularly if Zt is not discrete. Here we derive

a causal interpretation for generalized IV estimand

CovpYt`h , Zt | Ft´1 q ErpYt`h ´ Ŷt`h|t´1 qpZt ´ Ẑt|t´1 q | Ft´1 s


“ .
CovpWk,t , Zt | Ft´1 q ErpWk,t ´ Ŵk,t|t´1 qpZt ´ Ẑt|t´1 q | Ft´1 s

where Ŷt`h|t´1 “ ErYt`h | Ft´1 s, Ŵk,t|t´1 “ ErWk,t | Ft´1 s and Ẑt|t´1 “ ErZt | Ft´1 s.

No new technical issues arise in dealing with this setup, but Assumption (ii) in Theorem 2.5.2

now becomes

K tYt`h pzq : z P W Z us | Ft´1 ,


rZt K rZt KK tWt pzq : z P W Z us | Ft´1 . (2.14)

Then, the generalized IV estimand equals

ErYt`h
ş 1 pz q | F
t t´1 sErGt|t´1 pzt q | Ft´1 sdzt
şWZ
W Z ErWt pzt q | Ft´1 sErGt|t´1 pzt q | Ft´1 sdzt
1

where Gt|t´1 pzt q “ 1tzt ď Zt upZt ´ ErZt | Ft´1 sq, noting ErGt|t´1 pzt q | Ft´1 s ě 0.

Of more practical importance is the filtered IV estimand

ErpYt`h ´ Ŷt`h|t´1 qpZt ´ Ẑt|t´1 qs


,
ErpWk,t ´ Ŵk,t|t´1 qpZt ´ Ẑt|t´1 qs,

which can be estimated by instrumental variables applied to Yt`h ´ Ŷt`h|t´1 on Wk,t ´ Ŵk,t|t´1

with instruments Zt ´ Ẑt|t´1 . Under the conditions of Theorem 2.5.2 but using (2.14) instead of

80
Assumption (ii), then the filtered IV estimand becomes

ErErYt`h
ş 1 pz q | F
t t´1 sErGt|t´1 pzt q | Ft´1 ssdzt
şWZ .
W Z ErErWt pzt q | Ft´1 sErGt|t´1 pzt q | Ft´1 ssdzt
1

2.6 Estimands Based on Instruments and Outcomes

In this section, we study the nonparametric conditions under which common statistical estimands

based on only instruments and outcomes have causal meaning. We focus on an instrumented

potential outcome system

t
tZt , tWt pzt q, zt P W Z u, tYt pw1:t q, w1:t P WW uutě1 ,

in which the researcher only observes the instruments and the outcomes tzobs obs
t , yt utě1 . We will

sometimes refer to tFtZ,Y utě1 as the natural filtration generated by the realized tzobs obs
t , yt utě1 .

In this context, it is common for empirical researchers to analyze estimands involving two

elements of the outcome vector Yj,t`h , Yk,t and the instrument Zt (therefore, we return to using

a explicit subscript on the outcome variable). Consider, for example, an empirical researcher

that constructs an instrument Zt for the monetary policy shock (e.g., an instrument of the form

used in Kuttner (2001); Cochrane and Piazessi (2002); Gertler and Karadi (2015) or Romer and

Romer (2004)). In this case, the empirical researcher may measure the dynamic causal effect of

the monetary policy shock Wk,t on unemployment Yj,t`h by estimating the first-stage impulse

response function of the federal funds rate Yk,t on the instrument Zt . See, for example, Jordá

et al. (2015); Ramey and Zubairy (2018); Jordá et al. (2020) for recent empirical applications of this

empirical strategy.

In particular, we ask if the following estimands have causal meaning: Ratio Wald, Local Projec-

tion IV, generalized Ratio Wald, and the local filtered projection IV. We show that such dynamic

IV estimands identify “relative” local average impulse causal effect, which is a nonparametrically

generalization of the interpretation of such a dynamic IV estimand in existing literature on external

instruments (Stock and Watson, 2018; Plagborg-Møller and Wolf, 2020; Jordá et al., 2020). Table 2.3

defines these estimands and summarizes our main results on their causal interpretation under

important restrictions on the assignment process and other technical conditions. The rest of this

81
Section spells out the details.

Table 2.3: Top line results for the causal interpretation of common estimands based on instruments
and outcomes.

Name Estimand Causal Interpretation


ErYj,t`h
ş 1
ErYj,t`h |Zt “zs´ErYj,t`h |Zt “z1 s W pwk q|Ht pwk q“1sErHt pwk qsdwk
Ratio Wald ErYk,t |Zt “zs´ErYk,t |Zt “z1 s
ş
ErYk,t
1 pw q|H pw q“1sErH pw qsdw
W k t k t k k

ErYj,t`h
ş 1
CovpYj,t`h ,Zt q WZ pzk qsErGt pzk qsdzk
Local Projection CovpYk,t ,Zt q
ş
ErYk,t
1 pz qsErG pz qsdz
WZ k t k k

IV
Z,Y Z,Y
ErYj,t`h
Z,Y Z,Y
ş
ErYj,t`h |Zt “z,Ft´1 s´ErYj,t`h |Zt “z1 ,Ft´1 s W
1
pwk q|Ht pwk q“1,Ft´1 sErHt pwk q|Ft´1 sdwk
Generalized Ratio Z,Y Z,Y 1 pw q|H pw q“1,F Z,Y sErH pw q|F Z,Y sdw
ErYk,t |Zt “z,Ft´1 ErYk,t
ş
s´ErYk,t |Zt “z1 ,Ft´1 s W k t k t´1 t k t´1 k
Wald
Z,Y Z,Y
ErErYj,t`h
ş 1
CovpYj,t`h ´Ŷj,t`h|t´1 ,Zt ´Ẑt|t´1 q WZ pzk q|Ft´1 sErGt pzk q|Ft´1 ssdzk
Local Filtered 1 pz q|F Z,Y sErG pz q|F Z,Y ssdz
ErErYk,t
ş
CovpYk,t ´Ŷk,t|t´1 ,Zt ´Ẑt|t´1 q WZ k t´1 t k t´1 k

Projection IV
Notes: This table summarizes the main results for the causal interpretation of common estimands based on instruments
and outcomes. Here Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu, Gt pzt q “ 1tzt ď Zt upZt ´ ErZt sq and Gt|t´1 pzt q “ 1tzt ď
Z,Y Z,Y Z,Y
Zt upZt ´ ErZt | Ft´1 q, while Ŷk,t`h|t´1 “ ErYk,t`h | Ft´1 s and Ẑt|t´1 “ ErZt | Ft´1 s. Note that ErGt pzt qs ě 0 and
Z,Y
ErGt|t´1 pzt q | Ft´1 s ě 0.

2.6.1 Ratio Wald Estimand

The Ratio Wald Estimand


ErYj,t`h | Zt “ zs ´ ErYj,t`h | Zt “ z1 s
,
ErYk,t | Zt “ zs ´ ErYk,t | Zt “ z1 s
which is the ratio of the Wald estimands:

ErYj,t`h | Zt “ zs ´ ErYj,t`h | Zt “ z1 s ErYk,t | Zt “ zs ´ ErYk,t | Zt “ z1 s


, to .
ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s

Hence we just need to collect the conditions for the validity of their causal representations, and

then apply Theorem 2.5.1 twice.

Corollary 2.6.1. Assume an instrumented potential outcome system, z, z1 P W Z and that

i. Differentiability: Yk,t pwk q, Yj,t`h pwk q are continuously differentiable in closed interval Wk :“

rwk , wk s Ă R.

K tWk,t pzq : z P W Z u and Zt KK tYk,t pwk q, Yj,t`h pwk q : wk P Wk u.


ii. Independence: Zt K

ErYk,t
ş 1 pw q | H pw q “ 1sErH pw qsdw ‰ 0.
iii. Relevance: W k t k t k k

82
iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq with probability one.

Then, the Ratio Wald Estimand equals, if it exists,

W ErYj,t`h pwk q | Ht pwk q “ 1sErHt pwk qsdwk


ş 1
,
W ErYk,t pwk q | Ht pwk q “ 1sErHt pwk qsdwk
ş 1

where Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu.

In words, the ratio Wald estimand above identifies a relative local average impulse causal

effect under the instrumented potential outcome system. The numerator is a weighted average

of the marginal causal effects of Wk,t on the h-step ahead outcome Yj,t`h , where the weights are

proportional to the probability of compliance. Similarly, the denominator is a weighted average of

the marginal causal effects of Wk,t on the contemporaneous outcome Yk,t . Therefore, the ratio in

Corollary 2.6.1 measures the causal response of the h-step ahead outcome Yj,t`h to a change in the

treatment Wk,t that increases the contemporaneous outcome Yk,t by one unit on impact (among

compliers).

This is a nonparametric generalization of the well-known result that in linear SVMA models

(without invertibility) the IV based estimands identify relative impulse response functions (Stock

and Watson, 2018; Plagborg-Møller and Wolf, 2020). Corollary 2.6.1 makes no functional form

assumptions nor standard time series assumptions such as invertibility or recoverability. In this

sense, Corollary 2.6.1 highlights the attractiveness of using external instruments to measure

dynamic causal effects in observational time series data. Provided there exists an external

instrument for the treatment Wk,t that is randomly assigned, relevant and satisfies a monotonicity

condition, then the researcher can identify causally interpretable estimands without further

assumptions and without even directly observing the treatment itself.

2.6.2 Local Projection IV Estimand

The local projection IV estimand


CovpYj,t`h , Zt q
,
CovpYk,t , Zt q
CovpYj,t`h ,Zt q CovpYk,t ,Zt q
is the ratio of the IV estimands CovpWk,t ,Zt q to CovpWk,t ,Zt q . Therefore, we once again just need to

collect the conditions for the validity of their causal representations, and apply Theorem 2.5.2

twice.

83
Corollary 2.6.2. Consider an instrumented potential outcome system. Further assume that

i. Differentiability: Yk,t pzq, Yj,t`h pzq, Wt pzq are continuously differentiable in the closed interval z P

W Z “ rz, zs Ă R.

K tYk,t pzq, Yj,t`h pzq : z P W Z u and Zt KK tWt pzq : z P W Z u.


ii. Independence: Zt K

ErYk,t
ş 1 pz qsErG pz qsdz ‰ 0.
iii. Relevance: WZ t t t t

Then, the local projection IV estimand equals

W Z ErYj,t`h pzk qsErGt pzk qsdzk


ş 1
,
W Z ErYk,t pzk qsErGt pzk qsdzk
ş 1

where Gt pzk q “ 1tzk ď Zt upZt ´ ErZt sq, noting ErGt pzk qs ě 0.

2.6.3 Generalized Ratio Wald Estimand

Researchers may also be interested in analyzing the generalized ratio Wald estimand:
Z,Y Z,Y
ErYj,t`h | Zt “ z, Ft´1 s ´ ErYj,t`h | Zt “ z1 , Ft´1 s
Z,Y Z,Y
,
ErYk,t | Zt “ z, Ft´1 s ´ ErYk,t | Zt “ z1 , Ft´1 s

which is the ratio of generalized impulse response functions at different lags and for different

outcome variables. Since this is the ratio of two generalized Wald estimands, we immediately

arrive at the following corollary by applying Theorem 2.5.3 twice.

Corollary 2.6.3. Assume an instrumented potential outcome system, z, z1 P W Z and that

i. Differentiability: Yk,t pwk q, Yj,t`h pwk q are continuously differentiable in closed interval Wk :“

rwk , wk s Ă R.

Z,Y
K tWk,t pzq : z P W Z us | Ft´1
ii. Independence: rZt K and rZt KK tYk,t pwk q, Yj,t`h pwk q : wk P Wk us |
Z,Y
Ft´1 .

1 pw q | H pw q “ 1, F Z,Y sErH pw q | F Z,Y sdw ‰ 0.


ErYk,t
ş
iii. Relevance: W k t k t´1 t k t´1 k

Z,Y
iv. Monotonicity: Wk,t pz1 q ď Wk,t pzq | Ft´1 with probability one.

Then, the generalized ratio Wald estimand equals


Z,Y Z,Y
ErYk,t`h
ş 1
W pwk q | Ht pwk q “ 1, Ft´1 sErHt pwk q | Ft´1 sdwk
Z,Y Z,Y
,
W ErYk,t pwk q | Ht pwk q “ 1, Ft´1 sErHt pwk q | Ft´1 sdwk
ş 1

84
where Ht pwk q “ 1tWk,t pz1 q ď wk ď Wk,t pzqu.

The interpretation of Corollary 2.6.3 is analogous to the interpretation of the ratio Wald estimand

in Corollary 2.6.1, except now everything is conditional on the natural filtration.

2.6.4 Generalized Local Projection IV and Local Filtered Projection IV Estimands

In practice researchers typically estimate generalized impulse response functions using a two-stage

least-squares type estimator. This is also sometimes called “local projections with an external

instrument” (Jordá et al., 2015). We first analyze this generalized local projection IV
Z,Y
CovpYj,t`h , Zt | Ft´1 q
Z,Y
, (2.15)
CovpYk,t , Zt | Ft´1 q

which again is a ratio, this time of the Generalized IV estimands at different lag lengths. Using

the same arguments as Corollary 2.6.2, it has the causal interpretation


Z,Y Z,Y
ErYj,t`h
ş 1
WZ pzk q | Ft´1 sErGt pzk q | Ft´1 sdzk
,
1 pz q | F Z,Y sErG pz q | F Z,Y sdz
ErYk,t
ş
WZ k t´1 t k t´1 k

Z,Y
where Gt|t´1 pzk q “ 1tzk ď Zt upZt ´ ErZt | Ft´1 sq.

Of more practical relevance, is the local filtered projection IV estimand is

CovpYj,t`h ´ Ŷj,t`h|t´1 , Zt ´ Ẑt|t´1 q


,
CovpYk,t ´ Ŷk,t|t´1 , Zt ´ Ẑt|t´1 q
Z,Y Z,Y
where recall that, for example, Ŷk,t`h “ ErYk,t`h | Ft´1 s, and Ẑt|t´1 “ ErZt | Ft´1 s. The properties

of this are inherited from those of the generalized local projection IV. In particular, it equals
Z,Y Z,Y
ErErYj,t`h
ş 1
WZ pzk q | Ft´1 sErGt pzk q | Ft´1 ssdzk
.
1 pz q | F Z,Y sErG pz q | F Z,Y ssdz
ErErYk,t
ş
WZ k t´1 t k t´1 k

2.7 Estimands Based Only on Outcomes

The dominant approach to causal inference in macroeconometrics is a model-based approach in

the tradition of Sims (1980). See, for example, Ramey (2016) and Kilian and Lutkepohl (2017) for

recent reviews. In that literature, researchers introduce parametric models to study the dynamic

causal effects of unobservable “structural shocks,” which themselves must be inferred from the

85
outcomes. Here we link this to our setup, mostly to place our work in context and illustrate that

the enormous macroeconometric literature on simultaneous equation modelling can be nested

in the direct potential outcome system framework. Assume there is a direct potential outcome

system

tWt , tYt pw1:t q : w1:t P Wwt utě1 ,

where researchers only see the outcomes tyobs


t utě1 .

2.7.1 Linear simultaneous equation approach

The causal inference approach of using only time series data on outcomes is in the storied tradition

of linear simultaneous equations models developed at the Cowles Foundation (e.g., Christ, 1994;

Hausman, 1983). The most essential causal challenges arise without any dynamic causal effects,

so we start with a static example as an illustration. Suppose that

A0 Yt pw1:t q “ α ` wt , w1:t P W t , t “ 1, 2, ...,

where A0 is a non-stochastic, square matrix. Notice that in this model the potential outcome

process is deteministic and linear combinations of the potential outcomes equal the possible

assignments for every t. If A0 is additionally invertible, then

Yt pw1:t q “ A´1
0 pα ` wt q ,

which implies that the contemporaneous average treatment effect is ErYt pW1:t´1 , wq ´ Yt pW1:t´1 , w1 qs “

0 pw ´ w q, and the marginal average treatment effect is Er BwT s “ A0 whatever probabilistic


BYt pw1:t q
A´1 1 ´1
t

assumption is made about W1:t´1 .

Furthermore, under this model, if we see pWt , Yt q “ tWt , Yt pW1:t qu, then, if the second moments

of the observables exist and VarpWt q is non-singular, then for every t,

CovpYt , Wt qVarpWt q´1 “ A´1


0 ,

which would make statistical inference rather straightforward. But the point of this simultaneous

equations literature is to carry out inference without directly observing the assignments — which

is a much harder task.

86
If, in addition to A0 being invertible, we assume that VarpWt q ă 8, then
´ ¯T
VarpYt q “ A´1
0 VarpWt q A ´1
0 ,

Crucially knowing VarpYt q is not enough to untangle A0 and VarpWt q, and so knowledge of the

second moments of the observables is not enough alone to learn the contemporaneous average

treatment effect. In the linear simultaneous equations literature, this is resolved by a priori

imposing more economic structure on the potential outcome process, such as placing more

structure on the matrix A0 .

A central a priori constraint is the one highlighted by Sims (1980). He imposed that (a) A0 is

triangular, (b) VarpWt q is diagonal. For simplicity of exposition, look at the two dimensional case

and write
¨ ˛ ¨ ˛ ¨ ˛
1 0 ‹ 2
˚ 1 0 ‹ σ11 0 ‹
A0 “ ˝ ‚, A´1 ‚, VarpWt q “ ˝ ‚,
˚ ˚
0 “˝
´a21 1 a21 1 0 2
σ22

then the elements within A0 and VarpWt q can be individually determined from VarpYt q if VarpYt q

is of full rank. The same holds in higher dimensions. Hence, with additional restrictions on the

potential outcome process, the contemporaneous causal effect can be determined from the data on

the outcomes, without having observing the assignments (or without the access to instruments).

There are alternative a priori constraints to this triangular which also work here and the above

structure extends to non-linear systems of equations g pYt pw1:t qq “ wt .

The linear “structural vector autoregressive ” (SVAR) version of the linear simultaneous

equation has the same fundamental structure. Focusing on the one lag model with no intercept

for simplicity, the SVAR approach assumes that the potential outcome process satisfies

A0 Yt pw1:t q “ wt ` A1 Yt´1 pw1:t´1 q.

Kilian and Lutkepohl (2017) provide a book length review of this model structure and its various

extensions and implications. Then A0 pI ´ Φ1 Lq Yt pw1:t q “ wt , where L is a lag operator and

Φ1 “ A´1
0 A1 . So

0 wt ` Φ1 Yt´1 pw1:t´1 q,
Yt pw1:t q “ A´1

87
which in turn implies that the potential outcome process also has an SVMA model representation

0 wt ` Φ1 A0 wt´1 ` Φ1 A0 wt´2 ` ... ` Φ1 A0 w1 ` Φ1 A0 Y0 .


2 ´1 t´1 ´1 t ´1
Yt pw1:t q “ A´1 ´1

In this case, the h-period ahead average treatment effect is

ErYt`h pW1:t´1 , w, Wt`1:t`h q ´ Yt`h pW1:t´1 , w1 , Wt`1:t`h qs “ Φ1h A´1


` ˘
0 w ´ w1

and the h-period ahead marginal average treatment effect is ErBYt`h pw1:t`h q{Bw1t s “ Φ1h A´1
0 . The

time series parameter Φ1 can be determined from the dynamics of the observable outcomes if

this process is stationary. But again A0 and VarpWt q cannot be separately identified from the

observable outcomes, so further structural assumptions are needed.

2.7.2 Causal meaning of the GIRF of Yk,t on Yj,t`h

A broader analysis focuses on the h-step ahead generalized impulse response function of the

j-th outcome on the k-th outcome without placing functional form restrictions on the potential

outcome process. Here we provide a nonparametric causal meaning to it in terms of potential

outcomes. To do so, we will further assume that the potential outcome process is a deterministic

function of the assignments and that the assignments are independent across time.

Theorem 2.7.1. Consider a direct potential outcome system, and further assume that

i. the potential outcome process Yt pw1:t q is deterministic for all t ě 0, w1:t P W t .

ii. for all t ‰ s, Wt K


K Ws .

Them, so long as the corresponding moments exist,

Y Y
ErYj,t`h |pYk,t “ yk q, Ft´1 s ´ ErYj,t`h |pYk,t “ y1k q, Ft´1 s (2.16)
Y Y
“ Erψj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s ´ Erψj,t`h pW1:t q|pYk,t “ y1k q, Ft´1 s, (2.17)

where ψj,t`h pw1:t q :“ ErYj,t`h pw1:t , Wt`1:t`h qs.

Theorem 2.7.1 illustrates that without functional form restrictions on the potential outcome

process, the generalized impulse response function of the j-th outcome on the k-th outcome has a

causal interpretation in terms of the shifting the entire conditional distribution of the treatments

88
W1:t . While this is a non-standard object, it can be interpreted as the causal effect of a stochastic

intervention on the assignment path W1:t , which has been an object of recent interest in a growing

cross-sectional literautre on causal inference in the presence of interference/spillovers across units

– see, for example, Munoz and van der Laan (2012), Papadogeorgou et al. (2019), Papadogeorgou

et al. (2021), and Wu et al. (2021). Nonetheless, this is a complex causal effect as it measures an

average causal effect of simultaneously shifting all assignments from time t “ 1 to t.

2.8 Conclusion

In this paper, we developed the nonparametric, direct potential outcome system to study causal

inference in observational time series settings. We place no functional form restrictions on the

potential outcome process, no restrictions on the extent to which past assignments causally affect

the outcomes, nor common time series assumptions such as “invertibility’ or “recoverability.” The

direct potential outcome system therefore nests most leading econometric models used in time

series settings as a special case. We then studied conditions on the assignments under which

common time series estimands, such as the impulse response functions, generalized impulse

response function, and local projections, have a causal interpretation in terms of underlying

dynamic causal effects. We further showed that provided the researcher observes an instrument

that satisfies an appropriate unconfoundedness and monotonicity condition, then common IV

estimands such as local projections instrumenal variables also have causal interpretations in terms

of local average dynamic causal effects. Taken together, the potential outcome system provides a

flexible, nonparametric foundation for making causal statements from observational time series of

outcomes, assignments, and instruments.

89
Chapter 3

Panel Experiments and Dynamic Causal


Effects: A Finite Population Perspective1

3.1 Introduction

Panel experiments, where we randomly assign units to different interventions, measuring their

response and repeating the procedure in several periods, form the basis of causal inference in

many areas of biostatistics (e.g., Murphy et al. (2001)), epidemiology (e.g., Robins (1986)), and

psychology (e.g., Lillie et al. (2011)). In experimental economics, many authors recognize the

benefits of panel-based experiments, for instance Bellemare et al. (2014, 2016) highlighted the

potentially large gains in power and Czibor et al. (2019) emphasized that panel-based experiments

may help uncover heterogeneity across units. Despite these benefits, panel experiments are used

infrequently in part due to the lack of a formal statistical framework and concerns about how the

impact of past treatments on subsequent outcomes may induce biases in conventional estimators

(Charness et al., 2012). In practice, authors typically assume away this complication by requiring

that the outcomes only depend on contemporaneous treatment, what is often called the “no

1 This chapter is joint work with Iavor Bojinov and Neil Shephard. This chapter based on a paper published in
Quantitative Economics: Bojinov, I., Rambachan, A. and Shephard, N. “Panel Experiments and Dynamic Causal Effects:
A Finite Population Perspective.” 2021. Quantitative Economics, 12(4):1171-1196. We thank Isaiah Andrews, Robert
Minton, Karthik Rajkumar and Jonathan Roth for helpful discussions. We especially thank James Andreoni and Larry
Samuelson for kindly sharing their data. Finally, we are grateful to Gary Chamberlain for early conversations about
this project. Any remaining errors are our own. Rambachan gratefully acknowledges financial support from the NSF
Graduate Research Fellowship under Grant DGE1745303.

90
carryover assumption” (e.g., Abadie et al. (2017), Athey and Imbens (2018), Athey et al. (2018),

Imai and Kim (2019), Arkhangelsky and Imbens (2019), Imai and Kim (2020), de Chaisemartin

and D’Haultfoeuille (2020)). Even when researchers allow for carryover effects, they commonly

focus on incorporating the uncertainty due to sampling units from some super-population as

opposed to the design-based uncertainty, which arises due to the random assignment.2

In this paper, we tackle these challenges by defining a variety of new panel-based dynamic

causal estimands without evoking restrictions on the extent to which treatments can impact

subsequent outcomes. Our approach builds on the potential outcomes formulation of causal

inference and takes a purely design-based perspective on uncertainty, allowing us to be agnostic

to the outcomes model (Neyman, 1923; Kempthorne, 1955; Cox, 1958a; Rubin, 1974). Our main

estimands are various averages of lag-p dynamic causal effects, which capture how changes in

the assignments affect outcomes after p periods. We provide nonparametric estimators that are

unbiased over the randomization distribution induced by the random design. By exploiting the

underlying Martingale property of our unbiased estimators, we derive their finite population

asymptotic distribution as either the number of sample periods, experimental units, or both

increases. This is a new technique for proving finite population central limit theorems, which may

be broadly useful and of independent interest to researchers.

We develop two methods for conducting nonparametric inference on these dynamic causal

effects. The first uses the limiting distribution to perform conservative tests on weak null

hypotheses of no average dynamic causal effects. The second provides exact randomization tests

for sharp null hypotheses of no dynamic causal effects. We then highlight the usefulness of our

framework by deriving the finite population probability limit of commonly used linear estimation

strategies, such as the unit fixed effects estimator and the two-way fixed effects estimator. Such

estimators are biased for a contemporaneous causal effect whenever there exists carryover effects

and serial correlation in the assignment mechanism, underscoring the value of our proposed

nonparametric estimator.

Finally, we illustrate our theoretical results in a simulation study and apply our framework

to reanalyze a panel-based experiment. The simulation study illustrates our finite population

2 See Abadie et al. (2020) for a discussion of the difference between sampling-based and design-based uncertainty in
the cross-sectional setting.

91
central limit theorems under a variety of assumptions about the underlying potential outcomes

and assignment mechanism. We confirm that conservative tests based on the limiting distribution

of our nonparametric estimator control size well and have good rejection rates against a variety of

alternatives. We finish by reanalyzing a panel experiment conducted in Andreoni and Samuelson

(2006), which studies cooperative behavior in game theory and is a natural application of our

methods. Participants in the experiment played a twice-repeated prisoners’ dilemma many times,

and payoff structure of the game was randomly varied across plays. The sequential nature of the

experiment raises the possibility that past assignments may impact future actions as participants

learn about the structure of the game over time. For example, the random variation in the payoff

structure may induce participants to explore possible strategies. This motivates us to analyze the

experiment using our methods that are robust to possible dynamic causal effects. We confirm the

authors’ original hypothesis that the payoff structure of the twice repeated prisoners’ dilemma has

significant contemporaneous effects on cooperative behavior. Moreover, we provide suggestive

evidence of dynamic causal effects in this experiment — the payoff structure of previously played

games may affect cooperative behavior in the current game, which may be indicative of such

learning.

Our design-based framework provides a unified generalization of the finite population lit-

erature in cross-sectional causal inference (as reviewed in Imbens and Rubin (2015)) and time

series experiments (Bojinov and Shephard, 2019) to panel experiments. Three crucial contributions

differentiate our work from the existing literature. First, we focus on a much richer class of

dynamic causal estimands, which answer a broader set of causal questions by summarizing

heterogeneity across both units and time periods. Second, we derive two new finite population

central limit theorems as the size of the population grows, and as both the duration and population

size increase. Third, we compute the bias present in standard linear estimators in the presence of

dynamic causal effects and serial correlation in the treatment assignment probabilities.

Our framework is also importantly distinct from foundational work by Robins (1986) and

co-authors, that uses treatment paths for causal panel data analysis and focuses on providing

super-population (or sampling-based) inference methods. In contrast, we avoid super-population

arguments entirely. Our estimands and inference procedures are conditioned on the potential

outcomes and all uncertainty arises solely from the randomness in assignments. Avoiding super-

92
populations arguments is often attractive in panel data applications. For example, a company

only operates in a finite number of markets (e.g., states or cities within the United States) and can

only conduct advertising or promotional experiments across markets. Such panel experiments

are increasingly common in industry (e.g. Bojinov et al., 2020a,b).3 In econometrics, Abadie et al.

(2017) highlight the appeal of this design-based perspective in panel data applications. However,

the panel-based potential outcome model developed in that work contains no dynamics as the

authors primarily focus on cross-sectional data with an underlying cluster structure. Similarly,

Athey and Imbens (2018), Athey et al. (2018) and Arkhangelsky and Imbens (2019) also introduce

a potential outcome model for panel data, but assume away carryover effects. Heckman et al.

(2016), Hull (2018), and Han (2019) consider a potential outcome model similar to ours but

again rely on super-population arguments to perform inference. Additionally, an influential

literature in econometrics focuses on estimating dynamic causal effects in panel data under rich

models that allow heterogeneity across units, but does not introduce potential outcomes to define

counterfactuals and also relies on super-population arguments for inference (e.g., see Arellano

and Bonhomme (2016), Arellano et al. (2017) and the review in Arellano and Bonhomme (2012)).

Notation: For an integer t ě 1 and a variable At , we write A1:t :“ pA1 , . . . , At q. We compactly

write index sets as rNs :“ t1, . . . , Nu and rTs :“ t1, . . . , Tu. Finally, for a variable Ai,t observed
řT
over i P rNs and t P rTs, define its average over t as Āi¨ :“ T1 t“1 Ai,t , its average over i as
N T N
t :“ N1 i“1 1
ř ř ř
Ai,t and its average over both i and t as Ā :“ NT t“1 i“1 Ai,t .

3.2 Potential outcome panel and dynamic causal effects

3.2.1 Assignment panels and potential outcomes

Consider a panel in which N units (e.g., individuals or firms) are observed over T time periods.

For each unit i P rNs and period t P rTs, we allocate an assignment Wi,t P W . The assignment is a

random variable and we assume |W | ă 8. For a binary assignment W “ t0, 1u, we refer to “1” as

3 Of course, in other applications, super-population arguments may be entirely natural. For example, in the mental

healthcare digital experiments of Boruvka et al. (2018), it is compelling to use sampling-based arguments as the
experimental units are drawn from a larger group of patients for whom we wish to make inference on as, if successful,
the technology will be broadly rolled out.

93
treatment and “0” as control.

The assignment path for unit i is the sequence of assignments allocated to unit i, denoted

Wi,1:T :“ pWi,1 , ..., Wi,T q1 P W T . The cross-sectional assignment at time-t describes all assignments

allocated at period t, denoted W1:N,t :“ pW1,t , ..., WN,t q1 P W N . The assignment panel is the N ˆ T

matrix W1:N,1:T P W NˆT that summarizes the assignments given to all units over the sample
´ ¯1
1
period, where W1:N,1:T :“ pW1:N,1 , . . . , W1:N,T q “ W1,1:T 1
, . . . , WN,1:T .

A potential outcome describes what would be observed for a particular unit at a fixed point in

time along any assignment path.

Definition 3.2.1. The potential outcome for unit-i at time-t along assignment path wi,1:T P W T is written

as Yi,t pwi,1:T q.

In principle, the potential outcome can depend upon the entire assignment path allowing for

arbitrary spillovers across time periods. Definition 3.2.1 imposes that there are no treatment

spillovers across units (Cox, 1958a).4

3.2.2 The potential outcome panel model

We now define the potential outcome panel model by restricting the potential outcomes for a unit

in a given period not to be affected by future assignments.

Assumption 3.2.1. The potential outcomes are non-anticipating if, for all i P rNs, t P rTs, and

wi,1:T , w̃i,1:T P W T , Yi,t pwi,1:T q “ Yi,t pw̃i,1:T q whenever wi,1:t “ w̃i,1:t .

Non-anticipation still allows an arbitrary dependence on past and contemporaneous assignments,

and arbitrary heterogeneity across units and time periods.5 Under Assumption 3.2.1, the potential

outcome for unit i at time t only depends on the assignment path for unit i up to time t, allowing us

to write the potential outcomes as Yi,t pwi,1:t q. As notation, let Yi,t “ tYi,t pwi,1:t q : wi,1:t P W t u denote

4 The idea of defining potential outcomes as a function of assignment paths first appears in Robins (1986) and has
been further developed in subsequent work such as Robins (1994), Robins et al. (1999), Murphy et al. (2001), Boruvka
et al. (2018) and Blackwell and Glynn (2018).
5 Allowing for rich heterogeneity in panel data models is often useful in many economic applications. For example,

there is extensive heterogeneity across units in income processes (Browning et al., 2010) and the dynamic response of
consumption to earnings (Arellano et al., 2017). Time-varying heterogeneity is also an important feature. For example,
it is a classic point of emphasis in studying human capital formation – see Ben-Porath (1967), Griliches (1977) and more
recently, Cunha et al. (2006) and Cunha et al. (2010).

94
the collection of potential outcomes for unit i at time t and Y1:N,1:T “ tYi,t : i P rNs, t P rTsu denote

the collection of potential outcomes for all units across all time periods. Along an assignment

panel w1:N,1:t P W Nˆt up to time t, let Y1:N,1:t pw1:N,1:t q denote the associated N ˆ t matrix of

outcomes for all units up to time t.

To connect the observed outcomes with the potential outcomes, we assume every unit complies
obs “
with the assignment.6 For all i P rNs, t P rTs, the observed outcomes for unit i are yi,1:T
obs q, where wobs is the observed assignment path for unit i.
Yi,1:T pwi,1:T i,1:T

A panel of units, assignments and outcomes in which the units are non-interfering and

compliant with the assignments and the outcomes obey Assumption 3.2.1 is a potential outcome

panel. For N “ 1, the potential outcome panel reduces to the potential outcome time series

model in Bojinov and Shephard (2019). For T “ 1, the potential outcome panel reduces to the

cross-sectional potential outcome model (e.g., Holland (1986) and Imbens and Rubin (2015)).

3.2.3 Assignment mechanism assumptions

We focus on randomized experiments in which the assignment mechanisms for each period only

depend on past assignments and observed outcomes, but not on future potential outcomes nor

unobserved past potential outcomes.

Definition 3.2.2. The assignments are sequentially randomized if, for all t P rTs and any w1:N,1:t´1 P

W Nˆpt´1q

PrpW1:N,t |W1:N,1:t´1 “ w1:N,1:t´1 , Y1:N,1:T q “ PrpW1:N,t |W1:N,1:t´1 “ w1:N,1:t´1 , Y1:N,1:t´1 pw1:N,1:t´1 qq.

It is common to focus on sequentially randomized assignments in biostatistics and epidemiology

(Robins, 1986; Murphy, 2003). This is the panel data analogue of an “unconfounded” or “ignorable”

assignment mechanism in the literature on cross-sectional causal inference (as reviewed in Chapter

3 of Imbens and Rubin (2015)).7 Since future potential outcomes and counterfactual past potential

6 In some applications, this assumption may be unrealistic. For example, in a panel-based clinical trial, we may

worry that patients do not properly adhere to the assignment. In such cases, our analysis can be re-interpreted as
focusing on dynamic intention-to-treat (ITT) effects.
7 If the researcher further observes characteristics Xi,t that are causally unaffected by the assignments, then the
definition of a sequentially randomized assignment mechanism can be modified to additionally condition on past and
contemporaneous values of the characteristics X1:N,1:t .

95
outcomes are unobservable, any feasible assignment mechanism must be sequentially randomized.

An important special case imposes further conditional independence structure across as-

signments. Let W´i,t :“ pW1,t , ..., Wi´1,t , Wi`1,t , ..., WN,t q and F1:N,t,T be the filtration generated by

W1:N,1:t and Y1:N,1:T .

Definition 3.2.3. The assignments are individualistic for unit i if, for all t P rTs and any w1:N,1:t´1 P

W Nˆpt´1q
PrpWi,t |W´i,t , F1:N,t´1,T q “ PrpWi,t |Wi,1:t´1 “ wi,1:t´1 , Yi,1:t´1 pwi,1:t´1 qq.

An individualistic assignment mechanism further imposes that conditional on its own past assign-

ments and outcomes, the assignment for unit i at time t is independent of the past assignments

and outcomes of all other units as well as all other contemporaneous assignments. For example,

the Bernoulli assignment mechanism, where PrpWi,t |W´i,t , F1:N,t´1,T q “ Pr pWi,t q for all i P rNs and

t P rTs, is individualistic.

Example 3.2.1. Consider a food delivery firm that is testing the effectiveness of a new pricing policy across

ten major U.S. cities (Kastelman and Ramesh, 2018; Sneider and Tang, 2018). Each city is an experimental

unit, and the intervention administers the appropriate pricing policy for a duration of one hour. The outcome

is the total revenue generated during each hour of the experiment, t P rTs and from city i P rNs. The

firm wishes to learn the best policy for each city and the best overall policy across all cities. To do so, it

may conduct a panel experiment with an individualistic treatment assignment in which the probability

a particular pricing policy is administered in a given city over the next hour depends on prior observed

revenue in that city in earlier hours of the experiment.

Remark 3.2.1. Many adaptive experimental strategies (such as the one described in Example 3.2.1), in

which a series of units are sequentially exposed to random treatments whose probability vary depending on

the past observed data, satisfy our individualistic sequentially randomized assignment assumptions (e.g.,

Robbins, 1952; Lai and Robbins, 1985). Such experiments are widely used by technology companies to

quickly discern user preferences in recommendation algorithms (Li et al., 2010, 2016) and by academics

interested in improving their power against a particular hypothesis (van der Laan, 2008). There has been a

growing interest in drawing causal inferences based on the collected data in such adaptive experimental

designs (Hadad et al., 2021; Zhang et al., 2020). Since the assignment probabilities are known to the

96
researcher, our results can be viewed as providing finite population techniques for drawing causal conclusions

from adaptive experiments. In the special case of our framework where N “ 1, t P rTs indexes individuals

arriving over time and there no carryover effects, our results in the subsequent section are the finite

population analogue of the inference results in Hadad et al. (2021).8

Our finite population central limit theorems require that the assignment mechanism be

individualistic. In a non-individualistic assignment mechanism, the past outcomes of other

units may affect the contemporaneous assignment of a given unit, which introduces complex

dependence structure across units. A similar difficulty arises in the growing literature on relaxing

the non-interference assumptions in cross-sectional experiments, where researchers allow one

unit’s potential outcomes to depend on another unit’s assignments (e.g., see Sävje et al., 2019).

To derive the asymptotic distribution of causal estimators in such settings, researchers typically

require the assignment mechanism to be independent (Chin, 2018) or at least have only limited

dependence structure across units (Aronow and Samii, 2017).

3.2.4 Dynamic causal effects

A dynamic causal effect compares the potential outcomes for unit i at time t along different

assignment paths, which we denote by τi,t pwi,1:t , w̃i,1:t q :“ Yi,t pwi,1:t q ´ Yi,t pw̃i,1:t q for assignment

paths wi,1:t , w̃i,1:t P W t . We use these dynamic causal effects to build up causal estimands of

interest.

Lag-p dynamic causal effects and average dynamic causal effects

Since the number of potential outcomes grows exponentially with the time period t, there is a

considerable number of possible causal estimands. To make progress, we restrict our attention to

a core class, referred to as the lag-p dynamic causal effects.

8 The setup with N “ 1 was developed in Bojinov and Shephard (2019), but this connection to adaptive experiments

has not been previously made.

97
Definition 3.2.4. For 0 ď p ă t and w, w̃ P W p`1 , the i, t-th lag-p dynamic causal effect is
$
&τi,t ptwobs obs


i,1:t´p´1 , wu, twi,1:t´p´1 , w̃uq if p ă t ´ 1
τi,t pw, w̃; pq :“

%τi,t pw, w̃q
’ otherwise.

The i, t-th lag-p dynamic causal effect measures the difference between the outcomes from

following assignment path w from period t ´ p to t compared to the alternative path w̃, fixing the

assignments for unit i to follow the observed path up to time t ´ p ´ 1. Generally, when N ąą T

we recommend setting p “ t ´ 1, removing the dependence on the observed path.9

By further restricting the paths w and w̃ to share common features, we obtain the weighted

average i, t-th lag-p dynamic causal effect.

Definition 3.2.5. For integers p, q satisfying 0 ď p ă t, 0 ă q ď p ` 1, the weighted average i, t-th

lag-p, q dynamic causal effect is

:
ÿ
τi,t pw, w̃; p, qq :“ av τi,t ppw, vq, pw̃, vq; pq,
vPW p´q`1

where w, w̃ P W q and tav u are non-stochastic weights chosen by the researcher that satisfy
ř
vPW p´q`1 av “

1 and av ě 0 for all v P W p´q`1 .

The weighted average i, t-th lag-p, q dynamic causal effect summarizes the ceteris paribus, average

causal effect of switching the assignment path between period t ´ p and period t ´ p ` q from

w to w̃ on outcomes at time t.10 In this sense, the weighted average lag-p, q causal effect is a

finite-population causal generalization of an impulse response function, which is a common

estimand of interest in existing econometric research.11 Whenever q “ 1, we drop the q from the
: :
notation, simply writing τi,t pw, w̃; pq :“ τi,t pw, w̃; p, 1q.

The main estimands of interest in this paper are averages of the dynamic causal effects that

9 In a time series experiment with N “ 1, Bojinov and Shephard (2019) introduced defining causal effects that

depend on the observed assignment path because most potential outcomes are unobserved since there is only one
experimental unit in their setting. In our more general panel experiments setting, an analogous problem arises when T
is of a similar order as N.
10 For a binary assignment, setting N “ q “ 1 gives us a special case that was studied in Bojinov and Shephard
(2019).
11 Fortime series experiments, Rambachan and Shephard (2020) show that a particular version of the weighted
average lag-p, 1 causal effect is equivalent to the generalized impulse response function (Koop et al., 1996).

98
summarize how different assignments impact the experimental units.

Definition 3.2.6. For p ă T and w, w̃ P W p`1 ,

1 řN
1. the time-t lag-p average dynamic causal effect is τ̄¨t pw, w̃; pq :“ N i“1 τi,t pw, w̃; pq.

1 řT
2. the unit-i lag-p average dynamic causal effect is τ̄i¨ pw, w̃; pq :“ T´p t“p`1 τi,t pw, w̃; pq.

1 řT řN
3. the total lag-p average dynamic causal effect is τ̄pw, w̃; pq :“ NpT´pq t“p`1 i“1 τi,t pw, w̃; pq.

These estimands extend to the weighted average i, t-th lag-p dynamic causal effect by analogously defining

τ̄¨t: pw, w̃; p, qq, τ̄i¨: pw, w̃; p, qq, and τ̄ : pw, w̃; p, qq.

We can augment any of the above averages to incorporate non-stochastic weights. For example,
N the weights and consider the weighted time-t lag-p average dynamic
we could define tci,t ui“1
řN
causal effect N1 i“1 ci,t τi,t pw, w̃; pq. These weights, for instance, could be used to adjust for

different assignment path probabilities up to time t ´ p ´ 1, which are non-stochastic since the

assignment mechanism is known.

3.3 Nonparametric estimation and inference

In this section, we develop a nonparametric Horvitz and Thompson (1952) type estimator of

the i, t-th lag-p dynamic causal effects and derive its properties. If the assignment mechanism

is individualistic (Definition 3.2.3) and probabilistic (defined below), our proposed estimator is

unbiased for the i, t-th lag-p dynamic causal effects and its related averages over the assignment

mechanism. An appropriately scaled and centered version of our estimator for the average lag-p

dynamic causal effects becomes approximately normally distributed as either the number of units

or time periods grows large. These limiting results are finite population central limit theorems in

the spirit of Freedman (2008), and Li and Ding (2017).

3.3.1 Setup: adapted propensity score and probabilistic assignment

For each i, t, and any w “ pw1 , . . . , w p`1 q P W pp`1q , the adapted propensity score summarizes the

conditional probability of a given assignment path and is given by pi,t´p pwq :“ PrpWi,t´p:t “

w|Wi,1:t´p´1 , Yi,1:t pWi,1:t´p´1 , wqq. Even though the assignment mechanism is known, we only

99
obs q, and so it is not possible to
observe the outcomes along the realized assignment path Yi,1:t pwi,1:t

compute pi,t´p pwq for all assignment paths. However, we can compute the adapted propensity
obs
score along the observed assignment path, pi,t´p pwi,t´p:t q (see Appendix C.2 for further discussion).

We next assume that the assignment mechanism is probabilistic.

Assumption 3.3.1 (Probabilistic Assignment). Consider a potential outcome panel. There exists

C L , CU P p0, 1q such that C L ă pi,t´p pwq ă CU for all i P rNs, t P rTs and w P W pp`1q .

This is also commonly known as the “overlap” or “common support” assumption.

All expectations, denoted by Er¨s, are computed with respect to the probabilistic assignment

mechanism. We write Fi,t´p´1 as the filtration generated by Wi,1:t´p´1 and F1:N,t´p´1 as the filtra-

tion generated by W1:N,1:t´p´1 . Since we condition on all of the potential outcomes, conditioning

on Wi,1:t´p´1 is the same as conditioning on both Wi,1:t´p´1 and Yi,1:t´p´1 pWi,1:t´p´1 q.

3.3.2 Estimation of the i, t-th lag-p dynamic causal effect

For any w, w̃ P W pp`1q , the nonparametric estimator of τi,t pw, w̃; pq is

, wq1pwi,t´p:t , w̃q1pwi,t´p:t
# obs obs obs obs
+
Yi,t pwi,1:t´p´1 “ wq Yi,t pwi,1:t´p´1 “ w̃q
τ̂i,t pw, w̃; pq :“ ´ ,
pi,t´p pwq pi,t´p pw̃q

where 1tAu is an indicator function for an event A. Under individualistic assignments (Definition
obs t1pwobs
i,t´p:t “wq´1pwi,t´p:t “w̃qu
yi,t obs
3.2.3), the estimator simplifies to τ̂i,t pw, w̃; pq “ obs
pi,t´p pwi,t´p:t q
.

Theorem 3.3.1. Consider a potential outcome panel with an assignment mechanism that is individualistic

(Definition 3.2.3) and probabilistic (Assumption 3.3.1). For any w, w̃ P W pp`1q ,

Erτ̂i,t pw, w̃; pq | Fi,t´p´1 s “ τi,t pw, w̃; pq,

2
Varpτ̂i,t pw, w̃; pq|Fi,t´p´1 q “ γi,t pw, w̃q ´ τi,t pw, w̃; pq2 :“ σi,t
2
, (3.1)

where
obs
Yi,t pwi,1:t´p´1 , wq2 obs
Yi,t pwi,1:t´p´1 , w̃q2
2
γi,t pw, w̃; pq “ ` .
pi,t´p pwq pi,t´p pw̃q
Further, for distinct w, w̃, w̄, ŵ P W pp`1q

Covpτ̂i,t pw, w̃; pq, τ̂i,t pw̄, ŵ; pq|Fi,t´p´1 q “ ´τi,t pw, w̃; pqτi,t pw̄, ŵ; pq.

100
Finally, τ̂i,t pw, w̃q and τ̂j,t pw, w̃q are independent for i ‰ j conditional on F1:N,t´p´1 .

Theorem 3.3.1 states that for every i, t, the error in estimating τi,t pw, w̃; pq is a martingale

difference sequence through time and conditionally independent across units. The variance of

τ̂i,t pw, w̃; pq depends upon the potential outcomes under both the treatment and counterfactual
2 pw, w̃; pq,
and is generally not estimable. However, its variance is bounded from above by γi,t
obs q2 t1pwobs
i,t´p:t “wq`1pwi,t´p:t “w̃qu
pyi,t obs
2 pw, w̃; pq “
which we can estimate by γ̂i,t . The following proposition
obs
pi,t´p pwi,t´p:t q2
2 pw, w̃; pq is an unbiased estimator of γ2 pw, w̃; pq and its error in estimating
establishes that γ̂i,t i,t
2 pw, w̃; pq is also a martingale difference sequence through time and conditionally independent
γi,t

across units.

Proposition 3.3.1. Under the setup of Theorem 3.3.1, Erγ̂i,t


2 pw, w̃; pq|F 2
i,t´p´1 s “ γi,t pw, w̃; pq. Addi-

2 pw, w̃; pq and γ̂2 pw, w̃; pq are independent for i ‰ j conditional on F
tionally, γ̂i,t j,t 1:N,t´p´1 .

2 pw, w̃; pq is different from the typical Neyman variance bound, derived
The variance bound γi,t

under the assumption of a completely randomized experiment (Imbens and Rubin, 2015, Chap-

ter 5). In a completely randomized experiment, there is a negative correlation between any two

units’ assignments since the total number of units assigned to each treatment is fixed. In our

setting, all units’ assignments are conditionally independent under individualistic assignments,

precluding us from exploiting the negative correlation in deriving a bound.

Remark 3.3.1. Since the weighted average i, t-th lag-p, q dynamic causal effects (Definition 3.2.5) are

linear combinations of the i, t-th lag-p dynamic causal effects, we can directly apply Theorem 3.3.1 and

Proposition 3.3.1. We provide the details for the case when q “ 1.


:
For w, w̃ P W and v P W p , the nonparametric estimator of τi,t pw, w̃; pq is

, w, vq1pwi,t´p:t , w̃, vq1pwi,t´p:t


# obs obs obs obs
+
:
ÿ Yi,t pwi,1:t´p´1 “ pw, vqq Yi,t pwi,1:t´p´1 “ pw̃, vqq
τ̂i,t pw, w̃; pq “ av ´ .
vPW p
pi,t´p pw, vq pi,t´p pw̃, vq

awobs obs t1pwobs “wqq´1pwobs “w


yi,t i,t´p i,t´p
: i,t´p`1:t
Under an individualistic assignment mechanism, this estimator simplifies to τ̂i,t pw, w̃; pq “ obs
pi,t´p pwi,t´p:t q

This estimator is unbiased over the randomization distribution, and its variance can be bounded from above.

For uniform weights, the rest of the generalizations follow immediately by noticing that we can replace all

instances of w and w̃ with pw, vq and pw̃, vq.

101
3.3.3 Estimation of lag-p average causal effects

The martingale difference properties of the nonparametric estimator means that the averaged

plug-in estimators
N
1 ÿ
τ̄ˆ¨t pw, w̃; pq :“ τ̂i,t pw, w̃; pq
N
i“1
T
1 ÿ
τ̄ˆi¨ pw, w̃; pq :“ τ̂i,t pw, w̃; pq
pT ´ pq t“p`1
N ÿ T
1 ÿ
τ̄ˆ pw, w̃; pq :“ τ̂i,t pw, w̃; pq
NpT ´ pq t“p`1
i“1

are also unbiased for the average causal estimands τ̄¨t pw, w̃; pq, τ̄i¨ pw, w̃; pq, and τ̄pw, w̃; pq, respec-

tively. We next derive the limiting distribution of appropriately scaled and centered versions of

these averaged estimators.

Theorem 3.3.2. Consider a potential outcome panel with an individualistic (Definition 3.2.3) and probabilis-

tic assignment mechanism (Assumption 3.3.1). Further assume that the potential outcomes are bounded.12

Then, for any w, w̃ P W pp`1q ,


?
Ntτ̄ˆ¨t pw, w̃; pq ´ τ̄¨t pw, w̃; pqu d
Ý Np0, 1q as N Ñ 8,
Ñ
σ¨t
a
T ´ ptτ̄ˆi¨ pw, w̃; pq ´ τ̄i¨ pw, w̃; pqu d
Ý Np0, 1q as T Ñ 8,
Ñ
σi¨
a
NpT ´ pqtτ̄ˆ pw, w̃; pq ´ τ̄pw, w̃; pqu d
Ý Np0, 1q as NT Ñ 8,
Ñ
σ
2 , defined in Equation ((3.1)).
where σ¨t , σi¨ , and σ are the square root of the appropriate averages of σi,t

Likewise, for bounded potential outcomes with an individualistic and probabilistic assignment

mechanism, the scaled variances are

N ˆ Varpτ̄ˆ¨t pw, w̃; pq|F1:N,t´p´1 q “ E σ¨t2 | F1:N,t´p´1 ,


“ ‰

pT ´ pq ˆ Varpτ̄ˆi¨ pw, w̃; pq|Fi,0 q “ Erσi¨2 |Fi,0 s,

NpT ´ pq ˆ Varpτ̄ˆ pw, w̃; pq|F1:N,0 q “ Erσ2 |F1:N,0 s.

12 Assuming the potential outcomes are bounded is a common simplifying assumption made in deriving finite
population central limit theorems. As discussed in Li and Ding (2017), this assumption can often be replaced by a
finite-population analogue of the Lindeberg condition in analyses of cross-sectional, randomized experiments.

102
Following the same logic as earlier, we can establish unbiased and consistent estimators of the

variance bounds of the averaged estimators.

Proposition 3.3.2. Under the setup of Theorem 3.3.2, for any w, w̃ P W pp`1q ,
«˜ ¸ ff
N N
1 ÿ 2 1 ÿ 2
E γ̂i,t pw, w̃; pq | F1:N,t´p´1 “ γi,t pw, w̃; pq,
N N
i“1 i“1
»¨ ˛ fi
T T
1 ÿ 1 ÿ
E –˝ γ̂2 pw, w̃; pq ´ γ2 pw, w̃; pq‚ | Fi,0 fl “ 0,
pT ´ pq t“p`1 i,t pT ´ pq t“p`1 i,t
»¨ ˛ fi
N ÿ T N ÿ T
1 ÿ 1 ÿ
E –˝ 2
γ̂i,t pw, w̃; pq ´ 2
γi,t pw, w̃; pq‚ | F1:N,0 fl “ 0.
NpT ´ pq t“p`1
NpT ´ pq t“p`1
i“1 i“1

Moreover,
N N
1 ÿ 2 1 ÿ 2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as N Ñ 8,
N N
i“1 i“1
T T
1 ÿ
2 1 ÿ
2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as T Ñ 8,
pT ´ pq t“p`1 pT ´ pq t“p`1
N ÿ T N ÿ T
1 ÿ
2 1 ÿ
2 p
γ̂i,t pw, w̃; pq ´ γi,t pw, w̃; pq Ñ
Ý 0 as NT Ñ 8.
NpT ´ pq t“p`1
NpT ´ pq t“p`1
i“1 i“1

Proposition 3.3.2 shows that increasing the lag p increases our estimator’s variance, highlighting

an important trade-off: increasing the lag p reduces the dependence on the observed treatment

path at the cost of increased variance. Striking the correct balance depends on the context and the

design of the experiment.

Theorem 3.3.2 and Proposition 3.3.2 naturally extend to the weighted average i, t-th lag-p, q

dynamic causal effect from Definition 3.2.5 by using the estimator developed in Remark 3.3.1.

3.3.4 Confidence intervals and testing for lag-p average causal effects

Combining the variance bound estimators in Proposition 3.3.2 with the central limit theorems in

Theorem 3.3.2, we can carry out conservative inference for τ̄¨t pw, w̃; pq, τ̄i¨ pw, w̃; pq and τ̄pw, w̃; pq.

Such techniques can be used to construct conservative confidence intervals or tests of weak

null hypotheses that the average dynamic causal effects are zero. For example, these may be

H0 : τ̄i¨ pw, w̃; pq “ 0 for i “ 3 or H0 : τ̄¨t pw, w̃; pq “ 0 for t “ 4.

103
Alternatively, we may construct exact tests for sharp null hypotheses. An example of such a

sharp null hypothesis is H0 : τ̄i,t pw, w̃; pq “ 0, for all, w, w̃, i P rNs and specific t “ 4. Since all

potential outcomes are known under such sharp null hypotheses, we can simulate the assignment
obs
path Wi,t´p:t |Wi,1:t´p´1 obs
, yi,1:t´p´1 for each unit i and compute τ̂i,t pw, w̃; pq at each draw. Therefore,

we may simulate the exact distribution of any test statistics under the sharp null hypothesis and

compute an exact p-value for the observed test statistic. These randomization tests only require us

to be able to simulate from the randomization distribution of the assignments paths. Therefore,

such randomization tests may also be conducted if the treatment assignment mechanism is

sequentially randomized (Definition 3.2.2).

3.4 Estimation in a linear potential outcome panel

This section explore the properties of commonly used linear estimators, such as the canonical

unit fixed-effects estimator and two-way fixed effects estimator, under the potential outcomes

panel model. We establish that if there are dynamic causal effects and serial correlation in the

treatment assignment mechanism, both the unit fixed-effects estimator and the two-way fixed

effects estimator are asymptotically biased for a weighted average of contemporaneous causal

effects. In Appendix C.2, we consider analyzing the panel experiment as a repeated cross-section,

estimating a separate linear model in each period t.

Throughout this section, we further assume that the potential outcomes themselves are a linear

function of the assignment path.

Definition 3.4.1. A linear potential outcome panel is a potential outcome panel where

Yi,t pwi,1:t q “ β i,t,0 wi,t ` . . . ` β i,t,t´1 wi,1 ` ϵi,t @t P rTs and i P rNs,

and the non-stochastic coefficients β i,t,0:t´1 and non-stochastic error ϵi,t do not depend upon treatments.

We adapt notation used in Wooldridge (2005) for analyzing panel fixed effects models. For

a generic random variable Ai,t , we compactly write the within-period transformed variable as

A9 i,t “ Ai,t ´ t and the within-unit transformed variable as A qi,t “ Ai,t ´ Āi¨ . The within-unit and
9
within-period transformed variable is | Ai,t “ pAi,t ´ Āq ´ p t ´ Āq ´ p Āi¨ ´ Āq.

104
3.4.1 Interpreting the unit fixed effects estimator

Our next result characterizes the finite population probability limit of the unit fixed effects
ř N řT q q ř N řT q 2
estimator, β̂UFE “ i“1 t“1 Yi,t Wi,t { i“1 t“1 Wi,t , under the linear potential outcome panel
” ı
model. Define CovpW q i,s |F1:N,0,T q :“ q
q i,t , W qi,t :“ E W
σW,i,t,s and µ q i,t |F1:N,0,T .

Proposition 3.4.1. Assume a linear potential outcome panel and that the assignment mechanism is
q i,t |F1:N,0,T q :“ q
individualistic (Definition 3.2.3) with VarpW 2
σW,i,t ă 8 for each i P rNs, t P rTs. Further

assume that as N Ñ 8, the following sequences converge non-stochastically:


N
ÿ
N ´1 σW,i,t,s Ñ κqW,β,t,s
β i,t,s q @t P rTs & s ď t,
i“1
ÿN
2 2
N ´1 σW,i,t
q Ñq
σW,t @t P rTs,
i“1
ÿN
N ´1 Y
qi,t p0qµ
qi,t Ñ δqt @t P rTs.
i“1

Then, as N Ñ 8,
řT řT řt´1 řT
p t“1 κ
qW,β,t,t t“1 s“1 κ
qW,β,t,s δqt
β̂UFE Ñ
Ý řT ` řT ` řT t“1 .
2
σW,t 2
σW,t 2
σW,t
t“1 q t“1 q t“1 q

Proposition 3.4.1 decomposes the finite population probability limit of the unit fixed effects

estimator into three terms. The first term is a weighted average of contemporaneous dynamic

causal coefficients, describing how the contemporaneous causal coefficients covary with the

within-unit transformed assignments over the assignment mechanism. The second term captures

how past causal coefficients covary with the within-unit transformed treatments and arises

due to the presence of dynamic causal effects. The last term is an additional error that arises

due to the possible relationship between the demeaned counterfactual Y


qi,t p0q and the average,

demeaned treatment assignment. A sufficient condition for the last term to be equal zero is for

the counterfactual outcomes to be time invariant Yi,t p0q “ αi , in which case Y


qi,t p0q “ 0 for all

i P rNs, t P rTs. Therefore, the last term is zero whenever unit fixed effects are correctly summarize

the variation in the “control-only” counterfactual outcomes across units and time.

Proposition 3.4.1 is related to yet crucially different from results in Imai and Kim (2019),

which show that the unit fixed effects estimator recover a weighted average of unit-specific

105
contemporaneous causal effects if there are no carryover effects. In contrast, we establish that the

unit fixed effects estimator does not recover a weighted average of unit-specific contemporaneous

causal effects in the presence of carryover effects and persistence in the treatment path assignment

mechanism.

Example 3.4.1. Consider a linear outcome panel model with, for all t ą 1, Yi,t pwi,1:t q “ β 0 wi,t `
q i,t |F1:N,0,T q “ q
β 1 wi,t´1 ` ϵi,t and Yi,1 pwi,1 q “ β 0 wi,1 ` ϵi,1 for t “ 1. Assume VarpW 2
σW,t for all t and

CovpW q i,t´1 |F1:N,0,T q “ q


q i,t , W σW,t,t´1 for all t ą 1 are constant across units. In this case, Proposition 3.4.1

implies
řT řT
p σW,t,t´1
t“2 q δqt
β̂UFE Ñ
Ý β0 ` β1 řT ` řT t“1 .
2
σW,t 2
σW,t
t“1 q t“1 q

The unit fixed effects estimator converges in probability to the contemporaneous dynamic causal coefficient

β 0 plus a bias that depends on two terms. The first component of the bias depends on the lag-1 dynamic

causal coefficient and the covariance between assignments across periods.

3.4.2 Interpreting the two-way fixed effects estimator


ř N řT 9
q9 | ř N řT | 92
Consider the two-way fixed-effect estimator is β̂ TWFE “ i“1 t“1 Yi,t Wi,t { i“1 t“1 Wi,t . Define
9 9 | 9
EpW
| q9 i,t and CovpW
i,t |F1:N,0,T q :“ µ
| σ9 W,i,t,s .
i,t , Wi,s q :“ q

Proposition 3.4.2. Assume a linear potential outcome panel and assume that the assignment mechanism
9
is individualistic and VarpW
| σ9 W,i,t
i,t |F1:N,0,T q :“ q
2 ă 8 for each i P rNs, t P rTs. Further assume that as

N Ñ 8, the following sequences converge non-stochastically


N
ÿ
N ´1
σ9 W,i,t,s Ñ κq9 W,β,t,s
β i,t,s q @t P rTs & s ď t,
i“1
ÿN
N ´1 σ9 W,i,t
q 2
σ9 W,t
Ñq 2
@t P rTs,
i“1
N
ÿ
q9 i,t p0qµ 9
N ´1 Y q9 i,t Ñ δqt @t P rTs.
i“1

Then, as N Ñ 8,
řT 9 řT řt´1 9 řT q9
p t“1 κ
qW,β,t,t t“1 s“1 κ
qW,β,t,s δt
β̂ TWFE Ñ
Ý řT 9 2 ` řT 9 2 ` řT t“1
σW,t σW,t 9 2
σW,t
t“1 q t“1 q t“1 q

Similar to Proposition 3.4.1, the two-way fixed effects estimand can be decomposed into three

106
components under the linear potential outcome panel model, where the interpretation of each

component is similar to the unit fixed effects estimator. A simple sufficient condition for the last

term to equal zero is for counterfactual outcome to be additively separable into a time-specific

and unit-specific effect, Yi,t p0q “ αi ` λt for all i P rNs, t P rTs. Therefore, the last term is zero

whenever unit and time fixed effects are correctly summarize the variation in the “control-only”

counterfactual outcomes across units and time.

An active literature in econometrics analyzes the two-way fixed effects estimator under various

identifying assumptions. For example, de Chaisemartin and D’Haultfoeuille (2020) rule out

carryover effects and decompose the two-way fixed effects estimand under a “common-trends”

assumption that restricts how the potential outcomes under control evolve over time across

groups. Sun and Abraham (2020) decompose the two-way fixed effects estimand in staggered

designs (meaning units receive the treatments at some period and forever after) under a common-

trends assumption. Boryusak and Jaravel (2017), Athey and Imbens (2018) and Goodman-Bacon

(2020) also provide a decomposition of the two-way fixed effects estimand in staggered designs.

Proposition 3.4.2 provides a decomposition in panel experiments without restrictions on the

carryover effects, whereas these existing decompositions are useful in observational settings where

other identifying assumptions may be plausible.

3.5 Simulation Study

We conduct a simulation study to investigate the finite sample properties of the asymptotic results

presented in Section 3.3. These simulations show that the finite population central limit theorems

(Theorem 3.3.2) hold for a moderate number of treatment periods and experimental units. The

proposed conservative tests for the weak null of no average dynamic causal effects have correct

size and reasonable rejection rates against a range of alternatives.

3.5.1 Simulation design

We generate the potential outcomes for the panel experiment using an autoregressive model,

Yi,t “ ϕi,t,1 Yi,t´1 pwi,1:t´1 q ` . . . , ϕi,t,t´1 Yi,1 pwi,1 q ` β i,t,0 wi,t ` . . . ` β i,t,t´1 wi,1 ` ϵi,t @t ą 1, (3.2)

107
Yi,1 pwi,1 q “ β i,1,0 wi,1 ` ϵi,1 with ϕi,t,1 “ ϕ, ϕi,t,s “ 0 for s ą 1, β i,t,0 “ β and β i,t,s “ 0 for s ą 0. We

vary the choice ϕ, which governs the persistence of the process, and β, which governs the size of

the contemporaneous causal effects. We vary the probability of treatment pi,t´p pwq “ ppwq as well

as the distribution of the errors ϵi,t , which we either sample from a standard normal or Cauchy

distribution.

We document the performance of our nonparametric estimators over the randomization

distribution, meaning that we first generate the potential outcomes Y1:N,1:T and simulate over

different assignment panels W1:N,1:T , holding the potential outcomes fixed. In the main text, we

focus on evaluating the properties of our estimator for the total average dynamic causal effect

τ̄ˆ p1, 0; 0q. Appendix C.3 explores the properties of our estimators for the time-t average τ̄ˆ¨t p1, 0; 0q

and the unit-i average τ̄ˆi¨ p1, 0; 0q, as well as our estimators of the lag-1 weighted average dynamic

causal effects τ̄ˆ¨t: p1, 0; 1q, τ̄ˆi¨: p1, 0; 1q and τ̄ˆ : p1, 0; 1q.

Figure 3.1: Simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure plots the simulated randomization distribution for τ̄ˆ p1, 0; 0q under different choices of the parameter
ϕ (defined in Equation (3.2)) and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u.
The columns index the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the simulated
randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 10. Results are computed
over 5,000 simulations.

108
3.5.2 Normal approximations and size control

Figure 3.1 plots the randomization distribution for the estimator of the total average dynamic

causal effect τ̄ˆ p1, 0; 0q. We present results for the case with N “ 100, T “ 10 and N “ 500, T “ 100

(the results are similar when the roles of N, T are reversed). When the errors ϵi,t are normally

distributed, the randomization distribution quickly converges to a normal distribution. When the

errors are Cauchy distributed, the total number of units and time periods must be quite large

for the randomization distribution to become approximately normal. There is little difference

in the results across the values of ϕ and ppwq. Appendix C.3 provides quantile-quantile plots

of the simulated randomization distributions to further illustrate the quality of the normal

approximations. Testing based on the normal asymptotic approximation controls size effectively,

staying close to the nominal 5% level (see Table 3.1).

Table 3.1: Null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.

ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.050 0.047 0.048 0.25 0.028 0.029 0.032
ϕ 0.5 0.052 0.052 0.050 ϕ 0.5 0.046 0.039 0.044
0.75 0.050 0.049 0.048 0.75 0.055 0.044 0.054
(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This table reports the null rejection rate for the test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 based upon the
normal asymptotic approximation to the randomization distribution of τ̄ˆ p1, 0; 0q. Panel (a) reports the null rejection
probabilities in simulations with ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) reports the null rejection probabilities in
simulations with ϵi,t „ Cauchy and N “ 500, T “ 100. Results are computed over 5,000 simulations.

3.5.3 Rejection rate

Focusing on simulations with normally distributed errors, we next investigate the rejection

rate of statistical tests based on the normal asymptotic approximations. To do so, we generate

potential outcomes Y1:N,1:T under different values of β, which governs the magnitude of the

contemporaneous causal effect. As we vary β “ t´1, ´0.9, . . . , 0.9, 1u, we also vary the parameter

ϕ P t0.25, 0.5, 0.75u and probability of treatment ppwq P t0.25, 0.5, 0.75u to investigate how rejection

varies across a range of parameter values. We report the fraction of tests that reject the null

hypothesis of zero average dynamic causal effects.

109
Figure 3.2 plots rejection rate curves against the weak null hypotheses H0 : τ̄p1, 0; 0q “ 0 and

H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies for different choices of the parameter ϕ and treatment

probability ppwq. The rejection rate against H0 : τ̄p1, 0; 0q “ 0 quickly converges to one as β moves

away from zero across a range of simulations, indicating that the conservative variance bound

still leads to informative tests. When ϕ “ 0.25, the rejection rate against H0 : τ̄ : p1, 0; 1q “ 0 is

relatively low – lower values of ϕ imply less persistence in the causal effects across periods. When

ϕ “ 0.75, there is substantial persistence in the causal effects across periods and we observe that

the rejection rate curves looks similar.

Figure 3.2: Rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and
H0 : τ̄ : p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.

Notes: This figure plots rejection probabilities for a test of the null hypothesis H0 : τ̄p1, 0; 0q “ 0 and H0 : τ̄ : p1, 0; 1q “ 0
as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The rejection
rate curve against H0 : τ̄p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄ : p1, 0; 1q “ 0 is
plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u. The columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 100,
T “ 10. Results are averaged over 5000 simulations.

Appendix C.3 analyzes the rejection rate curves against the weak null hypothesis on the time-t

average dynamic causal effects with N “ 100 units and the unit-i average dynamic causal effect

with T “ 100 time periods. The conservative tests can have low power against these unit-specific

or time period-specific weak null hypotheses in small experiments with few units or few time

periods. Unless researchers are analyzing a panel experiment with a large cross-sectional or time

dimension, we recommend that researchers focus on analyzing total lag-p dynamic causal effects,

110
which enables them to improve power by pooling information across both units and time periods.

3.6 Empirical application in experimental economics

We apply our methods to reanalyze a panel experiment from Andreoni and Samuelson (2006) that

tests a game-theoretic model of “rational cooperation” and studied how variation in the payoff

structure of a two-player, twice-played prisoners’ dilemma affects the choices of players.

The payoffs of the game were determined by two parameters x1 , x2 ě 0 such that x1 ` x2 “

10. In each period, both players simultaneously select either C (cooperate) or D (defect) and

subsequently received the payoffs associated with these choices. Table 3.2 summarizes the payoff
x2
structure. Let λ “ x1 `x2 P r0, 1s govern the relative payoffs between the two periods of the

prisoners’ dilemma; when λ “ 0, all payoffs occurred in period one and when λ “ 1, all payoffs

occurred in period two. The authors develop a model of rational cooperation that predicts when

λ is large, players will cooperate more often in period one compared to when λ is small.

Table 3.2: Stage games from twice-played prisoners’ dilemma in the experiment conducted by
Andreoni and Samuelson (2006).

C D C D
C p3x1 , 3x1 q p0, 4x1 q C p3x2 , 3x2 q p0, 4x2 q
D p4x1 , 0q px1 , x1 q D p4x2 , 0q px2 , x2 q

Period one Period two

Notes: This table summarizes the payoffs in the stage games from twice-played prisoners’ dilemma in the experiment
x1
conducted by Andreoni and Samuelson (2006). The parameters satisfy x1 , x2 ě 0, x1 ` x2 “ 10 and λ “ x1 `x2 . The
choice C denotes “cooperate” and the choice D “defect.”

To investigate this hypothesis, Andreoni and Samuelson (2006) conducted a panel-based

experiment. In each session of the experiment, 22 subjects were recruited to play 20 rounds of the

twice-played prisoners’ dilemma in Table 3.2. In each round, participants were randomly matched

into pairs, and each pair was then randomly assigned λ P t0, 0.1, . . . , 0.9, 1u with equal probability.

The authors conducted the experiment over five sessions for a total sample of 110 participants and

we observe 2200 choices total.

This panel experiment is a natural application of our methods. The sequential nature of the

experiment raises the possibility that past assignments may impact future actions as participants

111
Table 3.3: Summary statistics for the experiment in Andreoni and Samuelson (2006).

Counts
0 1 Mean
Observed treatment, Wi,t 1136 1064 0.484
Observed outcome, Yi,t 521 1679 0.763

Notes: This table reports summary statistics for the experiment in Andreoni and Samuelson (2006). The treatment
Wi,t equals one when the assigned value of λ is larger than 0.6. The outcome Yi,t equals one whenever the participant
cooperates in period one of the twice-repeated prisoners’ dilemma. There are 110 participants and we observe 2220
choices total.

learn about the structure of the game over time. For example, random variation in the payoff

structure may induce players to explore the strategy space. Additionally, the authors originally

analyzed the experiment using regression models with unit-level fixed effects, which may be

biased in the presence of dynamic causal effects even if the potential outcomes are linear as

discussed in Section 3.4.

In our analysis, the outcome of interest Y is an indicator that equals one whenever the

participant cooperated in period one of the stage game, N “ 110, and T “ 20. The assignment

W P W “ t0, 1u is binary and equals one whenever the assigned value λ is greater than 0.6,

meaning that the payoffs are more concentrated in period two than period one of the stage game.

We binarize the assignment in this manner to keep its cardinality (and therefore the number of

possible assignment paths) manageable, while continuing to test the authors’ core prediction on

cooperative behavior. For a given pair of subjects, the assignment mechanism is Bernoulli with

probability p “ 5{11 for treatment and p “ 6{11 for control.13 Table 3.3 summarizes the observed

assignments and observed outcomes in the experiment.

3.6.1 Inference on total lag-p weighted average dynamic causal effects

We analyze the total lag-p weighted average causal effect τ̄ : p1, 0; pq for p “ 0, 1, 2, 3, which pools

information across all units and time periods to investigate dynamic causal effects.14 Based on the

13 One potential complication that may arise from the subjects playing against each other in the stage game is
possible spillovers or interference across units. The impact of such spillovers is, however, unlikely to be substantial as
the matches are anonymous, and no players play each other more than once. We ignore this concern in our analysis.
14 Appendix C.4 investigates unit-specific and period-specific weighted average lag-p dynamic causal effects. Since
there are only N “ 110 units and T “ 20 periods in the experiment, these estimates are noisier than our estimates of
the total lag-p weighted average dynamic causal effects.

112
conservative test in Section 3.3.4, the weak null hypothesis τ̄ : p1, 0; 0q “ 0 can be soundly rejected,

indicating that the treatment has a positive contemporaneous effect on cooperation in period

one of the stage game and confirming the hypothesis of Andreoni and Samuelson (2006). Table

3.4 summarizes these estimates of the total lag-p weighted average causal effects. Interestingly,

the point estimates are positive at p “ 1, 2, 3, suggesting there may be dynamic causal effects on

cooperative behavior across rounds of the twice-repeated prisoners’ dilemma. For example, the

treatment may induce participants to learn about the value of cooperation, thereby producing

persistent effects.

Table 3.4: Estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3.

lag-p
0 1 2 3
Point estimate, τ̄ˆ : p1, 0;
pq 0.285 0.058 0.134 0.089
Conservative p-value 0.000 0.226 0.013 0.126
Randomization p-value 0.000 0.263 0.012 0.114

Notes: This table reports estimates of the total lag-p weighted average dynamic causal effect for p “ 0, 1, 2, 3. The
conservative p-value reports the p-value associated with testing the weak null hypothesis of no average dynamic causal
effects, H0 : τ̄ : p1, 0; pq “ 0, using the conservative estimator of the asymptotic variance of the nonparametric estimator
(Theorem 3.3.2). The randomization p-value reports the p-value associated with randomization test of the sharp null of
dynamic causal effects, H0 : τi,t pw, w̃; pq “ 0 for all i P rNs, t P rTs. The randomization p-values are constructed based
on 10,000 draws.

We further investigate these results using randomization tests based on the sharp null of

no dynamic causal effects. We construct the randomization distribution for the nonparametric

estimator of the total lag-p weighted average dynamic causal effect τ̄ˆ : p1, 0; pq for p “ 0, 1, 2, 3

under the sharp null hypothesis of no lag-p dynamic dynamical causal effects for all units and

time periods; H0 : τi,t pw, w̃; pq “ 0 for all i P rNs, t P rTs.15 Table 3.4 summarizes randomization

p-values for the total lag-p weighted average causal effects. The p-value for the randomization test

at p “ 0 is approximately zero, strongly rejecting the sharp null of no contemporaneous dynamic

causal effects for all units and again confirming the hypothesis of Andreoni and Samuelson (2006).

15 When simulating the randomization distribution, we redraw assignment paths in a manner that respects the
realized pairs of subjects in the experiment, meaning that subjects that are paired in the same round receive the same
assignment.

113
3.7 Conclusion

This paper developed a potential outcome model for studying dynamic causal effects in a panel

experiment. We defined new panel-based dynamic causal estimands such as the lag-p dynamic

causal effect and introduced an associated nonparametric estimator. Our proposed estimator is

unbiased for lag-p dynamic causal effects over the randomization distribution, and we derived

its finite population asymptotic distribution. We developed tools for inference on these dynamic

causal effects – a conservative test for weak nulls and an exact randomization test for sharp nulls.

We showed that the linear unit fixed effects estimator and two-way fixed effects estimator are

asymptotically biased for the contemporaneous causal effects in the presence of dynamic causal

effects and persistence in the assignment mechanism. Finally, we illustrated our results through a

simulation study and analyzed a panel experiment on rational cooperation in games.

114
References

Abadie, A., Athey, S. C., Imbens, G. W. and Wooldridge, J. (2017). When Should You Adjust
Standard Errors for Clustering? Tech. rep., NBER Working Paper No. 24003.

—, —, — and — (2020). Sampling-based vs. design-based uncertainty in regression analysis.


Econometrica, 88 (1), 265––296.

Abaluck, J., Agha, L., Chan, D. C., Singer, D. and Zhu, D. (2020). Fixing Misallocation with
Guidelines: Awareness vs. Adherence. Tech. rep., NBER Working Paper No. 27467.

—, —, Kabrhel, C., Raja, A. and Venkatesh, A. (2016). The determinants of productivity in


medical testing: Intensity and allocation of care. American Economic Review, 106 (12), 3730–3764.

Abbring, J. H. and Heckman, J. J. (2007). Econometric evaluation of social programs, part III:
Using the marginal treatment effect to organize alternative econometric estimators to evaluate
social programs, and to forecast their effects in new environments. In J. J. Heckman and E. E.
Leamer (eds.), Handbook of Econometrics, vol. 6, Amsterdam, The Netherlands: North Holland,
pp. 5145–5303.

Albright, A. (2019). If You Give a Judge a Risk Score: Evidence from Kentucky Bail Decisions. Tech. rep.

Allen, R. and Rehbeck, J. (2020). Satisficing, Aggregation, and Quasilinear Utility. Tech. rep.

Andreoni, J. and Samuelson, L. (2006). Building rational cooperation. Journal of Economic Theory,
127, 117–154.

Andrews, I., Roth, J. and Pakes, A. (2019). Inference for Linear Conditional Moment Inequalities. Tech.
rep., NBER Working Paper No. 26374.

Angrist, J., Graddy, K. and Imbens, G. (2000). The interpretation of instrumental variables
estimators in simultaneous equation models with an application to the demand for fish. The
Review of Economic Studies, 67 (3), 499–527.

Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of average causal
effects in models with variable treatment intensity. Journal of the American Statistical Association,
90 (430), 431–442.

—, — and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal
of the American Statistical Association, 91, 444–455.

—, Jorda, O. and Kuersteiner, G. M. (2018). Semiparametric estimates of monetary policy effects:


string theory revisited. Journal of Business and Economic Statistics, 36, 381–387.

115
— and Kuersteiner, G. M. (2011). Causal effects of monetary shocks: Semiparametric conditional
independence tests with a multinomial propensity score. Review of Economics and Statistics, 93,
725–747.

Apesteguia, J. and Ballester, M. A. (2015). A measure of rationality and welfare. Journal of


Political Economy, 123 (6), 1278–1310.

Arellano, M., Blundell, R. and Bonhomme, S. (2017). Earnings and consumption dynamics: A
nonlinear panel data framework. Econometrica, 85, 693–734.

— and Bonhomme, S. (2012). Nonlinear panel data analysis. Annual Review of Economics, 3, 395–424.

— and — (2016). Nonlinear panel data estimation via quantile regressions. The Econometrics Journal,
19 (3), C61–C94.

Arkhangelsky, D. and Imbens, G. (2019). Double-Robust Identification for Causal Panel Data Models.
Tech. rep., arXiv preprint arXiv:1909.09412.

Arnold, D., Dobbie, W. and Hull, P. (2020a). Measuring Racial Discrimination in Algorithms. Tech.
rep., NBER Working Paper No. 28222.

—, — and — (2020b). Measuring Racial Discrimination in Bail Decisions. Tech. rep., NBER Working
Paper No. 26999.

—, — and Yang, C. (2018). Racial bias in bail decisions. The Quarterly Journal of Economics, 133 (4),
1885–1932.

Aronow, P. M. and Samii, C. (2017). Estimating average causal effects under general interference,
with application to a social network experiment. The Annals of Applied Statistics, 11 (4), 1912–1947.

Aruoba, S. B., Mlikota, M., Schorfheide, F. and Villalvazo, S. (2021). SVARs with Occasionally-
Binding Constraints. Tech. rep.

Athey, S. (2017). Beyond prediction: Using big data for policy problems. Science, 355 (6324),
483–485.

—, Bayati, M., Doudchenko, N., Imbens, G. and Koshravi, K. (2018). Matrix Completion Methods
for Causal Panel Data Models. Tech. rep., arXiv preprint arXiv 1710.10251.

— and Imbens, G. (2018). Design-based Analysis in Difference-In-Differences Settings with Staggered


Adoption. Tech. rep., arXiv preprint arXiv:1808.05293.

Auerbach, A. J. and Gorodnichenko, Y. (2012a). Fiscal Multipliers in Recession and Expansion,


University of Chicago Press, pp. 63–98.

— and — (2012b). Measuring the output responses to fiscal policy. American Economic Journal:
Economic Policy, 4, 1–27.

Augenblick, N. and Lazarus, E. (2020). Restrictions on Asset-Price Movements Under Rational


Expectations: Theory and Evidence. Tech. rep.

Autor, D. H. and Scarborough, D. (2008). Does job testing harm minority workers? evidence
from retail establishments. The Quarterly Journal of Economics, 123 (1), 219–277.

116
Baek, C. and Lee, B. (2021). A Guide to Autoregressive Distributed Lag Models for Impulse Response
Estimations. Tech. rep.

Barocas, S., Hardt, M. and Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org,
http://www.fairmlbook.org.

— and Selbst, A. (2016). Big data’s disparate impact. California Law Review, 104, 671–732.

Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions of finite state
Markov chains. The Annals of Mathematical Statistics, 37, 1554–1563.

Beaulieu-Jones, B., Finlayson, S. G., Chivers, C., Chen, I., McDermott, M., Kandola, J., Dalca,
A. V., Beam, A., Fiterau, M. and Naumann, T. (2019). Trends and Focus of Machine Learning
Applications for Health Research. JAMA Network Open, 2 (10), e1914051–e1914051.

Becker, G. (1957). The Economics of Discrimination. University of Chicago Press.

Bellemare, C., Bissonnette, L. and Kroger, S. (2014). Statistical Power of Within and Between-
Subjects Designs in Economic Experiments. Tech. rep., IZA Working Paper No. 8583.

—, — and — (2016). Simulating power of economic experiments: the powerbbk package. Journal
of the Economic Science Association, 2, 157—-168.

Belloni, A., Bugni, F. and Chernozhukov, V. (2018). Subvector Inference in Partially Identified
Models with Many Moment Inequalities. Tech. rep., arXiv preprint, arXiv:1806.11466.

Ben-Porath, Y. (1967). The Production of Human Capital and the Life Cycle of Earnings. Journal
of Political Economy, 75, 352–365.

Benitez-Silva, H., Buchinsky, M. and Rust, J. (2004). How Large are the Classification Errors in the
Social Security Disability Award Process? Tech. rep., NBER Working Paper Series No. 10219.

Bergemann, D., Brooks, B. and Morris, S. (2019). Counterfactuals with Latent Information. Tech.
rep.

— and Morris, S. (2013). Robust predictions in games with incomplete information. Econometrica,
81 (4), 1251–1308.

— and — (2016). Bayes correlated equilibrium and the comparison of information structures in
games. Theoretical Economics, 11, 487–522.

— and — (2019). Information design: A unified perspective. Journal of Economic Literature, 57 (1),
44–95.

Berk, R. A., Sorenson, S. B. and Barnes, G. (2016). Forecasting domestic violence: A machine
learning approach to help inform arraignment decisions. Journal of Empirical Legal Studies, 13 (1),
94–115.

Blackwell, M. and Glynn, A. (2018). How to make causal inferences with time-series and
cross-sectional data. American Political Science Review, 112, 1067–1082.

Blattner, L. and Nelson, S. T. (2021). How Costly is Noise? Tech. rep., arXiv preprint,
arXiv:arXiv:2105.07554.

117
Bohren, J. A., Haggag, K., Imas, A. and Pope, D. G. (2020). Inaccurate Statistical Discrimination: An
Identification Problem. Tech. rep., NBER Working Paper Series No. 25935.

Bojinov, I., Rambachan, A. and Shephard, N. (2021). Panel experiments and dynamic causal
effects: A finite population perspective. Quantitative Economics, 12 (4), 1171–1196.

—, Sait-Jacques, G. and Tingley, M. (2020a). Avoid the pitfalls of a/b testing. Harvard Business
Review, 98 (2), 48–53.

— and Shephard, N. (2019). Time series experiments and causal estimands: exact randomization
tests and trading. Journal of the American Statistical Association, 114 (528), 1665–1682.

—, Simchi-Levi, D. and Zhao, J. (2020b). Design and Analysis of Switchback Experiments. Tech. rep.,
arXiv preprint arXiv:2009.00148.

Bordalo, P., Coffman, K., Gennaioli, N. and Shleifer, A. (2016). Stereotypes. The Quarterly
Journal of Economics, 131 (4), 1753–1794.

—, Gennaioli, N., Ma, Y. and Shleifer, A. (2020). Overreaction in macroeconomic expectations.


American Economic Review, 110 (9), 2748–82.

—, — and Shleifer, A. (2021). Salience. Tech. rep., NBER Working Paper Series No. 29274.

Boruvka, A., Almirall, D., Witkiwitz, K. and Murphy, S. A. (2018). Assessing time-varying
causal effect moderation in mobile health. Journal of the American Statistical Association, 113,
1112–1121.

Boryusak, K. and Jaravel, X. (2017). Revisiting Event Study Designs, with an Application to the
Estimation of the Marginal Propensity to Consume. Tech. rep.

Box, G. E. P. and Tiao, G. C. (1975). Intervention analysis with applications to economic and
environmental problems. Journal of the American Statistical Association, 70, 70–79.

Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N. and Scott, S. L. (2015). Inferring causal
impact using bayesian structural time-series models. The Annals of Applied Statistics, 9, 247–274.

Browning, M., Ejrnaes, M. and Alvarez, J. (2010). Modelling income processes with lots of
heterogeneity. Review of Economic Studies, 77, 1353–1381.

Bugni, F. A., Canay, I. A. and Shi, X. (2015). Specification tests for partially identified models
defined by moment inequalities. Journal of Econometrics, 185, 259–282.

Camerer, C. F. (2019). Artificial intelligence and behavioral economics. In A. Agrawal, J. Gans


and A. Goldfarb (eds.), The Economics of Artificial Intelligence: An Agenda, University of Chicago
Press, pp. 587–608.

— and Johnson, E. J. (1997). The process-performance paradox in expert judgement. In W. M.


Goldstein and R. M. Hogarth (eds.), Research on Judgment and Decision Making: Currents, Connec-
tions, and Controversies, New York: Cambridge University Press.

Campbell, J. Y. (2003). Chapter 13 consumption-based asset pricing. In Financial Markets and Asset
Pricing, Handbook of the Economics of Finance, vol. 1, Elsevier, pp. 803–887.

118
Canay, I., Mogstad, M. and Mountjoy, J. (2020). On the Use of Outcome Tests for Detecting Bias in
Decision Making. Tech. rep., NBER Working Paper No. 27802.

Canay, I. A. and Shaikh, A. M. (2017). Practical and Theoretical Advances in Inference for Partially
Identified Models, Cambridge University Press, vol. 2, p. 271–306.

Caplin, A. (2016). Measuring and modeling attention. Annual Review of Economics, 8, 379–403.

— (2021). Economic Data Engineering. Tech. rep., NBER Working Paper Series No. 29378.

—, Csaba, D., Leahy, J. and Nov, O. (2020). Rational inattention, competitive supply, and
psychometrics. The Quarterly Journal of Economics, 135 (3), 1681–1724.

— and Dean, M. (2015). Revealed preference, rational inattention, and costly information acquisi-
tion. American Economic Review, 105 (7), 2183–2203.

— and Martin, D. (2015). A testable theory of imperfect perception. Economic Journal, 125, 184–202.

— and — (2021). Comparison of Decisions under Unknown Experiments. Tech. rep.

Chahrour, R. and Jurado, K. (2021). Recoverability and Expectations-Driven Fluctuations. Tech. rep.

Chalfin, A., Danieli, O., Hillis, A., Jelveh, Z., Luca, M., Ludwig, J. and Mullainathan, S.
(2016). Productivity and Selection of Human Capital with Machine Learning. American Economic
Review, 106 (5), 124–127.

Chambers, C. P., Liu, C. and Martinez, S.-K. (2016). A test for risk-averse expected utility. Journal
of Economic Theory, 163, 775–785.

Chan, D. C., Gentzkow, M. and Yu, C. (2021). Selection with Variation in Diagnostic Skill: Evidence
from Radiologists. Tech. rep., NBER Working Paper No. 26467.

Chandra, A. and Staiger, D. O. (2007). Productivity spillovers in health care: Evidence from the
treatment of heart attacks. Journal of Political Economy, 115 (1), 103–140.

— and — (2020). Identifying sources of inefficiency in healthcare. The Quarterly Journal of Economics,
135 (2), 785—-843.

Charness, G., Gneezy, U. and Kuhn, M. A. (2012). Experimental methods: Between-subject and
within-subject design. Journal of Economic and Business Organization, 81 (1), 1–8.

Chen, I. Y., Joshi, S., Ghassemi, M. and Ranganath, R. (2020). Probabilistic Machine Learning for
Healthcare. Tech. rep., arXiv preprint, arXiv:2009.11087.

Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation and inference.
Econometrica, 81 (2), 667–737.

Chin, A. (2018). Central limit theorems via stein’s method for randomized experiments under
interference. arXiv preprint arXiv:1804.03105.

Cho, J. and Russell, T. M. (2020). Simple inference on functionals of set-identified parameters


defined by linear moments.

Chopin, N. and Papasphiliopoulos, O. (2020). An Introduction to Sequential Monte Carlo. Springer.

119
Chouldechova, A., Benavides-Prado, D., Fialko, O. and Vaithianathan, R. (2018). A case
study of algorithm-assisted decision making in child maltreatment hotline screening decisions.
Proceedings of the 1st Conference on Fairness, Accountability and Transparency, pp. 134–148.

— and Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications
of the ACM, 63 (5), 82–89.

Christ, C. F. (1994). The Cowles Commission’s contributions to econometrics at Chicago, 1939-1955.


Journal of Economic Literature, 32, 30–59.

Cloyne, J. S., Óscar Jordá and Taylor, A. M. (2020). Decomposing the Fiscal Multiplier. Tech. rep.

Cochrane, J. H. (2011). Presidential address: Discount rates. The Journal of Finance, 66 (4), 1047–
1108.

— and Piazessi, M. (2002). The Fed and interest rates - a high-frequency identification. American
Economic Review, 92, 90–95.

Coston, A., Mishler, A., Kennedy, E. H. and Chouldechova, A. (2020). Counterfactual risk
assessments, evaluation and fairness. FAT* ’20: Proceedings of the 2020 Conference on Fairness,
Accountability, and Transparency, pp. 582––593.

—, Rambachan, A. and Chouldechova, A. (2021). Characterizing fairness over the set of good
models under selective labels.

Cowgill, B. (2018). Bias and Productivity in Humans and Machines: Theory and Evidence. Tech. rep.

Cox, D. R. (1958a). Planning of Experiments. Oxford, United Kingdom: Wiley.

— (1958b). The regression analysis of binary sequences (with discussion). Journal of the Royal
Statistical Society: Series B, 20, 215–42.

Cox, G. and Shi, X. (2020). Simple Adaptive Size-Exact Testing for Full-Vector and Subvector Inference
in Moment Inequality Models. Tech. rep.

Cunha, F., Heckman, J. J., Lochner, L. and Masterov, D. V. (2006). Chapter 12 Interpreting the
Evidence on Life Cycle Skill Formation. In E. Hanushek and F. Welch (eds.), Handbook of the
Economics of Education, vol. 1, Elsevier, pp. 697–812.

—, — and Schennach, S. M. (2010). Estimating the Technology of Cognitive and Noncognitive


Skill Formation. Econometrica, 78 (3), 883–931.

Currie, J. and Macleod, W. B. (2017). Diagnosing expertise: Human capital, decision making,
and performance among physicians. Journal of Labor Economics, 35 (1), 1–43.

— and — (2020). Understanding doctor decision making: The case of depression treatment.
Econometrica, 88 (3), 847–878.

Czibor, E., Jimenez-Gomez, D. and List, J. A. (2019). The dozen things experimental economists
should do (more of). Southern Economic Journal, 86 (2), 371–432.

Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of


human decision making. American Psychologist, 26 (2), 180–188.

120
— (1979). The robust beauty of improper linear models in decision making. American Psychologist,
34 (7), 571–582.

—, Faust, D. and Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 249 (4899),
1668–1674.

De-Arteaga, M., Dubrawski, A. and Chouldechova, A. (2021). Leveraging Expert Consistency to


Improve Algorithmic Decision Support. Tech. rep., arXix preprint, arXiv:2101.09648.

—, Fogliato, R. and Chouldechova, A. (2020). A Case for Humans-in-the-Loop: Decisions in


the Presence of Erroneous Algorithmic Scores, New York, NY, USA: Association for Computing
Machinery, p. 1–12.

de Chaisemartin, C. (2017). Tolerating defiance? local average treatment effects without mono-
tonicity. Quantitative Economics, 8 (2), 367–396.

— and D’Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment
effects. American Economic Review, 110 (9), 2964–96.

D’Haultfoeuille, X., Gaillac, C. and Maurel, A. (2020). Rationalizing Rational Expectations:


Characterization and Tests. Tech. rep., arXiv preprint, arXiv:2003.11537.

Dobbie, W., Goldin, J. and Yang, C. (2018). The effects of pretrial detention on conviction, future
crime, and employment: Evidence from randomly assigned judges. American Economic Review,
108 (2), 201–240.

—, Liberman, A., Paravisini, D. and Pathania, V. (2020). Measuring Bias in Consumer Lending.
Tech. rep.

— and Yang, C. (2019). Proposals for Improving the U.S. Pretrial System. Tech. rep., The Hamilton
Project.

Durbin, J. and Koopman, S. J. (2012). Time Series Analysis by State Space Methods. Oxford: Oxford
University Press, 2nd edn.

Echenique, F. (2020). New developments in revealed preference theory: Decisions under risk,
uncertainty, and intertemporal choice. Annual Review of Economics, 12, 299–316.

— and Saito, K. (2015). Savage in the market. Econometrica, 83 (4), 1467–1495.

—, — and Imai, T. (2021). Approximate Expected Utility Rationalization. Tech. rep., arXiv preprint,
arXiv:2102.06331.

Einav, L., Jenkins, M. and Levin, J. (2013). The impact of credit scoring on consumer lending.
Rand Journal of Economics, 44 (2), 249—-274.

Elliott, G., Komunjer, I. and Timmerman, A. (2008). Biases in macroeconomic forecasts: Irra-
tionality or asymmetric loss? Journal of the European Economic Association, 6 (1), 122–157.

—, Timmerman, A. and Komunjer, I. (2005). Estimation and testing of forecast rationality under
flexible loss. The Review of Economic Studies, 72 (4), 1107–1125.

Erel, I., Stern, L. H., Tan, C. and Weisbach, M. S. (2019). Selecting Directors Using Machine
Learning. Tech. rep., NBER Working Paper Series No. 24435.

121
Fang, Z., Santos, A., Shaikh, A. M. and Torgovitsky, A. (2020). Inference for Large-Scale Linear
Systems with Known Coefficients. Tech. rep.

Farmer, L., Nakamura, E. and Steinsson, J. (2021). Learning About the Long Run. Tech. rep.

Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. and Venkatasubramanian, S. (2015).
Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 259–268.

Frandsen, B. R., Lefgren, L. J. and Leslie, E. C. (2019). Judging Judge Fixed Effects. Tech. rep.,
NBER Working Paper Series No. 25528.

Frankel, A. (2021). Selecting applicants. Econometrica, 89 (2), 615–645.

Freedman, D. A. (2008). On regression adjustments to experimental data. Advances in Applied


Mathematics, 40 (2), 180–193.

Frisch, R. (1933). Propagation Problems and Impulse Problems in Dynamic Economics. London, United
Kingdom: Allen and Unwin.

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T. and Walther, A. (2018). Predictably Unequal?
The Effects of Machine Learning on Credit Markets. Tech. rep.

Gabaix, X. (2014). A sparsity-based model of bounded rationality. The Quarterly Journal of Economics,
129 (4), 1661–1710.

— (2019). Behavioral inattention. In B. D. Bernheim, S. DellaVigna and D. Laibson (eds.), Handbook


of Behavioral Economics: Applications and Foundations, vol. 2, North Holland, pp. 261–343.

Gafarov, B. (2019). Inference in high-dimensional set-identified affine models. Tech. rep., arXiv preprint,
arXiv:1904.00111.

Gallant, A. R., Rossi, P. E. and Tauchen, G. (1993). Nonlinear dynamic structures. Econometrica,
61, 871–907.

Gelbach, J. (2021). Testing Economic Models of Discrimination in Criminal Justice. Tech. rep.

Gennaioli, N., Ma, Y. and Shleifer, A. (2016). Expectations and investment. NBER Macroeconomics
Annual, 30, 379–431.

— and Shleifer, A. (2010). What comes to mind. The Quarterly Journal of Economics, 125 (4),
1399–1433.

Gertler, M. L. and Karadi, P. (2015). Monetary policy surprises, credit costs, and economic
activity. American Economic Journal: Macroeconomics, 7, 44–76.

Gillis, T. (2019). False Dreams of Algorithmic Fairness: The Case of Credit Pricing. Tech. rep.

Goncalves, S., Herreraz, A. M., Kilian, L. and Pesavento, E. (2021). Impulse response analysis for
structural dynamic models with nonlinear regressors. Tech. rep.

Goodman-Bacon, A. (2020). Difference-in-Differences with Variation in Treatment Timing. Tech. rep.

Gordon, N. J., Salmond, D. J. and Smith, A. F. M. (1993). A novel approach to nonlinear and
non-Gaussian Bayesian state estimation. IEE-Proceedings F, 140, 107–113.

122
Gourieroux, C. and Jasiak, J. (2005). Nonlinear innovations and impulse responses with applica-
tion to VaR sensitivity. Annales d’Economie et de Statistique, 78, 1–31.

Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral


methods. Econometrica, 37, 424–438.

Green, B. and Chen, Y. (2019a). Disparate interactions: An algorithm-in-the-loop analysis of


fairness in risk assessments. In Proceedings of the Conference on Fairness, Accountability, and
Transparency, FAT* ’19, New York, NY, USA: Association for Computing Machinery, p. 90–99.

— and — (2019b). The principles and limits of algorithm-in-the-loop decision making. Proceedings
of the ACM on Human-Computer Interaction, 3 (CSCW).

Griliches, Z. (1977). Estimating the Returns to Schooling: Some Econometric Problems. Economet-
rica, 45, 1–22.

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E. and Nelson, C. (2000). Clinical versus
mechanical prediction: A meta-analysis. Psychological Assessment, 12 (1), 19–30.

Gualdani, C. and Sinha, S. (2020). Identification and Inference in Discrete Choice Models with Imperfect
Information. Tech. rep., arXiv preprint, arXiv:1911.04529.

Gul, F., Natenzon, P. and Pesendorfer, W. (2014). Random choice as behavioral optimization.
Econometrica, 82, 1873–1912.

— and Pesendorfer, W. (2006). Random expected utility. Econometrica, 74 (1), 121–146.

Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S. and Athey, S. (2021). Confidence intervals
for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences,
118 (15).

Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and its Applications. San Diego, California,
USA: Academic Press.

Hamilton, J. (1989). A new approach to the economic analysis of nonstationary time series and
the business cycle. Econometrica, 57, 357–384.

Hamilton, J. D. (2011). Nonlinearities and the macroeconomic effects of oil prices. Macroeconomic
Dynamics, 15 (S3), 364–378.

Han, S. (2019). Identification in nonparametric models for dynamic treatment effects. Journal of
Econometrics, forthcoming.

Handel, B. and Schwartzstein, J. (2018). Frictions or mental gaps: What’s behind the information
we (don’t) use and when do we care? Journal of Economic Perspectives, 32 (1), 155–178.

Hardt, M., Price, E. and Srebro, N. (2016). Equality of opportunity in supervised learning.
NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems,
pp. 3323–3331.

Harvey, A. C. (1996). Intervention analysis with control groups. International Statistical Review, 64,
313–328.

123
— and Durbin, J. (1986). The effects of seat belt legislation on British road casualties: A case study
in structural time series modelling. Journal of the Royal Statistical Society: Series A, 149, 187–227.

Hausman, J. A. (1983). Specification and estimation of simultaneous equation models. In


Z. Griliches and M. D. Intriligator (eds.), Handbook of Econometrics, vol. 1, North-Holland,
pp. 391–448.

Heckman, J. J. (1974). Shadow prices, market wages, and labor supply. Econometrica, 42 (4),
679–694.

— (1979). Sample selection bias as a specification error. Econometrica, 47 (1), 153–161.

—, Humphries, J. E. and Veramendi, G. (2016). Dynamic treatment effects. Journal of Econometrics,


191, 276–292.

— and Navarro, S. (2007). Dynamic discrete choice and dynamic treatment effects. Journal of
Econometrics, 136, 341–396.

— and Vytlacil, E. J. (2006). Econometric evaluation of social programs, part i: Causal models,
structural models and econometric policy evaluation. In Handbook of Econometrics, vol. 6, pp.
4779–4874.

Henry, M., Meango, R. and Mourifie, I. (2020). Revealing Gender-Specific Costs of STEM in an
Extended Roy Model of Major Choice. Tech. rep.

Herbst, E. and Schorfheide, F. (2015). Bayesian Estimation of DSGE Models. Princeton, New Jersey,
USA: Princeton University Press.

Hernan, M. A. and Robins, J. M. (2019). Causal Inference. Boca Raton, Florida, USA: Chapman &
Hall, forthcoming.

Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J. and Parkes, D. (2021). Learning representations
by humans, for humans. In M. Meila and T. Zhang (eds.), Proceedings of the 38th International
Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 4227–4238.

Ho, K. and Rosen, A. M. (2017). Partial Identification in Applied Research: Benefits and Challenges,
Cambridge University Press, vol. 2, p. 307–359.

Hoffman, M., Kahn, L. B. and Li, D. (2018). Discretion in hiring. The Quarterly Journal of Economics,
133 (2), 765—-800.

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association,
81, 945–960.

Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement


from a finite universe. Journal of the American Statistical Association, 47, 663–685.

Hull, P. (2018). Estimating Treatment Effects in Mover Designs. Tech. rep., arXiv preprint,
arXiv:1804.06721.

— (2021). What Marginal Outcome Tests Can Tell Us About Racially Biased Decision-Making. Tech. rep.,
NBER Working Paper Series No. 28503.

124
Imai, K. and Kim, I. (2019). When should we use unit fixed effects regression models for causal
inference with longitudinal data? American Journal of Political Science, 63, 467—-490.

— and — (2020). On the use of two-way fixed effects regression models for causal inference with
panel data. Political Analysis, forthcoming.

Imbens, G. W. (2003). Sensitivity to exogeneity assumptions in program evaluation. American


Economic Review, 93 (2), 126–132.

— and Angrist, J. D. (1994). Identification and estimation of local average treatment effects.
Econometrica, 62, 467–475.

— and Rubin, D. B. (2015). Causal Inference for statistics, social and biomedical sciences: an introduction.
Cambridge, United Kingdom: Cambridge University Press.

Jacob, B. A. and Lefgren, L. (2008). Can principals identify effective teachers? evidence on
subjective performance evaluation in education. Journal of Labor Economics, 26 (1), 101–136.

Jordá, O. (2005). Estimation and inference of impulse responses by local projections. American
Economic Review, 95, 161–182.

Jordá, Óscar., Schularick, M. and Taylor, A. M. (2015). Betting the house. Journal of International
Economics, 96, S2–S18.

—, — and Taylor, A. M. (2020). The effects of quasi-random monetary experiments. Journal of


Monetary Economics, 112, 22–40.

Jung, J., Concannon, C., Shroff, R., Goel, S. and Goldstein, D. G. (2020a). Simple rules to
guide expert classifications. Journal of the Royal Statistical Society Series A, 183 (3), 771–800.

—, Shroff, R., Feller, A. and Goel, S. (2020b). Bayesian sensitivity analysis for offline policy
evaluation. AIES ’20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, p. 64–70.

Kallus, N., Mao, X. and Zhou, A. (2018). Interval Estimation of Individual-Level Causal Effects Under
Unobserved Confounding. Tech. rep., arXiv preprint arXiv:1810.02894.

— and Zhou, A. (2018). Confounding-robust policy improvement. Advances in Neural Information


Processing Systems 31 (NIPS 2018).

Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic
Engineering, Transactions ASMA, Series D, 82, 35–45.

Kamenica, E. (2019). Bayesian persuasion and information design. Annual Review of Economics, 11,
249–272.

— and Gentzkow, M. (2011). Bayesian persuasion. American Economic Review, 101, 2590–2615.

Kastelman, D. and Ramesh, R. (2018). Switchback tests and randomized experimentation under
network effects at doordash. URL: https://medium.com/@DoorDash/switchback-tests-and-randomized-
experimentation-under-network-effects-at-doordash-f1d938ab7c2a.

Kempthorne, O. (1955). The randomization theory of experimental inference. Journal of the


American Statistical Association, 50, 946–967.

125
Khandani, A. E., Kim, A. J. and Lo, A. W. (2010). Consumer credit-risk models via machine-
learning algorithms. Journal of Banking & Finance, 34 (11), 2767 – 2787.

Kilian, L. and Lutkepohl, H. (2017). Structural Vector Autoregressive Analysis. Cambridge, United
Kingdom: Cambridge University Press.

Killian, L. and Vigfusson, R. J. (2011a). Are the responses of the U.S. economy asymmetric in
energy price increases and decreases? Quantitative Economics, 2, 419–453.

— and — (2011b). Nonlinearities in the oil price-output relationship. Macroeconomic Dynamics,


15 (S3), 337–363.

Kitagawa, T. (2020). The Identification Region of the Potential Outcome Distributions under Instrument
Independence. Tech. rep., Cemmap Working Paper CWP23/20.

Kitamura, Y. and Stoye, J. (2018). Nonparametric analysis of random utility models. Econometrica,
86 (6), 1883–1909.

Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. and Mullainathan, S. (2018a). Human
decisions and machine predictions. Quarterly Journal of Economics, 133 (1), 237–293.

—, Ludwig, J., Mullainathan, S. and Obermeyer, Z. (2015). Prediction policy problems. American
Economic Review: Papers and Proceedings, 105 (5), 491–495.

—, —, — and Rambachan, A. (2018b). Algorithmic fairness. AEA Papers and Proceedings, 108,
22–27.

Kling, J. R. (2006). Incarceration length, employment, and earnings. American Economic Review,
96 (3), 863–876.

Koop, G., Pesaran, M. H. and Potter, S. M. (1996). Impulse response analysis in nonlinear
multivariate models. Journal of Econometrics, 74, 119–147.

Kubler, F., Selden, L. and Wei, X. (2014). Asset demand based tests of expected utility maximiza-
tion. American Economic Review, 104 (11), 3459–3480.

Kuersteiner, G., Phillips, D. and Villamizar-Villegas, M. (2018). Effective sterilized foreign


exchange intervention? Evidence from a rule-based policy. Journal of International Economics, 118,
118–138.

Kuncel, N. R., Klieger, D. M., Connelly, B. S. and Ones, D. S. (2013). Mechanical versus clinical
data combination in selection and admissions decisions: A meta-analysis. Journal of Applied
Psychology, 98 (6), 1060—1072.

Kuttner, K. (2001). Monetary policy surprises and interest rates: evidence from the Fed funds
futures market. Journal of Monetary Economics, 47, 523–544.

Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in
Applied Mathematics, 6 (1), 4–22.

Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J. and Mullainathan, S. (2017). The selective
labels problem: Evaluating algorithmic predictions in the presence of unobservables. KDD ’17
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 275–284.

126
— and Rudin, C. (2017). Learning cost-effective and interpretable treatment regimes. Proceedings
of the 20th International Conference on Artificial Intelligence and Statistics, 54, 166–175.

Lechner, M. (2011). The relation of different concepts of causality used in time series and
microeconomics. Econometric Reviews, 30, 109–127.

Lee, S. and Salanie, B. (2020). Filtered and unfiltered treatment effects with targeting instruments,
unpublished paper: Department of Economics, Columbia University.

Leslie, E. and Pope, N. G. (2017). The unintended impact of pretrial detention on case outcomes:
Evidence from new york city arraignments. The Journal of Law and Economics, 60 (3), 529–557.

Li, D., Raymond, L. and Bergman, P. (2020). Hiring as Exploration. Tech. rep., NBER Working
Paper Series No. 27736.

Li, L., Chu, W., Langford, J. and Schapire, R. E. (2010). A contextual-bandit approach to
personalized news article recommendation. In Proceedings of the 19th international conference on
World wide web, pp. 661–670.

Li, S., Karatzoglou, A. and Gentile, C. (2016). Collaborative filtering bandits. In Proceedings of
the 39th International ACM SIGIR conference on Research and Development in Information Retrieval,
pp. 539–548.

Li, X. and Ding, P. (2017). General Forms of Finite Population Central Limit Theorems with
Applications to Causal Inference. Journal of the American Statistical Association, 112 (520).

Lillie, E. O., Patay, B., Diamant, J., Issell, B., Topol, E. J. and Schork, N. J. (2011). The n-of-1
clinical trial: the ultimate strategy for individualizing medicine? Personalized Medicine, 8 (2),
161–173.

Liu, L. T., Dean, S., Rolf, E., Simchowitz, M. and Hardt, M. (2018). Delayed impact of fair
machine learning. Proceedings of the 35th International Conference on Machine Learning.

Low, H. and Pistaferri, L. (2015). Disability insurance and the dynamics of the incentive insurance
trade-off. American Economic Review, 105 (10), 2986–3029.

— and — (2019). Disability Insurance: Error Rates and Gender Differences. Tech. rep., NBER Working
Paper No. 26513.

Lu, J. (2016). Random choice and private information. Econometrica, 84 (6), 1983–2027.

— (2019). Bayesian identification: A theory for state-dependent utilities. American Economic Review,
109 (9), 3192–3228.

Lu, X., Su, L. and White, H. (2017). Granger causality and structural causality in cross-section and
panel data. Econometric Theory, 33, 263–291.

Lucas, R. E. (1972). Expectations and the neutrality of money. Journal of Economic Theory, 4,
103–124.

Madras, D., Pitassi, T. and Zemel, R. (2018). Predict Responsibly: Improving Fairness and Accuracy
by Learning to Defer. Tech. rep., arXiv preprint, arXiv:1711.06664.

Manski, C. F. (1989). Anatomy of the selection problem. Journal of Human Resources, 24 (3), 343–360.

127
— (1994). The selection problem. In C. Sims (ed.), Advances in Econometrics: Sixth World Congress,
vol. 1, Cambridge University Press, pp. 143–170.

— (2004). Measuring expectations. Econometrica, 72 (5), 1329–1376.

— (2017). Improving Clinical Guidelines and Decisions Under Uncertainty. Tech. rep., NBER Working
Paper No. 23915.

Marquardt, K. (2021). Mis(sed) Diagnosis: Physician Decision Making and ADHD. Tech. rep.

Martin, D. and Marx, P. (2021). A Robust Test of Prejudice for Discrimination Experiments. Tech. rep.

Mastakouri, A., Scholkopf, B. and Janzing, D. (2021). Necessary and sufficient conditions for
causal feature selection in time series with latent common causes. Proceedings of Machine Learning
Research, 139, 7502–7511.

Mavroeidis, S. (2021). Identification at the Zero Lower Bound. Tech. rep.

Meehl, P. E. (1954). Clinical versus Statistical Prediction: A Theoretical Analysis and a Review of the
Evidence. University of Minnesota Press.

Menchetti, F. and Bojinov, I. (2021). Estimating the Effectiveness of Permanent Price Reductions for
Competing Products Using Multivariate Bayesian Structural Time Series Models. Tech. rep.

Mitchell, S., Potash, E., Barocas, S., D’Amour, A. and Lum, K. (2019). Prediction-Based Decisions
and Fairness: A Catalogue of Choices, Assumptions, and Definitions. Tech. rep., arXiv Working Paper,
arXiv:1811.07867.

Molinari, F. (2020). Microeconometrics with partial identification. In Handbook of Econometrics,


vol. 7, pp. 355–486.

Mourifie, I., Henry, M. and Meango, R. (2019). Sharp bounds and testability of a Roy model of STEM
major choices. Tech. rep., arXiv preprint arXiv:1709.09284.

Mullainathan, S. and Obermeyer, Z. (2021). Diagnosing Physician Error: A Machine Learning


Approach to Low-Value Health Care. Tech. rep., NBER Working Paper Series, Working Paper No.
26168.

— and Spiess, J. (2017). Machine learning: An applied econometric approach. Journal of Economic
Perspectives, 31 (2), 87–106.

Munoz, I. D. and van der Laan, M. (2012). Population intervention causal effects based on
stochastic interventions. Biometrics, 68, 541–549.

Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statisticsl Society B,
65, 331–366.

—, van der Laan, M. J., Robins, J. M. and Group, C. P. P. R. (2001). Marginal mean models for
dynamic regimes. Journal of the American Statistical Association, 96, 1410–1423.

Nakamura, E. and Steinsson, J. (2018a). High-frequency identification of monetary non-neutrality:


The information effect. The Quarterly Journal of Economics, 133, 1283–1330.

— and — (2018b). Identification in macroeconomics. Journal of Economic Perspectives, 32 (3), 59–86.

128
Natenzon, P. (2019). Random choice and learning. Journal of Political Economy, 127 (1), 419–457.

Neyman, J. (1923). On the Application of Probability Theory to Agricultural Experiments. Essay


on Principles. Section 9. Statistical Science, 5 (4), 465–472.

Obermeyer, Z. and Emanuel, E. J. (2016). Predicting the future - big data, machine learning, and
clinical medicine. The New England Journal of Medicine, 375 (13), 1216–9.

Papadogeorgou, G., Imai, K., Lyall, J. and Li, F. (2021). Causal Inference with Spatio-temporal
Data: Estimating the Effects of Airstrikes on Insurgent Violence in Iraq. Tech. rep., arXiv preprint,
arXiv:2003.13555.

—, Mealli, F. and Zigler, C. M. (2019). Causal inference with interfering units for cluster and
population level treatment allocation programs. Biometrics, 75, 778–787.

Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: auxiliary particle filter. Journal of
the American Statistical Association, 94, 590–599.

Plagborg-Møller, M. (2019). Bayesian inference for structural impulse response functions.


Quantitative Economics, 10 (1), 145–184.

Plagborg-Møller, M. and Wolf, C. K. (2020). Instrumental variable identification of dynamic


variance decompositions.

Polisson, M., Quah, J. K. H. and Renou, L. (2020). Revealed preferences over risk and uncertainty.
American Economic Review, 110 (6), 1782–1820.

Raghavan, M., Barocas, S., Kleinberg, J. and Levy, K. (2020). Mitigating bias in algorithmic hiring:
Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability,
and Transparency, p. 469–481.

Raghu, M., Blumer, K., Corrado, G., Kleinberg, J., Obermeyer, Z. and Mullainathan, S.
(2019). The Algorithmic Automation Problem: Prediction, Triage, and Human Effort. Tech. rep., arXiv
preprint, arXiv:1903.12220.

Rambachan, A., Kleinberg, J., Ludwig, J. and Mullainathan, S. (2021). An Economic Approach to
Regulating Algorithms. Tech. rep., NBER Working Paper Series No. 27111.

— and Ludwig, J. (2021). Empirical Analysis of Prediction Mistakes in New York City Pretrial Data.
Tech. rep., University of Chicago Crime Lab Technical Report.

— and Roth, J. (2020). An Honest Approach to Parallel Trends. Tech. rep.

— and Shephard, N. (2020). Econometric analysis of potential outcomes time series: instruments, shocks,
linearity and the causal response function. Tech. rep., arXiv preprint arXiv:1903.01637.

Ramey, V. A. (2016). Macroeconomics shocks and their propagation. In J. B. Taylor and H. Uhlig
(eds.), Handbook of Macroeconomics, vol. 2A, Amsterdam, The Netherlands: North Holland, pp.
71–162.

— and Zubairy, S. (2018). Government spending multipliers in good times and in bad: Evidence
from US historical data. Journal of Political Economy, 126, 850–901.

Rehbeck, J. (2020). Revealed Bayesian Expected Utility with Limited Data. Tech. rep.

129
Ribers, M. A. and Ullrich, H. (2019). Battling Antibiotic Resistance: Can Machine Learning Improve
Prescribing? Tech. rep., arXiv preprint arXiv:1906.03044.

— and — (2020). Machine Predictions and Human Decisions with Variation in Payoffs and Skills. Tech.
rep.

Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American
Mathematical Society, 58 (5), 527–535.

Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained
exposure periods: application to control of the healthy worker survivor effect. Mathematical
Modelling, 7, 1393–1512.

— (1994). Correcting for non-compliance in randomization trials using structural nested mean
models. Communications in Statistics — Theory and Methods, 23, 2379–2412.

—, Greenland, S. and Hu, F.-C. (1999). Estimation of the causal effect of a time-varying exposure
on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association,
94, 687–700.

Rockoff, J. E., Jacob, B. A., Kane, T. J. and Staiger, D. O. (2011). Can you recognize an effective
teacher when you recruit one? Education Finance and Policy, 6 (1), 43–74.

Romer, C. D. and Romer, D. H. (2004). A new measure of monetary shocks: derivation and
implications. American Economic Review, 94, 1055–1084.

Rosenbaum, P. R. (2002). Observational Studies. Springer.

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized


studies. Journal of Educational Psychology, 66 (5), 688–701.

— (1976). Inference and missing data. Biometrika, 63 (3), 581–592.

— (1980). Randomization analysis of experimental data: The Fisher randomization test comment.
Journal of the American Statistical Association, 75, 591–593.

Russell, T. M. (2019). Sharp bounds on functionals of the joint distribution in the analysis of
treatment effects. Journal of Business & Economic Statistics.

Sargent, T. J. (1981). Interpreting economic time series. Journal of Political Economy, 89, 213–248.

Sävje, F., Aronow, P. M. and Hudgens, M. G. (2019). Average treatment effects in the presence of
unknown interference. Tech. rep., arXiv preprint arXiv:1711.06399.

Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69 (1),
99–118.

— (1956). Rational choice and the structure of the environment. Psychological Review, 63 (2),
129–138.

Sims, C. A. (1972). Money, income and causality. American Economic Review, 62, 540–552.

— (1980). Macroeconomics and reality. Econometrica, 48, 1–48.

— (2003). Implications of rational inattention. Journal of Monetary Economics, 50 (3), 665–690.

130
Slutzky, E. (1937). The summation of random causes as the source of cyclic processes. Econometrica,
5, 105–146.

Sneider, C. and Tang, Y. (2018). Experiment rigor for switchback experiment analysis. URL:
https://doordash.engineering/2019/02/20/experiment-rigor-for-switchback-experiment-analysis/.

Stevenson, M. (2018). Assessing risk assessment in action. Minnesota Law Review, 103.

— and Doleac, J. (2019). Algorithmic Risk Assessment in the Hands of Humans. Tech. rep.

Stock, J. H. and Watson, M. W. (2016). Dynamic factor models, factor-augmented vector autore-
gressions, and structural vector autoregressions in macroeconomics. In J. B. Taylor and H. Uhlig
(eds.), Handbook of Macroeconomics, vol. 2A, pp. 415–525.

— and — (2018). Identification and estimation of dynamic causal effects in macroeconomics using
external instruments. Economic Journal, 128, 917—-948.

Sun, L. and Abraham, S. (2020). Estimating Dynamic Treatment Effects in Event Studies with Heteroge-
neous Treatment Effects. Tech. rep., arXiv preprint, arXiv:1804.05785.

Syrgkanis, V., Tamer, E. and Ziani, J. (2018). Inference on Auctions with Weak Assumptions on
Information. Tech. rep., arXiv preprint, arXiv:1710.03830.

Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple equilibria. The
Review of Economic Studies, 70 (1).

Tan, S., Adebayo, J., Inkpen, K. and Kamar, E. (2018). Investigating Human + Machine Complemen-
tarity for Recidivism Predictions. Tech. rep., arXiv preprint, arXiv:1808.09123.

Tenreyro, S. and Thwaites, G. (2016). Pushing on a string: US monetary policy is less powerful
in recessions. American Economic Journal: Macroeconomics, 8 (4), 43–74.

Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,
185 (4157), 1124–1131.

van der Laan, M. J. (2008). The construction and analysis of adaptive group sequential designs.

White, H. and Kennedy, P. (2009). Retrospective estimation of causal effects through time. In
J. L. Castle and N. Shephard (eds.), The Methodology and Practice of Econometrics: A Festschrift in
Honour of David F. Hendry, Oxford University Press, pp. 59–87.

— and Lu, X. (2010). Granger causality and dynamic structural systems. Journal of Financial
Econometrics, 8, 193–243.

Wiener, N. (1956). The theory of prediction. In E. F. Beckenbeck (ed.), Modern Mathematics, New
York, USA: McGraw-Hill, pp. 165–190.

Wilder, B., Horvitz, E. and Kamar, E. (2020). Learning to complement humans. In Proceedings
of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, International
Joint Conferences on Artificial Intelligence Organization, pp. 1526–1533.

Wooldridge, J. M. (2005). Fixed-effects and related estimators for correlated random-coefficient


and treatment effect panel data models. Review of Economics and Statistics, 87, 385–390.

131
Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high
dimensional data in c++ and r. Journal of Statistical Software, Articles, 77 (1), 1–17.

Wu, X., Weinberger, K. R., Wellenius, G. A., Dominici, F. and Braun, D. (2021). Assessing
the causal effects of a stochastic intervention in time series data: Are heat alerts effective
in preventing deaths and hospitalizations?, unpublished paper: Department of Biostatistics,
Harvard T.H. Chan School of Public Health.

Yadlowsky, S., Namkoong, H., Basu, S., Duchi, J. and Tian, L. (2020). Bounds on the condi-
tional and average treatment effect with unobserved confounding factors. Tech. rep., arXiv preprint
arXiv:1808.09521.

Yang, C. and Dobbie, W. (2020). Equal protection under algorithms: A new statistical and legal
framework. Michigan Law Review, 119 (2), 291–396.

Zhang, K. W., Janson, L. and Murphy, S. A. (2020). Inference for Batched Bandits. Tech. rep., arXiv
preprint arXiv:2002.03217.

132
Appendix A

Appendix to Chapter 1

A.1 Additional Figures and Tables

Figure A.1: Observed failure to appear rate among released defendants and constructed bound
on the failure to appear rate among detained defendants by race-and-felony charge cells for one
judge in New York City.

Notes: This figure plots the observed failure to appear rate among released defendants (orange, circles) and the
bounds based on the judge leniency for the failure to appear rate among detained defendants (blue, triangles) at each
decile of predicted failure to appear risk and race-by-felony charge cell for the judge that heard the most cases in
the main estimation sample. he judge leniency instrument Z P Z is defined as the assigned judge’s quintile of the
constructed, leave-one-out leniency measure. Judges in New York City are quasi-randomly assigned to defendants
within court-by-time cells. The bounds on the failure to appear rate among detained defendants (blue, triangles) are
constructed using the most lenient quintile of judges, and by applying the instrument bounds for a quasi-random
instrument (see Appendix A.4.1). Section 1.5.3 describes the estimation details for these bounds. Source: Rambachan
and Ludwig (2021).

133
Figure A.2: Estimated bounds on implied prediction mistakes between top and bottom predicted
failure to appear risk deciles made by judges within each race-by-felony charge cell.

Notes: This figure plots the 95% confidence interval on the implied prediction mistake δpw, dq{δpw, d1 q between the top
decile d and bottom decile d1 of the predicted failure to appear risk distribution for each judge in the top 25 whose
pretrial release decisions violated the implied revealed preference inequalities (Table 1.1) and each race-by-felony
charge cell. The implied prediction mistake δpw, dq{δpw, d1 q measures the degree to which judges’ beliefs underreact
or overreact to variation in failure to appear risk. The confidence intervals highlighted in orange show that judges
under-react to predictable variation in failure to appear risk from the highest to the lowest decile of predicted failure to
appear risk (i.e., the estimated bounds lie below one). These confidence intervals are constructed by first constructing a
95% joint confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the
moment inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated
with each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on
the implied prediction mistake and Section 1.5.5 for the estimation details. Source: Rambachan and Ludwig (2021).

134
Figure A.3: Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that make detectable prediction mistakes about failure to appear risk
by defendant race.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decisions against the judge’s observed release decisions among judges who were found to make
detectable prediction mistakes, broken out by defendant race. Worst case total expected social welfare under each
decision rule is computed by first constructing a 95% confidence interval for total expected social welfare under the
decision rule, and reporting smallest value that lies in the confidence interval. These decisions rules are constructed
and evaluated over race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the relative social
welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention).
The solid line plots the median change across judges that make mistakes, and the dashed lines report the minimum
and maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).

135
Figure A.4: Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that make detectable prediction mistakes about failure to appear risk.

Notes: This figure reports the overall release rate of the algorithmic decision rule that fully automates decisions against
the judge’s observed release rates among judges who were found to make detectable prediction mistakes. These
decisions rules are constructed and evaluated over race-by-age cells and deciles of predicted risk. The x-axis plots the
relative social welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary
detention). The solid line plots the median release rate across judges that make detectable prediction mistakes, and
the dashed lines report the minimum and maximum release rates across judges. See Section 1.6.2 for further details.
Source: Rambachan and Ludwig (2021).

136
Figure A.5: Ratio of total expected social welfare under algorithmic decision rule relative to release
decisions of judges that do not make detectable prediction mistakes about failure to appear risk.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization at accurate beliefs about failure to appear risk. Worst case total expected
social welfare under each decision rule is computed by constructing 95% confidence intervals for total expected social
welfare under the decision rule, and reporting smallest value that lies in the confidence interval. These decisions rules
are constructed and evaluated over race-by-age cells and deciles of predicted failure to appear risk. The x-axis plots the
relative social welfare cost of detaining a defendant that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary
detention). The solid line plots the median change across judges, and the dashed lines report the minimum and
maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).

137
Figure A.6: Ratio of total expected social welfare under algorithmic decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes by defendant race.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization behavior at accurate beliefs, broken out by defendant race. Worst case
total expected social welfare under each decision rule is computed by first constructing a 95% confidence interval for
total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence interval.
These decisions rules are constructed and evaluated over race-by-age cells and deciles of predicted failure to appear
risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not fail to appear in court
U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges, and the dashed lines
report the minimum and maximum change across judges. See Section 1.6.2 for further details. Source: Rambachan and
Ludwig (2021).

138
Figure A.7: Overall release rates under algorithmic decision rule relative to the observed release
rates of judges that do not make detectable prediction mistakes.

Notes: This figure reports the overall release rate of the algorithmic decision rule that fully automates decisions
against the judge’s observed release rates among among judges whose choices were consistent with expected utility
maximization behavior at accurate beliefs. These decisions rules are constructed and evaluated over race-by-age cells
and deciles of predicted failure to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant
that would not fail to appear in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median release
rate across judges that do not make systematic prediction mistakes, and the dashed lines report the minimum and
maximum release rates across judges. See Section 1.6.2 for further details. Source: Rambachan and Ludwig (2021).

139
Table A.1: Estimated lower bound on the fraction of judges whose release decisions are inconsistent
with expected utility maximization behavior at accurate beliefs about any pretrial misconduct risk
given defendant characteristics.

Utility Functions Upc, y˚ ; wq


No Characteristics Race Race ` Age Race ` Felony Charge
Adjusted Rejection Rate 76% 72% 64% 92%
Notes: This table summarizes the results for testing whether the release decisions of each judge in the top 25 are
consistent with expected utility maximization behavior at strict preference utility functions Upc, y˚ ; wq that (i) do not
depend on any characteristics, (ii) depend on the defendant’s race, (iii) depend on both the defendant’s race and age,
and (iv) depend on both the defendant’s race and whether the defendant was charged with a felony offense. The
outcome Y ˚ is whether the defendant would commit any pretrial misconduct (i.e., either fail to appear in court or be
re-arrested for a new crime) upon release. Bounds on the any pretrial misconduct rate among detained defendants are
constructed using the judge leniency instrument (see Section 1.5.3). I first construct the unadjusted rejection rate by
testing whether the pretrial release decisions of each judge in the top 25 are consistent with the moment inequalities in
Corollary 1.3.4 at the 5% level using the conditional least-favorable hybrid test using the same procedure described
in Section 1.5.4. The adjusted rejection rate reports the fraction of rejections after correcting for multiple hypothesis
testing using the Holm-Bonferroni step down procedure, which controls the family-wise error rate at the 5% level. See
Section 1.5.4 for further discussion of this table. Source: Rambachan and Ludwig (2021).

140
A.2 User’s Guide to Identifying Prediction Mistakes in Screening De-

cisions

This section provides a step-by-step guide on how the identification results in Sections 1.3-1.4 for

a screening decision with a binary outcome may be applied by empirical researchers.

Suppose we observe a decision maker making many decisions, and for each decision we

observe the characteristics of the decision pW, Xq, decision maker’s choice C P t0, 1u, and the

outcome Y :“ C ¨ Y ˚ , where Y ˚ P t0, 1u is the latent outcome of interest. We observe the dataset
n , which is an i.i.d. sample from the joint distribution pW, X, C, Yq „ P. As
tpWi , Xi , Ci , Yi qui“1

discussed in Section 1.2.2, pretrial release decisions, medical testing and diagnostic decisions, and

hiring decisions are all examples of a screening decision. The basic algorithm for testing whether

the decision maker makes systematic prediction mistakes about the outcome Y ˚ based on the

characteristics pW, Xq is:

Step 1: Specify which characteristics W directly affect the utility function Upc, y˚ ; wq As

discussed in Section 1.2.3 and Section 1.3.1, this is the key restriction on behavior imposed by

the expected utility maximization model. Researchers may motivate this choice in two ways.

First, in a particular setting, there may be common assumptions about the specification of a

decision maker’s utility function used in empirical research. The researcher may directly appeal

to these established modelling choices to guide the choice of this exclusion restriction. Second,

the exclusion restriction may be chosen to summarize various social or legal restrictions on what

characteristics ought not to directly affect the utility function. I recommend that researchers report

a sensitivity analysis that examines how their conclusions change under alternative assumptions

about which characteristics W directly affect the utility function. This explores the extent to which

conclusions about systematic prediction mistakes are robust to alternative assumptions on the

decision maker’s utility function.

Step 2: Construct partitions Dw pXq of the excluded characteristics X For each value of the

characteristics w P W , construct a partition of the remaining excluded characteristics Dw : X Ñ

t1, . . . , Nd u as discussed in Section 1.3.3. The researcher may approach constructing such partitions

141
in two ways. First, there may be data on held-out decisions (e.g., decisions made by other decision

makers), and so the partition may be constructed using supervised machine learning based

prediction methods to predict the outcome on the held out decisions. In this case, estimate a

prediction function fˆ : W ˆ X Ñ r0, 1s on a set of held-out decisions and define Dw pxq by binning

the characteristics X into percentiles of predicted risk within each value w P W . In the empirical

application, I constructed partitions by predicting whether released defendants would fail to

appear in court over decisions may be all judges in New York City except for the top 25 judges

(see Section 1.5.3). Second, there may be an existing integer-valued risk score that can be used to

construct the partitions and the researcher may simply choose the partition Dw pxq to be level-sets

associated with this existing risk score.

Step 3: Estimate the observable choice-dependent outcome probabilities Given the parti-

tion Dw : X Ñ t1, . . . , Nd u for each w P W , estimate the observable choice-dependent outcome

probabilities at each cell pw, dq


řn
n´1 i“1 Ci Yi 1tWi “ w, Dw pXi q “ du
P̂Y˚ p1 | 1, w, dq :“ ´1 řn .
n i“1 Ci 1tWi “ w, Dw pXi q “ du

Step 4: Estimate bounds on unobservable choice-dependent outcome probabilities Construct

an upper bound on the unobservable choice-dependent outcome probabilities PY˚ p1 | 0, w, dq at

each cell w P W , d P t1, . . . , Nd u. This may be done in several ways.

First, as discussed in Section 1.3.2, there may be a randomly assigned instrument Z P Z (i.e.,

satisfies pY ˚ , W, Xq | Z) that generates random variation in the decision maker’s choices. In this

case, we can construct an upper bound on PY˚ p1 | 0, w, d, zq at each value z̃ P Z of the form
π0 pw,d,z̃q`PC,Y˚ p1,1|w,d,z̃q
π0 pw,d,zq . This upper bound is directly estimated by

π̂0 pw, d, z̃q ` P̂C,Y˚ p1, 1 | w, d, z̃q P̂C,Y˚ p1, 1 | w, d, zq


Pˆ Y˚ ,z̃ p1 | 0, w, d, zq “ ´ ,
π̂0 pw, d, zq π̂0 pw, d, zq
řn řn řn
n´1 i“1 p1´Ci q1tWi “w,Dw pXi “dqu n´1 i“1 C Y 1tWi “w,Dw pXi q“du
where π̂0 pw, d, zq :“ řn i“1
řn , P̂C,Y˚ p1, 1 | w, d, zq :“ řn i i .
n´1 i“1 i“1 1tWi “w,Dw pXi “dqu n´1 i“1 1tWi “w,Dw pXi q“du

If the instrument is quasi-randomly assigned (i.e., satisfies pY ˚ , W, Xq KK Z | T), then apply the

identification results in Appendix A.4.1. In the empirical application, I used the quasi-random

assignment of judges to cases to construct bounds on the unobservable failure to appear rate

142
among detained defendants (see Section 1.5.3).

Second, researchers may introduce additional assumptions that bound the unobservable choice-

dependent outcome probabilities using the observable choice-dependent outcome probabilities.

I refer to this as “direct imputation.” In direct imputation, the researcher specifies κw,d ě 0 for

each cell w P W , d P t1, . . . , Nd u and assumes that PY˚ p1 | 0, w, dq ď p1 ` κw,d qPY˚ p1 | 1, w, dq. See

Supplement A.7 for further details. I recommend that researchers report a sensitivity analysis on

their conclusions based on the choices of κw,d ě 0. I illustrate such a sensitivity analysis for direct

imputation in Supplement A.9 for the New York City pretrial release setting.

Finally, researchers may also observe an additional proxy outcome that does not suffer from

the missing data problem, and can therefore be used to construct bounds on the unobservable

choice-dependent outcome probabilities provided the researcher introduces bounds on the joint

distribution of the proxy outcome and the latent outcome. See Supplement A.7 for further details.

Step 5: Test whether the decision maker makes systematic prediction mistakes Testing whether

the decision maker makes systematic prediction mistakes given a choice of directly payoff-relevant

characteristics, a partition of the excluded characteristics and bounds on the unobservable choice-

dependent outcome probabilities is equivalent to testing whether the moment inequalities in

Corollary 1.3.4 are satisfied. Testing whether these moment inequalities are satisfied tests the null

hypothesis that the decision maker’s choices are consistent with expected utility maximization

behavior at preferences that satisfy the researcher’s conjectured exclusion restriction. Researchers

may pick their preferred moment inequality testing procedure from the econometrics literature

(Canay and Shaikh, 2017; Molinari, 2020). In the empirical application in Section 1.5, I use the

conditional least-favorable hybrid test developed in Andrews et al. (2019) since it is computationally

fast given estimates of the moments and the variance-covariance matrix and has desirable power

properties.

Step 6: Conduct Inference on how biased are the decision maker’s predictions To conduct

inference on how biased are the decision maker’s predictions between cells w P W , d P t1, . . . , Nd u

and w P W , d1 P t1, . . . , Nd u, first construct a joint confidence set for the decision maker’s reweighed

utility thresholds τpw, dq, τpw, d1 q at cells pw, dq, pw, d1 q based on the moment inequalities in (A.2).

143
This can be done through test-inversion: for a grid of possible values for the reweighted thresholds,

test the null hypothesis that the moment inequalities in (A.2) are satisfied at each point in the

grid and collect together all points at which we fail to reject the null hypothesis. Second, for each

value in the joint confidence set, construct the ratio in (A.3). This provides a confidence set for the

decision maker’s implied prediction mistake between cells pw, dq and pw, d1 q. If this confidence set

for the implied prediction mistake lies everywhere below one, then the decision maker’s beliefs

about the latent outcome given the characteristics are underreacting to predictable variation in

the latent outcome. Analogously, if this confidence set for the implied prediction mistake lies

everywhere above one, then the decision maker’s beliefs are overreacting. See Section 1.4.2 for

further discussion on the interpretation of this implied prediction mistake.

A.3 Additional Results for the Expected Utility Maximization Model

In this section of the appendix, I provide additional results for the expected utility maximization

model that are mentioned in the main text.

A.3.1 Characterization of Expected Utility Maximization in Treatment Decisions

In Section 1.3, I analyzed the testable implications of expected utility maximization behavior at

accurate beliefs in screening decisions with a binary outcome. I now show that these identification

results extend to treatment decisions with a multi-valued outcome for particular classes of utility

functions U . These extensions also apply to screening decisions with a multi-valued outcome.

Linear Utility

First, I analyze the conditions under which the decision maker’s choices are consistent with

expected utility maximization at a linear utility function of the form Upc, ⃗y; wq “ βpwqy ´ λpwqc,

where Y P R and βpwq ą 0, λpwq ą 0 for all w P W . This is an extended Roy model in which the

benefit function only depends on the realized outcome linearly.1

1 Henry et al. (2020) studies an extended Roy model under the assumption that utility function satisfies Up0, ⃗y; wq “
y0 , Up1, ⃗y; wq “ Y1 ´ λpY1 q for some function λp¨q. The authors derive testable restrictions on behavior under this
extended Roy model provided the researcher observes a stochastically monotone instrumental variable.

144
As notation, let µY1 ´Y0 pc, w, xq :“ ErY1 ´ Y0 | C “ c, W “ w, X “ xs. Define X 0 pwq :“ tx P

X : π0 pw, xq ą 0u and X 1 pwq :“ tx P X : π1 pw, xq ą 0u.

Theorem A.3.1. Consider a treatment decision. The decision maker’s choices are consistent with expected

utility maximization at some utility function Upc, ⃗y; wq “ βpwqY ´ λpwqC if and only if, for all w P W ,

max µ p0, w, xq ď min µY1 ´Y0 p1, w, xq,


xPX 0 pwq Y1 ´Y0 xPX 1 pwq

where

µ “ mintµY1 ´Y0 p0, w, xq : Pr⃗Y p ¨ | 0, w, xq P B0,w,x u,


Y1 ´Y0

µY1 ´Y0 p1, w, xq “ maxtµY1 ´Y0 p1, w, xq : Pr⃗Y p ¨ | 1, w, xq P B1,w,x u.

Proof. This result follows immediately from applying the inequalities in Theorem 1.2.1. Over
λpwq
pw, xq P W ˆ X such that π1 pw, xq ą 0, βpwq ď µY1 ´Y0 p1, w, xq must be satisfied: Analogously, over
λpwq
pw, xq P W ˆ X such that π0 pw, xq ą 0, βpwq ě µY1 ´Y0 p0, w, xq must be satisfied. The result is then

immediate.

In a treatment decision with a binary outcome, Theorem A.3.1 immediately implies negative

results about the testability of expected utility maximization behavior that are analogous to those

stated in the main text for a screening decision with binary outcomes. I state these negative results

as a corollary. Define

P⃗Y p0, 1 | 0, w, xq ´ P⃗Y p1, 0 | 0, w, xq “


! )
min Pr⃗Y p0, 1 | 0, w, xq ´ Pr⃗Y p1, 0 | 0, w, xq : Pr⃗Y p ¨ | 0, w, xq P B0,w,x

and

P⃗Y p0, 1 | 1, w, xq ´ P⃗Y p1, 0 | 1, w, xq “


! )
max Pr⃗Y p0, 1 | 1, w, xq ´ Pr⃗Y p1, 0 | 1, w, xq : Pr⃗Y p ¨ | 1, w, xq P B1,w,x .

Corollary A.3.1. Consider a treatment decision with a binary outcome. The decision maker’s choices

are consistent with expected utility maximization at some utility function Upc, ⃗y; wq “ βpwqy ´ λpwqc if

either:

(i) All characteristics affect utility (i.e., X “ H) and P⃗Y p0, 1 | 0, wq ´ P⃗Y p1, 0 | 0, wq ď P⃗Y p0, 1 |

1, wq ´ P⃗Y p1, 0 | 1, wq for all w P W .

145
(ii) The researcher’s bounds on the choice-dependent potential outcome probabilities are uninformative,
ř
meaning that for both c P t0, 1u, Bc,w,x is the set of all Pr⃗Y p ¨ | c, w, xq satisfying yc̃ PY Pr⃗Y p ¨, yc̃ |

c, w, xq “ PYc p ¨ | c, w, xq for all Pr⃗Y p ¨ | c, w, xq P Bc,w,x .

Proof. Case (i) follows immediately from Theorem A.3.1. Case (ii) follows since under uninfor-

mative bounds on the missing data, P⃗Y p0, 1 | 1, w, xq ´ P⃗Y p1, 0 | 1, w, xq “ PY1 p1 | 1, w, xq and

P⃗Y p0, 1 | 0, w, xq ´ P⃗Y p1, 0 | 0, w, xq “ ´PY0 p1 | 0, w, xq.

Binary-Valued Utility Function

I analyze the conditions under which the decision maker’s choices are consistent with expected

utility maximization at a utility function that is a simple function over Y . That is, for some known

Ỹ Ď Y , define Ỹc “ 1tYc P Ỹ u and Ỹ “ CỸ1 ` p1 ´ CqỸ0 . Consider the class of utility functions
of the form upc, ⃗y; wq :“ upc, ỹ; wq. For this class of utility function, the decision maker faces a

treatment decision with a binary outcome, and so the previous analysis applies if we further

assume that upc, ỹ; wq “ βpwqỹ ´ λpwqc.

A.3.2 ϵw -Approximate Expected Utility Maximization

The expected utility maximization model in the main text assumes that the decision maker

exactly maximizes expected utility given their preferences and beliefs about the outcome given

the characteristics and some private information. The decision maker, however, may suffer from

various cognitive limitations that prevent them from exact maximization. That is, the decision

maker may be boundedly rational and therefore make choices to “satisfice” rather than fully

optimize (Simon, 1955, 1956). I next weaken the definition of expected utility maximization to

allow the decision maker to be an ϵw -approximate expected utility maximizer.

Definition A.3.1 (ϵw -approximate expected utility maximization). The decision maker’s choices are

consistent with ϵw -approximate expected utility maximization if there exists a utility function U P U ,

ϵw ě 0 for all w P W , and joint distribution pW, X, V, C, ⃗


Yq „ Q satisfying:

i. Approximate Expected Utility Maximization: For all c P t0, 1u, c1 ‰ c, pw, x, vq P W ˆ X ˆ V such

146
that Qpc | w, x, vq ą 0,
” ı ” ı
EQ Upc, ⃗
Y; Wq | W “ w, X “ x, V “ v ě EQ Upc1 , ⃗
Y; Wq | W “ w, X “ x, V “ v ´ ϵw .

(ii) Information Set, (iii) Data Consistency.

That is, the decision-maker’s choices are consistent with ϵw -approximate expected utility maxi-

mization if there choices are within ϵw ě 0 of being optimal. This is analogous to the proposed

measure of violations of expected utility maximization in consumer optimization in Echenique

et al. (2021). Allen and Rehbeck (2020) propose a measure of whether a decision maker’s choices

in consumer optimization are ϵ-rationalizable by a quasi-linear utility function, whereas my

interest is focused on how well the decision maker’s choices are approximated by expected utility

maximization in empirical treatment decisions. Apesteguia and Ballester (2015) propose a “swaps-

index” to measure how much an observed preference relation violates utility maximization and

expected utility maximization, which summarizes the number of choices that must be swapped

in order to rationalize the data. Notice that the special case with ϵw “ 0 nests the definition of

expected utility maximization given in the main text (Definition 1.2.3).

Theorem 1.2.1 directly extends to characterize ϵw -approximate expected utility maximization

behavior in treatment decisions.

Theorem A.3.2. The decision maker’s choices are consistent with ϵw -expected utility maximization if

and only if there exists a utility function U P U , ϵw ě 0 for all w P W , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and

Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X satisfying


” ı ” ı
EQ Upc, ⃗
Y; Wq | C “ c, W “ w, X “ x ě EQ Upc1 , ⃗
Y; Wq | C “ c, W “ w, X “ x ´ ϵw (A.1)

for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where the joint distribution

pW, X, C, ⃗
Yq „ Q is given by Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.

Proof. The proof follows the same argument as the proof of Theorem 1.2.1.

In words, the decision maker’s choices are consistent with ϵw -approximate expected utility

maximization if and only if they approximately satisfy the revealed preference inequalities derived

in the main text.

147
The value of Theorem A.3.2 comes in applying the approximate revealed preference inequalities

to analyze particular decision problems. I next use this result to characterize whether the decision

maker’s choices are consistent with approximate expected utility maximization at strict preferences

in a screening decision with a binary outcome.

Theorem A.3.3. Consider a screening decision with a binary outcome. Assume PY˚ p1 | 1, w, xq ă 1 for

all pw, xq P W ˆ X with π1 pw, xq ą 0. The decision maker’s choices are consistent with ϵw -approximate

expected utility maximization at some strict preference utility function if and only if, for all w P W , there

exists ϵ̃w ě 0 satisfying

max PY˚ p1 | 1, w, xq ´ ϵ̃w ď min PY˚ p1 | 0, w, xq ` ϵ̃w .


xPX 1 pwq xPX 0 pwq

where X 1 pwq :“ tx P X : π1 pw, xq ą 0u and X 0 pwq :“ tx P X : π0 pw, xq ą 0u.

Proof. The proof follows the same argument as the Proof of Theorem 1.3.1, which I provide

for completeness. For all pw, xq P W ˆ X with π1 pw, xq ą 0, Theorem A.3.2 requires that
Up0,0;wq
PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq ` ϵw
´Up0,0;wq´Up1,1;wq . Analogously, for all pw, xq P W ˆ X with
Up0,0;wq ϵw
π0 pw, xq ą 0, Theorem A.3.2 requires that Up0,0;wq`Up1,1;wq ´ ´Up0,0;wq´Up1,1;wq ď PY˚ p1 | 0, w, xq.
ϵw
The result is immediate after defining ϵ̃w “ ´Up0,0;wq´Up1,1;wq ě 0.

That is, the decision maker’s choices are consistent with ϵw -approximate expected utility maxi-

mization if and only if the decision maker is acting as if she applies an approximate, incomplete

threshold rule in selecting her choices. This means that the observed choice-dependent outcome

probabilities and bounds on the unobservable choice-dependent outcome probabilities must satisfy

a relaxation of the inequalities given in Theorem 1.3.1. As shown in the proof, the relaxation
ϵw
satisfies ϵ̃w “ ´Up0,0;wq´Up1,1;wq . By further normalizing the scale of the decision maker’s utility

´Up0, 0; wq ´ Up1, 1; wq “ 1, it follows ϵ̃w “ ϵw . Notice that as ϵw grows large for each w P W ,

the decision maker’s choices are always rationalizable under ϵw -approximate expected utility

maximization. Therefore, searching for the minimal value ϵ̃w ě 0 such that the inequalities are

satisfied provides a simple summary measure of how “far from optimal” are the decision maker’s

choices (in an expected utility sense). This minimal value is given by


ˆ ˙
1
ϵ̃w “ max PY˚ p1 | 1, w, xq ´ min PY˚ p1 | 0, w, xq ,
2 xPX 1 pwq xPX 0 pwq `

148
where pAq` “ maxtA, 0u. Alternatively, the minimal value can also be characterized as the

smallest ϵ̃w ě 0 that satisfies the following system of moment inequalities

PY˚ p1 | 1, w, xq ´ PY˚ p1 | 0, w, x1 q ´ 2ϵ̃w ď 0

for all w P W , px, x1 q P X .

Finally, the implied revealed preference inequalities over partitions of the excluded charac-

teristics x P X given in Corollary 1.3.4 can analogously be extended. In particular, for partitions

Dw : X Ñ t1, . . . , Nd u, if the decision maker’s choices are consistent with ϵw -approximate expected

utility maximization, then there exists ϵ̃w ě 0 satisfying

max PY˚ p1 | 1, w, dq ´ ϵ̃w ď min PY˚ p1 | 0, w, dq ` ϵ̃w .


xPX 1 pwq xPX 0 pwq

Therefore, researchers can again characterize the implied relaxation ϵ̃w ě 0 over the constructed

partitions Dw pXq.

A.3.3 Expected Utility Maximization with Inaccurate Beliefs after Dimension Reduc-
tion

In this section, I show that the identification result for expected utility maximization behavior with

inaccurate beliefs extends to coarsening the excluded characteristics. Let Dw : X Ñ t1, . . . , dw u be

a function that partitions the observable characteristics X into level sets tx P X : Dw pxq “ du. The

next result follows by applying iterated expectations to Lemma A.5.3.

Proposition A.3.1. Suppose the decision maker’s choices are consistent with expected utility maximization

behavior at inaccurate beliefs and some utility function U P U . Then, for each w P W , d P t1, . . . , Nd u,

c P t0, 1u and c1 ‰ c,

ÿ ÿ
QC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq,
⃗yPY Nc ⃗yPY Nc

149
where
¨ ˛
ÿ
QC,⃗Y pc, ⃗y | w, Dw pxq “ dq “ ˝ PC pc | ⃗y, w, xqQ⃗Y p⃗y | w, xqPpx | wq‚{PpDw pxq “ d | wq,
x : Dw pxq“d

Pr⃗Y p⃗y | c, w, xqπc pw, xq


PC pc | ⃗y, w, xq “ ř .
c PC
1 Pr⃗ p⃗y | c1 , w, xqπc1 pw, xq
Y

Provided that PC,⃗Y pc, ⃗y | w, xq ą 0 for all pc, ⃗yq P C ˆ Y Nc and pw, xq P W ˆ X , Proposition A.3.1

can be recast as checking whether there exists non-negative weights ωpc, ⃗y; w, dq ě 0 satisfying

ÿ
ωpc, ⃗y; w, dqPC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq ě
⃗yPY Nc
ÿ
ωpc, ⃗y; w, dqPC,⃗Y pc, ⃗y | w, Dw pxq “ dqUpc, ⃗y; wq
⃗yPY Nc
” ı
and EP ωpC, ⃗
Y; W, Dw pXqq | W “ w, Dw pxq “ d “ 1.

I next apply this result in a screening decision with a binary outcome. In this special case,

this result may be applied to derive bounds on the decision maker’s reweighed utility threshold

through

ωp0, 0; w, dqUp0, 0; wq
PY˚ p1 | 1, w, dq ď ď PY˚ p1 | 0, w, dq, (A.2)
ωp0, 0; w, dqUp0, 0; wq ` ωp1, 1; w, dqUp1, 1; wq

where PY˚ py˚ | c, w, dq :“ PpY ˚ “ y˚ | C “ c, W “ w, Dw pXq “ dq. Next, define M “ 1tC “


ωp0,0;w,dqUp0,0;wq
0, Y ˚ “ 0u ` 1tC “ 1, Y ˚ “ 1u, τpw, dq “ ωp0,0;w,dqUp0,0;wq`ωp1,1;w,dqUp1,1;wq . Examining w P W ,

d, d1 P t1, . . . , Nd u, we arrive at
QpC“1,Y ˚ “1|M“1,w,dq{QpC“0,Y ˚ “0|M“1,w,dq
p1 ´ τpw, dqq{τpw, dq QpC“1,Y ˚ “1|M“1,w,d1 q{QpC“0,Y ˚ “0|M“1,w,d1 q
“ PpC“1,Y ˚ “1|M“1,w,dq{PpC“0,Y ˚ “0|M“1,w,dq
. (A.3)
p1 ´ τpw, d1 qq{τpw, d1 q ˚ 1 ˚ 1
PpC“1,Y “1|M“1,w,d q{PpC“0,Y “0|M“1,w,d q

By examining values in the identified set of reweighted utility thresholds defined on the coarsened

characteristic space, bounds may be constructed on a parameter that summarizes the decision

maker’s beliefs about her own “ex-post mistakes.” That is, how does the decision maker’s belief

about the relative probability of choosing C “ 0 and outcome Y ˚ “ 0 occurring vs. choosing C “ 1

and outcome Y ˚ “ 1 occurring compare to the true probability? If these bounds lie everywhere

below one, then the decision maker’s beliefs are under-reacting to variation in risk across the cells

pw, dq and pw, d1 q. If these bounds lie everywhere above one, then the decision maker’s beliefs are

150
over-reacting.

A.3.4 The Policymaker’s First-Best Decision Rule

Consider a policymaker with social welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 as in Section

1.6.1. I construct an algorithmic decision rule based on analyzing how the policymaker would

make choices herself in the binary screening decision. Rambachan et al. (2021) refer to this as the

“first-best problem” in their analysis of algorithmic decision rules.

Due to the missing data problem, the conditional probability of Y ˚ “ 1 given the characteristics

is partially identified and I assume the policymaker adopts a max-min evaluation criterion to

evaluate decision rules. Let p˚ pw, xq P r0, 1s denote the probability the policymaker selects C “ 1

given W “ w, X “ x. At each pw, xq P W ˆ X , the policymaker chooses p˚ pw, xq to maximize

min p˚ pw, xq PrY˚ p1 | w, xqU ˚ p1, 1q ` p1 ´ p˚ pw, xqqp1 ´ PrY˚ p1 | w, xqqU ˚ p0, 0q
PrY˚ p1|w,xq

s.t. PY˚ p1 | w, xq ď PrY˚ p1 | w, xq ď PY˚ p1 | w, xq.

Proposition A.3.2. Consider a binary screening decision and a policymaker with social welfare function

U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0, who chooses p˚ pw, xq P r0, 1s to maximize her worst-case expected utility.
U ˚ p0,0q
Defining τ ˚ pU ˚ q :“ U ˚ p0,0q`U ˚ p1,1q , her max-min decision rule is
$




’1 if PY˚ p1 | w, xq ď τ ˚ ,

&
p˚ pw, x; U ˚ q “ 0 if PY˚ p1 | w, xq ě τ ˚ ,





%τ ˚ if PY˚ p1 | w, xq ă τ ˚ ă PY˚ p1 | w, xq.

Proof. To show this result, I consider cases.

Case 1: Suppose PpY ˚ “ 1 | W “ w, X “ xq ď τ ˚ . In this case,

PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ě PpY ˚ “ 0 | W “ w, X “ xqU ˚ p0, 0q

for all PpY ˚ “ 1 | W “ w, X “ xq satisfying PpY ˚ “ 1 | W “ w, X “ xq ď PpY ˚ “ 1 | W “ w, X “

xq ď PpY ˚ “ 1 | W “ w, X “ xq. Therefore, it is optimal to set p˚ pw, xq “ 1 in this case.

151
Case 2: Suppose PpY ˚ “ 1 | W “ w, X “ xq ě τ ˚ . In this case,

PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ď PpY ˚ “ 0 | W “ w, X “ xqU ˚ p0, 0q

for all PpY ˚ “ 1 | W “ w, X “ xq satisfying PpY ˚ “ 1 | W “ w, X “ xq ď PpY ˚ “ 1 | W “ w, X “

xq ď PpY ˚ “ 1 | W “ w, X “ xq. Therefore, it is optimal to set p˚ pw, xq “ 0 in this case.

Case 3: Suppose PpY ˚ “ 1 | W “ w, X “ xq ă τ ˚ ă PpY ˚ “ 1 | W “ w, X “ xq. Begin by

noticing that p˚ pw, xq “ τ ˚ delivers constant expected payoffs for all PpY ˚ “ 1 | W “ w, X “ xq

satisfying PpY ˚ “ 1 | W “ w, X “ xq ď PpY ˚ “ 1 | W “ w, X “ xq ď PpY ˚ “ 1 | W “ w, X “ xq.

As a function of PpY ˚ “ 1 | W “ w, X “ xq and p˚ pw, xq, expected social welfare equals

p˚ pw, xqPpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ` p1 ´ p˚ pw, xqqPpY ˚ “ 0 | W “ w, X “ xqU ˚ p0, 0q.

The derivative with respect to PpY ˚ “ 1 | W “ w, X “ xq equals p˚ pw, xqU ˚ p1, 1q ´ p1 ´

p˚ pw, xqqU ˚ p0, 0q, which equals zero if p˚ pw, xq “ τ ˚ . Moreover, worst case expected social
U ˚ p0,0qU ˚ p1,1q
welfare at p˚ pw, xq “ τ ˚ is equal to the constant U ˚ p0,0q`U ˚ p1,1q . I show that any other choice of

p˚ pw, xq delivers strictly lower worst-case expected social welfare in this case.

Consider any p˚ pw, xq ă τ ˚ . At this choice, expected social welfare is minimized at PpY ˚ “

1 | W “ w, X “ xq. But, at PpY ˚ “ 1 | W “ w, X “ xq, the derivative of expected social welfare

with respect to p˚ pw, xq equals PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ´ p1 ´ PpY ˚ “ 1 | W “ w, X “

xqqU ˚ p0, 0q, which is strictly positive since PpY ˚ “ 1 | W “ w, X “ xq ă τ ˚ . This implies that

p˚ pw, xqPpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ` p1 ´ p˚ pw, xqqp1 ´ PpY ˚ “ 1 | W “ w, X “ xqqU ˚ p0, 0q ă

τ ˚ PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ` p1 ´ τ ˚ qp1 ´ PpY ˚ “ 1 | W “ w, X “ xqqU ˚ p0, 0q “


U ˚ p0, 0qU ˚ p1, 1q
.
U ˚ p0, 0q ` U ˚ p1, 1q
Therefore, worst-case expected social welfare for any p˚ pw, xq ă τ ˚ is strictly less than worst-case

expected social welfare at p˚ pw, xq “ τ ˚ .

Consider an p˚ pw, xq ą τ ˚ . At this choice, expected social welfare is minimized at PpY ˚ “

1 | W “ w, X “ xq. But, at PpY ˚ “ 1 | W “ w, X “ xq, the derivative of expected social welfare

with respect to p˚ pw, xq equals PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ´ p1 ´ PpY ˚ “ 1 | W “ w, X “

152
xqqU ˚ p0, 0q, which is strictly negative since PpY ˚ “ 1 | W “ w, X “ xq ą τ ˚ . This implies that

p˚ pw, xqPpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ` p1 ´ p˚ pw, xqqp1 ´ PpY ˚ “ 1 | W “ w, X “ xqqU ˚ p0, 0q ă

τ ˚ PpY ˚ “ 1 | W “ w, X “ xqU ˚ p1, 1q ` p1 ´ τ ˚ qp1 ´ PpY ˚ “ 1 | W “ w, X “ xqqU ˚ p0, 0q “


U ˚ p0, 0qU ˚ p1, 1q
.
U ˚ p0, 0q ` U ˚ p1, 1q
Therefore, worst-case expected social welfare for any p˚ pw, xq ą τ ˚ is strictly less than worst-case

expected social welfare at p˚ pw, xq “ τ ˚ .

The policymaker makes choices based on a threshold rule, where the threshold τ ˚ depends on the

relative costs to ex-post errors assigned by the social welfare function. If the upper bound on the

probability of Y ˚ “ 1 conditional on the characteristics is sufficiently low, then the policymaker

chooses C “ 1 with probability one. If the lower bound on the probability of Y ˚ “ 1 is sufficiently

high, then the policymaker chooses C “ 0 with probability one. Otherwise, if the identified set for

PpY “ 1 | W “ w, X “ xq contains the threshold τ ˚ , the policymaker randomizes her decision and

selects C “ 1 with probability exactly equal to τ ˚ .

In my empirical analysis in Section 1.6.2, I evaluate the choices of judges against this first-

best decision rule applied to each cell of payoff relevant characteristics W and each decile of

predicted risk Dw pXq. The bounds on the probability of Y ˚ “ 1 conditional on the characteristics

is constructed using the quasi-random assignment of judges as discussed in Section 1.5.3, and the

threshold τ ˚ varies as the social welfare function U ˚ p0, 0q, U ˚ p1, 1q varies.

A.4 Additional Results for the Econometric Framework

In this section of the appendix, I provide additional results that are useful for implementing the

econometric framework on testing for and characterizing systematic prediction mistakes.

A.4.1 Constructing Bounds on the Missing Data through a Quasi-Random Instrument

In this section, I modify Assumption 1.3.1 to only impose that the instrument be quasi-randomly

assigned conditional on some additional characteristics t P T with finite support. The joint

153
distribution pW, X, T, Z, C, Y ˚ q „ P satisfies

pW, X, Y ˚ q KK Z | T (A.4)

and Ppw, x, t, zq ą 0 for all pw, x, t, zq P W ˆ X ˆ T ˆ Z . This is useful as in my empirical

application to pretrial release decisions in New York City, bail judges are quasi-randomly assigned

to cases within a court-by-time cell. See Section 1.5 for details.

Under (A.4), researchers can derive bounds on the unobservable choice-dependent outcome

probabilities in a screening decision with a binary latent outcome. By iterated expectations,

ÿ ÿ
PY˚ p1 | w, x, zq “ PY˚ p1 | w, x, z, tqPpt | w, x, zq “ PY˚ p1 | w, x, z̃, tqPpt | w, x, zq,
tPT tPT

where the last equality follows by quasi-random assignment. Furthermore, for each value of t P T

and z P Z , PY˚ p1|w, x, z, tq is bounded by

PC,Y˚ p1, 1 | w, x, z, tq ď PY˚ p1 | w, x, z, tq ď π0 pw, x, z, tq ` PC,Y˚ p1, 1 | w, x, z, tq.

Therefore, for a given z P Z , valid lower and upper bounds on PY˚ p1 | w, x, zq are given by

E rPC,Y˚ p1, 1 | W, X, z̃, Tq | W “ w, X “ x, Z “ zs ď PY˚ p1 | w, x, zq,

PY˚ p1 | w, x, zq ď E rPC,Y˚ p1, 1 | W, X, z̃, Tq ` π0 pW, X, z̃, Tq | W “ w, X “ x, Z “ zs

for any z̃ P Z . Since PC,Y˚ p1, 1 | w, x, zq is observed, this naturally implies bounds on PC,Y˚ p0, 1 |

w, x, zq. This in turn gives a bound on PY˚ p1 | 0, w, x, zq since πc pw, x, zq is also observed (assuming

πc pw, x, zq ą 0).

A.4.2 Testing Expected Utility Maximization Behavior in Treatment Decisions

In this section, I extend the econometric framework for analyzing screening decisions in Section

1.3 to treatment decisions. First, I discuss how the researcher may construct bounds on the

unobservable choice-dependent potential outcome probabilities using an instrument. Second, I

discuss how testing the revealed preference inequalities in these settings reduces to testing many

moment inequalities with nuisance parameters that enter linearly.

154
Constructing Bounds with an Instrument in Treatment Decisions

As in the main text, let Z P Z be a finite support instrument. Assume that the joint distribution

pW, X, Z, C, ⃗
Yq „ P satisfies pW, X, ⃗
Yq KK Z and PpW “ w, X “ x, Z “ zq ą 0 for all pw, x, zq P

W ˆ X ˆ Z . In this case, the conditional joint distribution pC, ⃗Yq | W, X, Z is partially identified
and the next result provides bounds on this quantity.

Proposition A.4.1. Consider any pw, x, zq P W ˆ X ˆ Z . If PrC,⃗Y p¨, ¨ | w, x, zq P H P pPC,⃗Y p¨, ¨ | w, x, zqq,

then it satisfies

(i) For all y P Y ,

ÿ
PrC,⃗Y p1, y0 , y | w, x, zq “ PC,Y1 p1, y | w, x, zq
y0 PY
ÿ
PrC,⃗Y p0, y, y1 | w, x, zq “ PC,Y0 p0, y | w, x, zq.
y1 PY

(ii) For all z̃ P Z , ⃗y P Y ˆ Y ,

0 ď PrC,⃗Y p1, ⃗y | w, x, zq ` PrC,⃗Y p0, ⃗y | w, x, zq ď PC,Y1 p1, y1 | w, x, z̃q ` PC,Y0 p0, y0 | w, x, z̃q.

Proof. Consider a particular value pw, x, zq P W ˆ Z . Notice that P⃗Y p ¨ | w, x, zq “ P⃗Y p ¨ | w, x, z̃q

by random assignment of the instrument. Furthermore, notice that P⃗Y p⃗y | w, x, zq “ PC,⃗Y p0, ⃗y |

w, x, zq ` PC,⃗Y p1, ⃗y | w, x, zq, where

0 ď PC,⃗Y p0, ⃗y | w, x, zq ď PC,Y0 p0, y0 | w, x, zq,

0 ď PC,⃗Y p1, ⃗y | w, x, zq ď PC,Y1 p0, y1 | w, x, zq.

Therefore, we observe that P⃗Y p⃗y | w, x, zq must be bounded below by 0 (trivially) and bounded

above by PC,Y0 p0, y0 | w, x, zq ` PC,Y1 p0, y1 | w, x, zq for each z P Z . The result is then immediate by

also requiring data consistency.

These simple bounds are non-sharp, but they can be tightened by using Arstein’s Theorem.

Results in Russell (2019) and Kitagawa (2020) characterize the sharp identified set of potential

outcomes in treatment assignment problems and imply sharp bounds on P⃗Y p ¨ | w, x, zq. We can

replace the non-sharp bounds in (ii) in Proposition A.4.1 with these sharp bounds to obtain tight

155
bounds on the conditional joint distribution pC, ⃗
Yq | W, X, Z. The number of inequalities in these

sharp bounds grow exponentially in the support of the potential outcomes and equals 2Y ˆY .

Testing for Prediction Mistakes Reduces to Testing Many Moment Inequalities with Linear

Nuisance Parameters

Suppose the researcher wishes to test whether the decision maker’s choices are consistent with

expected utility maximization behavior at some utility function U P U in a general screening

decision, where recall that U is the set of feasible utility functions specified by the researcher.

Denote this null hypothesis by H0 pU q As a stepping stone, I provide a reduction to show how

the researcher may test whether the decision maker’s choices are consistent with expected utility

maximization behavior at a particular utility function U P U . Denote this particular null hypothesis

as H0 pUq. As discussed in Bugni et al. (2015), the researcher may construct a conservative test of

H0 pU q by constructing a confidence interval for the identified set of utility functions through test

inversion on H0 pUq and checking whether this confidence interval is empty.

With this in mind, consider a fixed utility function U P U and suppose the researcher constructs

bounds on the unobservable choice-dependent outcome probabilities using a randomly assigned

instrument. Testing H0 pUq is equivalent to testing a possibly high-dimensional set of moment

inequalities with linear nuisance parameters. I will prove this result for the non-sharp bounds

stated in Proposition A.4.1 but the result extends to using sharp bounds based on Arstein’s

Inequality.

Proposition A.4.2. Let Ny :“ |Y |. Assume there is a randomly assigned instrument. The decision

maker’s choices at z P Z are consistent with expected utility maximization behavior at utility function

U : t0, 1u ˆ Y ˆ Y ˆ W Ñ R if there exists δ̃ P R2dw dx Ny pNy ´1q satisfying

ÃpUqp¨,1:mq µpPq ` ÃpUqp¨,´p1:mqq δ̃ ď b,

where ÃpUq is a matrix of known constants that depend on the specified utility function U, b is a vector of

known constants, µpPq is a m :“ 2dw d x Ny ` dw d x Ny2 pNz ´ 1q dimensional vector that collects together

the observable moments and bounds based on the instrument.2

2 For a matrix B, the notation Bp¨,1:mq denotes the submatrix containing the first m columns of B and Bp¨,´1:mq

156
Proof. Applying the non-sharp bounds in Proposition A.4.1, I begin by restating Lemma A.5.1 in

terms of the nuisance parameters PrC,⃗Y p ¨ | w, x, zq P ∆pC ˆ Y ˆ Y q with entries


¨ ˛
P pc , ⃗y | w, x, zq
˚ C,⃗Y 1 1 ‹
˚ .
.

˚
˚ . ‹

˚ ‹
˚ P pc , ⃗y | w, x, zq ‹
˚ C,⃗Y 1 Ny2 ‹
.
˚ ‹
PrC,⃗Y p ¨ | w, x, zq “ ˚
˚ .. ‹.

˚ ‹
˚ ‹
˚ P pc N , ⃗y1 | w, x, zq ‹
˚ C,Y c⃗ ‹
˚ .. ‹
.
˚ ‹
˚ ‹
˝ ‚
PC,⃗Y pc Nc , ⃗y Ny2 | w, x, zq

Lemma A.4.1. The decision maker’s choices at z P Z are consistent with expected utility maximization

behavior at utility function U if for all pw, xq P W ˆ X there exists PrC,⃗Y p ¨ | w, x, zq P ∆pC ˆ Y ˆ Y q

satisfying

i. For all c P t0, 1u, c̃ ‰ c,

ÿ ÿ
PrC,⃗Y pc, ⃗y | w, x, zqUpc, ⃗y; w, zq ě PrC,⃗Y pc, ⃗y | w, x, zqUpc̃, ⃗y; w, zq.
⃗y ⃗y

ř ř
ii. For all y P Y , y1 PrC,⃗Y p0, y, y1 | w, x, zq “ PC,Y0 p0, y | w, x, zq and y0 PrC,⃗Y p1, y0 , y | w, x, zq “

PC,Y1 p1, y | w, x, zq.

iii. For all z̃ P Z , ⃗y P Y ˆ Y ,

0 ď PrC,⃗Y p1, ⃗y | w, x, zq ` PrC,⃗Y p0, ⃗y | w, x, zq ď PC,Y1 p1, y1 | w, x, z̃q ` PC,Y0 p0, y0 | w, x, z̃q.

For each c P t0, 1u and c̃ ‰ c, define the 1 ˆ Ny2 dimensional row vector Acw,x,z pUq as
´ ¯
Acw,x,z pUq “ Upc̃, ⃗y1 ; w, zq ´ Upc, ⃗y1 ; w, zq, . . . , Upc̃, ⃗y Ny ; w, zq ´ Upc, ⃗y Ny ; w, zq .

For each pw, xq P W ˆ X , define the 2 ˆ 2Ny2 dimensional block diagonal matrix Aw,x,z pUq as
¨ ˛
0
˚ Aw,x,z pUq
Aw,x,z pUq “ ˝ ‚.

A1w,x,z pUq

denotes the submatrix containing all columsn except the first m of B.

157
Define the 2dw d x ˆ 2dw d x Ny2 dimensional block diagnoal matrix Az pUq as
¨ ˛
Aw1 ,x1 ,z pUq
˚ ‹
˚ ‹
˚ Aw1 ,x2 ,z pUq ‹
Az pUq “ ˚ ‹.
˚ ‹
..
˚
˚ . ‹

˝ ‚
Awdw ,xdx ,z pUq
´ ¯
Letting PrC,⃗Y p ¨ | zq “ PC,⃗Y p ¨ | w1 , x1 , zq, . . . , PC,⃗Y p ¨ | wdw , xdx , zq , the revealed preference con-

straints in (i) of Lemma A.4.1 can be rewritten as

Az pUq PrC,⃗Y p ¨ | zq ď 0,

where PrC,⃗Y p ¨ | zq is a 2dw d x Ny2 ˆ 1 dimensional vector.

We may construct 2dw d x Ny ˆ 2dw d x Ny2 dimensional matrix Bz,eq that forms the data consistency

conditions in (ii) of Lemma A.4.1. For each z̃ P Z , we may construct the dw d x Ny2 ˆ 2dw d x Ny2

matrices Bz,z̃ that forms the upper bounds in (iii) of Lemma A.4.1. Stack these together to form

Bz , Finally, define dw ¨ ˆ2dw d x Ny2 matrix Dz,eq that imposes PrC,⃗Y p ¨ | w, x, zq sums to one and the

2dw d x Ny2 ˆ 2dw d x Ny2 matrix Dz,` that imposes each element of PrC,⃗Y p ¨ | w, x, zq is non-negative.

We next introduce non-negative slack parameters associated with each inequality constrain in

(ii)-(iii), and rewrite (ii)-(iii) in Lemma A.4.1 as


¨ ˛¨ ˛
˚ Bz,eq 0‹ ˚ PC,⃗Y p ¨ | zq‹
r
˝ ‚˝ ‚ “ µpPq
Bz 1 sz
looooomooooon looooooomooooooon
:“Bz :“δ

where µpPq is the vector that collects together the observable data and the bounds, and δ is a

2dw d x Ny2 ` dw d x Ny2 pNz ´ 1q dimensional vector.

Therefore, we observe that Lemma A.4.1 implies that the decision maker’s choices at z P Z are

consistent with expected utility maximization behavior at utility function U if and only if there

158
exists vector δ satisfying
¨ ˛ ¨ ˛
Az pUq 0 0
˚ ‹ ˚ ‹
˚ ‹ ˚ ‹
˚ Dz,eq 0 ‹ ˚1‹
˚ ‹ ˚ ‹
˚ ‹ ˚ ‹
0 ‹ ‹ δ ď ˚´1‹ and Bz δ “ µpPq.
˚´D ˚ ‹
˚ z,eq
˚ ‹ ˚ ‹
˚ Dz,` 0 ‹ ˚0‹
˚ ‹ ˚ ‹
˝ ‚ ˝ ‚
0 Dz,` 0
looooooooomooooooooon loomoon
:“ Ãz pUq :“b

The matrix Bz has full row rank, nrowpBz q “ 2dw d x Ny ` dw d x Ny2 pNz ´ 1q, ncolpBz q “ 2dw d x Ny2 `
¨ ˛
˚ Bz ‹
dw d x Ny2 pNz ´ 1q, and so nrowpBz q ď ncolpBz q. Therefore, we may define Hz “ ˝ ‚ to be the
Γz
full-rank, square matrix that pads Bz with linearly independent rows. Then,
¨ ˛
˚µpPq‹
Ãz pUqδ “ Ãz pUqHz´1 Hz δ “ Ãz pUqHz´1 ˝ ‚,
δ̃

where δ̃ :“ Γz δ is a 2dw d x Ny pNy ´ 1q dimensional vector. This completes the proof with the slight

abuse of notation by also defining Ãz pUqHz´1 to be ÃpUq.

Proposition A.4.2 showed that testing whether the decision maker’s choices are consistent with

expected utility maximization behavior at a candidate utility function may be reduced to testing a

many moment inequalities with linear nuisance parameters. This same testing problem may be

also reduced to testing whether there exists a non-negative solution to a large, linear system of

equations, which was recently studied in Kitamura and Stoye (2018) and Fang et al. (2020).

For each w P W , define Dw : X Ñ t1, . . . , Nd u to be some function that partitions the support

of the characteristics x P X into the level sets tx : Dw pxq “ du. Through an application of iterated

expectations, if the decision maker’s choices are consistent with expected utility maximization

behavior at some utility function U, then their choices must satisfy implied revealed preference

inequalities.

Corollary A.4.1. Suppose the decision maker’s choices are consistent with expected utility maximization

159
behavior at some utility function U. Then, for all w P W and d P t1, . . . , Nd u,
” ı ” ı
EQ Upc, ⃗
Y; Wq | C “ c, W “ w, Dw pXq “ d ě EQ Upc1 , ⃗
Y; Wq | C “ c, W “ w, Dw pXq “ d ,

for all c P t0, 1u, pw, dq P W ˆ t1, . . . , Nd u with πc pw, dq :“ PpC “ c | W “ w, Dw pXq “ dq ą 0 and

c1 ‰ c, where EQ r¨s is the expectation under Q and for some Pr⃗Y p ¨ | 0, w, xq P B0,w,x , PrY˚ p ¨ | 1, w, xq P

B1,w,x such that


ÿ
QpW “ w, Dw pXq “ d, C “ c, ⃗
Y “ ⃗yq “ Pr⃗Y p⃗y | c, w, xqPpc, w, xq.
x : Dw pxq“d

Therefore, in treatment decisions, researchers may test a lower dimensional set of moment

inequalities with linear nuisance parameters that characterize the set of implied revealed preference

inequalities. This is useful as recent work develops computationally tractable and power inference

procedures for lower-dimensional moment inequality with linear nuisance parameters such as

Andrews et al. (2019), Cox and Shi (2020) and Rambachan and Roth (2020).

A.4.3 Expected Social Welfare Under the Decision Maker’s Observed Choices

Consider a policymaker with social welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0 as in Section 1.6.1.

Total expected social welfare under the decision maker’s observed choices is given by

θ DM pU ˚ q “U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ` U ˚ p0, 0qPpC “ 0q


ÿ
´ U ˚ p0, 0q PC,Y˚ p0, 1 | w, xqPpw, xq.
pw,xqPW ˆX

Since PC,Y˚ p0, 1 | w, xq is partially identified, total expected social welfare under the decision

maker’s observed choices is also partially identified and the sharp identified set is an interval.

Proposition A.4.3. Consider a screening decision with a binary outcome and a policymaker with social

welfare function U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. The sharp identified set of total expected social welfare under

the decision maker’s observed choices, denoted by H P pθ DM pU ˚ q; B q, is an interval with H P pθ DM pU ˚ q; B q “


” DM
ı
θ DM pU ˚ q, θ pU ˚ q , where

θ DM pU ˚ q “ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ` U ˚ p0, 0qPpC “ 0q ´ U ˚ p0, 0qPpC “ 0, Y ˚ “ 1q


DM
θ pU ˚ q “ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ` U ˚ p0, 0qPpC “ 0q ´ U ˚ p0, 0qPpC “ 0, Y ˚ “ 1q,

160
where

ÿ
PpC “ 0, Y ˚ “ 1q “ " max * Ppw, xq PrC,Y˚ p0, 1 | w, xq
PrC,Y˚ p0,1|w,xq : pw,xqPW ˆX
pw,xqPW ˆX

s.t. PrC,Y˚ p0, 1 | w, xq P H P pPC,Y˚ p0, 1 | w, xq; B0,w,x q @pw, xq P W ˆ X

and PpC “ 0, Y ˚ “ 1q is the optimal value of the analogous minimization problem.

The sharp identified set of the conditional probability of C “ 0, Y ˚ “ 1 given the characteristics

is an interval, and therefore the sharp identified set of total expected social welfare under the

decision maker’s observed choices can be characterized as the solution to two linear programs.

Provided the joint distribution of the characteristics pW, Xq are known, then testing the null

hypothesis that total expected social welfare is equal to some candidate value is equivalent to

testing a system of moment inequalities with a large number of nuisance parameters that enter

the moments linearly. A confidence interval for total expected social welfare under the decision

maker’s observed choices can be constructed through test inversion.

Proposition A.4.4. Consider a binary screening decision and a policymaker with social welfare function

U ˚ p0, 0q ă 0, U ˚ p1, 1q ă 0. Conditional on the characteristics pW, Xq, testing the null hypothesis

H0 : θ DM pU ˚ q “ θ0 is equivalent to testing whether


¨ ˛
´PC,Y˚ p0, 1q‹
Dδ P Rdw dx rDM pθ0 ´ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ` U ˚ p0, 0qPpC “ 0qq ` A
s.t. A p¨,1q
rDM δ ď ˚
p¨,´1q ˝ ‚,
PC,Y˚ p0, 1q

where PC,Y˚ p0, 1q, PC,Y˚ p0, 1q are the dw d x -dimensional vectors of lower and upper bounds on PC,Y˚ p0, 1 |
rDM is a known matrix.
w, xq respectively, and A

r “ 0, Y ˚ “ 1 | W “ w, X “ xq and let PrC,Y˚ p0, 1q


Proof. As notation, let PrC,Y˚ p0, 1 | w, xq :“ PpC

denote the dw d x dimensional vector with entries equal to PrC,Y˚ p0, 1 | w, xq. From the definition of

θ DM pU ˚ q, the null hypothesis H0 : θ DM pU ˚ q “ θ0 is equivalent to the null hypothesis that there

exists PrC,Y˚ p0, 1q satisfying

ÿ
´U ˚ p0, 0q PrC,Y˚ p0, 1 | w, xqPpW “ w, X “ xq “
pw,xqPW ˆX

θ0 ´ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ´ U ˚ p0, 0qPpC “ 0q

161
and for each pw, xq P W ˆ X

PpC “ 0, Y ˚ “ 1 | W “ w, X “ xq ď PrC,Y˚ p0, 1 | w, xq ď PpC “ 0, Y ˚ “ 1 | W “ w, X “ xq.


¨ ˛ ¨ ˛
˚´Pp0, 1q‹ ˚´I ‹
We can express these bounds in the form A PrC,Y˚ p0, 1q ď ˝ ‚, A “ ˝ ‚ is a known
Pp0, 1q I
matrix. Therefore, defining ℓpU ˚ q to be the dw d x dimensional vector with entries ´U ˚ p0, 0qPpW “

w, X “ xq, we observe that the null hypothesis H0 : θ DM pU ˚ q “ θ0 is equivalent to the null

hypothesis

D PrC,Y˚ p0, 1q satisfying ℓ⊺ pU ˚ q PrC,Y˚ p0, 1q “ θ0 ´ U ˚ p1, 1qPpC “ 1, Y ˚ “ 1q ´ U ˚ p0, 0qPpC “ 0q and
¨ ˛
˚´PC,Y˚ p0, 1q‹
A PrC,Y˚ p0, 1q ď ˝ ‚.
PC,Y˚ p0, 1q

Next, we apply a change of basis argument. Define the full rank matrix Γ, whose first row is equal

to ℓ⊺ pU ˚ q. Then, the null hypothesis H0 : θ DM pU ˚ q “ θ0 can be further rewritten as


¨ ˛
´PC,Y˚ p0, 1q‹
D PrC,Y˚ p0, 1q satisfying AΓ´1 Γ Pp0,
r 1q ď ˚
˝ ‚,
PC,Y˚ p0, 1q
¨ ˛ ¨ ˛
˚ Γp1,¨q PC,Y˚ p0, 1q ‹ ˚θ0 ´ U p1, 1qPpC “ 1, Y “ 1q ´ U p0, 0qPpC “ 0q‹
r ˚ ˚ ˚
where Γ PrC,Y˚ p0, 1q “ ˝ ‚“ ˝ ‚
Γp´1,¨q PrC,Y˚ p0, 1q δ
defining δ “ Γp´1,¨q PrC,Y˚ p0, 1q and A
r “ AΓ´1 .

A.5 Proofs of Main Results

Proof of Theorem 1.2.1

I prove the following Lemma, and then show that it implies Theorem 1.2.1.

Lemma A.5.1. The decision maker’s choices are consistent with expected utility maximization behavior if

and only if there exists a utility function U P U , Pr⃗Y p ¨ | 0, w, xq P B0,w,x and Pr⃗Y p ¨ | 1, w, xq P B1,w,x that

satisfies for all pw, xq P W ˆ X , c P t0, 1u, c1 ‰ c,

ÿ ÿ
Pr⃗Y p⃗y | c, w, xqPC pc | w, xqUpc, ⃗y; wq ě Pr⃗Y p⃗y | c, w, xqPC pc | w, xqUpc1 , ⃗y; w, q.
⃗yPY ˆY ⃗yPY ˆY

162
Proof of Lemma A.5.1: Necessity Suppose that the decision maker’s choices are consistent

with expected utility maximization behavior at some utility function U and joint distribution

pW, X, V, C, ⃗
Yq „ Q.

First, I show that if the decision maker’s choices are consistent with expected utility maxi-

mization behavior at some utility function U, joint distribution pW, X, V, C, ⃗


Yq „ Q and private

information with support V , then her choices are also consistent with expected utility maximiza-

tion behavior at some finite support private information. Partition the original signal space V

into the subsets Vt0u , Vt1u , Vt0,1u , which collect together the signals v P V at which the decision

maker strictly prefers C “ 0, strictly prefers C “ 1 and is indifferent between C “ 0, C “ 1

respectively. Define the finite support signal space V


r “ tvt0u , vt1u , vt0,1u u and the finite support

rPV
private information V r as

Qp
r Vr “ vt0u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt0u | ⃗
Y “ ⃗y, W “ w, X “ xq

Qp
r Vr “ vt1u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt1u | ⃗
Y “ ⃗y, W “ w, X “ xq

Qp
r Vr “ vt0,1u | ⃗
Y “ ⃗y, W “ w, X “ xq “ QpV P Vt0,1u | ⃗
Y “ ⃗y, W “ w, X “ xq.

Define QpC
r “0|V
r “ vt0u , W “ w, X “ xq “ 1, QpC
r “1|V
r “ vt1u , W “ w, X “ xq “ 1 and

QpC “ 1, V P Vt0,1u | W “ w, X “ xq
QpC
r “1|V
r “ vt0,1u , W “ w, X “ xq “ .
QpV P Vt0,1u | W “ w, X “ xq

Define the finite support expected utility representation for the decision maker by the utility
r C, ⃗
function U and the random vector pW, X, V, r where Qpw,
Yq „ Q, r x, v, c, ⃗yq “ Qpw, x, ⃗yqQp
r vr |

w, x, ⃗yqQpc
r | w, x, vrq. The information set and expected utility maximization conditions are satisfied

by construction. Data consistency is satisfied since it is satisfied at the original private information

163
V P V . To see this, notice that for all pw, x, ⃗yq P W ˆ X ˆ Y ˆ Y

PpC “ 1, ⃗
Y “ ⃗y | W “ w, X “ xq “

QpC “ 1, V “ V , ⃗
Y “ ⃗y | W “ w, X “ xq “

QpC “ 1, V P Vt1u , ⃗
Y “ ⃗y | W “ w, X “ xq ` QpC “ 1, V P Vt0,1u , ⃗
Y “ ⃗y | W “ w, X “ xq “

QpC r “ vC“1 , ⃗
r “ 1, V Y “ ⃗y | W “ w, X “ xq ` QpC r “ vC“e , ⃗
r “ 1, V Y “ ⃗y | W “ w, X “ xq “
ÿ
QpC r “ vr, ⃗
r “ 1, V r “ 1, ⃗
Y “ ⃗y | W “ w, X “ xq “ QpC Y “ ⃗y | W “ w, X “ xq.
vrPV
r

The same argument applies to PpC “ 0, ⃗


Y “ ⃗y | W “ w, X “ xq. Therefore, for the remainder of

the necessity proof, it is without loss of generality to assume the private information V P V has

finite support.

I next show that if there exists an expected utility representation for the decision maker’s

choices, then the stated inequalities in Lemma A.5.1 are satisfied by adapting the necessity

argument given the “no-improving action switches inequalities” in Caplin and Martin (2015).

Suppose that the decision maker’s choices are consistent with expected utility maximization

behavior at some utility function U and joint distribution pW, X, V, C, ⃗


Yq „ Q. Then, for each

c P t0, 1u, pw, x, vq P W ˆ X ˆ V


¨ ˛ ¨ ˛
ÿ ÿ
QC pc | w, x, vq ˝ Q⃗Y p⃗y | w, x, vqUpc, ⃗y; wq‚ ě QC pc | w, x, vq ˝ Q⃗Y p⃗y | w, x, vqUpc1 , ⃗y; wq‚
⃗yPY ˆY ⃗yPY ˆY

holds for all c ‰ c1 . If QC pc | w, x, vq “ 0, this holds trivially. If QC pc | w, x, vq ą 0, this holds

through the expected utility maximization condition. Multiply both sides by QV pv | w, xq to arrive

at ¨ ˛
ÿ
QC pc | w, x, vqQV pv | w, xq ˝ Q⃗Y p⃗y | w, x, vqUpc, ⃗y; wq‚ ě
⃗yPY ˆY
¨ ˛
ÿ
QC pc | w, x, vqQV pv | w, xq ˝ Q⃗Y p⃗y | w, x, vqUpc1 , ⃗y; wq‚.
⃗yPY ˆY

Next, use information set to write QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc | w, x, vq and arrive at
¨ ˛
ÿ
QV pv | w, xq ˝ QC,⃗Y pc, ⃗y | w, x, vqUpc, ⃗y; w, xq‚ ě
⃗yPY ˆY

164
¨ ˛
ÿ
QV pv | w, xq ˝ QC,⃗Y pc, ⃗y | w, x, vqUpc1 , ⃗y; wq‚.
⃗yPY ˆY

Finally, we use QC,⃗Y pc, ⃗y, v | w, xq “ QC,⃗Y pc, ⃗y | w, x, vqQV pv | w, xq and then further sum over

v P V to arrive at
˜ ¸ ˜ ¸
ÿ ÿ ÿ ÿ
QV,C,⃗Y pv, c, ⃗y | w, xq Upc, ⃗y; wq ě QV,C,⃗Y pv, c, ⃗y | w, xq Upc1 , ⃗y; wq
⃗yPY ˆY vPV ⃗yPY ˆY vPV
ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY

The inequalities in Lemma A.5.1 then follow from an application of data consistency.

Proof of Lemma A.5.1: Sufficiency To establish sufficiency, I show that if the conditions in

Lemma A.5.1 holds, then private information v P V can be constructed that recommends choices

c P t0, 1u and an expected utility maximizer would find it optimal to follow these recommendations

as in the sufficiency argument in Caplin and Martin (2015) for the “no-improving action switches”

inequalities.

Towards this, suppose that the conditions in Lemma A.5.1 are satisfied at some Pr⃗Y p ¨ | c, w, xq P

Bc,w,x for all c P t0, 1u, pw, xq P W ˆ X . As notation, let v P V :“ t1, . . . , 3u index the subsets in
2t0,1u “ tt0u, t1u, t0, 1uu.

For each pw, xq P W ˆ X , define Cw,x :“ tc : πc pw, xq ą 0u Ď C to be the set of choices

selected with positive probability, and partition Cw,x into subsets that have identical choice-

dependent potential outcome probabilities. There are V̄w,x ď |Cw,x | such subsets. Each subset

of this partition of Cw,x is a subset in the power set 2t0,1u , and so I associate each subset in this

partition with its associated index v P V . Denote these associated indices by the set Vw,x . Denote

the choice-dependent potential outcome probability associated with the subset labelled v by

P⃗Y p ¨ | v, w, xq P ∆pY ˆ Y q. Finally, define Q⃗Y p⃗y | w, xq “ cPt0,1u Pr⃗Y p⃗y | c, w, xqπc pw, xq.
ř

165
Define the random variable V P V according to

ÿ
QV pv | w, xq “ πc pw, xq if v P Vw,x ,
c : P⃗Y p ¨|c,w,xq“P⃗Y p ¨|v,w,xq
$
& P⃗Y p⃗y|v,w,xqQpV“v|w,xq if v P Vw,x and Q⃗ p⃗y | w, xq ą 0,

’ r
Q p⃗y|w,xq
⃗Y Y
QV pv | ⃗y, w, xq “

%0 otherwise.

Next, define the random variable C P C according to


$ ¨ ˛

’ L ÿ
&πc pw, xq


’ ˝ πc̃ pw, xq‚ if v P Vw,x and P⃗Y p ¨ | c, w, xq “
QC pc | v, w, xq “ c̃ : P⃗Y p ¨|c̃,w,xq“P⃗Y p ¨|v,w,xq

’ P⃗Y p ¨ | v, w, xq



%0 otherwise.

Together, this defines the random vector pW, X, ⃗


Y, V, Cq „ Q with the joint distribution

Qpw, x, ⃗y, v, cq “ PW,X pw, xqQ⃗Y p⃗y | w, xqQV pV “ v | ⃗y, w, xqQC pc | v, w, xq.

We now check that this construction satisfies information set, expected utility maximization and

data consistency. First, information set is satisfied since QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc |

w, x, vq by construction. Next, for any pw, xq P W ˆ X and c P Cw,x , define vc,w,x P Vw,x to be the

label satisfying P⃗Y p ¨ | c, w, xq “ P⃗Y p ¨ | v, w, xq. For PC,⃗Y pc, ⃗y | w, xq ą 0, observe that

PC,⃗Y pc, ⃗y | w, xq “

Pr⃗Y p⃗y | c, w, xqPC pc | w, xq “


ř
Q⃗Y p⃗y | vc,w,x , w, xq c̃ : P⃗Y p ¨|c̃,w,xq“ PC pc̃ | w, xq
P⃗Y p ¨|c̃,w,xq PC pc | w, xq
Q⃗Y p⃗y | w, xq ř “
Q⃗Y p⃗y | w, xq c̃ : P⃗Y p ¨|c̃,w,xq“ PC pc̃ | w, xq
P⃗Y p ¨|c̃,w,xq

Q⃗Y p⃗y | w, xqQV pvc,w,x | ⃗y, w, xqQC pc | vc,w,x , w, xq “


ÿ ÿ
Q⃗Y p⃗y | w, xqQV pv | ⃗y, w, xqQC pc | v, w, xq “ QV,C,⃗Y pv, c, ⃗y | w, xq “ QC,⃗Y pc, ⃗y | w, xq.
vPV vPV

Moreover, whenever PC,⃗Y pc, ⃗y | w, xq “ 0, Q⃗Y p⃗y | vc,w,x , w, xqQC pc | vc,w,x , w, xq “ 0. Therefore, data

166
consistency holds. Finally, by construction, for QC pC “ c | V “ vc,w,x , W “ w, X “ xq ą 0,

Qp⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “
QpV “ vc,w,x | ⃗
Y “ ⃗y, W “ w, X “ xqQp⃗Y “ ⃗y | W “ w, X “ xq

QpV “ vc,w,x | W “ w, X “ xq
r⃗
PpY “ ⃗y | C “ c, W “ w, X “ xq.

Therefore, expected utility maximization is satisfied since the inequalities in Lemma A.5.1 were

assumed to hold and data consistency holds.

Lemma A.5.1 implies Theorem 1.2.1: Define the joint distribution Q as Qpw, x, c, ⃗yq “ Pr⃗Y p⃗y |

c, w, xqPpc, w, xq. Then, rewrite conditions (i)-(ii) in Lemma A.5.1 as: for all c P t0, 1u and c1 ‰ c,

ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y | w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY

Notice that if PC pc | w, xq “ 0, then QC,⃗Y pc, ⃗y | w, xq “ 0. Therefore, the inequalities involving

c P C with πc pw, xq “ 0 are satisfied. Next, inequalities involving c P C with πc pw, xq ą 0 can be

equivalently rewritten as

ÿ ÿ
QC,⃗Y p⃗y | c, w, xqUpc, ⃗y; wq ě QC,⃗Y p⃗y | c, w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY

The statement of Theorem 1.2.1 follows by noticing that

ÿ ” ı
Q⃗Y p⃗y | c, w, xqUpc, ⃗y; wq “ EQ Upc, ⃗
Y; wq | C “ c, W “ w, X “ x ,
⃗yPY ˆY
ÿ ” ı
QY˚ p⃗y | c, w, xqUpc1 , ⃗y; wq “ EQ Upc1 , ⃗
Y; wq | C “ c, W “ w, X “ x .
⃗yPY ˆY

Proof of Theorem 1.3.1

Lemma A.5.2. The decision maker’s choices are consistent with expected utility maximization behavior

at some strict preference utility function if and only if there exists strict preference utility functions U

satisfying

Up0,0;wq
i. PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq for all pw, xq P W ˆ X with π1 pw, xq ą 0,

167
Up0,0;wq
ii. Up0,0;wq`Up1,1;wq ď PY˚ p1 | 0, w, xq for all pw, xq P W ˆ X with π0 pw, xq ą 0.

Proof. This is an immediate consequence of applying Lemma A.5.1 to analyzing expected utility

maximization at strict preferences in a screening decision with a binary outcome. For all pw, xq P
Up0,0;wq
W ˆ X with π1 pw, xq ą 0, Lemma A.5.1 requires PY˚ p1 | 1, w, xq ď Up0,0;wq`Up1,1;wq . For all
Up0,0;wq
pw, xq P W ˆ X with π0 pw, xq ą 0, Lemma A.5.1 requires Up0,0;wq`Up1,1;wq ď PY˚ p1 | 0, w, xq.

Applying the bounds PY˚ p1 | 0, w, xq ď PY˚ p1 | 0, w, xq ď PY˚ p1 | 0, w, xq then delivers the

result.

By Lemma A.5.2, the human DM’s choices are consistent with expected utility maximization

behavior if and only if there exists strict preference utility functions U satisfying

Up0, 0; wq
max PY˚ p1 | 1, w, xq ď ď min PY˚ p1 | 0, w, xq
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq xPX 0 pwq

for all w P W . These inequalities are only non-empty if the stated conditions in Theorem 1.3.1 are

satisfied. The characterization of the identified set of utility functions also follows from Lemma

A.5.2. l

Proof of Proposition 1.3.1

Under Assumption 1.3.1, PY˚ p1 | w, x, zq “ PY˚ p1 | w, x, z̃q “ PY˚ p1 | w, xq for all pw, xq P W ˆ X

and z, z̃ P Z . Furthermore, PY˚ p1 | w, xq is bounded above and below by

PC,Y˚ p1, 1 | w, x, zq ď PY˚ p1 | w, xq ď π0 pw, x, zq ` PC,Y˚ p1, 1 | w, x, zq

for all z P Z .

These bounds on PY˚ p1 | w, xq are sharp by a simple application of Arstein’s Theorem. I

drop the conditioning on pw, xq P W ˆ X to simplify notation, and so this argument applies

conditionally on each pw, xq P W ˆ X . The screening decision setting establishes the model

correspondence G : Y Ñ Z ˆ C ˆ pY Y t0uq, where Gpy˚ q “ tpz, c, yq : y “ y˚ 1tc “ 1uu. The reverse

correspondence is given by G´1 pz, c, yq “ ty˚ : y “ y˚ 1tc “ 1uu. The observable joint distribution

pZ, C, Yq „ P characterizes a random set G´1 pZ, C, Yq via the generalized likelihood TpA | Z “

zq “ P pC, Yq : G´1 pz, C, Yq X A ‰ H for all A P 2Y . Artstein’s Theorem implies that there exists
` ˘

a random variable Y ˚ that rationalizes the observed data through the model correspondence G if

168
and only if there exists some Y ˚ „ Pr satisfying

PpAq
r ď TpA | Z “ zq for all A P 2Y and z P Z .

rY˚ p ¨ | zq be the set of distributions on Y that satisfy these inequalities at a given z P Z .


Let P

A sharp characterization of identified set for the marginal distribution of Y ˚ is then given by
Ş
H P pPY˚ p ¨qq “ zPZ PrY˚ p ¨ | zq. For Y “ t0, 1u. these inequalities give for each z P Z ,

r ˚ “ 0q ď PpC “ 0 | Z “ zq ` PpC “ 1, Y “ 0 | Z “ zq
PpY

r ˚ “ 1q ď PpC “ 0 | Z “ zq ` PpC “ 1, Y “ 1 | Z “ zq.


PpY

r ˚ “ 0q ` PpY
Since PpY r ˚ “ 1q “ 1, these inequalities may be further rewritten as requiring for each

zPZ

r ˚ “ 1q ď PpC “ 0 | Z “ zq ` PpC “ 1, Y “ 1 | Z “ zq.


PpC “ 1, Y “ 1 | Z “ zq ď PpY

This delivers sharp bonds on the marginal distribution of Y ˚ conditional on any z P Z since the

instrument is assumed to be independent of the outcome of interest. The sharpness of the bounds

on PpC “ 0, Y ˚ “ 1 | W “ w, X “ x, Z “ zq immediately follows since PpC “ 1, Y ˚ “ 1 | W “

w, X “ x, Z “ zq “ PpC “ 1, Y “ 1 | W “ w, X “ xq are observed. l

Proof of Theorem 1.4.1

To prove this result, I first establish the following lemma, and then show Theorem 1.4.1 follows as

a consequence.

Lemma A.5.3. Assume Pr⃗Y p ¨ | w, xq ą 0 for all Pr⃗Y p ¨ | w, xq P H P pP⃗Y p ¨ | w, xq; Bw,x q and all pw, xq P

W ˆ X . The decision maker’s choices are consistent with expected utility maximization behavior at
inaccurate beliefs if and only if there exists a utility function U P U , prior beliefs Q⃗Y p ¨ | w, xq P ∆pY ˆ Y q

for all pw, xq P W ˆ X , Pr⃗Y p ¨ | 0, w, xq P B0,w,x , Pr⃗Y p ¨ | 1, w, xq P B1,w,x for all pw, xq P W ˆ X satisfying

for all c P t0, 1u, c1 ‰ c

ÿ ÿ
Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xqUpc, ⃗y; wq ě Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xqUpc1 , ⃗y; wq,
⃗yPY ˆY ⃗yPY ˆY

Pr⃗Y p⃗y|c,w,xqπc pw,xq


where PrC pc | ⃗y, w, xq “ and Pr⃗Y p⃗y | w, xq “ Pr⃗Y p⃗y | 0, w, xqπ0 pw, xq ` Pr⃗Y p⃗y | 1, w, xqπ1 pw, xq.
Y
Pr⃗ p⃗y|w,xq

169
Proof of Lemma A.5.3: Necessity To show necessity, we apply the same steps as the proof of

necessity for Lemma A.5.1. First, by an analogous argument as given in the proof of necessity for

Lemma A.5.1, it is without loss of generality to assume the private information V P V has finite

support. Second, following the same steps as the proof of necessity for Lemma A.5.1, I arrive at

ÿ ÿ
QC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě QC,⃗Y pc, ⃗y, w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY ⃗yPY ˆY

Then, we immediately observe that QC,⃗Y pc, ⃗y | w, xq “ QC pc | ⃗y, w, xqQ⃗Y p⃗y | w, xq “ PrC pc |

⃗y, w, xqQ⃗Y p⃗y | w, xq, where the last equality follows via data consistency with inaccurate beliefs.

Proof of Lemma A.5.3: Sufficiency To show sufficiency, suppose that the conditions in Lemma

A.5.3 are satisfied at some Pr⃗Y p ¨ | c, w, xq P Bc,w,x for c P t0, 1u, pw, xq P W ˆ X and some

Q⃗Y p ¨ | w, xq P ∆pY ˆ Y q for all pw, xq P W ˆ X .

Define the joint distribution pW, X, C, ⃗


Yq „ Pr according to Ppw,
r x, c, ⃗yq “ PrC pc | ⃗y, w, xqQ⃗Y p⃗y |

w, xqPpw, xq, where PrC p ¨ | ⃗y, w, xq is defined in the statement of the Lemma. Given the inequalities

in the Lemma, we can construct a joint distribution pW, X, V, C, ⃗


Yq „ Q to satisfy information

set, expected utility maximization behavior and data consistency (Definition 1.2.3) for the con-

structed joint distribution pW, X, C, ⃗


Yq „ Pr following the same sufficiency argument as given in

Lemma A.5.1. This constructed joint distribution pW, X, V, C, ⃗


Yq „ Q will be an expected utility

maximization representation under inaccurate beliefs.

As notation, define PrC pc | w, xq to be the probability of C “ c given W “ w, X “ x and

Pr⃗Y p⃗y | c, w, xq to be the choice-dependent potential outcome probability given C “ c, W “ w, X “ x

under the constructed joint distribution pW, X, C, ⃗ r Let v P V :“ t1, . . . , 3u index the possible
Yq „ P.

subsets in the power set 2t0,1u .


! )
For each pw, xq P W ˆ X , define Cw,x :“ c : Ppc | w, xq ą 0 Ď C to be the set of choices
r

selected with positive probability, and partition Cw,x into subsets that have identical constructed

choice-dependent outcome probabilities. There are V̄w,x ď |Cw,x | such subsets. Associate each

subset in this partition with its associated index v P V and denote the possible values as Vw,x .

Denote the choice-dependent outcome probability associated with the subset labelled v by P̃⃗Y p ¨ |

v, w, xq P ∆pY ˆ Y q.

170
Define the random variable V P V according to

ÿ
QpV “ v | w, xq “ PrC pc | w, xq if v P Vw,x ,
c : Pr⃗Y p ¨|c,w,xq“ Pr⃗Y p ¨|v,w,xq
$
& P⃗Y p⃗y|v,w,xqQpV“v|w,xq if v P Vw,x and Q⃗ p⃗y | w, xq ą 0,

’ r
Q p⃗y|w,xq
⃗Y Y
QpV “ v | ⃗y, w, xq “

%0 otherwise.

Next, define the random variable C P C according to


$
’ PrC pc|w,xq
if v P Vw,x and Pr⃗Y p ¨ | c, w, xq “ Pr⃗Y p ¨ | v, w, xq


&ř Pr pc̃|w,xq
c̃ : Pr⃗ p ¨|c̃,w,xq“ Pr⃗ p ¨|v,w,xq C
QpC “ c | v, w, xq “ Y Y


’0 otherwise.
%

Together, this defines the random vector pW, X, ⃗


Y, V, Cq „ Q, whose joint distribution is defined as

Qpw, x, ⃗y, v, cq “ Ppw, xqQ⃗Y p⃗y | w, xqQV pv | ⃗y, w, xqQC pc | v, w, xq.

We now check that this construction satisfies information set, expected utility maximization and

data consistency. First, information set is satisfied since QC,⃗Y pc, ⃗y | w, x, vq “ Q⃗Y p⃗y | w, x, vqQC pc |

w, x, vq by construction. Next, for any pw, xq P W ˆ X and c P Cw,x , define vc,w,x P Vw,x to be the

label satisfying Pr⃗Y p ¨ | c, w, xq “ Pr⃗Y p ¨ | v, w, xq. For PrC,⃗Y pc, ⃗y | w, xq ą 0, observe that

PrC,⃗Y pc, ⃗y | w, xq “

Pr⃗Y p⃗y | c, w, xq PrC pc | w, xq “


ř#
Q⃗Y p⃗y | vc,w,x , w, xq c̃ : Pr⃗Y p ¨|c̃,w,xq“
+ PrC pc̃ | w, xq
Pr⃗ p ¨|v,w,xq
Y PrC pc | w, xq
Q⃗Y p⃗y | w, xq “
Q⃗Y p⃗y | w, xq
ř# +PrC pc̃ | w, xq
c̃ : Pr⃗Y p ¨|c̃,w,xq“
Pr⃗Y p ¨|v,w,xq

Q⃗Y py˚ | w, xqQV pvc,w,x | ⃗y, w, xqQC pc | vc,w,x , w, xq “


ÿ ÿ
Q⃗Y p⃗y | w, xqQV pv | ⃗y, w, xqQC pc | v, w, xq “ Qpv, c, ⃗y | w, xq.
vPV vPV

Moreover, whenever PrC,⃗Y pc, ⃗y | w, xq “ 0, Q⃗Y p⃗y | vc,w,x , w, xqQC pc | vc,w,x , w, xq “ 0. Therefore, data

consistency holds (Definition 1.2.3) holds for the constructed joint distribution pW, X, C, ⃗
Yq „ P.
r

Since PrC,⃗Y pc, y˚ | w, xq “ PrC pc | ⃗y, w, xqQ⃗Y p⃗y | w, xq by construction, pW, X, V, C, ⃗


Yq „ Q satisfies

data consistency at inaccurate beliefs (Definition 1.4.1). Finally, for QpC “ c | V “ vc,w,x , W “

171
w, X “ xq ą 0,

Qp⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “
QpV “ vc,w,x | ⃗
Y “ ⃗y, W “ w, X “ xqQp⃗Y “ ⃗y | W “ w, X “ xq

QpV “ vc,w,x | W “ w, X “ xq
r⃗
PpY “ ⃗y | C “ c, W “ w, X “ xq

r “ c | W “ w, X “ xq ą 0. Therefore, using data consistency and the inequalities in


and PpC

Lemma A.5.3, we have that

ÿ ÿ
Q⃗Y p⃗y | v, w, xqUpc, ⃗y; wq ě Q⃗Y p⃗y | v, w, xqUpc1 , ⃗y; wq,
⃗yPY ˆY ⃗yPY ˆY

which follows from the fact that Q⃗Y p⃗y | w, xq PrC pc | ⃗y, w, xq “ QC,⃗Y pc, ⃗y | w, xq by data consistency
r and Qp⃗
and the construction of P, r⃗
Y “ ⃗y | V “ vc,w,x , W “ w, X “ xq “ PpY “ ⃗y | C “ c, W “

w, X “ xq as just shown. Therefore, expected utility maximization is also satisfied.

Rewrite inequalities in Lemma A.5.3 in terms of weights: Define Pr as in the statement of the

Theorem. Rewrite the condition in Lemma A.5.3 as: for all c P t0, 1u and c̃ ‰ c,

ÿ Q⃗Y p⃗y | w, xq r ÿ Q⃗Y p⃗y | w, xq r


PC,⃗Y pc, ⃗y | w, xqUpc, ⃗y; wq ě PC,⃗Y pc, ⃗y | w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY Pr⃗ p⃗y | w, xq
Y ⃗yPY ˆY Pr⃗ p⃗y | w, xq
Y

Notice that if πc pw, xq “ 0, then PrC,⃗Y pc, ⃗y | w, xq “ 0. Therefore, the inequalities involving c P t0, 1u

with πc pw, xq “ 0 are trivially satisfied. Next, inequalities involving c P t0, 1u with πc pw, xq ą 0

can be equivalently rewritten as

ÿ Q⃗Y p⃗y | w, xq r ÿ Q⃗Y p⃗y | w, xq r


P⃗Y p⃗y | c, w, xqUpc, ⃗y; wq ě P⃗Y p⃗y | c, w, xqUpc1 , ⃗y; wq.
⃗yPY ˆY Pr⃗ p⃗y | w, xq
Y ⃗yPY ˆY Pr⃗ p⃗y | w, xq
Y
„ ȷ
Q p⃗y|w,xq Q⃗Y p⃗y|w,xq
Upc, ⃗y; wq “ EPr
ř
The result follows by noticing that ⃗yPY ˆY Pr⃗Y p⃗y | c, w, xq r⃗Y Upc, ⃗y; wq
P⃗Y p⃗y|w,xq Pr⃗ p⃗y|w,xq
Y
Q⃗Y p⃗y|w,xq
and defining the weights as ωp⃗y; w, xq “ . l
Pr⃗ p⃗y|w,xq
Y

172
Proof of Theorem 1.4.2

Under the stated conditions, the necessity statement in Theorem 1.4.1 implies that for all pw, xq P

W ˆX

ωp1; w, xqUp1, 1; w, xqPY˚ p1 | 1, w, xq ě ωp0; w, xqUp0, 0; w, xqPY˚ p0 | 1, w, xq,

ωp0; w, xqUp0, 0; w, xq PrY˚ p0 | 0, w, xq ě ωp1; w, xqUp1, 1; w, xq PrY˚ p1 | 0, w, xq,


r ˚ |w,xq
Qpy
where ωpy˚ ; w, xq “ r ˚ |w,xq . Re-arranging these inequalities, we observe that
Ppy

ωp0; w, xqUp0, 0; w, xq
PY˚ p1 | 1, w, xq ď ď PrY˚ p1 | 0, w, xq.
ωp0; w, xqUp0, 0; w, xq ` ωp1; w, xqUp1, 1; w, xq

The result then follows by applying the bounds on PrY˚ p1 | 0, w, xq in a screening decision with a

binary outcome. l

Proof of Proposition 1.6.2

First, rewrite θpp˚ , U ˚ q as

βpp˚ , U ˚ q ` ℓ⊺ pp˚ , U ˚ qPC,Y˚ p1, 1q ` ℓ⊺ pp˚ , U ˚ qPC,Y˚ p0, 1q,

where ℓ⊺ pp˚ , U ˚ q is defined in the statement of the proposition and PC,Y˚ p1, 1q, PC,Y˚ p0, 1q are

the dw d x vectors whose elements are the moments PC,Y˚ p1, 1 | w, xq :“ PpC “ 1, Y ˚ “ 1 | W “

w, X “ xq, PC,Y˚ p0, 1 | w, xq :“ PpC “ 0, Y ˚ “ 1 | W “ w, X “ xq respectively. The null hypothesis

H0 : θpp˚ , U ˚ q “ θ0 is equivalent to the null hypothesis that there exists Pp0,


r 1q satisfying

ℓ⊺ pp˚ , U ˚ q PrC,Y˚ p0, 1q “ θpp˚ , U ˚ q ´ βpp˚ , U ˚ q ´ ℓ⊺ pp˚ , U ˚ qPC,Y˚ p1, 1q

and, for all pw, xq P W ˆ X ,

PpC “ 0, Y ˚ “ 1 | W “ w, X “ xq ď PrC,Y˚ p0, 1 | w, xq ď PpC “ 0, Y ˚ “ 1 | W “ w, X “ xq.


¨ ˛ ¨ ˛
˚´PC,Y˚ p0, 1q‹ ˚´I ‹
We can express these bounds in the form A PrC,Y˚ p0, 1q ď ˝ ‚, where A “ ˝ ‚
PC,Y˚ p0, 1q I
is a known matrix and PC,Y˚ p0, 1q, PC,Y˚ p0, 1q are the dw d x vectors of lower and upper bounds

173
respectively. Therefore, the null hypothesis H0 : θpp˚ , U ˚ q “ θ0 is equivalent to the null hypothesis

D PrC,Y˚ p0, 1q satisfying ℓ⊺ pp˚ , U ˚ q PrC,Y˚ p0, 1q “ θ0 ´ βpp˚ , U ˚ q ´ ℓ⊺ pp˚ , U ˚ qPC,Y˚ p1, 1q and
¨ ˛
˚´PC,Y˚ p0, 1q‹
A PrC,Y˚ p0, 1q ď ˝ ‚.
PC,Y˚ p0, 1q

Next, we apply a change of basis argument. Define the full rank matrix Γ, whose first row is equal

to ℓ⊺ pp˚ , U ˚ q. The null hypothesis H0 : θpp˚ , U ˚ q “ θ0 can be further rewritten as


¨ ˛
˚´PC,Y˚ p0, 1q‹
D PrC,Y˚ p0, 1q satisfying AΓ´1 Γ PrC,Y˚ p0, 1q ď ˝ ‚,
PC,Y˚ p0, 1q
¨ ˛ ¨ ˛
˚ Γp1,¨q PC,Y˚ p0, 1q ‹ ˚θ0 ´ βpp˚ , U ˚ q ´ ℓ⊺ pp˚ , U ˚ qPC,Y˚ p1, 1q‹
r
where Γ PrC,Y˚ p0, 1q “ ˝ ‚“ ˝ ‚ defining δ “
Γp´1,¨q PrC,Y˚ p0, 1q δ
Γp´1,¨q PrC,Y˚ p0, 1q and A
r “ AΓ´1 .

174
A.6 Expected Utility Maximization Behavior with Continuous Charac-

teristics

In this section, I extend the setting described in Section 1.2 to allow for the characteristics X P X

to be continuously distributed. I focus this extension on the case of a screening decision for

simplicity. The outcome Y ˚ P Y , the choices C P C :“ t0, 1u and the characteristics W P W are still

finite. I now allow the characteristics X P X Ď Rdx to be continuously distributed. The random

vector pW, X, C, Y ˚ q „ P is defined over W ˆ X ˆ C ˆ Y and summarizes the joint distribution of

the characteristics, choices and latent outcome. I assume the joint distribution P admits a density

ppw, x, c, y˚ q that is continuous in x at each value pw, c, y˚ q P W ˆ C ˆ Y and satisfies ppw, xq ą 0

for all W ˆ X .

The researcher observes the characteristics pW, Xq and the decision maker’s choice C in each

decision, but only observes the outcome Y ˚ if the decision maker selected C “ 1. Therefore, the re-

searcher observes the joint distribution pW, X, C, Yq „ P, where Y :“ Y ˚ ¨ 1tC “ 1u. The researcher

places bounds on the unobservable choice-dependent outcome probabilities by specifying a family

of conditional densities over px, y˚ q conditional on W “ w, C “ c, denoted by Bc,w . Whenever the

decision maker selects C “ 1, the researcher observes pW, X, C, Y ˚ q, and so B1,w is a singleton that

only contains the observable density ppx, y˚ | C “ 1, W “ wq. Over the choice c “ 0, the set B0,w

is a set of joint densities p̃px, y˚ | C “ 0, W “ wq that (i) ppx, y˚ | C “ 0, W “ wq P B0,w , and (ii)
ř ˚ ˚
y˚ PY p̃px, y | C “ 0, W “ wq “ ppX “ x | C “ 0, W “ wq for all p̃px, y | C “ 0, W “ wq P B0,w .

The expected utility maximization model requires minimal modification to account for the con-

tinuous characteristics. The definition of a utility function and private information is unchanged.

Definition 1.2.3 is extended to ask whether there exists a random vector pW, X, C, V, Y ˚ q „ Q that

admits a density qpw, x, v, c, y˚ q that is consistent with the observable data ("Data Consistency")

by replacing the probability mass functions with the appropriate probability density functions.

Analogously, the characterization of expected utility maximization behavior also extends directly.

Theorem A.6.1. The decision maker’s choices are consistent with expected utility maximization behavior if

and only if there exists a utility function U P U and prpx, y˚ | C “ 0, W “ wq P B0,w for all w P W such

175
that

EQ rUpc, Y ˚ ; Wq | C “ c, W “ w, X “ xs ě EQ Upc1 , Y ˚ ; Wq | C “ c, W “ w, X “ x ,
“ ‰

for all c P t0, 1u, pw, xq P W ˆ X with πc pw, xq ą 0 and c1 ‰ c, where EQ r¨s is the expectation under Q

with density q satisfying

qpw, x, 1, y˚ q “ ppw, x, 1, y˚ q,

qpw, x, 0, y˚ q “ prpx, y˚ | C “ 0, W “ wqppC “ 0, W “ wq.

Proof. The proof of this result is analogous to the proof of Theorem 1.2.1 Towards this, I first extend

Lemma A.5.1 to the case with continuous characteristics (which analyzes the case with multiple

choices and |C | “ Nc q. Throughout, I write pC,Y˚ pc, y˚ | w, xq :“ PpC “ c, Y ˚ “ y˚ | W “ w, X “ xq

as shorthand, where notation such as p X,Y˚ px, y˚ | w, cq is defined analogously.

Lemma A.6.1. The decision maker’s choices are consistent with expected utility maximization behavior if

and only if there exists a utility function U P U that satisfies

i. For all c P C y , pw, xq P W ˆ X and c1 ‰ c,

ÿ ÿ
pC,Y˚ pc, y˚ | w, xqUpc, y˚ ; wq ě pC,Y˚ pc, y˚ | w, xqUpc1 , y˚ ; wq.
y˚ PY y˚ PY

ii. For all c P C zC y and w P W , there exists prC,Y˚ p ¨ | w, cq P Bw,c such that

ÿ ÿ
prC,Y˚ pc, y˚ | w, xqUpc, y˚ ; wq prC,Y˚ pc, y˚ | w, xqUpc1 , y˚ ; wq
y˚ PY y˚ PY

for all x P X and c1 ‰ c, where prC,Y˚ pc, y˚ | w, xq “ prX,Y˚ px, y˚ | w, cqpW,C pw, cq{pW,X pw, xq.

Proof of Necessity for Lemma A.6.1: The proof of necessity follows the same argument as the

proof of necessity of Lemma A.5.1 below by replacing the probability mass function Q with the

density q. l

Proof of Sufficiency for Lemma A.6.1: The proof of sufficiency follows the proof of sufficiency of

Lemma A.5.1 below by again simply replacing all probability mass functions with the appropriate

density function. l

176
Theorem A.6.1 then follows directly from Lemma A.6.1 by considering the special case with

C “ t0, 1u and C y “ t1u.

Theorem A.6.1 can be applied to analyze the special case in which the latent outcome Y ˚ is

binary. The bounds on the unobservable choice-dependent outcome probability B0,w are simply

joint densities p̃px, Y ˚ “ 0 | W “ w, C “ 0q, p̃px, Y ˚ “ 1 | W “ w, C “ 0q that are continuous

in x P X and satisfy ppx, Y ˚ “ 0 | W “ w, C “ 0q ` ppx, Y ˚ “ 1 | W “ w, C “ 0q “ ppx | W “

w, C “ 0q. From Theorem A.6.1, the decision maker’s choices are consistent with expected utility

maximization behavior at some strict preference utility function U if and only if for all w P W

there exists Up0, 0; wq ă 0, Up1, 1; wq ă 0 satisfying

Up0, 0; wq
sup ppY ˚ “ 1 | C “ 1, W “ w, X “ xq ď (A.5)
xPX 1 pwq Up0, 0; wq ` Up1, 1; wq
Up0, 0; wq
ď inf ppY ˚ “ 1 | C “ 0, W “ w, X “ xq, (A.6)
Up0, 0; wq ` Up1, 1; wq xPX 0 pwq

where X 0 pwq :“ tx P X : π0 pw, xq ą 0u, X 1 pwq :“ tx P X : π1 pw, xq ą 0u and ppY ˚ “ 1 | C “

0, W “ w, X “ xq is the upper bound on p̃px, Y ˚ “ 1 | W “ w, C “ 0q{ppx | W “ w, C “ 0q over

densities satisfying the bounds Bc“0,w . This provides a sharp characterization of the identified set

of strict preference utility functions in terms of “intersection bounds.” Therefore, researchers test

whether a decision maker’s choices are consistent with expected utility maximization behavior at

accurate beliefs using inference procedures developed in, for example, Chernozhukov et al. (2013).

Finally, Theorem A.6.1 can also be simplified through dimension reduction over the continu-

ously distributed characteristics X P X . Consider functions Dw : X Ñ t1, . . . , Nd u for each w P W

that partition the characteristic space into level sets tx P X : Dw pxq “ du. In a binary screening

decision, if the decision maker’s choices are consistent with expected utility maximization be-

havior at some strict preference utility function U that satisfies an exclusion restriction on the

characteristics X, then

Up0, 0; wq
max PpY “ 1 | C “ 1, W “ w, Dw pXq “ dq ď (A.7)
dPt1,...,dw u Up0, 0; wq ` Up1, 1; wq
Up0, 0; wq
ď PpY “ 1 | C “ 1, W “ w, Dw pXq “ dq (A.8)
Up0, 0; wq ` Up1, 1; wq

holds for all w P W .

177
A.7 Alternative Bounds on the Missing Data

In the main text, I discussed how researchers can construct bounds on the missing data using a

randomly assigned instrument. I now discuss alternative assumptions under which researchers

can construct bounds on the missing data. I define these alternative assumptions for the leading

case of a screening decision with a binary outcome.

A.7.1 Direct Imputation

The simplest empirical strategy for constructing bounds on the unobservable choice-dependent

outcome probabilities is “direct imputation.” In a binary screening decision, direct imputation

uses the observable PY˚ p1 | 1, w, xq to bound the unobservable PY˚ p1 | 0, w, xq.

Assumption A.7.1. For each pw, xq P W ˆ X with 0 ă π1 pw, xq ă 1, there exists κw,x ě 0 satisfying

PY˚ p1 | 1, w, xq ď PY˚ p1 | 0, w, xq ď p1 ` κw,x qPY˚ p1 | 1, w, xq.

The parameter κw,x ě 0 specifies how different the unobservable choice-dependent outcome

probability may be relative to the observable choice-dependent outcome probability. In pretrial

release decisions, setting κw,x “ 1 means that the researcher is willing to assume that the

conditional probability of pretrial misconduct among detained defendants is no more than

two times the conditional probability of pretrial misconduct among release defendants. Such

bounding assumptions are used in, for example, Kleinberg et al. (2018a), and Jung et al. (2020a).

In practice, the researcher may wish to test whether the decision maker is making systematic

prediction mistakes under various choices of the parameter κw,x , and thereby conduct a sensi-

tivity analysis of how robust the behavioral conclusions are to various assumptions about the

unobservable choice-dependent outcome probabilities. I illustrate such a sensitivity analysis in

Appendix A.9, where I report how the fraction of judges for whom we can reject expected utility

maximization behavior varies as the parameter κw,x varies in the New York City pretrial release

setting.

Finally, Assumption A.7.1 has a natural interpretation under the expected utility maximization

model. The parameter κw,x bounds the average informativeness of the decision maker’s private

information V P V .

178
Proposition A.7.1. Consider a screening decision with a binary outcome and suppose Assumption A.7.1

holds. If the decision maker’s choices are consistent with expected utility maximization behavior at some

private information V P V and joint distribution pW, X, V, C, Y ˚ q „ Q, then for each pw, xq P W ˆ X

with 0 ă π1 pw, xq ă 1 and 0 ă PY˚ p1 | 1, w, xq ă 1

QpC“0|Y ˚ “1,W“w,X“xq{QpC“1|Y ˚ “1,W“w,X“xq


a. 1 ď QpC“0|W“w,X“xq{QpC“1|W“w,X“xq ď 1 ` κw,x ,

P p1|1,w,xq QpC“0|Y ˚ “0,W“w,X“xq{QpC“1|Y ˚ “0,W“w,X“xq


b. 1 ´ κw,x PY˚˚ p0|1,w,xq ď QpC“0|W“w,X“xq{QpC“1|W“w,X“xq ď 1.
Y

Proof. Notice that

QpC “ 0 | Y ˚ “ 1, W “ w, X “ xq{QpC “ 1 | Y ˚ “ 1, W “ w, X “ xq
QpC “ 0 | W “ w, X “ xq{QpC “ 1 | W “ w, X “ xq

QpY ˚ “ 1 | C “ 0, W “ w, X “ xq
“ .
QpY ˚ “ 1 | C “ 1, W “ w, X “ xq
Since the decision maker’s choices are consistent with expected utility maximization behavior,

pW, X, V, C, Y ˚ q „ Q satisfies the data consistency condition in Definition 1.2.3 at some P̃Y˚ p¨ |

0, w, xq satisfying the bounds in Assumption A.7.1 for each pw, xq P W ˆ X . Therefore, QpY ˚ “
QpY ˚ “1|C“0,W“w,X“xq
1 | C “ 0, W “ w, X “ xq “ P̃Y˚ p1 | 0, w, xq and it immediately follows that QpY ˚ “1|C“1,W“w,X“xq “
P̃Y˚ p1|0,w,xq
PY˚ p1|1,w,xq P r1, 1 ` κw,x s under Assumption A.7.1. This proves (a). To show (b), notice that the

bounds in Assumption A.7.1 imply that

PY˚ p0 | 1, w, xq ´ κw,x PY˚ p1 | 1, w, xq ď PY˚ p0 | 0, w, xq ď PY˚ p0 | C “ 1, w, xq.

Moreover, as before, we also have that

QpC “ 0 | Y ˚ “ 0, W “ w, X “ xq{QpC “ 1 | Y ˚ “ 0, W “ w, X “ xq P̃Y˚ p0 | 0, w, xq



QpC “ 0 | W “ w, X “ xq{QpC “ 1 | W “ w, X “ xq PY˚ p0 | 1, w, xq

and (b) then follows immediately.

The direct imputation bounds imply bounds on the relative odds ratio of the decision maker’s

choice probabilities conditional on the outcome and the characteristics relative to their choice

probabilities conditional on only the characteristics. This places a bound on the average infor-

mativeness of the decision maker’s private information under the expected utility maximization

179
model since

QpC “ 1 | Y ˚ “ 1, W “ w, X “ xq “ EQ rQpC “ 1 | V “ v, W “ w, X “ xq | Y ˚ “ 1, W “ w, X “ xs

QpC “ 1 | Y ˚ “ 0, W “ w, X “ xq “ EQ rQpC “ 1 | V “ v, W “ w, X “ xq | Y ˚ “ 0, W “ w, X “ xs

under the Information Set condition in Definition 1.2.3. In this sense, the direct imputation bounds

are related to classic approaches for modelling violations of unconfoundedness in causal inference

such as Rosenbaum (2002), which model violations of unconfoundedness by postulating that

there exists some unobserved characteristics V that governs selection and places bounds on the

magnitude of the relative odds ratio of the propensity score conditional on V and the observable

characteristics versus the propensity score conditional on just the observable characteristics. See

Imbens (2003), which develops a tractable parametric model for such a violation of unconfound-

edness in a treatment assignment problem. Kallus et al. (2018) and Yadlowsky et al. (2020) derive

bounds on the conditional average treatment effect and average treatment effect under related

models for violations of unconfoundedness, and provide methods for inference on the derived

bounds.

A.7.2 Proxy Outcomes

In some empirical applications, the researcher may observe an additional proxy outcome Ỹ P Ỹ ,

which does not suffer from the missing data problem but is correlated with the outcome Y ˚ P Y .

By specifying bounds on the relationship between the proxy outcome Ỹ and the outcome Y ˚ , the

researcher may construct bounds on the unobservable choice-dependent outcome probabilities.

Proxy outcomes are common in medical and consumer lending settings. For example, Mul-

lainathan and Obermeyer (2021) observe each patient’s longer term health outcomes regardless

of whether a stress test for a heart attack was conducted during a particular emergency room

visit. A patient’s longer term health outcomes are related to whether the patient actually had

a heart attack, no matter the testing decisions of doctors. Similarly, Chan et al. (2021) observe

whether each patient had a future pneumonia diagnosis within one week of an initial examination,

regardless of whether a doctor ordered an MRI at the initial examination. Future pneumonia

diagnoses may be a useful proxy for whether the doctor failed to correctly diagnose pneumonia

180
during the initial examination. In mortgage approvals, Blattner and Nelson (2021) observe each

loan applicant’s default performance on other credit products, regardless of whether each loan

applicant was approved for a mortgage. A loan applicant’s default performance on other credit

products is related with whether they would have defaulted on the mortgage.

r P Yr , and the researcher observes


Assumption A.7.2 (Proxy Outcomes). There exists a proxy outcome Y
r Y ˚ ¨ Cq „ P. Assume PpW “ w, X “ X, Y
the joint distribution pW, X, C, Y, r “ yrq ą 0 for all

pw, x, yrq P W ˆ X ˆ Yr .

Over decisions in which the decision maker selected C “ 1, the researcher observes the joint

distribution of the proxy outcome and the outcome PpỸ “ ỹ, Y ˚ “ y˚ | C “ 1, W “ w, X “ xq.

By placing assumptions on how the joint distribution of the proxy outcome and the outcome

conditional on C “ 0 is bounded by the observable joint distribution of the proxy outcome and the

outcome conditional on C “ 1, the researcher can construct bounds on the unobservable choice-

dependent outcome probabilities of the form given in Assumption 1.2.1. Let PY˚ py˚ | ỹ, c, w, xq :“

PpY ˚ “ y˚ | Ỹ “ ỹ, C “ c, W “ w, X “ xq, PỸ pỹ | c, w, xq :“ PpỸ “ ỹ | C “ c, W “ w, X “ xq, and

πc pỹ, w, xq :“ PpC “ c | Ỹ “ ỹ, W “ w, X “ xq.

Assumption A.7.3 (Proxy Bounds). For each pw, x, ỹq P W ˆ X ˆ Ỹ satisfying 0 ă π1 pỹ, w, xq ă 1,

there exists bounds κ ỹ,w,x , κ ỹ,w,x ě 0 satisfying

p1 ´ κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xq ď PY˚ p1 | ỹ, 0, w, xq ď p1 ` κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xq.

Proposition A.7.2. Consider a binary screening decision in which Assumptions A.7.2-A.7.3 hold. For

each pw, xq P W ˆ X ,

ÿ
p1 ` κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xqPỸ pỹ | 0, w, xq ď PY˚ p1 | 0, w, xq,
ỹPỸ
ÿ
PY˚ p1 | 0, w, xq ď p1 ` κ ỹ,w,x qPY˚ p1 | ỹ, 1, w, xqPỸ pỹ | 0, w, xq.
ỹPỸ

The advantage of this approach is that it may be easier to express domain-specific knowledge

through the use of proxy outcomes. For example, in the mortgage approvals setting in Blattner and

Nelson (2021), the proxy bounds summarize the extent to which the mortgage default rate among

accepted applicants that also defaulted on other credit products differs from the counterfactual

181
mortgage default rate among rejected applicants that also defaulted on other credit products. In

the medical testing setting in Mullainathan and Obermeyer (2021), the proxy bounds summarize

the extent to which heart attack rate among tested patients that went on to die within 30 days of

their emergency room visit differs from the heart attack rate among untested patients that went

on to die within 30 days of their emergency room visit.

182
A.8 Summary Figures and Tables for New York City Pretrial Release

Figure A.8: Histogram of number of cases heard by each judge in the top 25 judges.

Notes: This figure plots a histogram of the number of cases heard per judge in the top 25 judges that are the focus of
my empirical analysis in the New York City pretrial release data. Every judge in the top 25 made at least 5,000 pretrial
release decisions over the sample period. See Section 1.5.2 for further details. Source: Rambachan and Ludwig (2021).

183
Figure A.9: Receiver-operating characteristic (ROC) curves for ensemble prediction functions

(a) Race-by-age cells

(b) Race-by-felony charge cells

Notes: This figure plots the Receiver-Operating Characteristic (ROC) curves for the ensemble prediction that predicts
failure to appear among defendants that were released by the top 25 judges. It reports the ROC curve for the ensemble
prediction function constructed within race-by-age cells and race-by-felony charge cells separately. Age is binarized
into young and older defendants, where older defendants are defined as defendants older than 25 years. The ensemble
prediction function is constructed over cases heard by the remaining bail judges and evaluated out-of-sample on cases
heard by the top 25 judges. The ensemble prediction function averages the predictions of a random forest, which is
estimated using the the R package ranger at the default hyperparameter values (Wright and Ziegler, 2017), and an
elastic net model, whose hyperparameters are tuned using three-fold cross-validation. The ROC curve plots the false
positive rate on the x-axis and the true positive rate on the y-axis. The out-of-sample area under the curve (AUC)
on all defendants equals 0.693 for the ensemble prediction function constructed over race-by-age cells and 0.694 for
the ensemble prediction function constructed over race-by-felony cells. See Section 1.5.3 for further details. Source:
Rambachan and Ludwig (2021).

184
All Defendants White Defendants Black Defendants
Estimation Top Estimation Top Estimation Top
Sample Judges Sample Judges Sample Judges
(1) (2) (3) (4) (5) (6)
Released before trial 0.720 0.736 0.757 0.777 0.687 0.699
Defendant Characteristics
White 0.475 0.481 1.000 1.000 0.000 0.000
Female 0.173 0.173 0.154 0.152 0.190 0.192
Age at Arrest 31.95 31.75 32.03 31.88 31.87 31.63
Arrest Charge
Number of Charges 1.152 1.167 1.187 1.217 1.119 1.121
Felony Charge 0.372 0.367 0.367 0.356 0.376 0.377
Any Drug Charge 0.253 0.224 0.253 0.217 0.253 0.230
Any DUI Charge 0.047 0.049 0.070 0.072 0.027 0.027
Any Violent Crime Charge 0.375 0.395 0.358 0.379 0.390 0.410
Property Charge 0.130 0.132 0.122 0.123 0.138 0.140

185
Defendant Priors
Any FTA 0.516 0.497 0.443 0.419 0.582 0.570
Number of FTAs 2.177 2.034 1.633 1.492 2.670 2.537
Any Misdemeanor Arrest 0.683 0.667 0.615 0.596 0.744 0.734
Any Misdemeanor Conviction 0.383 0.368 0.334 0.315 0.427 0.418
Any Felony Arrest 0.581 0.566 0.503 0.482 0.652 0.644
Any Felony Conviction 0.285 0.271 0.234 0.215 0.331 0.323
Any Violent Felony Arrest 0.398 0.387 0.306 0.292 0.481 0.476
Any Violent Felony Conviction 0.119 0.114 0.084 0.078 0.150 0.147
Total Cases 569,256 243,118 270,704 117,073 298,552 126,045

Table A.2: Summary statistics comparing the main estimation sample and cases heard by the top 25 judges, broken out by defendant
race.

Notes: This table provides summary statistics about defendant and case characteristics for the main estimation sample and cases heard by the top 25 judges in the
NYC pretrial release data for all defendants and separately by defendant race. See Section 1.5.2 for further discussion. Source: Rambachan and Ludwig (2021).
Released Detained
All Defendants Defendants Defendants
Estimation Top Estimation Top Estimation Top
Sample Judges Sample Judges Sample Judges
(1) (2) (3) (4) (5) (6)
Released before trial 0.720 0.736 1.000 1.000 0.000 0.000
Defendant Characteristics
White 0.475 0.481 0.499 0.508 0.412 0.407
Female 0.173 0.173 0.199 0.197 0.107 0.106
Age at Arrest 31.95 31.75 31.22 31.20 33.82 33.29
Arrest Charge
Number of Charges 1.152 1.167 1.148 1.162 1.161 1.182
Felony Charge 0.372 0.367 0.288 0.288 0.588 0.586
Any Drug Charge 0.253 0.224 0.229 0.204 0.314 0.279
Any DUI Charge 0.047 0.049 0.062 0.063 0.010 0.010
Any Violent Crime Charge 0.375 0.395 0.388 0.409 0.341 0.355
Property Charge 0.130 0.132 0.115 0.114 0.171 0.181

186
Defendant Priors
Any FTA 0.516 0.497 0.409 0.395 0.793 0.784
Number of FTAs 2.177 2.034 1.362 1.295 4.284 4.103
Any Misdemeanor Arrest 0.683 0.667 0.610 0.598 0.871 0.863
Any Misdemeanor Conviction 0.383 0.368 0.284 0.278 0.637 0.621
Any Felony Arrest 0.581 0.566 0.487 0.477 0.824 0.814
Any Felony Conviction 0.285 0.271 0.200 0.194 0.505 0.487
Any Violent Felony Arrest 0.398 0.387 0.315 0.309 0.614 0.608
Any Violent Felony Conviction 0.119 0.114 0.081 0.080 0.216 0.210
Total Cases 569,256 243,118 410,394 179,143 158,862 63,975

Table A.3: Summary statistics for released and detained defendants in the main estimation sample and for cases heard by the top 25
judges.

Notes: This table provides summary statistics about defendant and case characteristics for the main estimation sample and the cases heard by the top 25 judges in the
NYC pretrial release data for all defendants and by whether the defendant was released or detained. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants White Defendants Black Defendants

Estimation Top Estimation Top Estimation Top


Sample Judges Sample Judges Sample Judges
(1) (2) (3) (4) (5) (6)

Failure to Appear (FTA) 0.151 0.146 0.135 0.131 0.167 0.161


Rearrest (NCA) 0.261 0.257 0.230 0.225 0.292 0.289
Any Misconduct 0.331 0.324 0.297 0.290 0.366 0.359

187
Total Cases 410,394 179,143 205,174 91,026 205,220 88,117

Table A.4: Summary statistics of misconduct rates among released defendants in the main estimation sample and cases heard by the
top 25 judges.

Notes: This table summarizes the observed misconduct rates among released defendants for the main estimation sample and the cases heard by the top 25 judges in
the New York City pretrial release data for all defendants and separately by the race of the defendant. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants White Defendants Black Defendants
Full Estimation Full Estimation Full Estimation
Sample Sample Sample Sample Sample Sample
(1) (2) (3) (4) (5) (6)
Released before trial 0.736 0.720 0.765 0.757 0.691 0.687

Defendant Characteristics
White 0.457 0.475 1.000 1.000 0.000 0.000
Female 0.169 0.173 0.153 0.154 0.184 0.190
Age at Arrest 32.06 31.95 32.06 32.03 31.88 31.87
Arrest Charge
Number of Charges 1.165 1.152 1.176 1.187 1.114 1.119
Felony Charge 0.335 0.372 0.332 0.367 0.346 0.376
Any Drug Charge 0.244 0.253 0.251 0.253 0.252 0.253
Any DUI Charge 0.053 0.047 0.074 0.070 0.028 0.027
Any Violent Crime Charge 0.365 0.375 0.348 0.358 0.380 0.390
Property Charge 0.135 0.130 0.127 0.122 0.145 0.138

188
Defendant Priors
Any FTA 0.499 0.516 0.442 0.443 0.586 0.582
Number of FTAs 2.099 2.177 1.635 1.633 2.707 2.670
Any Misdemeanor Arrest 0.668 0.683 0.616 0.615 0.747 0.744
Any Misdemeanor Conviction 0.371 0.383 0.335 0.334 0.430 0.427
Any Felony Arrest 0.565 0.581 0.502 0.503 0.654 0.652
Any Felony Conviction 0.273 0.285 0.232 0.234 0.334 0.331
Any Violent Felony Arrest 0.384 0.398 0.306 0.306 0.484 0.481
Any Violent Felony Conviction 0.114 0.119 0.084 0.084 0.152 0.150
Total Cases 758,027 569,256 347,006 270,704 370,793 298,552

Table A.5: Summary statistics in the universe of all cases subject to a pretrial release decision and main estimation sample in the NYC
pretrial release data, broken out by defendant race.

Notes: This table provides summary statistics about defendant and case characteristics for the sample of all cases subject to a pretrial release decision and the main
estimation sample in the NYC pretrial release data, broken out for all defendants and by defendant race. See Section 1.5.2 for further discussion. Source: Rambachan
and Ludwig (2021).
All Defendants Released Defendants Detained Defendants
Full Estimation Full Estimation Full Estimation
Sample Sample Sample Sample Sample Sample
(1) (2) (3) (4) (5) (6)
Released before trial 0.736 0.720 1.000 1.000 0.000 0.000

Defendant Characteristics
White 0.457 0.475 0.476 0.499 0.406 0.412
Female 0.169 0.173 0.192 0.199 0.105 0.107
Age at Arrest 32.06 31.95 31.41 31.22 33.90 33.82
Arrest Charge
Number of Charges 1.165 1.152 1.166 1.148 1.161 1.161
Felony Charge 0.335 0.372 0.258 0.288 0.549 0.588
Any Drug Charge 0.244 0.253 0.221 0.229 0.307 0.314
Any DUI Charge 0.053 0.047 0.068 0.062 0.011 0.010
Any Violent Crime Charge 0.365 0.375 0.377 0.388 0.333 0.341
Property Charge 0.135 0.130 0.119 0.115 0.179 0.171

189
Defendant Priors
Any FTA 0.499 0.516 0.394 0.409 0.792 0.793
Number of FTAs 2.099 2.177 1.310 1.362 4.304 4.284
Any Misdemeanor Arrest 0.668 0.683 0.596 0.610 0.870 0.871
Any Misdemeanor Conviction 0.371 0.383 0.275 0.284 0.639 0.637
Any Felony Arrest 0.565 0.581 0.472 0.487 0.823 0.824
Any Felony Conviction 0.273 0.285 0.190 0.200 0.503 0.505
Any Violent Felony Arrest 0.384 0.398 0.302 0.315 0.612 0.614
Any Violent Felony Conviction 0.114 0.119 0.077 0.081 0.216 0.216
Total Cases 758,027 569,256 558,167 410,394 199,860 158,862

Table A.6: Summary statistics for released and detained defendants in the universe of all cases subject to a pretrial release decision and
the main estimation sample in the NYC pretrial release data.

Notes: This table provides summary statistics about defendant and case characteristics for the sample of all cases subject to a pretrial release decision and the main
estimation sample in the NYC pretrial release data, broken out for all defendants and by whether the defendant was released or detained. See Section 1.5.2 for
further discussion. Source: Rambachan and Ludwig (2021).
Table A.7: Balance check estimates for the quasi-random assignment of judges for all defendants
and by defendant race.

White Black
All Defendants Defendants Defendants
(1) (2) (3)
Defendant Characteristics
Black ´0.00011
(0.00008)
Female 0.000003 0.00005 ´0.00003
(0.00013) (0.00017) (0.00017)
Age ´0.00001 ´0.00002 ´0.000002
(0.000003) (0.00001) (0.000004)
Arrest Charge
Number of Charges ´0.000003 ´0.000003 0.000003
(0.00001) (0.00001) (0.00003)
Felony Charge 0.00009 ´0.00012 0.00027
(0.00015) (0.00017) (0.00018)
Any Drug Charge ´0.00012 ´0.00010 ´0.00013
(0.00013) (0.00018) (0.00016)
Any Violent Crime Charge ´0.00004 ´0.00013 0.00004
(0.00010) (0.00015) (0.00014)
Any Property Charge ´0.00033 ´0.00029 ´0.00035
(0.00016) (0.00019) (0.00025)
Any DUI Charge 0.00044 0.00039 0.00028
(0.00024) (0.00027) (0.00039)
Defendant Priors
Prior FTA ´0.00004 ´0.00011 0.00003
(0.00010) (0.00016) (0.00013)
Prior Misdemeanor Arrest 0.00007 0.00003 0.00011
(0.00010) (0.00013) (0.00015)
Prior Felony Arrest 0.00006 0.00003 0.00009
(0.00014) (0.00021) (0.00019)
Prior Violent Felony Arrest ´0.00013 ´0.00008 ´0.00016
(0.00011) (0.00019) (0.00016)
Prior Misdemeanor Conviction 0.00016 0.00021 0.00011
(0.00013) (0.00017) (0.00016)
Prior Felony Conviction ´0.00019 0.00011 ´0.00040
(0.00012) (0.00018) (0.00015)
Prior Violent Felony Conviction ´0.00008 ´0.00024 0.00002
(0.00015) (0.00021) (0.00020)
Joint p-value 0.06953 0.15131 0.41840
Court ˆ Time FE ✓ ✓ ✓
Cases 569,256 270,704 298,552
Notes: This table reports OLS estimates for regressions of constructed judge leniency on defendant and case characteris-
tics in the main estimation sample. These regressions are estimated over all defendants and separately by defendant
race. Standard errors (in parentheses) are clustered at the defendant-judge level. The joint p-value is based on the
F-statistic for whether all defendant and case characteristics are jointly significant. See Section 1.5.3 for further details.
Source: Rambachan and Ludwig (2021).
190
Table A.8: Balance check estimates for the quasi-random assignment of judges by defendant race
and age.

White Defendants Black Defendants

Young Older Young Older


(1) (2) (3) (4)
Defendant Characteristics
Female ´0.00008 0.00017 ´0.00007 ´0.00005
(0.00025) (0.00019) (0.00024) (0.00024)
Age ´0.000004 ´0.00001 ´0.00006 ´0.00001
(0.00004) (0.00001) (0.00003) (0.00001)
Arrest Charge
Number of Charges ´0.00002 ´0.000003 ´0.00002 0.00001
(0.00003) (0.000005) (0.00006) (0.00003)
Felony Charge 0.00002 ´0.00024 0.00019 0.00033
(0.00023) (0.00019) (0.00023) (0.00022)
Any Drug Charge ´0.00033 0.00004 ´0.00046 0.00004
(0.00033) (0.00022) (0.00025) (0.00020)
Any Violent Crime Charge ´0.00025 ´0.00010 ´0.00016 0.00018
(0.00026) (0.00019) (0.00024) (0.00018)
Any Property Charge ´0.00005 ´0.00046 ´0.00017 ´0.00045
(0.00034) (0.00023) (0.00031) (0.00029)
Any DUI Charge 0.00021 0.00042 ´0.00160 0.00062
(0.00045) (0.00030) (0.00072) (0.00044)
Defendant Priors
Prior FTA ´0.00013 ´0.00015 0.00034 ´0.00021
(0.00026) (0.00021) (0.00022) (0.00020)
Prior Misdemeanor Arrest 0.00026 ´0.00018 ´0.00008 0.00034
(0.00021) (0.00017) (0.00022) (0.00022)
Prior Felony Arrest ´0.00008 0.00018 0.00035 ´0.00025
(0.00026) (0.00027) (0.00030) (0.00024)
Prior Violent Felony Arrest ´0.00024 ´0.00001 ´0.00020 ´0.00019
(0.00030) (0.00023) (0.00025) (0.00021)
Prior Misdemeanor Conviction 0.00040 0.00023 0.00040 0.00004
(0.00029) (0.00025) (0.00028) (0.00018)
Prior Felony Conviction 0.00052 0.00005 ´0.00094 ´0.00016
(0.00049) (0.00019) (0.00033) (0.00017)
Prior Violent Felony Conviction ´0.00029 ´0.00020 0.00113˚˚ ´0.00012
(0.00077) (0.00022) (0.00054) (0.00021)
Joint p-value 0.85104 0.44370 0.038862 0.16062
Court ˆ Time FE ✓ ✓ ✓ ✓
Cases 99,536 171,168 119,156 179,396
Notes: This table reports OLS estimates for regressions of the judge leniency measure on defendant and case character-
istics in the main estimation sample. These regressions are estimated separately over subsamples defined by defendant
race and age, where “young” is defined as less than or equal to 25 years and “old” is defined as older than 25 years.
Standard errors (in parentheses) are clustered at the defendant-judge level. The joint p-value is based on the F-statistic
for whether all defendant and case characteristics are jointly significant. See Section 1.5.3 for further details. Source:
Rambachan and Ludwig (2021).
191
Table A.9: Balance check estimates for the quasi-random assignment of judges by defendant race
and felony charge.

White Defendants Black Defendants

Felony No Felony Felony No Felony


Charge Charge Charge Charge
(1) (2) (3) (4)
Defendant Characteristics
Female 0.00003 0.00001 ´0.00003 ´0.00004
(0.00023) (0.00021) (0.00026) (0.00021)
Age ´0.00002 ´0.00001 0.000004 ´0.000004
(0.00001) (0.00001) (0.00001) (0.00001)
Arrest Charge
Number of Charges ´0.000002 ´0.00004 ´0.000005 0.00003
(0.00001) (0.00003) (0.00003) (0.00007)
Any Drug Charge ´0.00022 ´0.00008 ´0.00012 ´0.00008
(0.00028) (0.00024) (0.00031) (0.00023)
Any Violent Crime Charge ´0.00043 0.00001 0.00038 ´0.00013
(0.00030) (0.00018) (0.00026) (0.00017)
Any Property Charge ´0.00038 ´0.00038 0.00023 ´0.00070
(0.00027) (0.00028) (0.00029) (0.00035)
Any DUI Charge 0.00047 0.00049 0.00100 0.00012
(0.00057) (0.00030) (0.00093) (0.00042)
Defendant Priors
Prior FTA ´0.00014 ´0.00005 0.00012 ´0.00003
(0.00023) (0.00020) (0.00024) (0.00015)
Prior Misdemeanor Arrest 0.00024 ´0.00012 0.00009 0.00010
(0.00025) (0.00017) (0.00028) (0.00018)
Prior Felony Arrest ´0.00007 ´0.000005 ´0.00043 0.00040
(0.00036) (0.00023) (0.00032) (0.00022)
Prior Violent Felony Arrest ´0.00042 0.00012 ´0.00001 ´0.00020
(0.00029) (0.00021) (0.00025) (0.00018)
Prior Misdemeanor Conviction ´0.00009 0.00050 0.00042 ´0.00013
(0.00030) (0.00021) (0.00027) (0.00017)
Prior Felony Conviction 0.00010 0.00024 ´0.00040 ´0.00041
(0.00034) (0.00023) (0.00025) (0.00019)
Prior Violent Felony Conviction 0.00040 ´0.00084 ´0.00004 0.0000001
(0.00036) (0.00030) (0.00028) (0.00024)
Joint p-value 0.05623 0.27401 0.24607 0.24712
Court ˆ Time FE ✓ ✓ ✓ ✓
Cases 99,463 171,241 112,517 186,035
Notes: This table reports OLS estimates for regressions of the constructed judge leniency measure on various defendant
and case characteristics. These regressions are estimated separately over subsamples defined on the race of the
defendant and whether the defendant was charged with a felony offense. Standard errors, reported in parentheses, are
clustered at the defendant and judge level. The joint p-value is based on the F-statistic for whether all defendant and
case characteristics are jointly significant. See Section 1.5.3 for further details. Source: Rambachan and Ludwig (2021).

192
A.9 Additional Empirical Results for New York City Pretrial Release

I now present additional empirical results on the behavior of judges in the New York City pretrial

release system.

A.9.1 Welfare Effects of Automation Policies: Race-by-Felony Charge Cells

Section 1.6.2 of the main text compared the total expected social welfare under the observed

release decisions by judges in new York City against the total expected social welfare under

counterfactual algorithmic decisions, conducting this exercise over race-by-age cells and deciles

of predicted failure to appear risk. In this section of the Supplement, I report the results of the

same analysis over race-by-felony charge cells and deciles of predicted failure to appear risk for

completeness and find analogous results as reported in the main text.

Automating Judges Who Make Prediction Mistakes

Figure A.10a plots the improvement in worst-case total expected social welfare under the algo-

rithmic decision rule that fully replaces judges who were found to make detectable prediction

mistakes against the observed release decisions of these judges.

For most values of the social welfare function, the algorithmic decision rule dominates the

observed choices of these judges, but for social welfare costs of unnecessary detentions ranging

over U ˚ p0, 0q P r0.3, 0.7s, the algorithmic decision rule either leads to no improvement or strictly

lowers worst-case expected total social welfare relative to the judges’ observed decisions.

Figure A.10b therefore plots the improvement in worst-case total expected social welfare under

the algorithmic decision rule that only corrects detectable prediction mistakes at the tails of the

predicted failure to appear risk distribution against the observed release decisions of these judges.

As found in the main text, the algorithmic decision rule that only corrects detectable prediction

mistakes appears to weakly dominate the observed release decisions of judges, no matter the

value of the social welfare function.

193
Automating Judges Who Do Not Make Prediction Mistakes

I next compare welfare effects of automating the release decisions of judges whose choices were

found to be consistent with expected utility maximization behavior at accurate beliefs about

Figure A.10: Ratio of total expected social welfare under algorithmic decision rules relative to
observed release decisions of judges that make detectable prediction mistakes over race-by-felony
charge cells.

(a) Welfare improvement of full automation decision rule

(b) Welfare improvement of decision rule that corrects prediction mistakes

Notes: This figure reports the change in worst-case total expected social welfare under two algorithmic decision rules
against the judge’s observed release decisions among judges who were found to make detectable prediction mistakes.
Worst case total expected social welfare under each decision rule is computed by first constructing a 95% confidence
interval for total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence
interval. These decisions rules are constructed and evaluated over race-by-felony cells and deciles of predicted failure
to appear risk. The x-axis plots the relative social welfare cost of detaining a defendant that would not fail to appear
to in court U ˚ p0, 0q (i.e., an unnecessary detention). The solid line plots the median change across judges that make
mistakes, and the dashed lines report the minimum and maximum change across judges. See Section 1.6.2 of the main
text and Supplement A.9.1 for further details. Source: Rambachan and Ludwig (2021).

194
failure to appear risk. Figure A.11 plots the improvement in worst-case total expected social

welfare under the algorithmic decision rule that fully replaces these judges against their observed

release decisions. As in the main text, I find that automating these judge’s release decisions may

strictly lower worst-case expected total social welfare for a large range of social welfare costs of

unnecessary detentions,

Figure A.11: Ratio of total expected social welfare under full automation decision rule relative to
observed decisions of judges that do not make detectable prediction mistakes over race-by-felony
charge cells.

Notes: This figure reports the change in worst-case total expected social welfare under the algorithmic decision rule
that fully automates decision-making against the judge’s observed release decisions among judges whose choices were
consistent with expected utility maximization behavior at accurate beliefs about failure to appear risk. Worst case
total expected social welfare under each decision rule is computed by first constructing a 95% confidence interval for
total expected social welfare under the decision rule, and reporting smallest value that lies in the confidence interval.
These decisions rules are constructed and evaluated over race-by-felony cells and deciles of predicted risk. The x-axis
plots the relative social welfare cost of detaining a defendant that would not fail to appear to in court U ˚ p0, 0q (i.e., an
unnecessary detention). The solid line plots the median change across judges that make mistakes, and the dashed lines
report the minimum and maximum change across judges. See Section 1.6.2 of the main text and Supplement A.9.1 for
further details. Source: Rambachan and Ludwig (2021).

195
A.9.2 Identifying Prediction Mistakes: Direct Imputation

Sections 1.5-1.4 of the main text tested whether the pretrial release decisions of judges in New

York City were consistent with expected utility maximization behavior at accurate beliefs about

failure to appear risk under various exclusion restrictions on their preferences by constructing

bounds on the failure to appear rate of detained defendants using the quasi-random assignment

of judges.

I now show how the same test may be conducted using direct imputation (Supplement A.7.1)

to construct bounds on the failure to appear rate of detained defendants. The key input in

constructing the direct imputation bounds is the parameter κw,d ě 0 for each value W “ w,

Dw pXq “ d, which bounds the failure to appear rate among detained defendants relative to the

observed failure to appear rate among released defendants. I assume that κw,d ” κ does not vary

across values W “ w, Dw pXq “ d and conduct my analysis under various assumptions on the

magnitude of κ ě 0. Comparing how the results change as κ ě 0 varies can be interpreted as a

sensitivity analysis on the informativeness of the judges’ private information: how do conclusions

about behavior change as we allow judges to have more accurate private information?

What Fraction of Judges Make Prediction Mistakes? Using the direct imputation bounds, I

test whether the release decisions of each judge in the top 25 are consistent with expected utility

maximization behavior at strict preferences under preferences that (i) do not depend on any

observable characteristics, (ii) depend on only the defendant’s race, (iii) depend on both the

defendant’s race and age, or (iv) depend on on the defendant’s race and whether the defendant

was charged with a felony offense. I test the inequalities in Corollary 1.3.4 across deciles of

predicted risk with each possible W cell. As in the main text, I include inequalities that compare

the observed failure to appear rate among released defendants at predicted risk deciles six to ten

against the direct imputation bounds on the failure to appear rate among detained defendants

at predicted risk deciles one to five. I construct the variance-covariance matrix of the moments

using the empirical bootstrap conditional on the observable characteristics W and predicted risk

decile Dw pXq on the cases assigned to a particular judge. Figure A.12 reports the fraction of

judges in the top 25 for whom we can reject expected utility maximization behavior at strict

196
preferences under various assumption on which observable characteristics W affect the utility

function. The adjusted rejection rate reports the fraction of rejections after multiple hypotheses

using the Holm-Bonferroni step down procedure, which controls the family-wise error rate at the

5% level.
Figure A.12: Fraction of judges whose release decisions are inconsistent with expected utility
maximization behavior at accurate beliefs about failure to appear risk using direct imputation
bounds.

Notes: This figure summarizes the results for testing whether the release decisions of each judge in the top 25 are
consistent with expected utility maximization behavior at strict preference utility Upc, y˚ ; wq that (i) do not depend on
any observable characteristics, (ii) depend on the defendant’s race, (iii) depend on both the defendant’s race and age,
and (iv) depend on both the defendant’s race and whether the defendant was charged with a felony offense. Bounds on
the failure to appear rate among detained defendants are constructed using direct imputation (see Supplement A.7.1)
for κ “ t0, 1, . . . , 10u. I first construct the unadjusted rejection rate by testing whether the pretrial release decisions
of each judge in the top 25 are consistent with the moment inequalities in Corollary 1.3.4 at the 5% level using the
conditional least-favorable hybrid test. I construct the variance-covariance matrix of the moments using the empirical
bootstrap conditional on the payoff-relevant characteristics W and the predicted risk decile Dw pXq. The adjusted
rejection rate reports the fraction of rejections after correcting for multiple hypothesis testing using the Holm-Bonferroni
step down procedure, which controls the family-wise error rate at the 5% level. This is the same procedure described in
Section 1.5.4. The adjusted rejection rate reports the fraction of rejections correcting for multiple hypotheses using the
Holm-Bonferroni step-down procedure, which controls the family-wise error rate at the 5% level. Source: Rambachan
and Ludwig (2021).

What Types of Prediction Mistakes are Being Made? I next apply the identification results in

Section 1.4 to analyze the types of prediction mistakes based on observable characteristics made

by judges in the New York City pretrial release data, constructing bounds on the unobservable

failure to appear rate among detained defendants using direct imputation. Figure A.13a reports

95% confidence intervals for the identified set of values δpw, dq{δpw, d1 q between the highest d

197
and lowest decile d1 of predicted risk within each race-by-age W cell using the direct imputation

bounds with κ “ 2. Figure A.13b plots the 95% confidence intervals for the identified set on

the same object within each race-by-felony charge W cell. As in found in Section 1.5.5, judges

appear to underreact to predictable variation in failure to appear risk. Whenever these bounds are

informative, they lie strictly below one.

198
Figure A.13: 95% confidence intervals for the implied prediction mistake of failure to appear risk
between the highest and lowest predicted failure to appear risk deciles using direct imputation
bounds with κ “ 2.

(a) Race-by-age W cells

(b) Race-by-felony charge W cells

Notes: This figures plots the 95% confidence interval for the identified set on the implied prediction mistake
δpw, dq{δpw, d1 q between the highest predicted failure to appear risk decile d and the lowest predicted failure to
appear risk decile d1 within each race-by-age cell and race-by-felony charge cell. The bounds on the failure to appear
rate among detained defendants are constructed using direct imputation with κ “ 2 (Section A.7.1) and for each judge
in the top 25 whose choices are inconsistent with expected utility maximization behavior at these bounds (Figure
A.12). These confidence intervals are constructed by first constructing a 95% joint confidence interval for a judge’s
reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the moment inequalities in Theorem 1.4.2,
and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated with each pair τpw, dq, τpw, d1 q in the
joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on the implied prediction mistake. Source:
Rambachan and Ludwig (2021).

199
A.9.3 Defining the Outcome to be Any Pretrial Misconduct

Section 1.5 of the main text considered an extension to my baseline empirical results on detectable

prediction mistakes in the New York City pretrial release system that defined the outcome of

interest to be whether a defendant would commit “any pretrial misconduct” (i.e., either fail to

appear in court or be re-arrested for a new crime). I now characterize the extent to which judges’

predictions of any pretrial misconduct are systematically biased using the identification results in

Section 1.4 of the main text.

What Types of Prediction Mistakes are Being Made? Figure A.14a reports 95% confidence

intervals for the identified set of the implied prediction mistake δpw, dq{δpw, d1 q between the

highest d and lowest decile d1 of predicted pretrial misconduct risk within each race-by-age W cell.

Figure A.14b plots the 95% confidence intervals for the identified set on the same object within

each race-by-felony charge W cell. Judges appear to systematically underreact to predictable

variation in pretrial misconduct risk between defendants at the tails of the pretrial misconduct

risk distribution. Whenever these bounds are informative, they lie strictly below one.

Furthermore, among judges whose choices are inconsistent with expected utility maximization

behavior at accurate beliefs about pretrial misconduct risk, Table A.10 reports the location of

the studentized maximal violation of the revealed preference inequalities and shows the fraction

of judges for whom the maximal violation occurs over the tails of the predicted distribution

(deciles 1-2, 9-10) or the middle of the predicted risk distribution (deciles 3-8) for black and

white defendants respectively. I again find that maximal violations of the revealed preference

inequalities mainly occur over defendants that lie in the tails of the predicted risk distribution, and

furthermore the majority occur over black defendants at the tails of the predicted risk distribution.

200
Figure A.14: 95% confidence intervals for the implied prediction mistakes of any pretrial miscon-
duct risk between the highest and lowest predicted any pretrial misconduct risk deciles

(a) Race-by-age W cells

(b) Race-by-felony charge W cells

Notes: This figures plots the 95% confidence interval for the identified set on δpw, dq{δpw, d1 q between the highest
predicted any pretrial misconduct risk decile d and the lowest predicted any pretrial misconduct risk decile d1 within
each race-by-age cell and race-by-felony charge cell. The outcome Y ˚ is whether the defendant would commit any
pretrial misconduct upon release (i.e., either fail to appear in court or be re-arrested for a new crime). Bounds on the
any pretrial misconduct rate among detained defendants are constructed using the judge leniency instrument (see
Section 1.5.3). Confidence intervals are constructed for each judge whose choices are inconsistent with expected utility
maximization at these bounds (Table A.1). These confidence intervals are constructed by first constructing a 95% joint
confidence interval for a judge’s reweighed utility threshold τpw, dq, τpw, d1 q using test inversion based on the moment
inequalities in Theorem 1.4.2, and then constructing the implied prediction mistake δpw, dq{δpw, d1 q associated with
each pair τpw, dq, τpw, d1 q in the joint confidence set (Corollary 1.4.1). See Section 1.4.2 for theoretical details on the
implied prediction mistake. Source: Rambachan and Ludwig (2021).

201
Table A.10: Location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization behavior
at accurate beliefs about any pretrial misconduct risk.

Utility Functions Upc, y; wq


Race and Age Race and Felony Charge
Unadjusted Rejection Rate 84% 98%

White Defendants
Middle Deciles 0.00% 0.00%
Tail Deciles 4.76% 4.16%
Black Defendants
Middle Deciles 9.52% 16.66%
Tail Deciles 85.71% 79.16%
Notes: This table summarizes the location of the maximum studentized violation of revealed preference inequalities
among judges whose release decisions are inconsistent with expected utility maximization behavior at accurate beliefs
about pretrial misconduct risk and preferences that depend on both the defendant’s race and age as well as the
defendant’s race and whether the defendant was charged with a felony. Bounds on the failure to appear rate among
detained defendants are constructed using the judge leniency instrument (see Section 1.5.3). Among judge’s whose
release decision violate the revealed preference inequalities at the 5% level, I report the fraction of judges for whom the
maximal studentized violation occurs among white and black defendants at the tails of the any pretrial misconduct
predicted risk distribution (deciles 1-2, 9-10) and at the middle of the any pretrial misconduct predicted risk distribution
(deciles 3-8). The outcome Y ˚ is whether the defendant would commit any pretrial misconduct upon release (i.e., either
fail to appear in court or be re-arrested for a new crime). Source: Rambachan and Ludwig (2021).

202
A.9.4 Alternative Pretrial Release Definition

In Section 1.5 of the main text, I tested whether the pretrial release decisions of judges in New York

City were consistent with expected utility maximization behavior at accurate beliefs about failure

to appear risk under various exclusion restrictions on their preferences. To do so, I collapsed the

pretrial release decision into a binary choice of simply whether to release or detain the defendant.

However, in practice, judges in New York City choose what bail conditions and monetary

amount to set for a defendant. Defendants may either be “released on recognizance” (i.e.,

automatically released without any bail conditions) or the judge may set some monetary bail, in

which case the defendant is only released if they can post the set bail amount. To account for this,

I now extend my baseline empirical implementation by defining a judge’s choice as whether to

release the defendant on recognize.

Identification Result for Alternative Pretrial Release Definition

To develop this extension, I first apply the identification results in Section 1.2 to analyze this

modified decision problem. Let C P t0, 1u denote whether the judge chose “release on recognizance”

(C “ 1). Let W P W denote the directly payoff relevant defendant characteristics and X P X

denote the excluded defendant characteristics as before. The latent outcome is now defined as

the pair pR˚ , Y ˚ q, where R˚ P t0, 1u denotes whether the defendant would satisfy the monetary

bail condition set by the judge (i.e., would the defendant be able to pay the bail amount set by

the judge?) and Y ˚ P t0, 1u is whether the defendant would fail to appear in court if released.

Let R P t0, 1u denote the observed release decision. The observed release decision satisfies R “

C ` p1 ´ CqR˚ , meaning that the defendant is released if the judge selects release on recognizance

or the judge sets monetary bail conditions and the defendant satisfies them.

I assume that the judge’s utility function takes the same form as in Section 1.3 of the main

text. The judge receives some payoff if a defendant is released that goes on to fail to appear in

court or a defendant is detained that would not fail to appear in court. That is, I consider the

set of utility functions satisfying Upc, r˚ , y˚ ; wq “ Upr, y˚ ; wq, Up0, 1; wq “ 0, Up1, 0; wq “ 0 and

Up0, 0; wq ă 0, Up1, 1; wq ă 0.

I apply Theorem 1.2.1 to derive conditions under which the judge’s choices are consistent with

203
expected utility maximization behavior at accurate beliefs about both failure to appear risk and

the ability of defendant’s to meet the bail conditions. As in the main text, for each w P W , define

X 1 pwq :“ tx P X : π1 pw, xq ą 0u and X 0 pwq :“ tx P X : π0 pw, xq ą 0u.

Proposition A.9.1. Assume PY˚ p1 | 1, w, xq ă 1 for all pw, xq P W ˆ X with π1 pw, xq ą 0 and

PpR “ 0 | C “ 0, W “ w, X “ xq ą 0 for all pw, xq P W ˆ X with π0 pw, xq ą 0. The decision maker’s

choices are consistent with expected utility maximization behavior at some strict preference utility function

if and only if for all w P W

max PpY ˚ “ 1 | C “ 1, W “ w, X “ xq ď min PpY ˚ “ 1 | R “ 0, C “ 0, W “ w, X “ xq.


xPX 1 pwq xPX 0 pwq

Proof. The inequalities in Theorem 1.2.1 imply that the judge’s choices are consistent with expected

utility maximization behavior at accurate beliefs if and only if

(1) for all pw, xq P W ˆ X with PpC “ 1 | W “ w, X “ xq ą 0.

´Up0, 0; wq
PpY ˚ “ 1 | C “ 1, W “ w, X “ xq ď
´Up0, 0; wq ´ Up1, 1; wq

(2) for all pw, xq P W ˆ X with PpC “ 0 | W “ w, X “ xq ą 0,

PpY ˚ “ 1, R “ 1 | C “ 0qUp1, 1; wq ` PpY ˚ “ 0, R “ 0 | C “ 0qUp0, 0; wq ě

PpY ˚ “ 1 | C “ 0, W “ w, X “ xqUp1, 1; wq.

The conditions in (2) may be re-arranged as

PpY ˚ “ 0, R “ 0 | C “ 0, W “ w, X “ xqUp0, 0; wq ě PpY ˚ “ 1, R “ 0 | C “ 0, W “ w, X “ xqUp1, 1q,

where PpY ˚ “ 0, R “ 0 | C “ 0, W “ w, X “ xq “ PpR “ 0 | C “ 0, W “ w, X “ xq ´ PpY ˚ “ 1, R “

0 | C “ 0, W “ w, X “ xq. Substituting this in and re-arranging then delivers that

PpY ˚ “ 1, R “ 0 | C “ 0, W “ w, X “ xq p´Up0, 0; wq ´ Up1, 1; wqq ě

´PpR “ 0 | C “ 0, W “ w, X “ xqUp0, 0; wq.

The result is then immediate.

The judge’s choices are inconsistent with expected utility maximization behavior at accurate

204
beliefs about both failure to appear risk and the ability of defendant’s to satisfy the monetary bail

conditions if and only if the maximal failure to appear rate among defendants that were released

on recognizance is less than the minimal bound on the failure to rate among defendants that

could not satisfy their monetary bail conditions. The same dimension reduction techniques may

be applied from the main text to reduce the number of moment inequalities that must be tested.

Empirical Implementation and Results

To apply this identification result, I test whether the implied revealed preference inequalities

in Proposition A.9.1 are satisfied over the deciles of predicted failure to appear risk that were

constructed in Section 1.5 of the main text. I use the quasi-random assignment of judges to cases

to construct bounds on the unobservable failure to appear rate among detained defendants that

could not satisfy their monetary bail conditions. The only modification is that I now estimate the

observed failure to appear rate among defendants that were released on recognizance.

The results are summarized in Table A.11 below. I find that at least 32% of judges make

detectable prediction mistakes about failure to appear risk and the ability of defendants to satisfy

their monetary bail conditions. This suggests that a large fraction of judges are making prediction

mistakes in their joint predictions of failure to appear risk and the ability of defendants to satisfy

their monetary bail conditions in their release on recognizance vs. monetary bail decisions.

205
Table A.11: Estimated lower bound on the fraction of judges whose “release on recognizance”
decisions are inconsistent with expected utility maximization behavior at accurate beliefs about
behavior under bail conditions and failure to appear risk given defendant characteristics.

Utility Functions
No Characteristics Race Race ` Age Race ` Felony Charge
Adjusted Rejection Rate 32% 32% 32% 52%
Notes: This table summarizes the results of the robustness exercise to assess whether the “release on recognizance” vs
monetary bail decisions of judges are consistent with expected utility maximization behavior at strict preference utility
functions that either (i) do not depend on any characteristics, (ii) depend on the defendant’s race, (iii) depend on both
the defendant’s race and age, and (iv) depend on both the defendant’s race and whether the defendant was charged
with a felony offense. The outcome is defined to be whether the defendant would be released under the chosen bail
condition (i.e., either the judge decides to release the defendant on recognizance or the defendant satisfies the bail
conditions set by the judge) and FTA if released. I first construct the unadjusted rejection rate by testing whether the
pretrial release decisions of each judge in the top 25 are consistent with the moment inequalities in Corollary 1.3.4 at
the 5% level using the conditional least-favorable hybrid test using the same procedure described in Section 1.5.4. The
adjusted rejection rate reports the fraction of rejections after multiple hypothesis correction using the Holm-Bonferroni
step down procedure. See Section A.9.4 for discussion. Source: Rambachan and Ludwig (2021).

206
Appendix B

Appendix to Chapter 2

B.1 Proofs of Results for Assignments and Outputs

B.1.1 Proof of Theorem 2.3.1

To prove this result, we begin by rewriting ErYj,t`h 1tWk,t “ wk us. Notice that

ErYj,t`h 1tWk,t “ wk us

“ ErYj,t`h pW1:t´1 , wk , W´k,t , Wt`1:t`h q1tWk,t “ wk us

“ ErYj,t`h pW1:t´1 , wk , W´k,t , Wt`1:t`h qsEr1tWk,t “ wk us


` ˘
`Cov Yj,t`h pW1:t´1 , wk , W´k,t , Wt`1:t`h q, 1tWk,t “ wk u .

Therefore, it immediately follows that

ErYj,t`h | Wk,t “ wk s “ ErYj,t`h pW1:t´1 , wk , W´k,t , Wt`1:t`h qs


` ˘
Cov Yj,t`h pW1:t´1 , wk , W´k,t , Wt`1:t`h q, 1tWk,t “ wk u
` .
Er1tWk,t “ wk us
The result is then immediate by (i) applying the same calculation to ErYj,t`h 1tWk,t “ w1k us, (ii)

taking the difference, and (iii) applying the definition of Yj,t`h pwk q. l

207
B.1.2 Proof of Theorem 2.3.3

The style proof extends Angrist et al. (2000) in their analysis of the Wald estimand in a cross-

sectional setting. Begin by writing Yt`h “ Yt`h pWk,t q as


ż Wk,t
BYt`h pw̃k q
Yt`h “ Yt`h pwk q ` B w̃k
wk dw̃k
ż wk
BYt`h pw̃k q
“ Yt`h pwk q ` 1tw̃k ď Wk,t udw̃k
wk B w̃k

by the fundamental theorem of calculus. Then, it follows that

CovpYt`h , Wk,t q “ ErYt`h pWk,t ´ ErWk,t sqs


p1q
“ ErpYt`h ´ Yt`h pwk qqpWk,t ´ ErWk,t sq
«˜ż ¸ ff
wk
BYt`h pw̃k q
“E 1tw̃k ď Wk,t udw̃k pWk,t ´ ErWk,t sq
wk B w̃k
ż wk „ ȷ
BYt`h pw̃k q
“ E 1tw̃k ď Wk,t upWk,t ´ ErWk,t sq dw̃k
wk B w̃k
ż wk „ ȷ
p2q BYt`h pw̃k q
“ E E r1tw̃k ď Wk,t upWk,t ´ ErWk,t sqs dw̃k
wk B w̃k

where (1) and (2) follow since Wk,t KK tYt`h pwk q : wk P Wk u. Interchanging the order of the

derivation and the expectation delivers the result. Analogously,


ż Wk,t ż wk
Wk,t “ wk ` dw̃k “ wk ` 1tw̃k ď Wk,t udw̃k ,
wk wk

so
ż wk
VarpWk,t q “ ErpWk,t ´ wk qpWk,t ´ ErWk,t sqs “ E r1tw̃k ď Wk,t upWk,t ´ ErWk,t sqs dw̃k .
wk

The result then follows immediately. To see that the resulting weights are non-negative, observe

that for w̃k P rwk , wk s

E r1tWk,t ě w̃k u pWk,t ´ ErWk,t sqs

“ E r1tWk,t ě w̃k uWk,t s ´ Er1tWk,t ě w̃k usErWk,t s

“ pE rWk,t | Wk,t ě w̃k s ´ ErWk,t sq PpWk,t ě w̃k q ě 0

since E rWk,t | Wk,t ě w̃k s ě ErWk,t s for w̃k P rwk , wk s. l

208
B.1.3 Proof of Theorem 2.3.4

The proof is analogous to the proof of Theorem 2.3.1. We start by rewriting ErYj,t`h 1tWk,t “ wk u |

Ft´1 s, noticing that


ErYj,t`h 1tWk,t “ wk u | Ft´1 s

“ ErYj,t`h pw1:t´1
obs
, wk , W´k,t , Wt`1:t`h q1tWk,t “ wk u | Ft´1 s

“ ErYj,t`h pw1:t´1
obs
, wk , W´k,t , Wt`1:t`h q | Ft´1 sEr1tWk,t “ wk u | Ft´1 s
´ ¯
obs
`Cov Yj,t`h pw1:t´1 , wk , W´k,t , Wt`1:t`h q, 1tWk,t “ wk u | Ft´1 .

Therefore, we have shown that

ErYj,t`h | Wk,t “ wk , Ft´1 s “ ErYj,t`h pw1:t´1


obs
, wk , W´k,t , Wt`1:t`h q | Ft´1 s
` obs , w , W
˘
Cov Yj,t`h pw1:t´1 k ´k,t , Wt`1:t`h q, 1tWk,t “ wk u | Ft´1
` .
Er1tWk,t “ wk u | Ft´1 s
The result follows by (i) applying the same calculation to ErYj,t`h 1tWk,t “ w1k u | Ft´1 s, (ii) taking

the difference, and (iii) applying the definition of the potential outcome Yj,t`h pwk q. l

B.2 Proofs of Results for Assignments, Instruments and Outputs

B.2.1 Proof of Theorem 2.5.1

To prove this result, we first observe that

ErYj,t`h | Zt “ zs “ ErYj,t`h pw1:t´1


obs
, Wk,t pzq, W´k,t , Wt`1:t`h q | Zt “ zs

“ ErYj,t`h pw1:t´1
obs
, Wk,t pzq, W´k,t , Wt`1:t`h qs

by (iii). Therefore,

ErYj,t`h | Zt “ zs ´ ErYj,t`h | Zt “ z1 s

“ ErYj,t`h pw1:t´1
obs obs
, Wk,t pzq, W´k,t , Wt`1:t`h q ´ Yj,t`h pw1:t´1 , Wk,t pz1 q, W´k,t , Wt`1:t`h qs.

Next, we can further rewrite this previous expression as


ż Wj,t pzq ż
BYj,t`h pwk q BYj,t`h pwk q
Er dwk s “ Er 1tWk,t pz1 q ď wk ď Wk,t pzqudwk s
1
Wj,t pz q Bw k W Bw k

209
where we used the definition Yj,t`h pwk q :“ Yj,t`h pW1:t´1 , wk,t , W´k,t , Wt`1:t`h q. Finally, assuming

that we can exchange the order of integration and expectation, we arrive at


ż
BYj,t`h pwk q
Er 1tWk,t pz1 q ď wk ď Wk,t pzqusdwk
W Bwk
ż
BYj,t`h pwk q
“ Er , Wk,t p0q ď wk ď Wk,t p1qsEr1tWk,t pz1 q ď wk ď Wk,t pzqusdwk .
W Bw k

We may apply the same argument to the denominator (again assuming that we can exchange the

order of integration and expectation) to arrive at

ErWk,t | Zt “ zs ´ ErWk,t | Zt “ z1 s “
ż
ErWk,t pzq ´ Wk,t pz qs “
1
Er1tWk,t pz1 q ď wk ď Wk,t pzqus.
W

Taking the ratio then delivers the desired result. l

B.2.2 Proof of Theorem 2.5.3

The proof is the same as the Proof of Theorem 2.5.1, except we must now condition on Ft´1

throughout. l

B.3 Proofs of Results for Outputs

B.3.1 Proof of Theorem 2.7.1

Then, if the subsequent moments exist, we have that

Y Y
ErYj,t`h |pYk,t “ yk q, Ft´1 s “ ErYj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s, Assumption (i)
Y Y
“ ErErYj,t`h pW1:t`h q|pYk,t “ yk q, W1:t , Ft´1 s|Yk,t “ yk , Ft´1 s, Adam’s law
Y
“ ErErYj,t`h pW1:t`h q|W1:t qs|pYk,t “ yk q, Ft´1 s, Assumption (i)
Y
“ Erψj,t`h pW1:t q|pYk,t “ yk q, Ft´1 s, Assumption (ii)

the last line holds as the future assignments are not informed by the historical ones. Applying

this result twice gives the first result. l

210
Appendix C

Appendix to Chapter 3

This supplement contains the following sections: Section C.1 contains the proofs of the main

results in the paper, Section C.2 contains additional theoretical results discussed in the main text

of the paper, Section C.3 contains additional results from the simulation study and Section C.4

contains additional results for the empirical application to a panel experiment in experimental

economics.

C.1 Proofs of main results

Proof of Theorem 3.3.1

We begin the proof with a Lemma that will be used later on.

Lemma C.1.1. Assume a potential outcome panel with an assignment mechanism that is individ-

ualistic (Definition 3.2.3) and probabilistic (Assumption 3.3.1). Define, for any w P W pp`1q , the

random function Zi,t´p:t pwq :“ pi,t´p pwq´1 1tWi,t´p:t “ wu. Then, over the assignment mecha-

nism, EpZi,t´p:t pwq|Fi,t´p´1 q “ 1 and VarpZi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´1 p1 ´ pi,t´p pwqq, and

CovpZi,t´p:t pwq, Zi,t´p:t pw̃q|Fi,t´p´1 q “ ´1 for all w ‰ w̃. Moreover, Zi,t´p:t pwq and Zj,t´p:t pwq are,

conditioning on F1:N,t´p´1 , independent for i ‰ j.

Proof. The expectation is by construction, the variance comes from the variance of a Bernoulli trial.

The conditional independence is by the individualistic assignment assumption.

211
For any w, w̃ P W pp`1q , let ui,t´p pw, w̃; pq “ τ̂i,t pw, w̃; pq ´ τi,t pw, w̃; pq be the estimation error.

Now

obs obs
ui,t´p pw, w̃; pq “ Yi,t pwi,1:t´p´1 , wqpZi,t´p:t pwq ´ 1q ´ Yi,t pwi,1:t´p´1 , w̃qpZi,t´p:t pw̃q ´ 1q.

Hence the conditional expectation is zero by Lemma C.1.1. Then,

obs
Varpui,t´p pw, w̃; pq|Fi,t´p´1 q “ Yi,t pwi,1:t´p´1 , wq2 VarpZi,t´p:t pwq|Fi,t´p´1 q
obs
` Yi,t pwi,1:t´p´1 , w̃q2 VarpZi,t´p:t pw̃q|Fi,t´p´1 q
obs obs
´ 2Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̃qCovpZi,t´p:t pw̃q, Zi,t´p:t pw̃|Fi,t´p´1 q
obs
“ Yi,t pwi,1:t´p´1 , wq2 pi,t´p pwq´1 p1 ´ pi,t´p pwqq
obs
` Yi,t pwi,1:t´p´1 , w̃q2 pi,t´p pw̃q´1 p1 ´ pi,t´p pw̃qq
obs obs
´ 2Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̃q.

Simplifying gives the result on the variance of the estimation error. Then,

Covpui,t´p pw, w̃; pq, ui,t´p pw̄, ŵ; pq|Fi,t´p´1 q


obs obs
“ Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̄qCovpZi,t´p:t pwq, Zi,t´p:t pw̄q|Fi,t´p´1 q
obs obs
´ Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , ŵqCovpZi,t´p:t pwq, Zi,t´p:t pŵq|Fi,t´p´1 q
obs obs
´ Yi,t pwi,1:t´p´1 , w̃qYi,t pwi,1:t´p´1 , w̄qCovpZi,t´p:t pw̃q, Zi,t´p:t pw̄q|Fi,t´p´1 q
obs obs
Yi,t pwi,1:t´p´1 , w̃qYi,t pwi,1:t´p´1 , ŵqCovpZi,t´p:t pw̃q, Zi,t´p:t pŵq|Fi,t´p´1 q
obs obs obs obs
“ ´Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , w̄q ` Yi,t pwi,1:t´p´1 , wqYi,t pwi,1:t´p´1 , ŵq
obs obs obs obs
` Yi,t pwi,1:t´p´1 , w̃qYi,t pwi,1:t´p´1 , w̄q ´ Yi,t pwi,1:t´p´1 , w̃qYi,t pwi,1:t´p´1 , ŵq

Finally, conditional independence of the errors follows due to the individualistic assignment of

treatments. l

212
Proof of Proposition 3.3.1

The proof of this result is analogous to the proof of Theorem 3.3.1. We state the analogue of

Lemma C.1.1 for completeness.

Lemma C.1.2. Assume a potential outcome panel with an assignment mechanism that is individual-

istic (Definition 3.2.3) and probabilistic (Assumption 3.3.1). Define, for any w P W pp`1q , the ran-

dom function Vi,t´p:t pwq :“ pi,t´p pwq´2 1tWi,t´p:t “ wu. Then, over the assignment mechanism,

EpVi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´1 and VarpVi,t´p:t pwq|Fi,t´p´1 q “ pi,t´p pwq´3 p1 ´ pi,t´p pwqq,

and CovpVi,t´p:t pwq, Vi,t´p:t pw̃q|Fi,t´p´1 q “ pi,t´p pwq´1 pi,t´p pw̃q´1 for all w ‰ w̃. Moreover, Vi,t´p:t pwq

and Vj,t´p:t pwq are, conditioning on F1:N,t´p´1 , independent for i ‰ j.

2 pw, w̃; pq ´ γ2 pw, w̃; pq be the estimation error.


For any w, w̃ P W pp`1q , let vi,t´p pw, w̃; pq “ γ̂i,t i,t

Now

obs obs
vi,t´p pw, w̃; pq “ Yi,t pwi,1:t´p´1 , wq2 pVi,t´p:t pwq ´ pi,t´p pwq´1 q ` Yi,t pwi,1:t´p´1 , w̃q2 pVi,t´p:t pw̃q ´ p̃i,t´p pwq´1 q.

Therefore, the conditional expectation is zero by Lemma C.1.2. The conditional independence of

the errors follows due to the individualistic assignment of the treatments. l

Proof of Theorem 3.3.2

Only the third results requires a new proof. The first result is a reinterpretation of the classic cross-

sectional result using a triangular array central limit theorem, for the usual Lindeberg condition

must hold due to the bounded potential outcomes and the treatments being probabilistic. The

second result follows from results in Bojinov and Shephard (2019), who use a martingale difference

array central limit theorem.

The third result, which holds for NT going to infinity, can be split into three parts. For NT to

go to infinity we must have either: (i) T goes to infinity with N finite, (ii) N goes to infinity with

T finite, or (iii) both N and T go to infinity. In the case (i), we apply the martingale difference CLT

but now we have preaveraged the cross-sectional errors over the N terms for each time period.

The preaverage is still a martingale difference, so the technology is the same. In the case (ii) we

preaverage the time aspect. Then we are back to a standard triangular array CLT. As we have both

(i) and (ii), then (iii) must hold. l

213
Proof of Proposition 3.3.2

The unbiasedness statements follow directly from Proposition 3.3.1. The proofs of the consistency

statements are analogous to the proof of Theorem 3.3.2. The first result follows from an application

of the triangular array law of law of large numbers, which may be applied due to the bounded

potential outcomes and the treatments being probabilistic. The second statement follows from an

application of a martingale difference sequence law of large numbers (Theorem 2.13 in Hall and

Heyde (1980)). The third statement can be again proved in three cases: (i) T goes to infinity with

N finite, (ii) N goes to infinity with T finite, or (iii) both N and T go to infinity as in the proof of

Theorem 3.3.2 and applying the appropriate law of large numbers. l

Proof of Proposition 3.4.1

Begin by writing the observed outcomes as


t
ÿ
Yi,t “ Yi,t p0q ` β i,t,t´s Wi,s .
s“1

1 ř T řt
Similarly, write Ȳi¨ “ Ȳi¨ p0q ` βW i¨ , where βW i¨ “ T t“1 s“1 β i,t,t´s Wi,s . The transformed

outcome can be then written as


t
ÿ
Y
qi,t “ β i,t,t´s Wi,s ´ βW i¨ ` Y
qi,t p0q.
s“1

Consider the numerator of the unit fixed effects estimator. Substituting in, we arrive at
˜ ¸
N T N T N ÿ T t´1 N T
1 ÿÿq q 1 ÿÿ 1 ÿ ÿ 1 ÿÿq
Yi,t Wi,t “ β i,t,0 Wi,t Wi,t `
q β i,t,t´s Wi,s Wi,t `
q Yi,t p0qW
q i,t
NT NT NT NT
i“1 t“1 i“1 t“1 i“1 t“1 s“1 i“1 t“1
˜ ¸ ˜ ¸ ˜ ¸
T N T t´1 N T N
1 ÿ 1 ÿ 1 ÿ ÿ 1 ÿ 1 ÿ 1 ÿ
“ β i,t,0 Wi,t W
q i,t ` β i,t,t´s Wi,s W
q i,t ` Y
qi,t p0qW q i,t .
T t“1 N T t“1 s“1 N T t“1 N
i“1 i“1 i“1

214
Therefore, for fixed T as N Ñ 8,
˜ ¸
T N T
1 ÿ 1 ÿ p 1 ÿ
β i,t,0 Wi,t W
q i,t ÑÝ κqW,β,t,t ,
T t“1 N T t“1
i“1
˜ ¸
T t´1 N T t´1
1 ÿ ÿ 1 ÿ p 1 ÿÿ
β i,t,t´s Wi,s Wq i,t ÑÝ κqW,β,t,s ,
T t“1 s“1 N T t“1 s“1
i“1
˜ ¸
T N T
1 ÿ 1 ÿq p 1 ÿq
Yi,t p0qWqi Ñ Ý δt .
T t“1 N T t“1
i“1

1 řT ř N p 1 řT
q2 Ý 2 .
Similarly, the denominator converges to NT t“1 i“1 Wi,t Ñ T σW,t
t“1 q The result then follows

by Slutsky. l

Proof of Proposition 3.4.2

Begin by writing
t
ÿ
Yi,t “ Yi,t p0q ` β i,t,t´s Wi,s .
s“1

Then, Ȳ¨t “ Ȳ¨t p0q ` βW ¨t , Ȳi¨ “ Ȳi¨ p0q ` βW i¨ and Ȳ “ Ȳp0q ` βW. Therefore,
˜ ¸
t
9q 9q ÿ ` ˘ ` ˘
Yi,t “ Yi,t p0q ` β i,t,t´s Wi,s ´ βW ´ βW ´ βW ´ βW ´ βW . ¨t i¨
s“1

Consider the numerator of the unit fixed effects estimator. Substituting in,
N T N T N T t´1 N T
1 ÿ ÿ q9 | 9 1 ÿÿ 9 1 ÿÿÿ 9 1 ÿ ÿ q9 9
Yi,t Wi,t “ β i,t,0 Wi,t W
| i,t ` β i,t,t´s Wi,s W
| i,t ` Yi,t p0qW
| i,t
NT NT NT NT
i“1 t“1 i“1 t“1 i“1 t“1 s“1 i“1 t“1
˜ ¸ ˜ ¸ ˜ ¸
T N T N t´1 T N
1 ÿ 1 ÿ 9 1 ÿ 1 ÿÿ 9 1 ÿ 1 ÿ q9 9
“ β i,t,0 Wi,t Wi,t `
| β i,t,t´s Wi,s Wi,t `
| Yi,t p0qWi,t .
|
T t“1 N T t“1 N s“1
T t“1 N
i“1 i“1 i“1

Therefore,
N
1 ÿ 9 p
β i,t,0 Wi,t W
| Ý κq9 W,β,t,t ,
i,t Ñ
N
i“1
N t´1 t´1
1 ÿ ÿ 9 p ÿ 9
β i,t,t´s Wi,s W
| i,t Ñ
Ý κqW,β,t,s ,
N
i“1 s“1 s“1
N
1 ÿ 9
q9 i,t p0qW p 9
Y | Ý δqt .
i,t Ñ
N
i“1

A similar argument applies to the denominator and the result follows. l

215
C.2 Additional theoretical results

C.2.1 Prediction decomposition of the adapted propensity score

Recall the definition of the adapted propensity score in Section 3.3

pi,t´p pwq :“ PrpWi,t´p:t “ w|Wi,1:t´p´1 , Yi,1:t pWi,1:t´p´1 , wqq.

The adapted propensity score can be decomposed using individualistic assignment (Definition

3.2.3) and the prediction decomposition.

Lemma C.2.1. For a potential outcome panel satisfying individualistic assignment (Definition 3.2.3) and

any w P W pp`1q , the adapted propensity score can be factorized as

pi,t´p pwq “ PrpWi,t´p “ w1 |Wi,1:t´p´1 , Yi,1:t´p´1 pWi,1:t´p´1 qq


p
ź
ˆ PrpWi,t´p`s “ ws`1 |Wi,1:t´p´1 , Wi,t´p:t´p`s´1 “ w1:s , Yi,1:t´p`s´1 pWi,1:t´p´1 , w1:s qq.
s“1

Proof. Use the prediction decomposition for assignments, given all outcomes,

pi,t´p pwq “ PrpWi,t´p “ w1 |Wi,1:t´p´1 , Yi,1:t pWi,1:t´p´1 , wqq


p
ź
ˆ PrpWi,t´p`s “ ws`1 |Wi,1:t´p´1 , Wi,t´p:t´p`s´1 “ w1:s , Yi,1:t pWi,1:t´p´1 , wqq.
s“1

and then simplify using the individualistic assignment of treatments.

Even though the assignment mechanism is known, we only observe the outcomes along the
obs q, and so it is not possible to use Lemma C.2.1 to compute
realized assignment path Yi,1:t pwi,1:t

pi,t´p pwq for all assignment path. We can, however, compute the adapted propensity score along
obs
the observed assignment path, pi,t´p pwi,t´p:t q, since the associated outcomes are observed.

C.2.2 Estimation as a repeated cross-section

Denote Y9 1:N,t “ pY9 1,t , ..., Y9 N,t q1 , W


9 i,1:t “ pWi,t ´ W̄¨t , Wi,t´1 ´ W̄¨t´1 , ..., Wi,1 ´ W̄¨1 q1 and W
9 1:N,t “

9 1,1:t , ..., W
pW 9 N,1:t q1 . The least squares coefficient in the regression of Y9 1:N,t on W
9 1:N,t is β̂
1:N,t “
91 W
pW 9 ´1 9 1 9
1:N,t 1:N,t q W1:N,t Y1:N,t . Proposition C.2.1 derives the finite population limiting distribution of

β̂1:N,t as the number of units grows large.

216
Proposition C.2.1. Assume a potential outcome panel and consider the “control” only path, for 0 P W let
” ı
w̃i,1:t “ 0. Let µ9 i,t be the t ˆ 1 vector whose u-th element is E W9 i,t´pu´1q | F1:N,0,T and Ωi,t be the t ˆ t

9 i,t´pu´1q , W
matrix whose u, v-th element is CovpW 9 i,t´pv´1q |F1:N,0,T q. Additionally assume that:

1. The potential outcome panel is linear (Definitions 3.4.1) and homogeneous with βit ” βt “

pβ t,0 , . . . , β t,t´1 q for all t.

2. Wi,1:t is an individualistic stochastic assignment path and, over the randomization distribution,
2
VarpWi,t |F1:N,0,T q “ σW,i,t ă 8 for each i P rNs, t P rTs.

3. As N Ñ 8,

řN
(a) Non-stochastically, N ´1 i“1 Ωi,t Ñ Γ2,t , where Γ2,t is positive definite.
řN d
(b) N ´1{2 9
i“1 pWi,1:t Ý Np0, Γ1,t q.
´ µ9 i,t qY9 i,t p0q|F1:N,0,T Ñ
řN
(c) Non-stochastically, N ´1 9
i“1 Yi,t p0qµ
9 i,t Ñ δ9t .

Then, over the randomization distribution, as N Ñ 8,

? d
Np β̂1:N,t ´ βt ´ Γ´1 9 Ý Np0, Γ´1
2,t δt q|F1:N,0,T Ñ 2,t Γ1,t Γ2,t q.
´1

Proof. Under linear potential outcomes,


t´1
ÿ
Yi,t pWi,1:t q ´ Yi,t pW̃i,1:t q “ β i,t,s pWi,t´s ´ W̃i,t´s q.
s“0

Focus on the counterfactual W̃i,1:t “ 0, then


t´1
ÿ
Yi,t “ Yi,t pWi,1:t q “ Ȳ¨t p0q ` β i,t,s Wi,t´s ` Y9 i,t p0q.
s“0

Therefore, the within-period transformed outcome equals


t´1 N
ÿ 1 ÿ
Y9 i,t “ Yi,t ´ Ȳ¨t “ tβ i,t,s Wi,t´s ´ β j,t,s Wj,t´s u ` Y9 i,t p0q.
s“0
N
j“1

Further imposing homogeneity, it simplifies to


t´1 N
ÿ 1 ÿ
Y9 i,t “ tβ t,s pWi,t´s ´ Wj,t´s u ` Y9 i,t p0q.
s“0
N
j“1

217
Stacking everything across units, this becomes Y9 1:N,t “ W
9 1:N,t β t ` Y9 1:N,t p0q, and so the linear

projection coefficient is given by

91 W
β̂ t “ pW 9 ´1 9 1 9 91 9 ´1 9 1 9
1:N,t 1:N,t q W1:N,t Y1:N,t “ β t ` pW1:N,t W1:N,t q W1:N,t Y1:N,t p0q.

The important unusual point here is that Y9 1:N,t p0q is non-stochastic and that W
9 1:N,t is random,

exactly the opposite of the case often discussed in the statistical analysis of linear regression. Now
N
1 91 9 1:N,t “ 1
ÿ
9 i,1:t W
91 ,
W1:N,t W W i,1:t
N N
i“1

and
N N N
1 ÿ 9 1 ÿ 9 1 ÿ
Wi,1:t Y9 i,t p0q “ pWi,1:t ´ µ9 i,t qY9 i,t p0q ` µi,t Y9 i,t p0q.
N N N
i“1 i“1 i“1

Then, under the assumption of individualistic assignment (Definition 3.2.3),


N
1 ÿ 9 p
Wi,1:t W i,1:t Ý Γ2,t ,
9 1 |F1:N,0,T Ñ
N
i“1

recalling Y9 i,t p0q is non-stochastic and applying Assumptions 3(b) and 3(c), then Slutsky’s theorem

delivers the result. l

C.3 Additional simulation results

C.3.1 Additional simulations for the estimator of the total average dynamic causal
effects

Quantile-quantile plot for the normal approximation: Figure C.1 provides quantile-quantile

plots of the simulated randomization distribution for the estimator τ̄ˆ p1, 0; 0q presented in Section

3.5 of the main text.

Simulation results for the estimator of the lag-1 total weighted average dynamic causal effect,

τ̄ : p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for

the lag-1 total weighted average dynamic causal effect, τ̄ˆ : p1, 0; 1q. We choose the weights to av to

place equal weight on the future treatment paths. Figure C.2 plots the simulated randomization

distribution for τ̄ˆ : p1, 0; 1q and Figure C.3 plots the associated quantile-quantile plot. We observe

218
Figure C.1: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the quantile-quantile
plots simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 100.
Results are computed over 5,000 simulations. See Section 3.5 of the main text for further details.

219
that the normal approximation remains accurate for lagged dynamic causal effects.

Figure C.2: Simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure plots the simulated randomization distribution for τ̄ˆ : p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100, T “ 10. Panel (b) plots the simulated randomization distribution with
Cauchy distribution errors ϵi,t „ Cauchy and N “ 500, T “ 10. Results are computed over 5,000 simulations. See
Section 3.5 of the main text for further details.

220
Figure C.3: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100, T “ 10 (b) ϵi,t „ Cauchy, N “ 500, T “ 100
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ : p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details.

221
C.3.2 Simulations for the estimator of the time-t average dynamic causal effects

We present simulation results for our estimator of the time-t average dynamic causal effect,

τ̄ˆ¨,t p1, 0; 0q, with N “ 100 units when the potential outcomes are generated with normally dis-

tributed errors and N “ 50, 000 with Cauchy distributed errors.

Normal approximations and size control: Figure C.4 plots the randomization distribution for

the estimator of the contemporaneous time-t average dynamic causal effect, τ̄ˆ¨t p1, 0; 0q, under

the null hypothesis of β “ 0 for different combinations of the parameter ϕ P t0.25, 0.5, 0.75u and

treatment probability ppwq P t0.25, 0.5, 0.75u. When the errors ϵi,t are normally distributed, the

randomization distribution quickly converges to a normal distribution – the normal approximation

is accurate when there are only N “ 100 units in the experiment. As expected, when the errors are

Cauchy distributed, the number of units must be quite large for the randomization distribution to

become approximately normal. There is little difference in the results across the values of ϕ and

ppwq. Figure C.5 provides quantile-quantile plots of the simulated randomization distributions

to further illustrate the quality of the normal approximations. Testing based on the normal

asymptotic approximation controls size effectively, staying close to the nominal 5% level (the exact

rejection rates for the null hypothesis, H0 : τ̄¨t p1, 0; 0q “ 0 are reported in Table C.1).

Table C.1: Null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.

ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.046 0.048 0.048 0.25 0.031 0.031 0.034
ϕ 0.5 0.049 0.049 0.050 ϕ 0.5 0.048 0.039 0.043
0.75 0.050 0.049 0.045 0.75 0.052 0.047 0.057
(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This table summarizes the null rejection rate for the test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 based
upon the normal asymptotic approximation to the randomization distribution of τ̄ˆ¨t p1, 0; 0q. Panel (a) reports the null
rejection probabilities in simulations with ϵi,t „ Np0, 1q and N “ 100. Panel (b) reports the null rejection probabilities
in simulations with ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.

Rejection rates: Figure C.6 plots rejection rate curves against the null hypotheses as the parameter

β varies for different choices of the parameter ϕ and treatment probability ppwq in simulations

222
Figure C.4: Simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 iterations. See Section 3.5 of the
main text for further details on the simulation design.

223
Figure C.5: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.

224
with N “ 100 units. For p “ 0, the rejection rate against H0 : τ̄¨t p1, 0; 0q “ 0 quickly converges to

one as β moves away from zero across a range of simulations. This is encouraging as it indicates

that the conservative variance bound still leads to informative tests. However, when p “ 1, the

persistence of the causal effects ϕ has an important effect on the power of our tests. In particular,

when ϕ “ 0.25, the rejection rate against H0 : τ̄¨t: p1, 0; 1q “ 0 is quite low for all values of β – lower

values of ϕ imply less persistence in the causal effects across periods. When ϕ “ 0.75, there is

substantial persistence across periods and observe that the rejection rate curves improve for p “ 1.

Additionally, Figure C.7 shows the same power plots for N “ 1000 units. We again observe that

power is relatively low for low values of ϕ, but when ϕ “ 0.75, the rejection rate curves for p “ 0, 1

appear similar. This suggests that detecting dynamic causal effects requires larger sample sizes.

Figure C.6: Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.

Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and H0 :
τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄¨t p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄¨t: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 100.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.

Simulation results for the estimator of the lag-1, time-t weighted average dynamic causal effect,
:
τ̄¨,t p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for
:
the lag-1 total weighted average dynamic causal effect, τ̄ˆ¨,t p1, 0; 1q. We choose the weights to av to

225
Figure C.7: Rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and
H0 : τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.

Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄¨t p1, 0; 0q “ 0 and H0 :
τ̄¨t: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄¨t p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄¨t: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and N “ 1000.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.

place equal weight on the future treatment paths. Figure C.8 plots the simulated randomization
:
distribution for τ̄ˆ¨,t p1, 0; 1q and Figure C.9 plots the associated quantile-quantile plot. We observe

that the normal approximation remains accurate for lagged dynamic causal effects.

226
Figure C.8: Simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and N “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.

227
Figure C.9: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, N “ 100 (b) ϵi,t „ Cauchy, N “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆ¨t: p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and N “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and N “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.

228
C.3.3 Simulations for the estimator of the unit-i average dynamic causal effects

We present simulation results for our estimator of the unit-i average dynamic causal effect,

τ̄ˆi,¨ p1, 0; 0q, with T “ 100 time periods when the potential outcomes are generated with normally

distributed errors and T “ 50, 000 with Cauchy distributed errors.

Normal approximations and size control: Figure C.10 plots the randomization distribution

for τ̄ˆi¨ p1, 0; 0q. We see a similar pattern as before—when the errors are normally distributed, the

randomization distribution converges quickly to a normal distribution, but it takes longer to do so

when the errors are heavy-tailed. Figure C.11 provides quantile-quantile plots of the simulation

randomization distributions to further illustrate the quality of the normal approximations. The

null rejection rates for the hypothesis, H0 : τ̄i,¨ p1, 0; 0q “ 0 are reported in Table C.2 and, again, the

test controls size well across a wide range of parameters.

Figure C.10: Simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of the
parameter ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u,
and the columns index the treatment probability ppwq P t0.25, 0.5, 0.75u.

(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.

229
Table C.2: Null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based upon the
normal asymptotic approximation.

ppwq ppwq
0.25 0.5 0.75 0.25 0.5 0.75
0.25 0.052 0.047 0.054 0.25 0.031 0.031 0.034
ϕ 0.5 0.049 0.049 0.048 ϕ 0.5 0.048 0.039 0.043
0.75 0.058 0.046 0.054 0.75 0.052 0.047 0.057
(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This table summarizes the null rejection rate for the test of the null hypothesis H0 : τ̄i¨ p1, 0; 0q “ 0 based
upon the normal asymptotic approximation to the randomization distribution of τ̄ˆi¨ p1, 0; 0q. Panel (a) reports the null
rejection probabilities in simulations with ϵi,t „ Np0, 1q and T “ 100. Panel (b) reports the null rejection probabilities in
simulations with ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.

Figure C.11: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨ p1, 0; 0q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.

230
Rejection rates: Next, we investigate the rejection rate of the statistical test based on the normal

asymptotic approximation for H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 : τ̄i¨: p1, 0; 1q “ 0, plotting the rejection rates

in Figure C.12. For p “ 0, Once again, we observe that the rejection rate against H0 : τ̄i¨: p1, 0; 0q “ 0

has good power properties across a range of simulations. However, once again for p “ 1, our

conservative test has low power and the persistence of the causal effects ϕ has an important effect

on the power of our tests. Additionally, Figure C.13 shows the same power plots for T “ 1000

time periods. In this case, we observe that the conservative test has good power against the weak

null of no unit-i average dynamic causal effects for both p “ 0, 1. This suggests that detecting

unit-i average dynamic causal effects requires a long time dimension in the panel experiment.

Figure C.12: Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.

Notes: This figure plots the rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 :
τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The
rejection rate curve against H0 : τ̄i¨: p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄i¨: p1, 0; 1q “ 0
is plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and T “ 100.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.

Simulation results for the estimator of the lag-1, unit-i weighted average dynamic causal effect,
:
τ̄i,¨ p1, 0; 1q: We now present simulation results that analyze the properties of our estimator for
:
the lag-1 total weighted average dynamic causal effect, τ̄ˆi,¨ p1, 0; 1q. We choose the weights to av to

place equal weight on the future treatment paths. Figure C.14 plots the simulated randomization

231
Figure C.13: Rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and
H0 : τ̄i¨: p1, 0; 1q “ 0 as the parameter β varies under different choices of the parameter ϕ and
treatment probability ppwq.

Notes: This figure plots rejection probabilities for a test of the null hypothesis H0 : τ̄i¨: p1, 0; 0q “ 0 and H0 : τ̄i¨: p1, 0; 1q “ 0
as the parameter β varies under different choices of the parameter ϕ and treatment probability ppwq. The rejection
rate curve against H0 : τ̄i¨: p1, 0; 0q “ 0 is plotted in blue and the rejection rate curve against H0 : τ̄i¨: p1, 0; 1q “ 0 is
plotted in orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the treatment probability
ppwq P t0.25, 0.5, 0.75u. The simulations are conducted with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000.
Results are averaged over 5000 simulations. See Section 3.5 of the main text for further details on the simulation design.

:
distribution for τ̄ˆi,¨ p1, 0; 1q and Figure C.15 plots the associated quantile-quantile plot. We observe

that the normal approximation remains accurate for lagged dynamic causal effects.

232
Figure C.14: Simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the
parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, T “ 100 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure plots the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under different choices of the parameter
ϕ and treatment probability ppwq. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index the
treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the simulated randomization distribution with normally
distributed errors ϵi,t „ Np0, 1q and T “ 100. Panel (b) plots the simulated randomization distribution with Cauchy
distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are computed over 5,000 simulations. See Section 3.5 of the
main text for further details on the simulation design.

233
Figure C.15: Quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q
under different choices of the parameter ϕ and treatment probability ppwq.

(a) ϵi,t „ Np0, 1q, T “ 1000 (b) ϵi,t „ Cauchy, T “ 50, 000
Notes: This figure shows quantile-quantile plots for the simulated randomization distribution for τ̄ˆi¨: p1, 0; 1q under
different choices of the parameter ϕ and treatment probability ppwq. The quantile-quantile plots compare the quantiles
of the simulated randomization distribution (y-axis) against the quantiles of a standard normal random variable (x-axis).
The 45 degree line is plotted in solid orange. The rows index the parameter ϕ P t0.25, 0.5, 0.75u, and the columns index
the treatment probability ppwq P t0.25, 0.5, 0.75u. Panel (a) plots the quantile-quantile plots for simulated randomization
distribution with normally distributed errors ϵi,t „ Np0, 1q and T “ 1000. Panel (b) plots the quantile-quantile plots
simulated randomization distribution with Cauchy distribution errors ϵi,t „ Cauchy and T “ 50, 000. Results are
computed over 5,000 simulations. See Section 3.5 of the main text for further details on the simulation design.

234
C.4 Additional empirical results

C.4.1 Analysis of unit and time-specific average dynamic causal effects

We estimate unit-specific average dynamic causal effects in the panel experiment conducted by

Andreoni and Samuelson (2006). We focus on two randomly selected units in the experiment

and construct estimates of their average i, t-th lag-0 dynamic causal effect, τi,t p1, 0; 0q (Definition

3.2.5). Figure C.16 shows the nonparametric estimates τ̂i,t p1, 0; 0q for t P rTs, for the two units. The

figure also contains the nonparametric estimate of the average unit-i lag-0 dynamic causal effect
řT
τ̄i¨ p1, 0; 0q “ T1 t“1 τ̂i,t p1, 0; 0q. The result shows that the point estimate of the average unit-i lag-0

dynamic causal effect is positive for both units, suggesting that a larger value of λ in the current

game increases the likelihood of cooperation for both units. Since each unit only plays a total of

twenty rounds, the estimated variance of these unit-specific estimators is quite large.

Figure C.16: Estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5)
of W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of Andreoni and
Samuelson (2006).

Notes: This figure plots estimates of the weighted average i, t-th lag-0 dynamic causal effect (Definition 3.2.5) of
W “ 1tλ ě 0.6u on cooperation in period one for two units in the experiment of Andreoni and Samuelson (2006). The
solid black line plots the nonparametric estimator τ̂i,t p1, 0; 0q given in Remark 3.3.1. The dashed black line plots the
running average of the period-specific estimator for each unit: for each t P rTs, 1t ts“1 τ̂i,s p1, 0; 0q. The dashed red line
ř
řT
plots the estimated weighted average unit-i lag-0 dynamic causal effect, τ̄ˆi¨ p1, 0; 0q “ T1 t“1 τ̂i,t p1, 0; 0q.

We next estimate period-specific, weighted average dynamic causal effects that pool informa-

235
tion across units in order to gain precision. For each time period t P rTs, we construct estimates

based on the nonparametric estimator of the weighted average time-t, lag-p dynamic causal effect
řN :
τ̄¨t: p1, 0; pq “ N1 i“1 τi,t p1, 0; pq for p “ 0, 1, 2, 3. For each value of p, the dashed black line in

Figure C.17 plots the estimates τ̄ˆ¨t: p1, 0; pq and the grey region plots a 95% pointwise conservative

confidence band for the period-specific weighted average dynamic causal effects. For each value of

p, there appears to be some heterogeneity in the period-specific weighted causal dynamic causal

effects across time periods.

To further investigate these dynamic causal effects, the solid blue line in Figure C.17 plots the

nonparametric estimator the total lag-p weighted average causal effect τ̄ : p1, 0; pq for p “ 0, 1, 2, 3,

which further pools information across all units and time periods. The dashed blue lines plot the

conservative confidence interval for the total lag-p weighted average causal effect. See the main

text for further discussion of the total lag-p weighted average causal effect estimates.

236
Figure C.17: Estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq
of W “ 1tλ ě 0.6u on cooperation in period one based on the experiment of Andreoni and
Samuelson (2006) for each time period t P rTs and p “ 0, 1, 2, 3.

Notes: This figure plots estimates of the time-t lag-p weighted average dynamic causal effect, τ̄¨t: p1, 0; pq of W “ 1tλ ě
0.6u on cooperation in period one based on the experiment of Andreoni and Samuelson (2006) for each time period
t P rTs and p “ 0, 1, 2, 3. The black dashed line plots the nonparametric estimator of the time-t lag-p weighted average
dynamic causal effect, τ̄ˆ¨t: p1, 0; pq, for each period t P rTs. The grey region plots the 95% point-wise confidence band for
τ̄¨t: p1, 0; pq based on the conservative estimator of the asymptotic variance of the nonparametric estimator (Theorem
3.3.2). The solid blue line plots the nonparametric estimator of the total lag-p weighted average dynamic causal effect,
τ̄ˆ : p1, 0; pq and the dashed blue lines plot the 95% confidence interval for τ̄ : p1, 0; pq based on the conservative estimator
of the asymptotic variance of the nonparametric estimator.

237

You might also like