0% found this document useful (0 votes)
14 views25 pages

List 00780

Uploaded by

holypeace2947
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

List 00780

Uploaded by

holypeace2947
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

How Can We Make Experimental Research

Results More Reliable and Replicable?

John A. List, U. Chicago, ANU, and NBER


@Econ_4_Everyone
Building Confidence in (and Knowledge from)
Experimental Results
1. Scientific research aims to create a stock of knowledge.
Optimally adding to this stock requires confidence in the
received estimates.
2. A key question concerning confidence revolves around the
query: after a research finding has been claimed, what is the
post-study probability that it is true?
3. Two unique features of the experimental approach situate it
well to deepen the stock of scientific knowledge: selective
data generation and the ability to enhance the notion, and role,
of replications.
A Simple Bayesian Framework
to Build Knowledge

PSP: Probability that a declaration of a research


finding, made upon reaching statistical
significance, is true.
α: Level of statistical significance
1 - β: Level of power
π: Can think of this as the prior
Some Inference
Exhibit 15.5: PSPs With and Without a Statistically Significant Finding

Power

0.20 0.30 0.50 0.70 0.80

PSP (reject null)

0.01 0.04 0.06 0.09 0.12 0.14

0.05 0.17 0.24 0.34 0.42 0.46

0.10 0.31 0.40 0.53 0.61 0.64

0.20 0.50 0.60 0.71 0.78 0.80

0.30 0.63 0.72 0.81 0.86 0.87

0.40 0.73 0.80 0.87 0.90 0.91

0.50 0.80 0.86 0.91 0.93 0.94


What Can Go Wrong?
Controlling the False Positive Rate

 Statistical Error (alpha)


 Human Error (how we generate/evaluate/interpret data)
 Human Fraud (less rare than we hope)

 Import of replication becomes clear


One Example of Human Error: MHT

Reported P-value
.5

.4

.3
Fraction

.2

.1

0
0.05 1.0
Corrected P-values

Holm-corrected P-value
.5

.4

.3
Fraction

.2

.1

0
0.05 1.0
Building Confidence: What Can We Do?

1) Reduce Bias

2) Promote Transparency

3) Promote Scrutiny
1. One Kind of Bias
Common belief is that significant results have much
greater import than reporting null results.

 Scientific journals might prefer statistically


significant “newsworthy” results
 Funders might reward scholars who produce
noteworthy insights
 Ultimately, scientists might conclude that journal
publications and streams of funding matter a great
deal in tenure decisions
Yet, from a scientific perspective of building
knowledge, such skewed preferences are flawed
(see List, 2024).
Null Results Are Informative Too

Exhibit 15.5: PSPs With and Without a Statistically Significant Finding

Power

0.20 0.30 0.50 0.70 0.80

PSP (reject null)

0.01 0.04 0.06 0.09 0.12 0.14

0.05 0.17 0.24 0.34 0.42 0.46

0.10 0.31 0.40 0.53 0.61 0.64

0.20 0.50 0.60 0.71 0.78 0.80

0.30 0.63 0.72 0.81 0.86 0.87

0.40 0.73 0.80 0.87 0.90 0.91

0.50 0.80 0.86 0.91 0.93 0.94

PSP (null result)

0.01 0.01 0.01 0.01 0.00 0.00

0.05 0.04 0.04 0.03 0.02 0.01

0.10 0.09 0.08 0.06 0.03 0.02

0.20 0.17 0.16 0.12 0.07 0.05

0.30 0.27 0.24 0.18 0.12 0.08

0.40 0.36 0.33 0.26 0.17 0.12

0.50 0.46 0.42 0.34 0.24 0.17


Implications
 If our goal is to build scientific knowledge,
then recognizing and rewarding null
results, especially those that move priors,
is important

 Side benefit: will reduce the level of bias


in our science.
2. Promote Transparency
 Pre-Registration (must be well timed)
 Pre-Analysis Plans (must be well timed)
 Registered Reports (not for all journals)

Scientific transparency alone does not verify the validity


of the received results. Rather, it permits an exploration
of the received claims.

In this manner, transparency and scrutiny are


complements in enhancing knowledge building.
Implications
 When building scientific knowledge it is important to
understand that there is a crucial distinction between
the probability that a reported significant finding in the
literature represents a real relationship and the
probability that an individual experiment has
uncovered a real relationship.

 Side benefits of enhanced transparency: reduces bias


and provides a better depiction of what the literature is
finding.
3. Promote Scrutiny (Replications)
Pure replication: examine same question using the underlying original data set.
Robustness analysis: use the exact same data as the original analysis but modify
the data or the empirical methods to see if the results are robust
Same population replication: running a new experiment closely following the
original protocol to test whether similar results can be generated using random
draws from the same underlying population.
Similar population replication: conducting an experiment with UCLA
undergraduates to replicate a previous lab experiment conducted with University
of Maryland undergraduates.
Disparate population replication: examining the same question and model using
a population dissimilar from the original experiment.
Finally, the sixth and broadest replication category entails testing the hypotheses
of the original study using a new research design; this as a conceptual
replication,
How Fast Can We Build Confidence?


The Power of Replication

Power (1-β) = 0.80 Power (1-β) = 0.50

PSP PSP

0.01 0.14 0.72 0.98 1.00 0.09 0.50 0.91 0.99

0.02 0.25 0.84 0.99 1.00 0.17 0.67 0.95 1.00

0.05 0.46 0.93 1.00 1.00 0.34 0.84 0.98 1.00

0.10 0.64 0.97 1.00 1.00 0.53 0.92 0.99 1.00

0.20 0.80 0.98 1.00 1.00 0.71 0.96 1.00 1.00

0.30 0.87 0.99 1.00 1.00 0.81 0.98 1.00 1.00

0.40 0.91 0.99 1.00 1.00 0.87 0.99 1.00 1.00

0.50 0.94 1.00 1.00 1.00 0.91 0.99 1.00 1.00


What About Other Types of Replication?


Power

0.20 0.30 0.50 0.70 0.80

ɖ = 0.00

0.01 0.04 0.06 0.09 0.12 0.14

0.05 0.17 0.24 0.34 0.42 0.46

0.10 0.31 0.40 0.53 0.61 0.64

0.20 0.50 0.60 0.71 0.78 0.80

0.30 0.63 0.72 0.81 0.86 0.87

0.40 0.73 0.80 0.87 0.90 0.91

0.50 0.80 0.86 0.91 0.93 0.94

ɖ = 0.10

0.01 0.02 0.03 0.04 0.05 0.05

0.05 0.09 0.12 0.17 0.21 0.23

0.10 0.18 0.22 0.30 0.36 0.39

0.20 0.33 0.39 0.49 0.56 0.59

0.30 0.45 0.52 0.62 0.68 0.71

0.40 0.56 0.63 0.72 0.77 0.79

0.50 0.66 0.72 0.79 0.83 0.85

ɖ = 0.25

0.01 0.01 0.02 0.02 0.03 0.03

0.05 0.07 0.08 0.10 0.12 0.13

0.10 0.13 0.16 0.19 0.23 0.25

0.20 0.26 0.29 0.35 0.40 0.43

0.30 0.37 0.41 0.48 0.54 0.56

0.40 0.48 0.52 0.59 0.64 0.66

0.50 0.58 0.62 0.68 0.73 0.75

ɖ = 0.50

0.01 0.01 0.01 0.01 0.02 0.02

0.05 0.06 0.06 0.07 0.08 0.08

0.10 0.11 0.12 0.14 0.15 0.16

0.20 0.22 0.24 0.26 0.29 0.30

0.30 0.33 0.35 0.38 0.41 0.42

0.40 0.43 0.45 0.49 0.52 0.53

0.50 0.53 0.55 0.59 0.62 0.63


Implications
 When building knowledge it is important to have rapid scrutiny
so the course of science can quickly correct.

 We always focus on false positives but one might argue that


researchers’ tolerance for false negatives has potentially
irreversible effects on the development of scientific knowledge:
since false negative results are less likely to be followed up than
false positives, self-correction is less likely to occur in these
cases.

 Side benefits of scrutiny: reduces bias and provides a better


depiction of what the literature is finding.
What Could Go Wrong?
 The Great Endangered Species!
 Maniadis et al. (2017) survey experimental papers published between
1975–2014 in the top 150 journals in economics and estimate that the
fraction of replication studies among all experimental papers in their
sample is 4.2%.

 Changing Incentives
 Replications typically bring little recognition (few journals interested) and
even induce scorn.
 JESA, JPE:Micro are promising steps.
 Need to change authors’ incentives to collaborate with replicators. Should
positive replications of one’s work be considered a “super cite”?
Promoting Reproducibility
 Butera and List (2017): original investigators
of a study commit to only publishing their
results as a working paper and offer
coauthorship of a second paper to others who
are willing to replicate.

 Dreberet al. (2015) suggest using prediction


markets with experts as quick and low-cost
ways to obtain information about
reproducibility.
Cites
Abrams, Eliot, Jonathan Libgober, and John A. List. 2020. “Research Registries: Facts,
Myths, and Possible Improvements.” NBER.
Alevy, Jonathan, John List, and Wiktor Adamowicz. 2010. “How Can Behavioral
Economics Inform Non-Market Valuation? An Example from the Preference Reversal
Literature.” National Bureau of Economic Research, Inc, NBER Working Papers 87
(January). https://doi.org/10.3386/w16036.
Benjamin, Daniel J., James O. Berger, Magnus Johannesson, Brian A. Nosek, E.-J.
Wagenmakers, Richard Berk, Kenneth A. Bollen, et al. 2017. “Redefine Statistical
Significance.” Nature Human Behaviour 2 (1): 6–10. https://doi.org/10.1038/s41562-
017-0189-z.
Butera, Luigi, Philip J. Grossman, Daniel Houser, John A. List, and Marie-Claire
Villeval. 2020. “A New Mechanism to Alleviate the Crises of Confidence in Science-
With An Application to the Public Goods Game,” Working Paper Series, , February.
https://doi.org/10.3386/w26801.
Butera, Luigi, and John A. List. 2017. “An Economic Approach to Alleviate the Crises
of Confidence in Science: With an Application to the Public Goods Game,” Working
Paper Series, , April. https://doi.org/10.3386/w23335.
Cites
Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael
Kirchler, et al. 2016. “Evaluating Replicability of Laboratory Experiments in Economics.” Science 351 (6280):
1433–36. https://doi.org/10.1126/science.aaf0918.
Dreber, Anna, Thomas Pfeiffer, Johan Almenberg, Siri Isaksson, Brad Wilson, Yiling Chen, Brian A. Nosek,
and Magnus Johannesson. 2015. “Using Prediction Markets to Estimate the Reproducibility of Scientific
Research.” Proceedings of the National Academy of Sciences 112 (50): 15343–47.
https://doi.org/10.1073/pnas.1516179112.
Levitt, Steven D., and John A. List. 2009. “Field Experiments in Economics: The Past, the Present, and the
Future.” European Economic Review 53 (1): 1–18. https://doi.org/10.1016/j.euroecorev.2008.12.001.
List, John A. 2004. “Neoclassical Theory versus Prospect Theory: Evidence from the Marketplace.”
Econometrica 72 (2): 615–25.
Maniadis, Zacharias, Fabio Tufano, and John A. List. 2014. “One Swallow Doesn’t Make a Summer: New
Evidence on Anchoring Effects.” American Economic Review 104 (1): 277–90.
https://doi.org/10.1257/aer.104.1.277.
———. 2015. “How to Make Experimental Economics Research More Reproducible: Lessons from Other
Disciplines and a New Proposal.” Research in Experimental Economics 18 (January).
https://doi.org/10.1108/S0193-230620150000018008.
———. 2017. “To Replicate or Not to Replicate? Exploring Reproducibility in Economics through the Lens of a
Model and a Pilot Study.” The Economic Journal 127 (605): F209–35. https://doi.org/10.1111/ecoj.12527.
Tufano, Fabio, and John A. List. 2021. “On the Importance of ‘Null Effects’ in Economics.” Unpublished
Manuscript.

You might also like