Towards Bayesian Model-Based Demography: Agency, Complexity and Uncertainty in Migration Studies
Towards Bayesian Model-Based Demography: Agency, Complexity and Uncertainty in Migration Studies
Jakub Bijak
Towards Bayesian
Model-Based
Demography
Agency, Complexity and Uncertainty in
Migration Studies
With contributions by
Philip A. Higham · Jason Hilton · Martin Hinsch
Sarah Nurse · Toby Prike · Oliver Reinhardt
Peter W.F. Smith · Adelinde M. Uhrmacher
Tom Warnke
Methodos Series
Volume 17
Series Editors
Daniel Courgeau, Institut National d’Études Démographiques
Robert Franck, Université Catholique de Louvain
Editorial Board
Peter Abell, London School of Economics
Patrick Doreian, University of Pittsburgh
Sander Greenland, UCLA School of Public Health
Ray Pawson, Leeds University
Cees Van De Eijk, University of Amsterdam
Bernard Walliser, Ecole Nationale des Ponts et Chaussées, Paris
Björn Wittrock, Uppsala University
Guillaume Wunsch, Université Catholique de Louvain
This Book Series is devoted to examining and solving the major methodological
problems social sciences are facing. Take for example the gap between empirical
and theoretical research, the explanatory power of models, the relevance of
multilevel analysis, the weakness of cumulative knowledge, the role of ordinary
knowledge in the research process, or the place which should be reserved to “time,
change and history” when explaining social facts. These problems are well known
and yet they are seldom treated in depth in scientific literature because of their
general nature. So that these problems may be examined and solutions found, the
series prompts and fosters the setting-up of international multidisciplinary research
teams, and it is work by these teams that appears in the Book Series. The series can
also host books produced by a single author which follow the same objectives.
Proposals for manuscripts and plans for collective books will be carefully examined.
The epistemological scope of these methodological problems is obvious and
resorting to Philosophy of Science becomes a necessity. The main objective of the
Series remains however the methodological solutions that can be applied to the
problems in hand. Therefore the books of the Series are closely connected to the
research practices.
Towards Bayesian
Model-Based Demography
Agency, Complexity and Uncertainty
in Migration Studies
With contributions by
Philip A. Higham • Jason Hilton • Martin Hinsch
Sarah Nurse • Toby Prike • Oliver Reinhardt
Peter W. F. Smith • Adelinde M. Uhrmacher
Tom Warnke
Jakub Bijak
Social Statistics & Demography
University of Southampton
Southampton, UK
With contributions by
Philip A. Higham Jason Hilton
University of Southampton University of Southampton
Southampton, UK Southampton, UK
Martin Hinsch Sarah Nurse
University of Southampton University of Southampton
Southampton, UK Southampton, UK
Toby Prike Oliver Reinhardt
University of Southampton University of Rostock
Southampton, UK Rostock, Germany
Peter W. F. Smith Adelinde M. Uhrmacher
University of Southampton University of Rostock
Southampton, UK Rostock, Germany
Tom Warnke
University of Rostock
Rostock, Germany
Methodos Series
ISBN 978-3-030-83038-0 ISBN 978-3-030-83039-7 (eBook)
https://doi.org/10.1007/978-3-030-83039-7
© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit
to the original author(s) and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this book are included in the book’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To those who had to leave their homes to find
a better future elsewhere
Foreword
This book, perfectly in line with the aims of the Methodos Series, proposes micro-
foundations for migration and other population studies through the development of
model-based methods involving Bayesian statistics. This line of thought follows
and completes two previous volumes of the series. First, the volume Probability and
social science, which I published in 2012 (Courgeau, 2012), shows that Bayesian
methods overcome the main difficulties that objective statistical methods may
encounter in social sciences. Second, the volume Methodological Investigations in
Agent-Based Modelling, published by Eric Silverman (2018), shows that its research
programme adds a new avenue of empirical relevance to demographic research.
I would like to highlight here the history and epistemology of some themes of
this book, which seem to be very promising and important for future research.
The notion of probability originated with Blaise Pascal’s treatise of 1654 (Pascal
1654). As he was dealing with games of pure chance, i.e., assuming that the dice on
which he was reasoning were not loaded, Pascal was addressing objective probabil-
ity, for the chances of winning were determined by the fact that the game had not
been tampered with. However, he took the reasoning further in 1670, introducing
epistemic probability for unique events, such as the existence of God. In a section
of the Pensées (Pascal 1670), he showed how an examination of chance may lead to
a decision of theological nature. Even if we can criticise its premises, this reasoning
seems near to the Bayesian notion of epistemic probability introduced one hundred
years later by Thomas Bayes (1763), defined in terms of the knowledge that human-
ity can have of objects.
Let us see in more detail how these two principal concepts differ.
The objectivist approach assumes that the probability of an event exists indepen-
dently of the statistician, who tries to estimate it through successive experiments. As
the number of trials tends to infinity, the ratio of the cases where the event occurs to
vii
viii Foreword
the total number of observations tends towards this probability. But the very hypoth-
esis that this probability exists cannot be clearly demonstrated. As Bruno de Finetti
said clearly: probability does not exist objectively, that is, independently of the
human mind (De Finetti, 1974).
The epistemic approach, in contrast, focuses on the knowledge that we can have
of a phenomenon. The epistemic statistician takes advantage of new information on
this phenomenon to improve his or her opinion a priori on its probability, using
Bayes’ theorem to calculate its probability a posteriori. Of course, this estimate
depends on the chosen probability a priori, but when this choice is made with
appropriate care, the result will be considerably improved relative to the objective
probability.
When it comes to using these two concepts in order to make a decision, the two
approaches differ even more. When an objectivist provides a 95% confidence inter-
val for an estimate, they can only say that if they were to draw a large number of
samples of the same size, then the unknown estimate would lie in the confidence
interval they constructed 95% of the time. Clearly, this complex definition does not
fit with what might be expected of it. The Bayesians, in contrast, starting from their
initial hypotheses, can clearly state that a Bayesian 95% credibility interval indi-
cates an interval in which they were justified in thinking that there was a 95% prob-
ability of finding the unknown parameter.
One may wonder why the Bayesian approach, which seems better suited for the
social sciences and demography, has taken so long to gain acceptance among
researchers in these domains. The first reason is the complexity of the calculations,
which computers can now undertake. The example of Pierre-Simon de Laplace
(1778), who presented the complex calculations and approximations (twenty pages
mainly devoted to formulae) in order to solve, with the epistemic approach, a simple
problem involving comparing the birth frequencies of girls and boys, is a good
explanation of this reason. A second reason is a desire for an objective demography,
drawing conclusions from data alone, with a minimal role for personal judgement.
Jakub Bijak was one of the first demographers to use Bayesian models, for
migration forecasting (Bijak, 2010). He showed that the Bayesian approach can
offer an umbrella framework for decision-making, by providing a coherent mecha-
nism of inference. In this book, with his colleagues, he provides us with a more
complete analysis of Bayesian modelling for demography.
Social sciences, and more particularly demography, were launched by John Graunt
(1662), just eight years later than the notion of probability was conceived. In his
volume on the Bills of Mortality, Graunt used an objective probability model to
estimate the age-specific probabilities of dying, under hypotheses that were rough,
but the only conceivable ones at this time (Courgeau, 2012, pp. 28–34).
Foreword ix
Later, Leonard Euler (1760) extended Graunt’s model to the reproduction of the
human species, introducing fertility and mortality. He used three hypotheses in
order to justify his model. The first was based on the vitality specific to humans,
measured by the probability of dying at each age for the members of a given popula-
tion. These probabilities were assumed to remain the same in the future. The second
hypothesis was based on the principle of propagation, which depended on marriage
and fertility, measured by a rough approximation of fertility in a population. Again,
these probabilities were to remain constant in the future. The third and last hypoth-
esis was that the two principles of mortality and propagation are independent of
each other. From these principles, Euler could calculate all the other probabilities
that population scientists would want to estimate. Again, this model was computed
under the objectivist probability assumptions and led to the concept of a stable
population.
Later, in the twentieth century, Samuel Preston and Ansley Coale (1982) gener-
alised this model to other populations, leading to a broader set of models of popula-
tion dynamics: stable, semi-stable, and quasi-stable populations (Bourgeois-Pichat,
1994). These models were always designed assuming the objectivist interpretation
of probability.
More recently, Francesco Billari and Alexia Prskawetz (2003) introduced the
agent-based approach, already in use in many other disciplines (sociology, biology,
epidemiology, technology, network theory, etc.) since 1970, to demography. This
approach was first based on using objectivist probabilities, but more recently
Bayesian inference techniques were introduced as an alternative methodology to
analyse simulation models.
For Billari and Prskawetz, agent-based models pre-suppose the rules of behav-
iour and enable verifying, whether these micro-based rules can explain macroscopic
regularities. Hence, these models start from pre-suppositions, as hypothetical theo-
retical models, but there is no clear way to construct these pre-suppositions, nor to
verify if they are really explaining some macroscopic regularity. The choice of a
behavioural theory hampers the widespread use of agent-based rules in demogra-
phy, and depending on the selected theoretical model, the results produced by the
agent-based model may be very different.
A second criticism of agent-based models had been formulated by John Holland
(2012, p. 48). He said that “agent-based models offer little provision for agent con-
glomerates that provide building blocks and behaviour at higher orders of organisa-
tion.” Indeed, micro-level rules find hardly a link with aggregate-level rules, and it
seems difficult to think that macro-level rules may always be modelled with a micro
approach: such rules generally transcend the behaviours of the component agents.
Finally, Rosaria Conte and colleagues (2012, p. 340) wondered, “how to find out
the simple local rules? How to avoid ad hoc and arbitrary explanations? […] One
criterion has often been used, i.e., choose the conditions that are sufficient to gener-
ate a given effect. However, this leads to a great deal of alternative options, all of
which are to some extent arbitrary.”
In front of these criticisms, this book gives preference to a model-based approach,
which had already been proposed by us in Courgeau et al. (2016). This approach is
x Foreword
Conclusion
This historical and epistemological foreword on the two main and justified
approaches relied on in this book by Jakub Bijak and his colleagues, Bayesian mod-
elling and model-based demography, leaves aside many other important points that
the reader will discover: migration theory, more particularly international migration
theory; simulation in demography, with the very interesting set of Routes and
Rumours models; cognition and decision making; computational challenges solved;
replicability and transparency in modelling; and many more.
I greatly hope that that the reader will discover the importance of these
approaches, not only for demography and migration studies but also for all other
social sciences.
References
Lead Author
Contributors
xi
xii About the Authors
Peter W. F. Smith is Professor of Social Statistics in the Department of Social
Statistics and Demography at the University of Southampton, and a Fellow of the
British Academy. His research includes developing statistical methods for handling
non-response, longitudinal data, and applications in demography, medicine, and
health sciences. His email address is [email protected].
Adelinde M. Uhrmacher is Professor at the Institute for Visual and Analytic
Computing of the University of Rostock and head of the Modeling and Simulation
Group. She holds a PhD in Computer Science from the University of Koblenz and a
Habilitation in Computer Science from the University of Ulm. Her email address is
[email protected].
xiii
xiv Acknowledgements
In addition, some specific ideas presented in this book emerged through interac-
tions and discussions with colleagues across different areas of modelling as well as
migration research and practice. In particular, we are grateful to the organisers and
participants of the following workshops: ‘Uncertainty and Complexity of Migration’,
held in London on 20–21 November 2018; ‘Rostock Retreat on Simulation’, organ-
ised in Rostock on 1–3 July 2019; ‘Agent-Based Models for Exploring Public Policy
Planning’, held at the Lorentz Center@Snellius in Leiden on 15–19 July 2019; and
‘Modelling Migration and Decisions’, organised in Southampton on 21 January
2020. Particular credit, in the non-individually attributable manner, compliant with
the requirements of the Chatham House rule, goes to (also in alphabetical order):
Rob Axtell, David Banks, Ann Blake, Alexia Fürnkranz-Prskawetz, André Grow,
Katarzyna Jaśko, Leah Johnson, Nico Keilman, Ben Klemens, Elzemiek Kortlever,
Giampaolo Lanzieri, Antonietta Mira, Petra Nahmias, Adrian Raftery, Hana
Ševčíková, Eric Silverman, Ann Singleton, Vadim Sokolov, Sarah Wise, Teddy
Wilkin and Dominik Zenner. When working on such a multidimensional topic as
migration, having a multitude of perspectives to rely on has been invaluable, and we
are grateful to everyone for sharing their views.
We are also indebted to Jo Russell for very careful proofreading and copyediting
of the draft manuscript, and for helping us achieve greater clarity of the sometimes
complex ideas and arguments. Naturally, all the remaining errors and omissions are
ours alone.
On a private note, the lead author (Jakub Bijak) wishes to thank Kasia, Jurek and
Basia for their support and patience throughout the writing and editing of this book
during the long lockdown days of 2020–2021. Besides, having to explain several
migration processes in a way that would be accessible for a year five primary stu-
dent helped me better understand some of the arguments presented in this book.
Contents
Part I Preliminaries
1 Introduction���������������������������������������������������������������������������������������������� 3
1.1 Why Bayesian Model-Based Approaches for Studying
Migration? �������������������������������������������������������������������������������������� 3
1.2 Aims and Scope of the Book���������������������������������������������������������� 4
1.3 Structure of the Book���������������������������������������������������������������������� 7
1.4 Intended Audience and Different Paths Through the Book������������ 10
2 Uncertainty and Complexity: Towards Model-Based
Demography �������������������������������������������������������������������������������������������� 13
2.1 Uncertainty and Complexity in Demography
and Migration���������������������������������������������������������������������������������� 13
2.2 High Uncertainty and Impact: Why Model
Asylum Migration? ������������������������������������������������������������������������ 15
2.3 Shifting Paradigm: Description, Prediction, Explanation �������������� 18
2.4 Towards Micro-foundations in Migration Modelling �������������������� 20
2.5 Philosophical Foundations: Inductive, Deductive
and Abductive Approaches�������������������������������������������������������������� 22
2.6 Model-Based Demography as a Research Programme������������������ 25
xv
xvi Contents
Glossary������������������������������������������������������������������������������������������������������������ 227
Index������������������������������������������������������������������������������������������������������������������ 257
List of Boxes
xix
List of Figures
xxi
xxii List of Figures
xxv
Part I
Preliminaries
Chapter 1
Introduction
Jakub Bijak
1.1 W
hy Bayesian Model-Based Approaches
for Studying Migration?
This book presents and reflects on the process of developing a simulation model of
international migration route formation, with a population of intelligent, cognitive
agents, their social networks, and policy-making institutions, all interacting with
one another. The overarching aim of this work is to bring new insights into the
1.2 Aims and Scope of the Book 5
This book
- Quantitative + qualitative data
Novelty and - Bayesian experimental design
contribution - Innovative cognitive experiments
- Bespoke modelling language
Fig. 1.1 Position of the proposed approach among formal migration modelling methods. (Source:
own elaboration, based on Bijak (2010: 48))
built inductively, from the bottom up, addressing important epistemological limita-
tions of population sciences.
The book builds upon the foundations laid out in the existing body of work, at the
same time aiming to address the methodological and practical challenges identified
in the recent population and migration modelling literature. Starting from a previous
review of formal models of migration (Bijak, 2010), our proposed approach is spe-
cifically based on the five elements that have not been combined in modelling
before. In particular, the existing micro-level approaches to migration studies,
including microeconomic and sociological explanations, as well as inspirations
from existing agent-based and microsimulation models, are combined here with
macro-level statistical analysis of migration processes and outcomes, with the ulti-
mate aim of informing decisions and policy analysis (see Fig. 1.1).
The novel elements included in this book additionally include combining quali-
tative and quantitative data in the formal modelling process (Polhill et al., 2010),
learning about social mechanisms through Bayesian methods of experimental
design, as well as including experimental information on human decision making
and behaviour. Additionally, we develop further a dedicated programming language,
ML3, to facilitate modelling migration, extending the earlier work in that area
(Warnke et al., 2017). These different themes draw from the existing state of the art
in migration modelling, and enhance it by adding new elements, as summarised in
Fig. 1.1.
From the scientific angle, we aim to advance both the philosophical and practical
aspects of modelling. This is done, first, by applying the concepts and ideas sug-
gested in the contemporary literature to develop a model of migration routes in an
iterative, multi-stage process. Second, these parallel aims are addressed by offering
practical solutions for implementing and furthering the model-based research pro-
gramme in demography (van Bavel & Grow, 2016; Courgeau et al., 2016; Silverman,
2018; Burch, 2018), and in social sciences more broadly (Hedström & Swedberg,
1998; Franck, 2002; Hedström, 2005).
1.3 Structure of the Book 7
The book draws inspiration from a wide literature. From a philosophical per-
spective, key ideas that underpin the theoretical discussions in this book can be
found in Franck (2002), Courgeau (2012), Courgeau et al. (2016), Silverman (2018)
and Burch (2018). The practical aspects of the many desired features of modelling
involved, including the need for modular nature of model construction, were called
for by Gray et al. (2017) and Richiardi (2017), while the need for additional, non-
traditional sources of information, including qualitative and experimental data, was
advocated by Polhill et al. (2010) and Conte et al. (2012), respectively.
At the same time, methods for a statistical analysis of computational experiments
have also been discussed in many important reference works, for example in Santner
et al. (2003). Specific applications of the existing statistical methods of analysing
agent-based models can be found in Ševčíková et al. (2007), Bijak et al. (2013),
Pope and Gimblett (2015) or Grazzini et al. (2017). The use of such methods –
mainly Bayesian – have also been suggested elsewhere in the demographic litera-
ture, for example by Willekens et al. (2017). To that end, we propose a coherent
methodology for embedding the model development process into a wider frame-
work of Bayesian statistics and experimental design, offering a blueprint for an
iterative process of construction and statistical analysis of computational models for
social realms.
We have divided this book into three parts, devoted to: Preliminaries (Part I),
Elements of the modelling process (Part II), and Model results, applications, and
reflections (Part III). This structure enables different readers to focus on specific
areas, depending on interest, without necessarily having to read the more technical
details referring to individual aspects of the modelling process.
Part I lays down the foundations for the presented work. Chapter 2 focuses on the
rationale and philosophical underpinnings of the Bayesian model-based approach.
The discussion starts with general remarks on uncertainty and complexity in demog-
raphy and migration studies. The uncertainty of migration processes is briefly
reviewed, with focus on the ambiguities in the concepts, definitions and imprecise
measurement; simplifications and pitfalls of the attempts at explanation; and on
inherently uncertain predictions. A risk-management typology of international
migration flows is revisited, focusing on asylum migration as the most uncertain and
highest-impact form of mobility. In this context, we discuss the rationale for using
computational models for asylum migration. To address the challenges posed by
such complex and uncertain processes as migration, we seek inspiration in different
philosophical foundations of demographic epistemology: inductive, deductive and
abductive (inference to the best explanation). Against this background, we introduce
a research programme of model-based demography, and evaluate its practical appli-
cability to studying migration.
8 1 Introduction
Part II presents five elements of the proposed modelling process – the building
blocks of Bayesian model-based description and analysis of the emergence of
migration routes. It begins in Chap. 3 with a high-level discussion of the process of
developing agent-based models, starting from general principles, and then moving
focus to the specific example of migration. We review and evaluate existing exam-
ples of agent-based migration models in the light of a discussion of the role of for-
mal modelling in (social) sciences. Next, we discuss the different parts of migration
models, including their spatial dimension, treatment of various sources of uncer-
tainty, human decisions, social interactions and the role of information. The discus-
sion is illustrated by presenting a prototype, theoretical model of migrant route
formation and the role of information exchange, called Routes and Rumours, which
is further developed in subsequent parts of the book, and used as a running example
to illustrate different aspects of the model-building process. The chapter concludes
by identifying the main knowledge gaps in the existing models of migration. This
chapter is accompanied by Appendix A, where the architecture of the Routes and
Rumours model is described in more detail.
Chapter 4 introduces the motivating example for the application of the Routes
and Rumours model – asylum migration from Syria to Europe, linked to the so-
called European asylum crisis of 2015–16. In this chapter, we present the process of
constructing a dedicated knowledge base. The starting point is a discussion of vari-
ous types of quantitative and qualitative data that can be used in formal modelling,
including information on migration concepts, theories, factors, drivers and mecha-
nisms. We also briefly present the case study of Syrian asylum migration.
Subsequently, the data related to the case study are catalogued and formally assessed
by using a common quality framework. We conclude by proposing a blueprint for
including different data types in modelling. The chapter is supplemented by detailed
meta-inventory and quality assessment of data, provided in Appendix B and avail-
able online, on the website of the research project Bayesian Agent-based Population
Studies, underpinning the work presented throughout this book (www.baps-
project.eu).
Chapter 5 is dedicated to presenting the general framework for analysing the
results of computational models of migration. First, we offer a description of the
statistical aspects of the model construction process, starting from a brief tutorial on
uncertainty quantification in complex computational models. The tutorial includes
Bayesian methods of uncertainty quantification; an introduction to experimental
design; the theory of meta-modelling and emulators; methods for uncertainty and
sensitivity analysis, as well as calibration. The general setup for designing and run-
ning computer experiments with agent-based migration models is illustrated by a
running example based on the Routes and Rumours model introduced in Chap. 3.
The accompanying Appendix C contains selected results of the illustrative uncer-
tainty and sensitivity analysis presented in this chapter, as well as a brief overview
of software packages for carrying out the experimental design and model analysis.
The cognitive psychological experiments are discussed in Chap. 6, following the
rationale for making agent-based models more realistic and empirically grounded.
Building on the psychological literature on decision making under uncertainty, the
1.3 Structure of the Book 9
chapter starts with an overview of the design of cognitive experiments. This is fol-
lowed by a presentation of three such experiments, focusing on discrete choice
under uncertainty, elicitation of subjective probabilities and risk, and choice between
leading migration drivers. We conclude the chapter by providing reflections on
including the results of experiments in agent-based models, and the potential of
using immersive interactive experiments in this context. Supplementary material
included in Appendix D contains information on the study protocol and selected
ethical aspects of experimental research and data collection.
Chapter 7, concluding the second part of the book, presents the computational
aspects of the modelling work. We discuss the key features of domain-specific and
general-purpose programming languages, by using an example of languages
recently developed for demographic applications. In particular, the discussion
focuses on modelling, model execution, and running simulation experiments in dif-
ferent languages. The key contributions of this chapter are to present a bespoke
domain-specific language, aimed at combining agent-based modelling with simula-
tion experiments, and formally describing the logical structure of models by using a
concept of provenance modelling. Appendix E includes further information about
the provenance description of the migration simulation models developed through-
out this book, based on the Routes and Rumours template.
Part III offers a reflection on the selected outcomes of the modelling process and
their potential scientific and policy implications. In particular, Chap. 8 is devoted to
discussing the results of applying the model-based analytical template, combining
all the building blocks listed above, and aimed at answering specific substantive
research questions. We therefore follow the model development process, from the
purely theoretical version to a more realistic one, called Risk and Rumours, subse-
quently including additional empirical and experimental data, in the version called
Risk and Rumours with Reality. At the core of this chapter are the results of experi-
ments with different models, and the analysis of their sensitivity and uncertainty.
Subsequently, we reflect on the model-building process and computational imple-
mentation of the models, as well as their key limitations. The chapter concludes by
exploring the remaining (residual) uncertainty in the models, and highlighting areas
for future data collection. The underlying model architecture is an extension of the
Routes and Rumours one, presented in Chap. 3 and Appendix A.
Subsequently, in Chap. 9, we outline the scientific and policy implications of
modelling and its results. First, we discuss perspectives for furthering the model-
based research agenda in social sciences, reflecting on the scientific risk-benefit
trade-offs of the proposed approach. The usefulness of modelling for policy is then
explored through a variety of possible uses, from scenario analysis, to foresight
studies, stress testing and calibration of early warnings. To that end, we also present
several migration scenarios, based on two models introduced in Chap. 8 (Risk and
Rumours, and Risk and Rumours with Reality), aiming to simulate the impacts of
actual policy decisions using an example of a risk-related information campaign.
The chapter concludes with a discussion of the key limitations and practical recom-
mendations for the users of the model-based approach.
10 1 Introduction
The discussion in Chap. 10 focuses on the key role of transparency and replica-
bility in modelling. Starting from a summary of the recent ‘replicability crisis’ in
psychology, and lessons learned from this experience, we offer additional argu-
ments for strengthening the formal documentation of the models constructed,
including through the use of formal provenance modelling. The general implica-
tions for modelling and modellers, as well as for the users of models, are pre-
sented next.
Finally, the simulation results serve as a starting point for a broader reflection on
the potential contribution of simulation-based approaches to migration research and
social sciences generally. In that spirit, Chap. 11 concludes the book by summaris-
ing the theoretical, methodological and practical outcomes of the approach pre-
sented in the book in the light of recent developments in population and migration
studies. We present further potential and limitations of Bayesian model-based
approaches, alongside the lessons learned from implementing the modelling pro-
cess proposed in the book. Key practical implications for migration policy are also
summarised. As concluding thoughts, we discuss ways forward for developing sta-
tistically embedded model-based computational approaches, including an assess-
ment of the viability of the whole model-based research programme.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 2
Uncertainty and Complexity: Towards
Model-Based Demography
Jakub Bijak
2.1 U
ncertainty and Complexity in Demography
and Migration
The past, present, and especially the future size and composition of human popula-
tions are all, to some extent, uncertain. Population dynamics results from the inter-
play between the three main components of population change – mortality, fertility
and migration – which differ with regard to their predictability. Long-term trends
indicate that mortality is typically the most stable and hence the most predictable of
the three demographic components. At the same time, the uncertainty of migration
is the highest, and exhibits the most volatility in the short term (NRC, 2000).
Next to being uncertain, demographic processes are also complex in that they
result from a range of interacting biological and social drivers and factors, acting in
non-linear ways, with human agency – and free will – exercised by the different
actors involved. There are clear links between uncertainty and complexity: for mor-
tality, the biological component is very high; contemporary fertility is a result of a
mix of biological and social factors as well as individual choice; whereas migra-
tion – unlike mortality or fertility – is a process with hardly any biological input, in
which human choice plays a pivotal role. This is one of the main reasons why human
migration belongs to the most uncertain and volatile demographic processes, being
as it is a very complex social phenomenon, with a multitude of underpinning factors
and drivers.
On the whole, uncertainty in migration studies is pervasive (Bijak & Czaika,
2020). Migration is a complex demographic and social process that is not only dif-
ficult to conceptualise and to measure (King, 2002; Poulain et al., 2006), but also –
even more – to explain (Arango, 2000), predict (Bijak, 2010), and control (Castles,
2004). Even at the conceptual level, migration does not have a single definition, and
its conceptual challenges are further exacerbated by the very imprecise instruments,
such as surveys or registers, which are used to measure it.
Historically, attempts to formalise the analysis of migration have been proposed
since at least the seminal work of Ravenstein (1885). Contemporarily, a variety of
alternative approaches co-exist, largely being compartmentalised along disciplinary
boundaries: from neo-classical micro-economics, to sociological observations on
networks and institutions (for a review, see Massey et al., 1993), or macro-level
geographical studies of gravity (Cohen et al., 2008), to ‘mobility transition’
(Zelinsky, 1971) and unifying theories such as migration systems (Mabogunje,
1970; Kritz et al., 1992), or Massey’s (2002) less-known synthesising attempt.
At the same time, the very notions of risk and uncertainty, as well as possible
ways of managing them, are central to contemporary academic debates on migra-
tion (e.g. Williams & Baláž, 2011). Some theories, such as the new economics of
migration (Stark & Bloom, 1985; Stark, 1991) even point to migration as an active
strategy of risk management on the part of the decision-making unit, which in this
case is a household rather than an individual. Similar arguments have been given in
the context of environment-related migration, where mobility is perceived as one of
the possible strategies for adapting to the changing environmental circumstances in
the face of the unknown (Foresight, 2011).
Still, there is general agreement that none of the existing explanations offered for
migration processes are fully satisfactory, and theoretical fragmentation is at least
partially to blame (Arango, 2000). Similarly, given meagre successes of predictive
migration models (Bijak et al., 2019), the contemporary consensus is that the best
that can be achieved with available methods and data is a coherent, well-calibrated
description of uncertainty, rather than the reduction of this uncertainty through addi-
tional knowledge (Bijak, 2010; Azose & Raftery, 2015). Due to ambiguities in
migration concepts and definitions, imprecise measurement, too simplistic attempts
at explanation, as well as inherently uncertain prediction, it appears that the demo-
graphic studies of migration, especially looking at macro-level or micro-level pro-
cesses alone, have reached fundamental epistemological limits.
2.2 High Uncertainty and Impact: Why Model Asylum Migration? 15
Recently, Willekens (2018) reviewed the factors behind the uncertainty of migra-
tion predictions, including the poor state of migration data and theories, additionally
pointing to the existence of many motives for migration, difficulty in delineating
migration versus other types of mobility, and the presence of many actors, whose
interactions shape migration processes. In addition, the intricacies of the legal,
political and security dimensions make international migration processes even more
complex from an analytical point of view.
The existing knowledge gaps in migration research can be partially filled by
explicitly and causally modelling the individuals (agents) and their decision-making
processes in computer simulations (Klabunde & Willekens, 2016; Willekens, 2018).
In particular, as advocated by Gray et al. (2016), the psychological aspects of human
decisions can be based on data from cognitive experiments similar to those carried
out in behavioural economics (Ariely, 2008). Some of the currently missing infor-
mation can be also supplemented by collecting dedicated data on various facets of
migration processes. Given their vast uncertainty, this could be especially important
in the context of asylum migration flows, as discussed later in this chapter.
2.2 H
igh Uncertainty and Impact: Why Model
Asylum Migration?
Among the different types of migration, those related to various forms of involun-
tary mobility, violence-induced migration, including asylum and refugee move-
ments, have the highest uncertainty and the highest potential impact on both the
origin and destination societies (see, e.g. Bijak et al., 2019). Such flows are some of
the most volatile and therefore the least predictable. They are often a rapid response
to very unstable and powerful drivers, notably including armed conflict or environ-
mental disasters, which lead people to leave their homes in a very short period
(Foresight, 2011). Despite the involuntary origins, different types of forced mobil-
ity, including asylum migration, like all migration flows, also prominently feature
human agency at their core: this is well known both from scholarly literature
(Castles, 2004), as well as from journalistic accounts of migrant journeys
(Kingsley, 2016).
As a result, and also because it is difficult to disentangle asylum migration from
other types of mobility precisely, involuntary flows evade attempts at defining them
in precise terms. Of course, many definitions related to specific populations of inter-
est exist, beginning with the UN designation of a refugee, following the 1951
Convention and the 1967 Protocol, as someone who:
“owing to well-founded fear of being persecuted for reasons of race, religion, nationality,
membership of a particular social group or political opinion, is outside the country of his
[sic!] nationality and is unable or, owing to such fear, is unwilling to avail himself of the
protection of that country; or who, not having a nationality and being outside the country of
his former habitual residence as a result of such events, is unable or, owing to such fear, is
unwilling to return to it.” (UNHCR, 1951/1967; Art. 1 A (2))
16 2 Uncertainty and Complexity: Towards Model-Based Demography
On the other hand, the following definition of the International Association for
the Study of Forced Migration (IASFM), characterises forced migrations very
broadly, as:
“Movements of refugees and internally displaced people (displaced by conflicts) as well as
people displaced by natural or environmental disasters, chemical or nuclear disasters, fam-
ine, or development projects” (after Forced Migration Review; https://www.fmreview.org,
as of 1 September 2021).
theoretical approaches and do not share many common insights (FitzGerald, 2015).
Comprehensive theoretical treatment of different types of migration on the
voluntary-forced spectrum is rare; with examples including the important work by
Zolberg (1989).
One pragmatic solution can be to focus on various factors and drivers of migra-
tion, an approach systematised in the classical push-pull framework of Everett Lee
(1966), and since extended by many authors, including Arango (2000), Carling and
Collins (2018), or Van Hear et al. (2018). Specifically in the context of forced migra-
tion, Öberg (1996) mentioned the importance of ‘hard factors’, such as conflict,
famine, persecution or disasters, pushing involuntary migrants out from their places
of residence, and leading to resulting migration flows being less self-selected. A
contemporary review of factors and drivers of asylum-related migration was pub-
lished in the EASO (2016) report, while a range of economic aspects of asylum
were reviewed by Suriyakumaran and Tamura (2016).
In addition, uncertainty of asylum migration measurement includes many idio-
syncratic features, besides those common with other forms of mobility. In particu-
lar, focus on counting administrative events rather than people results in limited
information being available on the context and on migration processes themselves
(Singleton, 2016). As a result, on the one hand, some estimates include duplications
of the records related to the same persons; while on the other hand, some of the
flows are at the same time undercounted due to their clandestine nature (idem).
The politicisation of asylum statistics, and their uses and misuses to fit with any
particular political agenda, are other important reasons for being cautious when
interpreting the numbers of asylum migrants (Bakewell, 1999; Crisp, 1999).
Contemporary attempts to overcome some of the measurement issues are currently
undertaken through increasing use of biometric techniques, such as the EURODAC
system in the European Union (Singleton, 2016), as well as through experimental
work with new data, such as mobile phone records or ‘digital footprints’ of social
media usage (Hughes et al., 2016). This results in a patchwork of sources covering
different aspects of the flows under study, as illustrated in Chap. 4 on the example
of Syrian migration to Europe.
Despite these very high levels of uncertainty, formal quantitative modelling of
various forms of asylum-related migration remains very much needed. Its key uses
are both longer-term policy design, as well as short-term operational planning,
including direct humanitarian responses to crises, provision of food, water, shelter
and basic aid. In this context, decisions under such high levels of uncertainty require
the presence of contingency plans and flexibility, in order to improve resilience of
the migration policies and operational management systems. This perspective, in
turn, requires new analytical approaches, the development of which coincides with
a period of self-reflection on the theoretical state of demography, or broader popula-
tion studies, in the face of uncertainty (Burch, 2018). These developments are there-
fore very much in line with the direction of changes of the main aims of demographic
enquiries over the past decades, which are briefly summarised next.
18 2 Uncertainty and Complexity: Towards Model-Based Demography
empirical area of social sciences, with many policy implications (Morgan & Lynch,
2001), for which computational models can offer attractive analytical tools.
So far, the empirical slant has constituted one of the key strengths of demography
as a discipline of social sciences; however, there is increasing concern about the
lack of theories explaining the population phenomena of interest (Burch, 2003,
2018). This problem is particularly acute in the case of the micro-foundations of
demography being largely disconnected from the macro-level population processes
(Billari, 2015). The quest for micro-foundations, ensuring links across different lev-
els of the problem, thus becomes one of the key theoretical and methodological
challenges of contemporary demography and population sciences.
In order to be realistic and robust, migration (or, more broadly, population) theories
and scenarios need to be grounded in solid micro-foundations. Still, in the uncertain
and messy social reality, especially for processes as complex as migration, the mod-
elling of micro-foundations of human behaviour has its natural limits. In econom-
ics, Frydman and Goldberg (2007) argued that such micro-foundations may merely
involve a qualitative description of tendencies, rather than any quantitative predic-
tions. Besides, even in the best-designed theoretical framework, there is always
some residual, irreducible aleatory uncertainty. Assessing and managing this uncer-
tainty is crucial in all social areas, but especially so in the studies of migration, given
its volatility, impact and political salience (Disney et al., 2015).
In other disciplines, such as in economics, the acknowledgement of the role of
micro-foundations has been present at least since the Lucas critique of macroeco-
nomic models, whereby conscious actions of economic agents invalidate predic-
tions made at the macro (population) level (Lucas, 1976). The related methodological
debate has flourished for over at least four decades (Weintraub, 1977; Frydman &
Goldberg, 2007). The response of economic modelling to the Lucas critique largely
involved building large theoretical models, such as those belonging to the Dynamic
Stochastic General Equilibrium (DSGE) class, which would span different levels of
analysis, micro – individuals – as well as macro – populations (see e.g. Frydman &
Goldberg, 2007 for a broad theoretical discussion, and Barker & Bijak, 2020 for a
specific migration-related overview).
Existing migration studies offer just a few overarching approaches with a poten-
tial to combine the micro and macro-level perspectives: from multi-level models,
that belong to the state of the art in statistical demography (Courgeau, 2007), to
conceptual frameworks that potentially encompass micro-level as well as macro-
level migration factors. The key examples of the latter include the push and pull
migration factors (Lee, 1966), with recent modifications, such as the push-pull-plus
framework (Van Hear et al., 2018), and the value-expectancy model of De Jong and
Fawcett (1981). In the approach that we propose in this book, however, the link
between the different levels of analysis is of statistical and computational nature,
2.4 Towards Micro-foundations in Migration Modelling 21
overview, see e.g. Zaidi et al., 2009; Bélanger & Sabourin, 2017). In migration
research, several examples of constructing agent-based models exist, such as
Kniveton et al. (2011) or Klabunde et al. (2017), with a more detailed survey of such
models offered in Chap. 3.
In general, agent-based models have complex and non-linear structures, which
prohibit a direct analysis of their outcome uncertainty. Promising methods which
could enable indirect analysis include Gaussian process (GP) emulators or meta-
models – statistical models of the underlying computational models (Kennedy &
O’Hagan, 2001; Oakley & O’Hagan, 2002), or the Bayesian melding approach
(Poole & Raftery, 2000), implemented in agent-based transportation simulations
(Ševčíková et al., 2007). In demography, prototype GP emulators have been tested
on agent-based models of marriage and fertility (Bijak et al., 2013; Hilton & Bijak,
2016). A general framework for their implementation is that of (Bayesian) statistical
experimental design (Chaloner & Verdinelli, 1995), with other approaches that can
be used for estimating agent-based models including, for example, Approximate
Bayesian Computations (Grazzini et al., 2017). A detailed discussion, review and
assessment of such methods follows in Chap. 5.
Before embarking on the modelling work, it is worth ensuring that the out-
comes – models – have realistic potential for increasing our knowledge and under-
standing of demographic processes. The discussion about relationship between
modelling and the main tenets of the scientific method remains open. To that end,
we discuss the epistemological foundations of model-based approaches next, with
focus on the question of the origins of knowledge in formal modelling.
2.5 P
hilosophical Foundations: Inductive, Deductive
and Abductive Approaches
There are several different ways of carrying out scientific inference and generating
new knowledge. The deductive reasoning has been developed through millennia,
from classical syllogisms, whereby the conclusions are already logically entailed in
the premises, to the hypothetico-deductive scientific method of Karl Popper
(1935/1959), whereby hypotheses can be falsified by non-conforming data. The
deductive approaches strongly rely on hypotheses, which are dismissed by the pro-
ponents of the inductive approaches due to their arbitrary nature (Courgeau
et al., 2016).
The classical inductive reasoning, in turn, which underpins the philosophical
foundations of the modern scientific method, dates back to Francis Bacon (1620). It
relies on inducing the formal principles governing the processes or phenomena of
interest (Courgeau et al., 2016), at several different levels of explanation. These
principles, in turn, help identify the key functions of the processes or phenomena,
which are required for these processes or phenomena to occur, and to take such form
as they have. The identified functions then guide the observation of the empirical
2.5 Philosophical Foundations: Inductive, Deductive and Abductive Approaches 23
properties, so that in effect, the observed variables describing these properties can
illuminate the functional structures of the processes or phenomena as well as the
functional mechanisms that underpin them.1
When it comes to hypotheses, the main problem seems to be not so much their
existence, but their haphazard and often not properly justified provenance. To help
address this criticism, a third, less-known way of making scientific inference has
been proposed: abduction, also referred to as ‘inference to the best explanation’.
The idea dates back to the work of Charles S. Peirce (1878/2014), an American
philosopher of science working in the second half of the nineteenth century and the
early twentieth century. His new, pragmatic way of making a philosophical argu-
ment can be defined as “inference from the body of data to an explaining hypothe-
sis” (Burks, 1946: 301).
Seen in that way, abduction appears as a first phase in the process of scientific
discovery, with setting up a novel hypothesis (Burks, 1946), whereas deduction
allows subsequently for deriving testable consequences, while modern induction
allows their testing, for example through statistical inference. As an alternative clas-
sification, Lipton (1991) labelled abduction as a separate form of inductive reason-
ing, offering ‘vertical inference’ (idem: 69) from observable data to unobservable
explanations (theory), allowing for the process of discovery. The consequences of
the latter can subsequently follow deductively (idem). Thanks to the construction
and properties of abductive reasoning, this perspective has found significant follow-
ing within the social simulation literature, to the point of equating the methods with
the underpinning epistemology. To that end, Lorenz (2009: 144) explicitly stated
that “simulation model is an abductive process”.
Some interpretations of abductive reasoning stress the pivotal role it plays in the
sequential nature of the scientific method, as the stage where new scientific ideas
come from in a process of creativity. At the core of the abductive process is surprise:
observing a surprising result leads to inferring the hypothesis that could have led to
its emergence. In this way, the (prior) beliefs, confronted by a surprise, lead to doubt
and enable further, creative inference (Burks, 1946; Nubiola, 2005), which in itself
has some conceptual parallels with the mechanism of Bayesian statistical knowl-
edge updating.
There is a philosophical debate as to whether the emergence of model properties
as such is of ontological or epistemological nature. In other words, whether model-
ling can generate new facts, or rather help uncover the patterns through improved
knowledge about the mechanisms and processes (Frank et al., 2009). The latter
interpretation is less restrictive and more pragmatic (idem), and thus seems better
suited for social applications. As an example, in demography, a link between dis-
covery (surprise) and inference (explanation) was recently established and
1
The notion of classical induction is different from the concept of induction as developed for
example by John Stuart Mill, where observables are generalised into conclusions, by eliminating
those that do not aid the understanding of the processes under study, for example in the process of
experimenting (Jacobs 1991). The two types of induction should not be confused. On this point, I
am indebted to Robert Franck and Daniel Courgeau for detailed philosophical explanations.
24 2 Uncertainty and Complexity: Towards Model-Based Demography
formalised by Billari (2015), who argued that the act of discovery typically occurs
at the population (macro) level, but explanation additionally needs to include indi-
vidual (micro)-level foundations.
Abduction, as ‘inference to the best explanation’, is also a very pragmatic way of
carrying out the inferential reasoning (Lipton, 1991/2004). What is meant by the
‘best explanation’ can have different interpretations, though. First, it can be the best
of the candidate explanations of the probable or approximate truth. Second, it can
be subject to an additional condition that the selected hypothesis is satisfactory or
‘good enough’. Third, it can be such an explanation, which is ‘closer to the truth’
than the alternatives (Douven, 2017).
The limitations of all these definitions are chiefly linked to a precise definition of
the criterion for optimality in the first case, satisfactory quality criteria in the sec-
ond, as well as relative quality and the space of candidate explanations in the third.
One important consideration here is the parsimony of explanation – the Ockham’s
razor principle would suggest preferring simple explanations to more complex ones,
as long as they remain satisfactory. Another open question is which of these three
alternative definitions, if any, are actually used in human reasoning (Douven, 2017)?
In any case, a lack of a single and unambiguous answer points out to lack of strict
identifiability of abductive solutions to particular inferential problems: under differ-
ent considerations, many candidate explanations can be admissible, or even opti-
mal. This ambiguity is the price that needs to be paid for creativity and discovery.
As pointed out by Lorenz (2009), abductive reasoning bears the risk of an abductive
fallacy: given that abductive explanations are sufficient, but not necessary, the
choice of a particular methodology or a specific model can be incorrect.
These considerations have been elaborated in detail in the philosophy of science
literature. In his comprehensive treatment of the approach, Lipton (1991/2004) reit-
erated the pragmatic nature of inference to the best explanation, and made a distinc-
tion between two types of reasoning: ‘likeliest’, being the most probable, and
‘loveliest’, offering the most understanding. The former interpretation has clear
links with the probabilistic reasoning (Nubiola, 2005), and in particular, with
Bayes’s theorem (Lipton, 2004; Douven, 2017). This is why abduction and Bayesian
inference can be even seen to be ‘broadly compatible’ (Lipton, 2004: 120), as long
as the elements of the statistical model (priors and likelihoods) are chosen based on
how well they can be thought to explain the phenomena and processes under study.
In relation to the discussion of psychological realism of the models of human rea-
soning and decision making (e.g. Tversky & Kahneman, 1974, 1992), formal
Bayesian reasoning can offer rationality constraints for the heuristics used for
updating beliefs (Lipton, 2004).
There are important implications of these philosophical discussions both for
modelling, as well as for practical and policy applications. To that end, Brenner and
Werker (2009) argued that simulation models built by following the abductive prin-
ciples at least partially have a potential to reduce the error and uncertainty in the
outcome. In particular, looking at the modelled structures of the policy or practical
problem can help safeguard against at least some of the unintended and undesirable
consequences (idem), especially when they can be identified through departures
from rationality.
2.6 Model-Based Demography as a Research Programme 25
In that respect, to help models achieve their full potential, the different philo-
sophical perspectives need to be ideally combined. As deduction on its own relies
on assumptions, induction implies uncertainty, and abduction does not produce
uniquely identifiable results, the three perspectives should be employed jointly,
although even then, uncertainty cannot be expected to disappear (Lipton, 2004;
Brenner & Werker, 2009). These considerations are reflected in the nascent research
programme for model-based demography, the main tenets of which we discuss
in turn.
after: idem). The broad tenets of this approach are followed throughout this book,
and its individual components are presented in Part II.
In the model-based programme, as proposed by Courgeau et al. (2016), the
objective of modelling is to infer the functional structures that generate the observed
social properties. Here, the empirical observables are necessary, but not sufficient
elements in the process of scientific discovery, given that for any set of observables,
there can be a range of non-implausible models generating matching outcomes
(idem). At the same time, as noted by Brenner and Werker (2009), the modelling
process needs to explicitly recognise that the errors in inference are inevitable, but
modellers should aim to reduce them as much as possible.
In what can be seen as a practical solution for implementing a version of the
model-based programme, Brenner and Werker (2009:3.6) advocated four steps of
the modelling process:
(1) Setting up the model based on all available empirical knowledge, starting from a simple vari-
ant, and allowing for free parameters, wherever data are not available (abduction);
(2) Running the model and calibrating it against the empirical data for some chosen outputs,
excluding the implausible ranges of the parameter space (induction, in the modern sense);
(3) On that basis, classifying observations into classes, enabling alignment of theoretical explana-
tions implied by the model structure with empirical observations (another abduction);
(4) Use of the calibrated model for scenario and policy analysis (which per se is a deductive exer-
cise, notwithstanding the abductive interpretation given by Brenner & Werker, 2009).
In this way, the key elements of the model-based programme become explicitly
embedded in a wider framework for model-based policy advice, which makes full
use of three different types of reasoning – inductive, abductive and deductive – at
three different stages of the process. Additionally, the process can implicitly involve
two important checks – verification of consistency of the computer code with the
conceptual model, and validation of the modelling results against the observed
social phenomena (see David, 2009 for a broad discussion).
As a compromise between the ideal, fully inductive model-based programme
advocated by Courgeau et al. (2016) and the above guidance by Brenner and Werker
(2009), we propose a pragmatic variant of the model-based approach, which is sum-
marised in Fig. 2.1. The modelling process starts by defining the specific research
question or policy challenge that needs explaining – the model needs to be specific
to the research aims and domain (Gilbert & Ahrweiler, 2009, see also Chap. 3).
These choices subsequently guide the collection of information on the properties of
the constituent parts of the problem. The model construction then ideally follows
the classical inductive principles, where the functional structure of the problem, the
contributing factors, mechanisms and the conceptual model are inferred. If a fully
inductive approach is not feasible, the abductive reasoning to provide the ‘best
explanation’ of the processes of interest can offer a pragmatic alternative.
Subsequently, the model, once built, is internally verified, implemented and exe-
cuted, and the results are then validated by aligning them with observations. This
step can be seen as a continuation of the inductive process of discovery. The nature
of the contributing functions, structures and mechanisms is unravelled, by identify-
ing those elements of the modelled processes without which those processes would
28 2 Uncertainty and Complexity: Towards Model-Based Demography
Inductive and/or
abductive steps
Deductive steps
Guidance for Computational Conceptual and
Scenario analysis data collection model design, mathematical
and policy advice and further execution and modelling of the
observations analysis structure
Fig. 2.1 Basic elements of the model-based research programme. (Source: own elaboration based
on Courgeau et al., 2016: 43, and Brenner and Werker, 2009)
not occur, or would manifest themselves in a different form. At this stage, the model
can also help identify (deduce) the areas for further data collection, which would
lead to subsequent model refinements. At the same time, also in a deductive manner,
the model generates derived scenarios, which can serve as input to policy advice.
These scenarios can give grounds to new or amended research or policy questions,
at which point the process can be repeated (Fig. 2.1).
Models obtained by applying the above principles can therefore both enable sce-
nario analysis and help predict structural features and outcomes of various policy
scenarios. The model outcomes, in an obvious way, depend on empirical inputs,
with Brenner and Werker (2009) having highlighted some important pragmatic
trade-offs, for example between validity of results and availability of resources,
including research time and empirical data. These pragmatic concerns point to the
need for initiating the modelling process by defining the research problem, then
building a simple model, as a first-order approximation of the reality to guide intu-
ition and further data collection, followed by creating a full descriptive and empiri-
cally grounded version of the model.
At a more general level, modelling can be located on a continuum from general
(nomological) approaches (Hempel, 1962), aimed at uncovering idealised laws,
theories and regularities, to specific, unique and descriptive (ideographic) ones
(Gilbert & Ahrweiler, 2009). The blueprint for modelling proposed in this book
aims to help scan at least a segment of this conceptual spectrum for analysing the
research problem at hand.
In epistemological terms, the guiding principles of the abductive reasoning can
be seen as a pragmatic approximation of a fully inductive process of scientific
enquiry, which is difficult whenever our knowledge about the functions, structures
and mechanisms is limited, incomplete, poor quality, or even completely missing. In
the context of social phenomena, such as migration, these limitations are paramount.
This is why the approach adopted throughout the book sees the classical induction
as the ideal philosophy to underpin model-based enquiries, and the abductive rea-
soning as a possible real-life placeholder for some specific aspects. In this way, we
2.6 Model-Based Demography as a Research Programme 29
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Part II
Elements of the Modelling Process
Chapter 3
Principles and State of the Art
of Agent-Based Migration Modelling
one hand, as well as the development of the global climate in reaction to human
influence on the other.
At this point it is important to note that everyday use of language tends to obscure
what we really do when building a model. We tend to talk about real world systems
in terms of discrete nouns, such as ‘the weather’, ‘this population’, or ‘international
migration’. This has two effects: first, it implies that these are things or objects
rather than observable properties of dynamic, complex processes. Second, it sug-
gests that these phenomena are easy to define with clear borders. This leads to a –
surprisingly widespread – ‘naive theory of modelling’ where we have a ‘thing’ (or
an ‘object’ of modelling) that we can build a canonical, ‘best’ ‘model of’, in the
same way we can draw an image of an object.
In reality, however, for both types of inference described above, how we build
our model is strictly defined by the problem we use it to solve: either by the set of
assumptions and behaviours we attempt to link, or by the specific set of observables
we want to extrapolate. That means that for a given empirical ‘object’ (such as ‘the
weather’), we might build substantially different models depending on what aspect
of that ‘object’ we are actually interested in. In short, which model we build is deter-
mined by the question we ask (Edmonds et al., 2019).
As an illustration, let us assume that we want to model a specific stretch of river.
Things we might possibly be interested in could be – just to pick a few arbitrary
examples – the likelihood of flooding in adjacent areas, sustainable levels of fishing
or the decay rate of industrial chemicals. We could attempt to build a generic river
model that could be used in all three cases, but that would entail vastly more effort
than necessary for each of the single cases. To understand flooding risk, for exam-
ple, population dynamics of the various animal species in the river are irrelevant.
Not only that, building unnecessary complexity into the model is in fact actively
harmful as it introduces more sources of error (Romanowska, 2015). It is therefore
prudent to keep the model as simple as possible. Thus, even though we will in all
three cases build a model ‘of the river’, the overlap between the models will be
limited.
3.1.3 Complications
The main foundational task in modelling therefore consists in defining and delineat-
ing the system. First, the system needs to be defined horizontally – that is, which
part of the world do we consider peripheral and which parts should be part of the
model? Second, it needs also to be specified vertically – which details do we con-
sider important? This can be quite challenging as there is fundamentally no
36 3 Principles and State of the Art of Agent-Based Migration Modelling
straightforward way to determine which processes are relevant for the model output
(Barth et al., 2012; Poile & Safayeni, 2016).
Defining the system can become less of a challenge, as long as we are working
in the context of a proof-of-causality modelling effort, since finding which assump-
tions produce a specific kind of behaviour is precisely the aim of this type of model-
ling. However, as soon as we intend to use our model to extrapolate system
behaviour, trying to include all processes that might affect the dynamics we are
interested in, while leaving out those that only unnecessarily complicate the model,
becomes a difficult task. As a further complication, we are in practice constrained
by various additional factors, such as availability of data, complexity of implemen-
tation, and computational and analytical tractability of the simulation (Silverman,
2018). Even with a clear-cut question in mind, designing a suitable model is there-
fore still as much an art as a science.
Almost all social phenomena – including migration – involve at least two levels of
aggregation. At the macroscopic level of the social aggregate – such as a city, social
group, region, country or population – we can observe conspicuous patterns or regu-
larities: large numbers of people travel on similar routes, a population separates into
distinct political factions, or neighbourhoods in a city are more homogeneous than
expected by chance. The mechanisms producing these patterns, however, lie in the
interactions between the components of these aggregates – usually individuals, but
also groups, institutions, and so on, as well as between the different levels of
aggregation.
In order to understand or predict the aggregate patterns we can therefore try to
analyse regularities in the behaviour of the aggregate (which can be done with some
success, see e.g. Ahmed et al., 2016), or we can try to derive the aggregate behav-
iour from the behaviour of the components. The latter is the guiding principle
behind agent-based modelling/models (ABM): instead of attempting to model the
dynamics of a social group as such, the behaviour of the agents making up the group
and their interactions are modelled. Group-level phenomena are then expected to
emerge naturally from these lower-level mechanisms.
Which modelling paradigm is best suited to a given problem depends to a large
degree on the problem itself; however, a few general observations concerning the
suitability of ABMs for a given problem can be made. If we want to build an explan-
atory model, it is immediately clear that agent-based models are a useful – or in
many cases the only reasonable – approach. Even for predictive modelling, how-
ever, such models have become very popular in the last decades. The advantages
and disadvantages of this method have been discussed at length elsewhere (Bryson
et al., 2007; Lomnicki, 1999; Peck, 2012; Poile & Safayeni, 2016; Silverman,
2018), but to sum up the most important points: agent-based models are computa-
tionally expensive, not easy to implement (well), difficult to parameterise, and are
3.2 Complex Social Phenomena and Agent-Based Models 37
3.2.2 Uncertainty
To make things even more difficult, for most of the research questions relevant to
the migration processes we are unable to exclude that differences as well as interac-
tions between individuals are an essential part of the dynamics we are interested in.
At least as a starting point, this commits us to agent-based modelling as the default
architecture.
In the context of migration modelling, the agent-based methodology presents
two major challenges. First, as mentioned earlier, many of the processes involved in
our target system are not well defined. We therefore have to be careful to take the
uncertainty resulting from this lack of definition into account. This is no easy task
for a simple model, but even less so for a complicated agent-based model. Second,
agent-based models tend to be computationally expensive, which reduces the range
of parameter values that can be tested, and thus ultimately the level of detail of any
results, including through the lens of sensitivity analysis.
38 3 Principles and State of the Art of Agent-Based Migration Modelling
3.3 A
gent-Based Models of Migration: Introducing
the Routes and Rumours Model
For a long time, theoretical migration research has been dominated by statistical or
equation-based flow models in the economic tradition (Greenwood, 2005). However,
the rise of agent-based modelling in the social sciences in the last decades has left
its mark on migration research as well. A full review of migration-related ABM
studies is outside the scope of this book (but see for example Klabunde & Willekens,
2016 or McAlpine et al., 2021). Instead, we present a number of key aspects of
ABMs in general and migration models in particular, and discuss how they have
been approached in the existing literature.
Throughout the book we also present a running example taken from our own
modelling efforts related to a model of migrant route formation linked to informa-
tion spread (Routes and Rumours), different elements of which are described in
successive boxes throughout this book. We attempt to clarify the points made in the
main text by applying them to our example in turn. Insofar as relevant for this chap-
ter, the documentation of the model can be found in Appendix A.
3.3 Agent-Based Models of Migration: Introducing the Routes and Rumours Model 39
A key dimension along which to distinguish existing modelling efforts is the pur-
pose for which the respective models have been built. The majority of ABMs of
migration are built with a concrete real-world scenario in mind, often with a specific
focus on one aspect of the situation: Hailegiorgis et al. (2018) for example aimed to
predict how climate change might affect emigration from rural communities (among
other aspects) in Ethiopia. They used data specific to that situation (including local
geography) for their model. Entwisle et al. (2016) studied the effect of different
climate change scenarios on migration in north Thailand using a very detailed
model that includes data on local weather patterns and agriculture. Frydenlund et al.
(2018) attempted to predict where people displaced by conflict in the Democratic
Republic of Congo will migrate to. Their model, among other features, includes
local geographical and elevation data.
Many of these very concrete models, however, while being calibrated to a spe-
cific situation are meant to provide more general insights. Suleimenova and Groen
(2020), for example, modelled the effect of policy decisions on the number of arriv-
als in refugee camps in South Sudan. Their study was intended to provide direct
support to humanitarian efforts in the area. At the same time, it serves as a showcase
for a new modelling approach that the authors have developed.
A minority of studies eschew data and specific scenarios, and instead focus on
more general theoretical questions. Collins and Frydenlund (2016), for example,
investigated the effect of group formation on the travel speed of refugees using a
purely theoretical model without any relation to specific real-world situations. In a
similar vein, Reichlová (2005) explored the consequences of including safety and
social needs in a migration model. Although her study was explicitly motivated by
real-world phenomena, the model itself and the question behind it are purely
theoretical.
Finally, some models are built without a specific domain question in mind. In
these cases, the authors often explore methodological issues or put their model forth
as a framework to be used by more applied studies down the line (e.g. Groen, 2016;
Lin et al., 2016; Suleimenova et al., 2017). Others simply explore the dynamics aris-
ing from a set of assumptions without further reference to real-world phenomena
(e.g. Silveira et al., 2006, or Hafızoğlu & Sen, 2012).
The research question underpinning the Routes and Rumours model is defined in
Box 3.1.
Fig. 3.1 An example topology of the world in the Routes and Rumours model: Settlements are
depicted with circles, and links with lines, their thickness corresponding to traffic intensity
42 3 Principles and State of the Art of Agent-Based Migration Modelling
space as a proxy for social distance (see Sect. 3.3.2) and defining an individual’s
‘social network’ as all individuals within a specific distance in that space (e.g.
Reichlová, 2005; Silveira et al., 2006). More elaborate models explicitly set up links
between individuals and/or households (Simon, 2019; Smith et al., 2010; Werth &
Moss, 2007), which in some cases are assumed to change over time (e.g. Klabunde,
2011; Barbosa et al., 2013).
The effects that networks are assumed to have on individuals vary and in many
cases more than one effect is built into models. Most commonly, networks directly
affect individuals’ migration decisions either by providing social utility (e.g.
Reichlová, 2005; Silveira et al., 2006; Simon, 2019) or social norms (Smith et al.,
2010; Barbosa et al., 2013). Another common function is the transmission of infor-
mation on the risk or benefits of migration (Barbosa et al., 2013; Klabunde, 2011;
Simon et al., 2018). Direct economic benefits of networks are only taken into
account in a few cases (Klabunde, 2011; Simon, 2019; Werth & Moss, 2007).
Apart from social networks, a few other types of interaction occur in agent-based
models of migration. In some studies, agents make their migration decisions with-
out any direct influence from others but interact with them in other ways, such as
economically (Naivinit et al., 2010; Naqvi & Rehm, 2014) or by learning
(Hailegiorgis et al., 2018), which affects their economic status and thus the likeli-
hood of migrating.
Information and exchange of that information between migrants are the main
processes we assumed to be relevant for the emergence of migration routes, and
consequently had to be a core part of our model. The information dynamics within
the model, as well as the mechanism for the update of agents’ beliefs, are sum-
marised in Box 3.4.
Box 3.4: Information Dynamics and Beliefs Update in the Routes and
Rumours Model
Agents in our model start out knowing very little about the area they are trav-
elling through, but accumulate knowledge either by exploring locally or by
exchanging information with agents they meet or are in contact with. This
information is not only necessarily incomplete most of the time, but may also
not be accurate. Through exchange it is even possible that incorrect informa-
tion spreads in the population.
For each property of the environment – say, risk associated with a transport
link – an agent has an estimate as well as a confidence value. Collecting infor-
mation improves the estimate and increases the confidence. During informa-
tion exchange with other agents, however, confidence can even decrease if
both agents have very different opinions.
Our model of information exchange therefore had to fulfil a number of
conditions: (a) knowledge can be wrong and/or incomplete, (b) knowledge
can be exchanged between individuals, yet, crucially the exchange does not
depend on objective, but only on subjective reliability of the information, and
(c) agents therefore need an estimate of how certain they are that their infor-
mation is correct.
Since existing models of belief dynamics do not fulfil all of these criteria,
we designed a new (sub-) model of information exchange.
Formally, we used a mass action approach to model the interaction between the
certainty t ∈ (0, 1) and doubt d = 1 − t components of two agents’ beliefs. During
interactions we assumed that these components interact independently in a way
that agents can be convinced (doubt transforming to certainty through the interac-
tion with certainty), converted (certainty of one belief is changed to certainty of a
different belief through the interaction with certainty) or confused (certainty is
changed to doubt by interacting with certainty if the beliefs differ sufficiently).
For two agents A and B we calculated difference in belief as
v A − vB
δv = .
v A + vB
d A′ = d A dB + (1 − ci ) d A t B + cuδ v t A t B ,
t A dB v A + ci d A t B vB + t A t B (1 − cuδ v ) ( (1 − ce ) v A + ce vB )
v A′ = ,
(1 − d ) A
′
Julia, a new language developed by a group from MIT (Bezanson et al., 2014),
has recently started to challenge this trade-off. It has been designed with a focus on
technical computing and the express goal of combining the accessibility of a
dynamically typed scripting language like Python or R with the efficiency of a stati-
cally typed language like C++ or Rust. A combination of different techniques is
used to achieve this goal. In order to keep the language easily accessible, it employs
a straightforward syntax (borrowing heavily from MatLab) and dynamic typing
with optional type annotations. Runtime efficiency is accomplished by combining
strong type inference with just-in-time compilation based on the LLVM platform
(Lattner & Adve, 2004). Following a few relatively straightforward guidelines, it is
therefore possible to write code in Julia that is nearly as fast as C, C++ or Fortran
while being substantially simpler and more readable.
Beyond simplicity and efficiency, however, Julia offers additional benefits.
Similar to languages such as R or Python, it comes with interactive execution envi-
ronments, such as a REPL (read-eval-print loop) and a notebook interface that can
greatly speed up prototyping. It also has a powerful macro system built in that has,
for example, been used to enable near-mathematical notation for differential equa-
tions and computer algebra. Some specific notes related to the Julia implementation
are summarised in Box 3.5.
As we can see, ABMs have become firmly established as a method available for
migration modelling. Their application ranges from purely theoretical models to
efforts to predict aspects of migration calibrated to a specific real-world situation. A
variety of different topics have been tackled such as the effects of climate change on
migration via agriculture, the spread of migration experiences through social net-
works, the formation of groups by travelling migrants, or how the local threat of
violence affects numbers of arrivals in refugee camps. Methodologically, these
models vary considerably as well, including for example GIS-based spatial repre-
sentation, decision models based on the theory of planned behaviour, or a spatially
explicit ecological model that predicts agricultural yields.
On the other hand, some notable counter-examples notwithstanding, many mod-
els in this field still tend to be simple, not at all or poorly calibrated, narrow in focus
and littered with ad hoc assumptions. In many cases, this is despite best efforts on
the part of the authors. Not only is agent-based modelling in general a very ‘data
hungry’ method, but in addition – as further discussed in Chap. 4 and in Sect. 3.2 in
this chapter – migration is a phenomenon that is inherently difficult to access
empirically.
While macroscopic data on e.g. number of arrivals, countries of origin or demo-
graphic composition are sometimes reasonably accessible, microscopic data, in par-
ticular on individual decision making, can be nearly impossible to obtain (Klabunde
& Willekens, 2016). Consequently, decision making – arguably the most important
part of a model concerned with an aspect of human behaviour – is in most models
at best calibrated with regression data (but see Simon et al., 2016 for a notable
exception) and often neither calibrated, nor in other ways justified (e.g. Hébert
et al., 2018).
Unfortunately, even calibration or validation against easier to obtain macroscopic
data is not a given. Even some predictive studies restrict themselves to the most
basic forms of validation, for example by simply showing model outcomes next to
real data (e.g. Groen et al., 2020; Lin et al., 2016; Suleimenova & Groen, 2020). For
a purely theoretical model, a lack of empirical reference is not necessarily a cause
for concern. But if it is the express goal of a study to be applicable to a concrete
real-world situation, then a certain effort towards understanding the amount as well
as the causes of uncertainty in the model results should be expected. As some
authors, who go to great lengths to include the available data and to calibrate the
model against it, demonstrate, high-quality modelling efforts do exist (e.g. Naivinit
et al., 2010; Simon et al., 2018; Hailegiorgis et al., 2018).
Another point to note is the relative paucity of theoretical studies attempting to
find general mechanisms – as opposed to generating predictions of a specific situa-
tion – in the tradition of Schelling (1971) or Epstein and Axtell (1996). Of the exist-
ing examples, some stand in the tradition of abstract modelling approaches employed
in physics, so that it is difficult to assess the generality of their results (Hafızoğlu &
Sen, 2012; Silveira et al., 2006). All these issues additionally reinforce the need for
48 3 Principles and State of the Art of Agent-Based Migration Modelling
the model-based research programme, advocated in Chap. 2, going beyond the state
of the art in agent-based modelling, and including other approaches and sources of
empirical information. As argued before, such efforts should be ideally guided by
the principles of classical inductive reasoning.
Generally, however, we can see that formal modelling can open up new areas for
migration studies. Many questions remain untouched, providing promising areas for
future research. On the whole, as argued above, the primary focus of any modelling
exercise should not be aimed at a precise description, explanation or prediction of
migration processes, which is an impossible task, but at identifying gaps in data and
knowledge. Furthermore, for any given migration system, there is no canonical
model. As argued before, the models need to be built for specific purposes, and with
particular research questions in mind. Of course, many such questions still have
direct practical, policy or scientific relevance. Examples of such questions may
include:
• What is the uncertainty of migration across a range of time horizons? What can
be a reasonable horizon for attempts at predicting migration, under a reasonable
description of uncertainty?
• How are the observed flows of migration likely to be formed, who might be
migrating, and who would stay behind? What is the role of historical trends,
migrant networks, or other drivers?
• What drives the emergence of migration routes, policies and political impacts of
migration? Are migration policies only exogenous variables, or are they endog-
enous, driven by migration flows?
• More generally, does migration lead to feedback effects, for example through the
impacts on societies, policies or markets, and how is it mediated by the level of
integration of migrants?
• What are the root causes of migration, and how does migration interact with
other aspects of social life? To what extent are various actors (migrants, institu-
tions, intermediaries…) involved?
• How are migration decisions formed and put into action? Do cognitive compo-
nents dominate, or are emotions highly involved as well? Does it vary between
different migration types?
The specific questions, which can be driven by policy or scientific needs, will
determine the model architecture and data requirements. Next, we discuss a way of
assessing the data requirements of the model through formal analysis.
3.5 Knowledge Gaps in Existing Migration Models 49
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Building a Knowledge Base for the Model
In this chapter, after summarising the key conceptual challenges related to the mea-
surement of asylum migration, we briefly outline the history of recent migration
flows from Syria to Europe. This case study is intended to guide the development of
a model of migration route formation, used throughout this book as an illustration
of the proposed model-based research process. Subsequently, for the case study, we
offer an overview of the available data types, making a distinction between the
sources related to the migration processes, as well as to the context within which
migration occurs. We then propose a framework for assessing different aspects of
data, based on a review of similar approaches suggested in the literature, and this
framework is subsequently applied to a selection of available data sources. The
chapter concludes with specific recommendations for using the different forms of
data in formal modelling, including in the uncertainty assessment.
4.1 K
ey Conceptual Challenges of Measuring Asylum
Migration and Its Drivers
drivers. At the same time, problems with data on asylum migration are manifold and
well documented (see Chap. 2). The aim of the work presented in this chapter is to
collate as much information as possible on the chosen case study for use in the
modelling exercise, and to assess its quality and reliability in a formal way, allowing
for an explicit description of data uncertainty. In this way it can be still possible to
use all available relevant information while taking into account the relative quality
when deciding on the level of importance with which the data should be treated, and
the uncertainty that needs to be reflected in the model.
In this context, it was particularly important to choose a migration case study
with a large enough number of migrants, and with a broad range of available infor-
mation and sources of data on different aspects of the flows. This is especially per-
tinent in order to allow investigation of the different theoretical and methodological
dimensions of the migration processes by formally modelling their properties and
the underlying migrant behaviour. Consequently, knowledge about the different
aspects of data collection and quality of information, and a methodology for reflect-
ing this knowledge in the model, become very important elements of the modelling
endeavour in their own right.
In this chapter, we present an assessment of data related to the recent asylum
migration from Syria to Europe in 2011–19. As mentioned above, we chose the case
study not only due to its humanitarian and policy importance, and the high impact
this migration had both on Syria and on the European societies, but also taking into
account data availability. This chapter is accompanied by Appendix B, which lists
the key sources of data on Syrian migration and its drivers. The listing includes
details on the data types, content and availability, as well as a multidimensional
assessment of their usefulness for migration models, following the framework intro-
duced in this chapter.
Even though one of the central themes of the computational modelling endeav-
ours is to reflect the complexity of migration, the theoretical context of our under-
standing of population flows has traditionally been relatively basic. As mentioned in
Chap. 2, within a vast majority of the existing frameworks, decisions are based on
structural differentials, such as employment rates, resulting in observed overall
migration flows (for reviews, see e.g. Massey et al., 1993; Bijak, 2010). In his clas-
sical work, Lee (1966) aimed to explain the migration process as a weighing up of
factors or ‘drivers’ which influence decisions to migrate, while Zelinsky (1971)
described different features of a ‘mobility transition’, which could be directly
observed. Most of the traditional theories do not reflect the complexity of migration
(Arango, 2000), and typically fail to link the macro- and micro-level features of the
migration processes, which is a key gap that needs addressing through modelling.
More recently, there have been attempts to move the conceptual discussion for-
ward and to bridge some of these gaps. A contemporary ‘push-pull plus’ model
(Van Hear et al., 2018) adds complexity to the original theory of Lee (1966), but
fails to provide a framework that can be operationalised in an applied empirical
context. The ‘capability’ framework of Carling and Schewel (2018) stresses the
importance of individual aspirations and ability to migrate, but again fails to map
the concepts clearly onto the empirical reality. In general, the disconnection between
4.2 Case Study: Syrian Asylum Migration to Europe 2011–19 53
4.2 C
ase Study: Syrian Asylum Migration
to Europe 2011–19
In this section, we look at recent Syrian migration to Europe (2011–19) through the
lens of the available data sources, and propose a unified framework to assess the
different aspects in which the data may be useful for modelling. From a historical
perspective, recent large-scale Syrian migration has a distinct start, following the
widespread protests in 2011 and the outbreak of the civil war. After more than a year
of unrest, in June 2012 the UN declared the Syrian Arab Republic to be in a state of
civil war, which continues at the time of writing, more than nine years later. Whereas
previous levels of Syrian emigration remained relatively low, the nature of the
54 4 Building a Knowledge Base for the Model
conflict, involving multiple armed groups, government forces and external nations,
has resulted in an estimated 6.7 million people fleeing Syria since 2011 and a further
6.1 million internally displaced by the end of 2019, according to the UNHCR (2021,
see also Fig. 4.1). The humanitarian crisis caused by the Syrian conflict, which had
its dramatic peak in 2015–16, has continued throughout the whole decade.
Initial scoping of the modelling work suggests the availability of a wide range of
different types of data that have been collected on the recent Syrian migration into
Europe. In particular, the key UNHCR datasets show the number of Syrians who
were displaced each year, as measured by the number of registered asylum seekers,
refugees and other ‘persons of concern’, and the main destinations of asylum seek-
ers and refugees who have either registered with the UNHCR or applied for asylum.
The information is broken down by basic characteristics, including age and sex and
location of registration, distinguishing people located within refugee camps and
outside.
As shown in Fig. 4.1, neighbouring countries in the region (chiefly Turkey,
Lebanon and Jordan, as well as Iraq and Egypt) feature heavily as countries of
asylum, together with a number of European destinations, in particular, Germany
and Sweden. The scale of the flows, as well as the level of international interest
and media coverage, means that the development of migrant routes and strategies
have often been observed and recorded as they occur. In many cases, the situa-
tion of the Syrian asylum seekers and refugees is also very precarious. By the
UNHCR’s account, by the end of 2017, nearly 460,000 people still lived in
camps, mostly in the region, in need of more ‘durable solutions’, such as safe
repatriation or resettlement. (This number has started to decline, and nearly
halved by mid-2019). A further five million were dispersed across the communi-
ties in the ‘urban, peri-urban and rural areas’ of the host countries (UNHCR,
2021). The demographic structure of the Syrian refugee population generates
challenges in the destination countries with respect to education provision and
labour market participation, with about 53% people of working age (18–59 years),
2% seniors over 60 years, and 45% children and young adults under 18
(UNHCR, 2021).
When it comes to asylum migration journeys to Europe, visible routes and cor-
ridors of Syrian migration emerged, in recent years concentrating on the Eastern
Mediterranean sea crossing between Turkey and Greece, as well as the secondary
land crossings in the Western Balkans, and the Central Mediterranean sea route
between Libya and Italy (Frontex, 2018). By the end of 2017, Syrian asylum
migrants were still the most numerous group – over 20,000 people – among those
apprehended on the external borders of the EU (of whom nearly 14,000 were on the
Eastern Mediterranean sea crossing route). However, these numbers were consider-
ably down from the 2015 peak of nearly 600 thousand apprehensions in total, and
nearly 500,000 in the Eastern Mediterranean (idem, pp. 44–46). These numbers can
be supplemented by other sad statistics: the estimated numbers of fatalities, espe-
cially referring to people who have drowned while attempting to cross the
Mediterranean. The IOM minimum estimates cite over 19,800 drownings in the
14,000,000
12,000,000
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Jordan Germany Iraq
IDPs Refugees and asylum seekers Egypt Sweden Other
Fig. 4.1 Number of Syrian asylum seekers, refugees, and internally displaced persons (IDPs), 2011–19, and the distribution by country in 2019.
(Source: UNHCR, 2021)
55
56 4 Building a Knowledge Base for the Model
period 2014–19, of which 16,300 were in the Central Mediterranean. In about 850
cases, the victims were people who came from the Middle East, a majority pre-
sumed to be Syrian (IOM, 2021). In the same period, the relative risk of drowning
increased to the current rate of around 1.6%, substantially higher (2.4%) for the
Central Mediterranean route (idem).
As concerns the destinations themselves, the asylum policies and recognition
rates (the proportion of asylum applicants who receive positive decisions granting
them refugee status or other form of humanitarian protection) clearly differ across
the destination countries, and also play a role in shaping the asylum data. Still, in the
case of Syrian asylum seekers, these differences across the European Union are not
large. According to the Eurostat data,1 between 2011 and 2019, over 95% decisions
to the applications of Syrian nationals were positive, and these rates were more or
less stable across the EU, with the exception of Hungary (with only 36% positive
decisions, and a relatively very low number of decisions made). It is worth noting
here that administrative data on registrations and decisions have obvious limitations
related to the timeliness of registration of new arrivals and processing of the appli-
cations, sometimes leading to backlogs, which may take months or even years to
clear. Moreover, the EU statistics refer to asylum applications lodged, which refers
to the final step in the multi-stage asylum application process, consisting of a formal
acknowledgement by the relevant authorities that the application is under consider-
ation (European Commission, 2016).
At the same time, besides the official statistics from the registration of Syrian
refugees and asylum seekers by national and international authorities, specific
operational needs and research objectives have led to the emergence of many other
data sources. In this way, in addition to the key official statistics, such as those of
the UNHCR, there exist many disparate information sets, which deal with some
very specific aspects of Syrian migration flows and their drivers. These sources
extend beyond the fact of registration, providing much deeper insights into some
aspects of migration processes and their context. Still, the trade-offs of using such
sources typically include their narrower coverage and lack of representativeness of
the whole refugee and asylum seeker populations. Hence, there is a need for a uni-
fied methodology for assessing the different quality aspects of different data
sources, which we propose and illustrate in the remainder of this chapter. In addi-
tion, we present a more complete survey of these sources in more detail in Appendix
B, current as of May 2021, together with an assessment of their suitability for
modelling.
1
All statistics quoted in this paragraph come from the ‘Asylum and managed migration’ (migr)
domain, table ‘First instance decisions on applications by citizenship, age and sex’ (migr_asydcf-
sta), extracted on 1 February 2021.
4.3 Data Overview: Process and Context 57
In the proposed approach to data collection and use in modelling, we suggest fol-
lowing a two-stage process of data assessment for modelling. The first stage is to
identify all available data relevant to the different elements involved in the decision
making and migration flows being modelled. The second stage is then to introduce
an assessment of uncertainty so that it can be formally taken into account and incor-
porated into the model.
Depending on the purpose and the intended use in different parts of the model,
the data sources can be classified by type; broadly, these can be viewed as providing
either process-related or contextual information. The distinction here is made
between data relating specifically to the migration processes, including the charac-
teristics of migrants themselves, their journey and decisions on the one hand, and
contextual information, which covers the wider situation at the origin, destination
and transit countries, on the other. Relevant data on context can include, for exam-
ple, macro-economic conditions, the policy environment, and the conflict situation
in the country of origin or destination.
In addition, in order to allow the data to be easily accessed and appropriately
utilised in the model, the sources can be further classified depending on the level of
aggregation (macro or micro), as well as paradigm under which they were collected
(quantitative or qualitative). These categories, alongside a description of source type
(for example, registers, surveys, censuses, administrative or operational data, jour-
nalistic accounts, or legal texts) are the key components of meta-information related
to individual data sources, and are useful for comparing similar sources during the
quality assessment.
The conceptual mapping of the different stages of the migration process and their
respective contexts onto a selection of key data sources is presented in Fig. 4.2, with
context influencing the different stages of the process, and the process itself being
simplified into the origin, journey and destination stages. For each of these stages,
several types of sources of information may be typically available, although certain
types (surveys, interviews, ‘new data’ such as information on mobile phone loca-
tions or communication exchange, social media networks, or similar) are likely to
be more associated with some aspects than with others. From this perspective, it is
also worth noting that while the process-related information can be available both at
the macro level (populations, flows, events), or at the micro level (individual
migrants), the contextual data typically refer to the macro scale.
Hence, to follow the template for the model-building process sketched in Chap.
2, the first step in assessing the availability of data for any migration-related model-
ling endeavour is to identify the critical aspects of the model, without which the
processes could not be properly described, and which can be usefully covered by the
existing data sources, with a varying degree of accuracy. Next, we present examples
of such process- and context-related aspects.
58 4 Building a Knowledge Base for the Model
Context
Process
Fig. 4.2 Conceptual relationships between the process and context of migrant journeys and the
corresponding data sources. (Source: own elaboration)
Among the process-related data, describing the various features of migration flows
and migrants, be it for individual actors involved in migration (micro level) or for
the whole populations (macro level), the main types of the information can be par-
ticularly useful for modelling are listed below.
Origin Populations. Information on the origin country population, such as data
from a census or health surveys can be used for benchmarking. Data on age and sex
distributions as well as other social and economic characteristics can be helpful in
identifying specific subpopulations of interest, as well as in allowing for heteroge-
neity in the populations of migrants and stayers.
Journey. Any information available about the specific features of the journey itself
also forms part of the process-related information. This could include data about
durations of the different segments of the trip, or distinct features of the process of
moving, which can be gauged for example from retrospective accounts or surveys,
including qualitative interviews or journalistic accounts. Similarly, information on
intermediaries, smugglers, and so on, as long as it is available and even remotely
reliable, can be a part of the picture of the migrant journeys.
Policies and Institutions. Specifically related to the destination context, but also
extending beyond it, information on various aspect of migration policy and law
enforcement, including visa, asylum and settlement policies in destination and tran-
sit countries, as well as their changes in response to migration, additionally helps
paint a more complete picture of the dynamic legal context of migrant decisions and
of their possible interactions with those of other actors (border agents, policy mak-
ers, and so on).
Route Features. Contextual data on, for example, geographic terrain, networks,
borders, barriers, transport routes and law enforcement can be used to assess differ-
ent and variable levels of friction of distance, which can have long- and short-term
impact on migration decisions and on actual flows (corresponding to intervening
obstacles in Lee’s framework). Here, information on the level of resources that are
required for the journey, including availability of humanitarian aid, or intricacies of
the smuggling market, as well as information on migrant access to resources, can
provide additional insights into the migration routes and trajectories. Resources
typically deplete over time and journey, which again impacts on decisions by deter-
mining the route, destination choice, and so on. This aspect can form a part of the
set of route features mentioned above, or feature as a separate category, depending
on the importance of the resource aspect for the analysis and modelling.
The multidimensionality of migration results in a patchwork of sources of infor-
mation covering different aspects of the flows and the context in which they are
taking place, often involving different populations and varying accuracy of mea-
surement, which can be combined with the help of formal modelling (Willekens,
1994). At the same time, it implies the need for greater rigour and transparency, and
a careful consideration of the data quality and their usefulness for a particular pur-
pose, such as modelling.
Different process and context data are characterised by varying degrees of uncer-
tainty, stemming from different features of the data collection processes, varying
sample sizes, as well as a range of other quality characteristics. The quality of data
itself is a multidimensional concept, which requires adequate formal analysis through
a lens of a common assessment framework adopted for a range of different data
sources that are to be used in the modelling exercise. We discuss methodological and
practical considerations related to the design of such an assessment framework next,
illustrated by an application to the case of recent Syrian migration to Europe.
No perfect data exist, let alone concerning migration processes. The measurement
of asylum migration requires particular care, going beyond the otherwise challeng-
ing measurement of other forms of human mobility (see e.g. Willekens, 1994). As
mentioned in Chap. 2, the most widespread ways to measure asylum migration pro-
cesses involve administrative data on events, which include very limited
4.4 Quality Assessment Framework for Migration Data 61
information about the context (Singleton, 2016). Other, well-known issues with the
statistics involve duplicated records of the same people, for whom multiple events
have been recorded, as well as the presence of undercount due to the clandestine
nature of many asylum-related flows (Vogel & Kovacheva, 2008). The use of asy-
lum statistics for political purposes adds another layer of complexity, and necessi-
tates extra care when interpreting the data (Bakewell, 1999).
More generally, official migration statistics, as with all types of data, are social and
political constructs, which strongly reflect the policy and research priorities prevalent
at the time (for an example, see Bijak & Koryś, 2009). For this reason, the purpose
and mechanisms of data collection also need to be taken into account in the assess-
ment, as different types of information may carry various inherent biases. Given the
potential dangers of relying on any single data source, which may be biased, when
describing migration flows through modelling, multiple sources ideally need to be
used concurrently, and be subject to formal quality assessment, as set out below.
Assessing the quality of sources can allow us to make use of a greater range of
information that may otherwise be discarded. Trustworthiness and transparency of
data are particularly important for a politically sensitive topic of migration against
the backdrop of armed conflict at the origin, and political controversies at the desti-
nation. Official legal texts, especially more recent ones, include references to data
quality – European Regulation 862/2007 on migration and asylum statistics refers
to and includes provisions for quality control and for assessing the “quality, compa-
rability and completeness” of data (Art. 9).2 Similarly, Regulation 763/2008 on
population and housing censuses explicitly lists several quality criteria to be applied
to the assessment of census data: relevance, accuracy, timeliness, accessibility, clar-
ity, comparability, and coherence (Art. 6).3
Existing studies indicate several important aspects in assessing the quality of
data from different sources. A key recent review of survey data specifically targeting
asylum migrants, compiled by Isernia et al. (2018), provides a broad overview, as
well as listing some specific elements to be considered in the data analysis. Surveys
selected for this review highlight definitional issues with identifying the appropriate
target population. Aspiring to clarity in definitional issues is an enduring theme in
migration studies, asylum migration included (Bijak et al., 2017).
There are also several examples of existing academic studies in related areas,
which aim at assessing the quality of sources of information. Specifically in the
2
Regulation (EC) No 862/2007 of the European Parliament and of the Council of 11 July 2007 on
Community statistics on migration and international protection, OJ L 199, 31.7.2007, p. 23–29,
with subsequent amendments.
3
Regulation (EC) No 763/2008 of the European Parliament and of the Council of 9 July 2008 on
population and housing censuses, OJ L 218, 13.8.2008, p. 14–20.
62 4 Building a Knowledge Base for the Model
4.4.2 P
roposed Dimensions of Data Assessment: Example
of Syrian Asylum Migration
The aim and nature of the modelling process imply that, while clarity of definitions
is important, it is also possible to encompass a wider range of information sources
and to assign different relative importance to these sources in the model. Our pro-
posal for a quality assessment framework and uncertainty measures for different
types of data is therefore multidimensional, as set out below. In particular, we pro-
pose six generic criteria for data assessment:
1 . Purpose for data collection and its relevance for modelling
2. Timeliness and frequency of data collection and publication
3. Trustworthiness and absence of biases
4. Sufficient levels of disaggregation
5. Target population and definitions including the population of interest (in our case
study, Syrian asylum migrants)
6. Transparency of the data collection methods
The need to identify the target population precisely is common for all types of
data on migrants, but there are additional quality criteria specific to registers and
survey-based sources. Thus, for register-based information an additional criterion
relates to its completeness, while for surveys, their design, sampling strategy, sam-
ple sizes, and response rates are all aspects that need to be clearly set out in order to
be assessed for rigour and good practice in data collection (Isernia et al., 2018).
In our framework, all criteria are evaluated according to a five-point scale, based
on the traffic lights approach (green, amber, red), but also including half-way cate-
gories (green-amber and amber-red). The specific classification descriptors for
assigning a particular source to a given class across all the criteria are listed in
Table 4.1. Finally, for each source, a summary rating is obtained by averaging over
the existing classes. This meta-information on data quality can be subsequently
used in modelling either by adjusting the raw data, for example when these are
known to be biased, or by reflecting the data uncertainty, when there are reasons to
believe that they are broadly correct, yet imprecise.
4.4 Quality Assessment Framework for Migration Data 63
Table 4.1 Proposed framework for formal assessment of the data sources for modelling the recent
Syrian asylum migration to Europe
The result of applying the seven quality criteria to 28 data sources identified
as potentially relevant to modelling Syrian migration is summarised in Table 4.2
and presented in detail in Appendix B. The listing in the Appendix additionally
64 4 Building a Knowledge Base for the Model
Table 4.2 Summary information on selected data sources related to Syrian migration into Europe
One important consideration when choosing data to aid modelling is that the infor-
mation used needs to be subsidiary to the research or policy questions that will be
answered through models. For example, consider the questions about the journey
(process), such as whether migrants choose the route with the shortest geographic
distance, or is it mitigated by resources, networks and access to information?
Exploring possible answers to this question would require gathering different
4.5 The Uses of Data in Simulation Modelling 65
sources of data, for example around general concepts such as ‘friction’ or ‘resources’,
and would allow the modeller to go far beyond standard geographic measures of
distance or economic measures of capital, respectively.
The arguments presented above lead to three main recommendations regarding
the use of data in the practice of formal modelling.
First, there are no perfect data, so the expectations related to using them need to
be realistic. There may be important trade-offs between different sources in terms of
various evaluation criteria. For this reason, any data assessment has to be multidi-
mensional, as different purposes may imply focus on different desired features of
the data.
Second, any source of uncertainty, ambiguity or other imperfection in the data
has to be formally reflected and propagated into the model. A natural language for
expressing this uncertainty is one of probabilities, such as in the Bayesian statistical
framework.
Third, the context of data collection has to be always borne in mind. Migration
statistics – being to a large extent social and political constructs – are especially
prone to becoming ‘statistical artefacts’ (see e.g. Bijak & Koryś, 2009), being dis-
torted, and sometimes misinterpreted. With that in mind, the use of particular data
needs to be ideally driven by the specific research and policy requirements rather
than mere convenience.
One key extension of the formal evaluation of various data sources is to investi-
gate the importance of the different pieces of knowledge, and to address the chal-
lenge of coherently incorporating the data on both micro- and macro-level processes,
as well as the contextual information, together with their uncertainty assessment, in
a migration model. If that could be successfully achieved, the results of the model-
ling can additionally help identify the future directions of data collection, strength-
ening the evidence base behind asylum migration and helping shape more realistic
policy responses.
A natural formal language for describing the data quality or, in other words, the
different dimensions of the uncertainty of the data sources, is provided by probabil-
ity distributions, which can be easily included in a fully probabilistic (Bayesian)
model for analysis. In the probabilistic description, two key aspects of data quality
come to the fore: bias – by how much the source is over- or under-estimating the
real process – which can be modelled by using the location parameters of the rele-
vant distributions (such as mean, median and so on), and variance – how accurate
the source is – which can be described by scale parameters (such as variance, stan-
dard deviation, precision, etc.). As in the statistical analysis of prediction errors,
there may be important trade-offs between these two aspects: for example, with
sample surveys, increasing the sample size is bound to decrease the variance, but if
the sampling frame is mis-specified, this can come at the expense of an increasing
bias – the estimates will be more precise, but in the wrong place.
Of the eight quality assessment criteria listed in Table 4.1, the first two (purpose
and timeliness) are of a general nature, and – depending on the aim of the modelling
endeavours – can be decisive in terms of whether or not a given source can be used
at all. The remaining ones can be broadly seen either as contributing to the bias of a
source (definitions of the target populations, trustworthiness of data collection, and
66 4 Building a Knowledge Base for the Model
Fig. 4.3 Representing data quality aspects through probability distributions: stylised examples.
(Source: own elaboration)
A natural way to include the uncertainty assessment of the different types of data
sources is then, for the inputs, to feed the data into the model in a probabilistic form
(as probability distributions), and, for the outputs, to include in the model an addi-
tional error term that is intended to capture the difference between the processes
being modelled and their empirical measurements (see Chap. 5). Box 4.1 presents
an illustration related to a set of possible data sources, which may serve to augment
the Routes and Rumours model introduced in Chap. 3 and to develop it further,
together with their key characteristics and overall assessment. More details for these
sources are offered in Appendix B.
Box 4.1: Datasets Potentially Useful for Augmenting the Routes and
Rumours Model
As described in Chap. 3, temporal detail and spatial information are important
for this model in order to understand more about the emergence of migration
routes. We focused on the Central Mediterranean route, utilising data on those
intercepted leaving Libya or Tunisia, losing their lives during the sea crossing,
or being registered upon arrival in Italy. One exception was the retrospective
Flight 2.0 survey, carried out in Germany, which looked into the use of infor-
mation by migrants during their journey. All the data included below are
quantitative, reported at the macro-level (although Flight 2.0 recorded micro-
level survey data), and relate to the migration process. The available data are
listed in Table 4.3 below; for this model monthly totals were used. In addition,
OpenStreetMap (see source S02 in Appendix B) data provides real world geo-
graphic detail. For a general quality assessment of data sources, see Appendix
B, where the more detailed notes for each dataset provide additional relevant
information and give some brief explanation of the reasoning behind particu-
lar quality ratings.
Table 4.3 Selection of data sources which can inform the Routes and Rumours model, with their
key features and quality assessment
Reference in Source Content focus Source and Quality Bias &
Appendix B time detail rating variance
Of course, there are also other methods for dealing with missing, incomplete or
fragmented data, coming from statistics, machine learning and other emerging areas
of broader ‘data science’. The review of such methods remains beyond the scope of
this book, but it suffices to name a few, such as various approaches to imputation,
which have been covered extensively e.g. in Kim and Shao (2014), or data match-
ing, which in machine learning is also referred to as data fusion, also covered by a
broad literature (e.g. Bishop et al., 1975/2007; D’Orazio et al., 2006; Herzog et al.,
2007). A comprehensive recent review of the field was provided by Little and Rubin
(2020). In the migration context, some of these methods, such as micro-level match-
ing, are not very feasible, unless individual-level microdata are available with
enough personal detail to enable the matching. For ethical reasons, this should not
be possible outside of very secure environments under strictly controlled condi-
tions; therefore this may not be the right option for most applied migration research
questions. Better, and more realistic options include reconciliation of macro-
level data through statistical modelling, such as in the Integrated Modelling of
European Migration work (Raymer et al., 2013), producing estimates of migration
flows within Europe with a description of uncertainty. Such estimates can then be
subject to a quality assessment as well, and be included in the models following the
general principles outlined above.
4
Part of the discussion is inspired by a debate panel on migration modelling, held at the workshop
on the uncertainty and complexity of migration, in London on 20–21 November 2018. The discus-
sion, conducted under the Chatham House rule (no individual attribution), covered two main top-
ics: migration knowledge gaps and ways to fill them, and making simulation models useful
for policy. We are grateful to (in alphabetical order) Ann Blake, Nico Keilman, Giampaolo
Lanzieri, Petra Nahmias, Ann Singleton, Teddy Wilkin and Dominik Zenner for sharing their views.
4.6 Towards Better Migration Data: A General Reflection 69
social constructs and the product of their times, and as such, are not politically neu-
tral. These features put the onus on the modellers and users, who need to be aware
of the social and political baggage associated with the data. Besides the need to be
conscious of the context of the data collection, there can be a trap associated with
bringing in too much of the analysts’ and modellers’ own life experience to model-
ling. This, in turn, requires particular attention in the context of modelling of migra-
tion processes that are global in nature, or consider different cultural contexts than
the modellers’ own.
Similar reservations hold from the modelling point of view, especially when
dealing with agent-based models attempting to represent human behaviour. Such
models often imply making very strong value judgements and assumptions, for
example with respect to the objective functions of individual agents, or the con-
straints under which they operate. The values that are reflected in the models need
to be made explicit, also to acknowledge the role of the research stakeholders, for
the sake of transparency and to ensure public trust in the data. It has to be clear who
defines the research problem underlying the modelling, and what their motiva-
tions were.
Another aspect of trust relates to the new forms of data, such as digital traces
from social media or mobile phones, where their analytical potential needs to be
counterbalanced by strong ethical precautions related to ensuring privacy. This is
especially crucial in the context of individual-level data linking, where many differ-
ent sources of data taken together can reveal more about individuals than is justified
by the research needs, or than should be ethically admissible. This also constitutes
a very important challenge for traditional data providers and custodians, such as
national and international statistical offices and other parts of the system of official
statistics, whose future mission can include acting as legal, ethical and method-
ological safeguards of the highest professional standards with respect to migration
data collection, processing, storage and dissemination.
Another important point is that the modelling process, especially if employed in
an iterative manner, as argued in Chap. 2 and throughout this book, can act as an
important pathway towards discovering further gaps in the existing knowledge and
data. This is a more readily attainable aim than a precise description or explanation
of migration processes, not to mention their prediction. Additionally, this is the
place for a continuous dialogue between the modellers and stakeholders, as long as
the underpinning ideas and concepts are well defined, simple, clear and transparent,
and the expectations as to what the data and models can and cannot deliver are
realistic.
To achieve these aims, open communication about the strengths and limitations of
data and models is crucial, which is one of the key arguments behind an explicit
treatment of different aspects of data quality, as discussed above. These features can
help both the data producers and users better navigate the different guises of the
uncertainty and complexity of migration processes, by setting the minimum quality
standards – or even requirements – that should be expected from the data and
70 4 Building a Knowledge Base for the Model
models alike. A prerequisite for that is a high level of statistical and scientific liter-
acy, not only of the users and producers of data and models, but also ideally among
the general public. To that end, while the focus of this chapter is on the limitations
of various sources of data, and what aspects of information they are able to provide,
the next one looks specifically at the ways in which the formal model analysis can
help shed light on information gaps in the model, and also utilise empirical informa-
tion at different stages of the modelling process.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Uncertainty Quantification, Model
Calibration and Sensitivity
model via probability distributions; errors in the computer code; and residual vari-
ability, left after accounting for every other source.
The tools of probability and statistics, and in particular Bayesian statistics, offer
a natural way of describing these different sources of uncertainty, by expressing
every modelled quantity as a random variable with a probability distribution. The
mechanism of Bayesian inference, by which the prior quantities (distributions) are
combined with the likelihood of the data to yield posterior quantities, helps bring
together the different sources of knowledge – data and prior knowledge, the latter
for example elicited from experts in a given domain.
There is a long history of mutual relationships between Bayesian statistics and
social sciences, including demography, dating back to the seminal work of Thomas
Bayes and Pierre-Simon de Laplace in the late eighteenth century (Courgeau, 2012,
see also Foreword to this book). A thorough introduction to Bayesian statistics is
beyond the scope of this book, but more specific details on Bayesian inference and
applications in social sciences can be found in some of the excellent textbooks and
reference works (Lynch, 2007; Gelman et al., 2013; Bryant & Zhang, 2018), while
the use of Bayesian methods in demography was reviewed in Bijak and Bryant (2016).
The Bayesian approach is especially well-suited for carrying out a comprehen-
sive analysis of uncertainty in complex computational models, as it can cover vari-
ous sources and forms of error in a coherent way, from the estimation of the models,
to prediction, and ultimately to offering tools for supporting decision making under
uncertainty. In this way, Bayesian inference offers an explicit, coherent description
of uncertainty at various levels of analysis (parameters, models, decisions), allows
the expert judgement to play an important role, especially given deficiencies of data
(which are commonplace in such areas as migration), and can potentially offer more
realistic assessment of uncertainty than traditional methods (Bijak, 2010).
Uncertainty quantification (UQ) as a research area looking into uncertainty and
inference in large, and possibly analytically intractable, computational models,
spanning statistics, applied mathematics and computing, has seen rapid develop-
ment since the early twenty-first century (O’Hagan, 2013; Smith, 2013; Ghanem
et al., 2019). The two key aspects of UQ include propagating the uncertainty through
the model and learning about model parameters from the data (calibration), with the
ultimate aim of quantifying and ideally reducing the uncertainty of model predic-
tions (idem). The rapid development of UQ as a separate area of research, with
distinct methodology, has been primarily motivated by the increase in the number
and importance of studies involving large-scale computational models, mainly in
physical and engineering applications, from astronomy, to weather and climate,
biology, hydrology, aeronautics, geology and nuclear fusion (Smith, 2013), although
with social science applications lagging behind. A recent overview of UQ was
offered by Smith (2013), and a selection of specific topics were given detailed treat-
ment in the living reference collection of Ghanem et al. (2019). For the reasons
mentioned before, Bayesian methods, with their coherent probabilistic language for
describing all unknowns, offer natural tools for UQ applications.
The main principles of UQ include a comprehensive description of different
sources of uncertainty (error) in computational models of the complex systems
5.2 Preliminaries of Statistical Experimental Design 73
under study, and inference about the properties of these systems on that basis. To do
that, it relies on specific methods from other areas of statistics, mathematics and
computing, which are tailored to the UQ problems. These methods, to a large extent,
rely on the use of meta-models (or emulators, sometimes also referred to as surro-
gate models) to approximate the dynamics of the complex computational models,
and facilitate other uses. Specific methods that have an important place in UQ
include uncertainty analysis, which looks at how uncertainty is propagated through
the model, and sensitivity analysis, which aims to assess which elements of the
model and, in particular, which parameters matter for the model outputs (Oakley &
O’Hagan, 2002). Besides, for models with predictive ambitions, methods for cali-
brating them to the observed data become of crucial importance (Kennedy &
O’Hagan, 2001). We discuss these different groups of methods in more detail in the
remainder of this chapter, starting from a general introduction to the area of statisti-
cal experimental design, which is underpinning the construction and calibration of
meta-models, and therefore provides foundations for many of the UQ tools and their
applications.
The use of tools of statistical experimental design in the analysis of the results of
agent-based models starts from the premise that agent-based models, no matter how
opaque, are indeed experiments. By running the model at different parameter values
and with different settings – that is, experimenting by repeated execution of the
model in silico (Epstein & Axtell, 1996) – we learn about the behaviour of the
model, and hopefully the underlying system, more than would be possible other-
wise. This is especially important given the sometimes very complex, non-
transparent and analytically intractable nature of many computational simulations.
Throughout this chapter, we will define an experiment as a process of measuring
a “stochastic response corresponding to a set of … input variables” (Santner et al.,
2003, p. 2). A computer experiment is a special case, based on a mathematical the-
ory, implemented by using numerical methods with appropriate computer hardware
and software (idem). Potential advantages of computer experiments include their
built-in features, such as replicability, relatively high speed and low cost, as well as
their ability to analyse large-scale complex systems. Whereas the quality standards
of natural experiments are primarily linked to the questions of randomisation (as in
randomised control trials), blocking of similar objects to ensure homogeneity, and
replication of experimental conditions, computer experiments typically rely on
deterministic or stochastic simulations, and require transparency and thorough doc-
umentation as minimum quality standards (idem).
Computer experiments also differ from traditional, largely natural experiments
thanks to their wider applicability, also to social and policy questions, with different
ethical implications than experiments requiring direct human participation. In some
social contexts, other experiments would not be possible or ethical. For example,
74 5 Uncertainty Quantification, Model Calibration and Sensitivity
analysing optimal ways of evacuating people facing immediate danger (such as fire
or flood), very important for tailoring operational response, cannot involve live
experiments in actual dangerous conditions. In such cases, computer experiments
can provide invaluable insights into the underlying processes, possibly coupled with
ethically sound natural experiments carried out in safe conditions, for example on
the ways large groups of people navigate unknown landscapes.
To make the most of the computer experiments, their appropriate planning and
design becomes of key importance. To maximise our information gains from experi-
mentation, which typically comes at a considerable computational cost (as mea-
sured in computing time), we need to know at which parameter values and with
which settings the models need to be run. The modern statistical theory and practice
of experimental design dates back to the agricultural work of Sir Ronald Fisher
(1926), with the methodological foundations fully laid out, for example, in the
much-cited works of Fisher (1935/1958) and Cox (1958/1992). Since then, the
design of experiments has been the subject of many refinements and extensions,
with applications specifically relevant for analysing computer models discussed in
Santner et al. (2003) and Fang et al. (2006), among others.
The key objectives of the statistical design of experiments are to help understand
the relationship between the inputs and the outcome (response), and to maximise
information gain from the experiments – or to minimise the error – under computa-
tional constraints, such as time and cost of conducting the experiments. The addi-
tional objectives may include aiding the analytical aims listed before, such as the
uncertainty or sensitivity analysis, or model-based prediction.
As for the terminology, throughout this chapter we use the following definitions,
based on the established literature conventions. Most of these definitions follow the
conventions presented in the Managing Uncertainty in Complex Models online
compendium (MUCM, 2021).
Model (simulator) “A representation of some real-world system, usually imple-
mented as a computer program” (MUCM, 2021), which is transforming inputs
into outputs;
Factor (input) “A controllable variable of interest” (Fang et al., 2006, p. 4), which
can include model parameters or other characteristics of model specification.
Response (output) A variable representing “specific properties of the real system”
(Fang et al., 2006, p. 4), which are of interest to the analyst. The output is a result
of an individual run (implementation) of a model for a given set of inputs.
Calibration The analytical process of “adjusting the inputs so as to make the simu-
lator predict as closely as possible the actual observation points” (MUCM, 2021);
Calibration parameter “An input which has … a single best value” with respect to
the match between the model output and the data (reality), and can be therefore
used for calibration (MUCM, 2021);
Model discrepancy (inadequacy) The residual difference between the observed
reality and the output calibrated at the best inputs (calibration parameters);
Meta-model (emulator, surrogate) A statistical or mathematical model of the
underlying complex computational model. In this chapter, we will mainly look at
statistical emulators.
5.2 Preliminaries of Statistical Experimental Design 75
Response 1
2
0.8 4
1.5
1
Design points: 0.6 3
Training sample:
2
0.4
0.5
(x, y) 0.2
1
(x, y, f(x,y))
0
0 0.8
0
0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.4
0 0.6
Input 0 0.2 0.4 0.6 0.8 1 0.8 1
0
Fig. 5.1 Concepts of the model discrepancy (left), design (middle) and training sample (right).
For the discrepancy example, the real process (solid line) is f(x) = 1.2 sin(8πx), and the model
(dashed line) is a polynomial of order 6, fitted by using ordinary least squares. The calibration
parameters are then the coefficients of the polynomial, and the model discrepancy is the difference
between the values of the two functions. (Source: own elaboration)
Design “A choice of the set of points in the space of simulator inputs at which the
simulator is run” (MUCM, 2021), and which then serve as the basis for model
analysis;
Training sample Data comprising inputs from the design space, as well as the
related outputs, which are used to build and calibrate an emulator for subsequent
use in the analysis.
The diagrams in Fig. 5.1 illustrate the concepts of model discrepancy, design and
training sample.
There are different types of design spaces, which are briefly presented here fol-
lowing their standard description in the selected reference works (Cox, 1958/1992;
Santner et al., 2003; Fang et al., 2006). To start with, a factorial design is based on
combinations of design points at different levels of various inputs, which in practice
means being a subset of a hyper-grid in the full parameter space, conventionally
with equidistant spacing between the grid points for continuous variables. As a spe-
cial case, the full factorial design includes all combinations of all possible levels
of all inputs, whereas a fractional factorial design can be any subset of the full
design. Due to practical considerations, and the ‘combinatorial explosion’ of the
number of possible design points with the increasing number of parameters, limit-
ing the analysis to a fractional factorial design, for the sake of efficiency, is a prag-
matic necessity.
There are many ways in which fractional factorial designs can be constructed.
One option involves random design, with design points randomly selected from the
full hyper-grid, e.g. by using simple random sampling, or – more efficiently – strati-
fied sampling, with the hyper-grid divided into several strata in order to ensure good
coverage of different parts of the parameter space. An extension of the stratified
design is the Latin Hypercube design – a multidimensional generalisation of a two-
dimensional idea of a Latin Square, where only one item can be sampled from each
row and each column, similarly to a Sudoku puzzle. In the multidimensional case,
only one item can be sampled for each level in every dimension; that is, for every
input (idem).
76 5 Uncertainty Quantification, Model Calibration and Sensitivity
x x x x x x1 x x x 1 x 1
Fig. 5.2 Examples of a full factorial (left), fractional factorial (middle), and a space-filling Latin
Hypercube design (right). (Source: own elaboration)
0 1 -1 -1 -1 -1 -1 -1 1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 0
1 0 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 0
-1 1 0 1 -1 -1 -1 -1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 1 -1 0
-1 -1 1 0 1 -1 -1 -1 -1 1 1 -1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 1 1 0
-1 -1 -1 1 0 1 -1 -1 -1 -1 1 1 -1 1 -1 1 1 -1 1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 1 0
-1 -1 -1 -1 1 0 1 -1 -1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 0
-1 -1 -1 -1 -1 1 0 1 -1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 1 -1 1 1 -1 -1 1 -1 1 -1 0
-1 -1 -1 -1 -1 -1 1 0 1 1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 -1 -1 1 1 -1 -1 1 -1 1 0
1 -1 -1 -1 -1 -1 -1 1 0 -1 1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 1 -1 -1 1 1 -1 -1 1 -1 0
1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 0 1 -1 -1 -1 -1 -1 -1 1 0
-1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 1 0 1 -1 -1 -1 -1 -1 -1 0
1 -1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 0 1 -1 -1 -1 -1 -1 0
1 1 -1 1 -1 1 1 -1 -1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 1 1 -1 -1 1 0 1 -1 -1 -1 -1 0
-1 1 1 -1 1 -1 1 1 -1 1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 1 -1 -1 -1 1 0 1 -1 -1 -1 0
-1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 1 1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 0 1 -1 -1 0
1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 0 1 -1 0
1 1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 1 -1 0 -1 -1 -1 1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 0 1 0
Fig. 5.3 Visualisation of a transposed Definite Screening Design matrix D′ for 17 parameters.
Black squares correspond to high parameter values (+1), white to low ones (−1), and grey to
middle ones (0). (Source: own elaboration)
learning side, these approaches also have common features with support vector
machines (Tipping, 2001). As the ARD and SBL methods are quite involved, we do
not discuss them here in more detail, but a fuller treatment of some of the related
approaches can be found, for example, in Neal (1996).
5.3 A
nalysis of Experiments: Response Surfaces
and Meta-Modelling
There are several ways in which the results of complex computational experiments
can be analysed. The two main types of analysis, linking to different research objec-
tives, include explanation of the behaviour of the systems being modelled, as well
as the prediction of this behaviour outside of the set of observed data points. In this
chapter, broadly following the framework of Kennedy and O’Hagan (2001), we
look specifically at four types of explanations:
• Response of the model output to changes in inputs, both descriptive and
model-based.
• Sensitivity analysis, aimed at identifying the inputs which influence the changes
in output.
• Uncertainty analysis, describing the output uncertainty induced by the uncer-
tain inputs.
• Calibration, aimed at identifying a combination of inputs, for which the model
fits the observed data best, by optimising a set of calibration parameters (see
Sect. 5.2).
Notably, Kleijnen (1995) argued that these types of analysis (or equivalent ones)
also serve an internal modelling purpose, which is model validation, here under-
stood as ensuring “a satisfactory range of accuracy consistent with the intended
application of the model” (Sargent, 2013: 12). This is an additional model quality
requirement beyond a pure code verification, which is aimed at ensuring that “the
computer program of the computerized model and its implementation are correct”
(idem). In other words, carrying out different types of explanatory analysis, ideally
together, helps validate the model internally – in terms of inputs and outputs – as
well as externally, in relation to the data. Different aspects of model validation are
reviewed in a comprehensive paper by Sargent (2013).
At the same time, throughout this book we interpret prediction as a type of analy-
sis involving both interpolation between the observed sample points, as well as
extrapolation beyond the domain delimited by the training sample. Extrapolation
comes with obvious caveats related to going beyond the range of training data, espe-
cially in a multidimensional input space. Predictions can also serve the purpose of
model validation, both out-of-sample, by assessing model errors on new data points,
outside of the training sample, as well as in-sample (cross-validation), on the same
80 5 Uncertainty Quantification, Model Calibration and Sensitivity
0.8
5
4
0.6
3
2 0.4
1
1 0.2
0 0.8
0 0.6
0.2
0.4 0.4
0.6 0.2 0
0.8 0 0.2 0.4 0.6 0.8 1
1
0
Fig. 5.4 Examples of piecewise-linear response surfaces: a 3D graph (left) and contour plot
(right). (Source: own elaboration)
1
It is worth noting that, according to Cressie (1990), similar methods were independently proposed
already in the 1940s by Herman Wold, Andrey Nikolaevich Kolmogorov and Norbert Wiener.
5.3 Analysis of Experiments: Response Surfaces and Meta-Modelling 81
Kennedy and O’Hagan (2001) and Oakley and O’Hagan (2002), presenting the con-
struction and estimation of Bayesian GP emulators.
The basic description of the GP emulation approach, presented here after
Kennedy and O’Hagan (2001, 431–434), is as follows. Let the (multidimensional)
model inputs x from the input (parameter) space X, x ∈ X, be mapped onto a one-
dimensional output y ∈ Y, by the means of a function f, such that y = f(x). The func-
tion f follows a GP distribution, if “for every n = 1, 2, 3, …, the joint distribution of
f(x1), …, f(xn) is multivariate normal for all x1, …, xn ∈ X” (idem: 432). This distri-
bution has a mean m, typically operationalised as a linear regression function of
inputs or their transformations h(⋅), such that m(x) = h(x)’ β, with some regression
hyperparameters β. The GP covariance function includes a common variance term
across all inputs, σ 2, as well as a non-negative definite correlation matrix between
inputs, c(⋅,⋅). The GP model can be therefore formally written as:
f , 2 , R MVN m ; 2 c , (5.1)
The correlation matrix c(⋅,⋅) can be parameterised, for example, based on the
distances between the input points, with a common choice of c(x1, x2) =
c(x1 – x2) = exp(−(x1 – x2)’ R (x1 – x2)), with a roughness matrix R = diag(r1, …, rn),
indicating the strength of response of the emulator to particular inputs. To reflect the
uncertainty of the computer code, the matrix c(⋅,⋅) can additionally include a sepa-
rate variance term, called a nugget. Kennedy and O’Hagan (2001) discuss in more
detail different options of model parameterisation, choices of priors for model
parameters, as well as the derivation of the joint posterior, which then serves to cali-
brate the model given the data. We come back to some of these properties in Sect.
5.5, devoted to model calibration.
In addition to the basic approach presented above, many extensions and generali-
sations have been developed as well. One such extension concerns GP meta-models
with heteroskedastic covariance matrices, allowing emulator variance to differ
across the parameter space. This is especially important in the presence of phase
transitions in the model domain, whereby model behaviour can be different, depend-
ing on the parameter combinations. This property can be modelled for example by
fitting two GPs at the same time: one for the mean, and one for the (log) variance of
the output of interest. Examples of such models can be found in Kersting et al.
(2007) and Hilton (2017), while the underpinning design principles are discussed in
more detail in Tack et al. (2002).
Another extension concerns multidimensional outputs, where we need to look at
several output variables at the same time, but cannot assume independence between
them. Among the ideas that were proposed to tackle that, there are natural generali-
sations, such as the use of multivariate emulators, notably multivariate GPs (e.g.
Fricker et al., 2013). Alternative approaches include dimensionality reduction of the
output, for example through carrying out the Principal Components Analysis (PCA),
producing orthogonal transformations of the initial output, or Independent
Component Analysis (ICA), producing statistically independent transformations
82 5 Uncertainty Quantification, Model Calibration and Sensitivity
Box 5.2: Gaussian Process Emulator Construction for the Routes and
Rumours Model
The design space with seven parameters of interest, described in Box 5.1 was
used to train and fit a set of four GP emulators, one for each output. The emu-
lation was done twice, assuming that the parameters are either uniformly or
normally distributed. The emulators for all four output variables (mean_freq_
plan, stdd_link_c, corr_opt_links and prop_stdd) additionally included code
uncertainty, described by the ‘nugget’ variance term. The fitting was done in
GEM-SA (Kennedy & Petropoulos, 2016). In terms of the quality of fit, the
root mean square standardised errors (RMSSE) were found to be in the range
between 1.59 for mean_freq_plan and 1.95 for stdd_link_c, based on a
leave-20%-out cross-validation exercise, which, compared with the ideal
value of 1, indicated a reasonable fit quality. Figure 5.5 shows an example
analysis of a response surface and its error for one selected output, mean_
freq_plan, and two inputs, p_transfer_info and p_info_contacts, based on the
fitted emulator. Similar figures for the other outputs are included in Appendix
C. For this piece of analysis, all the input and output variables have been
standardised.
5.3 Analysis of Experiments: Response Surfaces and Meta-Modelling 83
Fig. 5.5 Estimated response surface of the proportion of time the agents follow a plan vs two input
parameters, probabilities of information transfer and of communication with contacts: mean pro-
portion (top) and its standard deviation (bottom). (Source: own elaboration)
84 5 Uncertainty Quantification, Model Calibration and Sensitivity
Once fitted, emulators can serve a range of analytical purposes. The most immediate
ones consider the impact of various model inputs on the output (response). Questions
concerning the uncertainty of the output and its susceptibility to the changes in
inputs are common. To address these questions, uncertainty analysis looks at how
much error gets propagated from the model inputs into the output, and sensitivity
analysis deals with how changes in individual inputs and their different combina-
tions affect the response variable.
Of the two types of analysis, uncertainty analysis is more straightforward, espe-
cially when it is based on a fitted emulator such as a GP (5.1), or another meta-
model. Here, establishing the output uncertainty typically requires simulating from
the assumed distributions for the inputs and from posterior distributions of the emu-
lator parameters, which then get propagated into the output, allowing a Monte
Carlo-type assessment of the resulting uncertainty. For simpler models, it may be
also possible to derive the output uncertainty distributions analytically.
On the other hand, the sensitivity analysis involves several options, which need
to be considered by the analyst to ascertain the relative influence of input variables.
Specifically for agent-based models, ten Broeke et al. (2016) discussed three lines
of enquiry, to which sensitivity analysis can contribute. These include insights into
mechanisms generating the emergent properties of models, robustness of these
insights, and quantification of the output uncertainty depending on the model inputs
(ten Broeke et al., 2016: 2.1).
Sensitivity analysis can also come in many guises. Depending on the subset of
the parameter space under study, one can distinguish local and global sensitivity
analysis. Intuitively, the local sensitivity analysis looks at the changes of the
response surfaces in the neighbourhoods of specific points in the input space, while
the global analysis examines the reactions of the output across the whole space (as
long as an appropriate, ideally space-filling design is selected). Furthermore, sensi-
tivity analysis can be either descriptive or variance-based, and either model-free or
model-based, the latter involving approaches based on regression and other meta-
models, such as GP emulators.
The descriptive approaches to evaluating output sensitivity typically involve
graphical methods: the visual assessment (‘eyeballing’) of response surface plots
(such as in Fig. 5.4), correlations and scatterplots can provide first insights into the
responsiveness of the output to changes in individual inputs. In addition, some of
the simple descriptive methods can be also model-based, for example those using
standardised regression coefficients (Saltelli et al., 2000, 2008). This approach
relies on estimating a linear regression model of an output variable y based on all
standardised inputs, zij = (xij – xi)/σi, where xi and σi are the mean and standard devia-
tion of the ith input calculated for all design points j. Having estimated a regression
5.4 Uncertainty and Sensitivity Analysis 85
model on the whole design space Z = {(zij, yj)}, we can subsequently compare the
absolute values of the estimated coefficients to infer about the relative influence of
their corresponding inputs on the model output.
Variance-based approaches, in turn, aim at assessing how much of the output
variance is due to the variation in individual inputs and their combinations. Here
again, both model-free and model-based approaches exist, which differ in terms of
whether the variance decomposition is analysed directly, based on model inputs and
outputs, or whether it is based on some meta-model that is fitted to the data first. As
observed by Ginot et al. (2006), one of the simplest, although seldom used methods
here is the analysis of variance (ANOVA), coupled with the factorial design. Here,
as in the classical ANOVA approach, the overall sum of squared differences between
individual outputs and their mean value can be decomposed into the sums of squares
related to all individual effects (inputs), plus a residual sum of squares (Ginot et al.,
2006). This approach offers a quick approximation of the relative importance of the
various inputs.
The state-of-the-art approaches, however, are typically based on the decomposi-
tion of variance and on so-called Sobol’ indices. Both in model-free and model-
based approaches, the template for the analysis is the same. Formally, let overall
output variance in a model with K inputs be denoted by V = Var[f(x)]. Let us then
define the sensitivity variances for individual inputs i and all their multi-way com-
binations, denoted by Vi, Vij, …, V12…K. These sensitivity variances measure by how
much the overall variance V would reduce if we observed particular sets of inputs,
xi, {xi, xj} … {x1, x2 … xK}, respectively. Formally, the sensitivity variances can be
defined as VS = V – E{Var[f(x)|xS = x*S]}, where S denotes any non-empty set of
individual inputs and their combinations. The overall variance V can then be addi-
tively decomposed into terms corresponding to the inputs and their respective com-
binations (e.g. Saltelli et al., 2000: 381):
approaches, which can be embedded within the Bayesian decision analysis, cou-
pling the estimates with loss functions related to specific outputs (idem).
In addition to the methods for global sensitivity analysis, local methods may
include evaluating partial derivatives of the output function f(⋅) – or its emulator – in
the interesting areas of the parameter space (Oakley & O’Hagan, 2004). In practice,
this is often done by the means of a ‘one-factor-at-a-time’ method, where one of the
model inputs is varied, while others are kept fixed (ten Broeke et al., 2016). This
approach can help identify the type and shape of one-way relationships (idem). In
terms of a comprehensive treatment of the various aspects of sensitivity analysis, a
detailed overview and discussion can be found in Saltelli et al. (2008), while a fully
probabilistic treatment, involving Bayesian GP emulators, can be found in Oakley
and O’Hagan (2004). In the context of agent-based models, ten Broeke et al. (2016)
have provided additional discussion and interpretations, while applications to demo-
graphic simulations can be found for example in Bijak et al. (2013) and Silverman
et al. (2013).
To illustrate some of the key concepts, the example of the model of migration
routes is continued in Box 5.3 (with further details in Appendix C). This example
summarises results of the uncertainty and global variance-based sensitivity analysis,
based on the fitted GP emulators.
Box 5.3: Uncertainty and Sensitivity of the Routes and Rumours Model
In terms of the uncertainty of the emulators presented in Box 5.2, the fitted
variance of the GPs for standardised outputs, representing the uncertainty
induced by the input variables and the intrinsic randomness (nugget) of the
stochastic model code, ranged from 1.14 for mean_freq_plan, to 1.50 for
stdd_link_c, to 1.65 corr_opt_links. The nugget terms were respectively equal
0.009, 0.020 and 0.019. For the cross-replicate output variable, prop_stdd, the
variances were visibly higher, with 4.15 overall and 0.23 attributed to the
code error.
As for the sensitivity analysis, for all four outputs the parameters related to
information exchange proved most relevant, especially the probability of
exchanging information through communication, as well as the information
error – a finding that was largely independent of the priors assumed for the
parameters (Fig. 5.6). In neither case did parameters related to exploration
matter much.
5.5 Bayesian Methods for Model Calibration 87
10%
0%
mean_freq_plan stdd_link_c corr_opt_links prop_stdd
Fig. 5.6 Variance-based sensitivity analysis: variance proportions associated with individual vari-
ables and their interactions, under different priors. (Source: own elaboration)
Emulators, such as the GPs introduced in Sect. 5.3, can serve as tools for calibrating
the underlying complex models. There are many ways in which this objective can
be achieved. Given that the emulators can be built and fitted by using Bayesian
methods, a natural option for calibration is to utilise full Bayesian inference about
the distributions of inputs and outputs based on data (Kennedy & O’Hagan 2001;
Oakley & O’Hagan, 2002; MUCM, 2021). Specifically in the context of agent-
based models, various statistical methods and aspects of model analysis are also
reviewed in Banks and Norton (2014) and Heard et al. (2015).
88 5 Uncertainty Quantification, Model Calibration and Sensitivity
The fully Bayesian approach proposed by Kennedy and O’Hagan (2001) focuses
on learning about the calibration parameters θ of the model or, for complex models,
its emulator, based on data. Such parameters are given prior assumptions, which are
subsequently updated based on observed data to yield calibrated posterior distribu-
tions. However, as mentioned in Sect. 5.3, even at the calibrated values of the input
parameters, model discrepancy – a difference between the model outcomes and obser-
vations – remains, and needs to be formally acknowledged too. Hence, the general
version of the calibration model for the underlying computational model (or meta-
model) f based on the training sample x and the corresponding observed data z(x), has
the following form (Kennedy & O’Hagan, 2001: 435; notation after Hilton, 2017):
z x f x, x x . (5.3)
In this model, δ(x) represents the discrepancy term, ε(x) is the residual observa-
tion error, and ρ is the scaling constant. GPs are the conventional choices of priors
both for f(x, θ) and δ(x). For the latter term, the informative priors for the relevant
parameters typically need to be elicited from domain experts in a subjective
Bayesian fashion, to avoid problems with the non-identifiability of both GPs (idem).
The calibrated model (5.3) can be subsequently used for prediction, and also for
carrying out additional uncertainty and sensitivity checks, as described before.
Existing applications to agent-based models of demographic or other social pro-
cesses are scarce, with the notable exception of the analysis of a demographic micro-
simulation model of population dynamics in the United Kingdom, presented by
Hilton (2017), and, more recently, an analysis of ecological demographic models, as
well as epidemiological ‘compartment’ models discussed by Hooten et al. (2021).
Emulator-based and other more involved statistical approaches are especially
applicable wherever the models are too complex and their parameter spaces have
too many dimensions to be treated, for example, by using simple Monte Carlo algo-
rithms. In such cases, besides GPs or other similar emulators, several other
approaches can be used as alternative or complementary to the fully Bayesian infer-
ence. We briefly discuss these next. Detailed explanations of these methods are
beyond the scope of this chapter, but can be explored further in the references (see
also Hooten et al., 2020 for a high-level overview, with a slightly different emphasis).
• Approximate Bayesian Computation (ABC). This method relies on sampling
from the prior distributions for the parameters of a complex model, comparing the
resulting model outputs with actual data, and rejecting those samples for which the
difference between the outputs and the data exceeds a pre-defined threshold. As
the method does not involve evaluating the likelihood function, it can be computa-
tionally less costly than alternative approaches, although it can very quickly
become inefficient in many-dimensional parameter spaces. The theory underpin-
ning this approach dates to Tavaré et al. (1997), with more recent overviews offered
in Marin et al. (2012) and Sisson et al. (2018). Applications to calibrating agent-
based models in the ecological context were discussed by van der Vaart et al. (2015).
• Bayes linear methods, and history matching. In this approach, the emulator is
specified in terms of the two first moments (mean and covariance function) of the
5.5 Bayesian Methods for Model Calibration 89
output function, and a simplified (linear) Bayesian updating is used to derive the
expected posterior moments given the model inputs and outputs from the train-
ing sample, under the squared error loss (Vernon et al., 2010). Once built, the
emulator is fitted to the observed empirical data by comparing them with the
model outputs by using measures of implausibility, in an iterative process known
as history matching (idem). For many practical applications, especially those
involving highly-dimensional parameter spaces, the history matching approach
is computationally more efficient than the fully Bayesian approach of Kennedy
and O’Hagan (2001), although at the expense of providing an approximate solu-
tion (for more detailed arguments, see e.g. the discussion of Vernon et al., 2010,
or Hilton, 2017). Examples of applying these methods to agent-based approaches
include a model of HIV epidemics by Andrianakis et al. (2015), as well as mod-
els of a demographic simulation and fertility developments in response to labour
market changes (the so-called Easterlin effect) by Hilton (2017).
• Bayesian melding. This approach ‘melds’ two types of prior distributions for the
model output variable: ‘pre-model’, set for individual model inputs and param-
eters and propagated into the output, and ‘post-model’, set directly at the level of
the output. The two resulting prior distributions for the output are weighted (lin-
early or logarithmically) by being assigned weights a and (1–a), respectively,
and the posterior distribution is calculated based on such a weighted prior. The
underpinning theory was proposed by Raftery et al. (1995) and Poole and Raftery
(2000). In a recent extension, Yang and Gua (2019) proposed treating the pooling
parameter a as another hyper-parameter of the model, which is also subject to
estimation through the means of Bayesian inference. An example of an applica-
tion of Bayesian melding to an agent-based modelling of transportation can be
found in Ševčíková et al. (2007).
• Polynomial chaos. This method, originally stemming from applied mathematics
(see O’Hagan, 2013), uses polynomial approximations to model the mapping
between model inputs and outputs. In other words, the output is modelled as a
function of inputs by using a series of polynomials with individual and mixed
terms, up to a specified degree. The method was explained in more detail from
the point of view of uncertainty quantification in O’Hagan (2013), where it was
also compared with GP-based emulators. The conclusion of the comparison was
that, albeit computationally promising, polynomial chaos does not (yet) account
for all different sources of uncertainty, which calls for closer communication
between the applied mathematics and statistics/uncertainty quantification com-
munities. A relevant example, using polynomial chaos in an agent-based model
of a fire evacuation, was offered by Xie et al. (2014).
• Recursive Bayesian approach. This method, designed by Hooten et al. (2019,
2020), aims to make full use of the natural Bayesian mechanism for sequential
updating in the context of time series or similar processes, whereby the posterior
distributions of the parameters of interest are updated one observation at a time.
The approach relies on a recursive partition of the posterior for the whole series
into a sequence of sub-series of different lengths (Hooten et al. 2020), which can
be computed iteratively. The computational details and the choice of appropriate
sampling algorithms were discussed in more detail in Hooten et al. (2019).
90 5 Uncertainty Quantification, Model Calibration and Sensitivity
Fig. 5.7 Calibrated posterior distributions for Routes and Rumours model parameters
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 6
The Boundaries of Cognition and Decision
Making
This chapter outlines the role that individual-level empirical evidence gathered from
psychological experiments and surveys can play in informing agent-based models,
and the model-based approach more broadly. To begin with, we provide an over-
view of the way that this empirical evidence can be used to inform agent-based
models. Additionally, we provide three detailed exemplars that outline the develop-
ment and implementation of experiments conducted to inform an agent-based model
of asylum migration, as well as how such data can be used. There is also an extended
discussion of important considerations and potential limitations when conducting
laboratory or online experiments and surveys, followed by a brief introduction to
exciting new developments in experimental methodology, such as gamification and
virtual reality, that have the potential to address some of these limitations and open
the door to promising and potentially very fruitful new avenues of research.
6.1 T
he Role of Individual-Level Empirical Evidence
in Agent-Based Models
Agents are the key feature that distinguish agent-based models from other forms of
micro-simulation. Specifically, within agent-based models, agents can interact with
one another in dynamic and non-deterministic ways, allowing macro-level patterns
and properties to emerge from the micro-level characteristics and interactions within
the model. This key feature of agent-based models means that insights into indi-
vidual behaviour from psychology and behavioural economics, such as behaviours,
personalities, judgements, and decisions, are even more crucial than for other mod-
elling efforts. Within this chapter, we provide an outline as to why it is important to
incorporate insights from the study of human behaviour within agent-based models,
and give examples of the processes that can be used to do this. As in other chapters
within this book, agent-based models of migration are used as an exemplar,
however, the information and processes described are applicable to a wide swathe
of agent-based models.
Traditionally, many modelling efforts, including agent-based models of demo-
graphic processes, have relied on normative models of behaviour, such as expected
utility theory, and have assumed that agents behave rationally. However, descriptive
models of behaviour, commonly used within psychology and behavioural econom-
ics, provide an alternative approach with a focus on behaviour, judgements, and
decisions observed using experimental and observational methods. There are many
important trade-offs to consider when deciding which approaches to use for an
agent-based model and which level of specificity or detail to use. For example, nor-
mative models may be more likely to be tractable and already formalised, which
gives some key advantages (Jager, 2017). In contrast, many social scientific theories
based on observations from areas such as psychology, sociology, and political sci-
ence may provide much more detailed and nuanced descriptions of how people
behave, but are also more likely to be specified using verbal language that is not
easily formalised. Therefore, to convert these social science theories from verbal
descriptions of empirical results into a form that can be formalised within an agent-
based model requires the modeller to make assumptions (Sawyer, 2004). For exam-
ple, there may be a clear empirical relationship between two variables but the
specific causal mechanism that underlies this relationship may not be well estab-
lished or formalised (Jager, 2017). Similarly, there may be additional variables
within an agent-based model that were not incorporated in the initial theory or
included in the empirical data. In situations such as these, it often falls to the indi-
vidual modeller(s) to make assumptions about how to formalise the theory, provide
formalised causal mechanisms, and extend the theory to incorporate any additional
variables and their potential interactions and impacts.
When it comes to agent-based models of migration, the extent to which empirical
insights from the social sciences are used to add complexity and depth to the agents
varies greatly (e.g., see Klabunde & Willekens, 2016 for a review of decision mak-
ing in agent-based models of migration). Additionally, because migration is a com-
plex process that has wide-ranging impacts, there are many options and areas in
which additional psychological realism can be added to agent-based models. For
example, the personality of the agent is likely to play a role and may be incorporated
through giving each agent a propensity for risk taking. Previous research has shown
that increased tolerance to risk is associated with a greater propensity to migrate
(Akgüç et al., 2016; Dustmann et al., 2017; Gibson & McKenzie, 2011; Jaeger
et al., 2010; Williams & Baláž, 2014), and therefore incorporating this psychologi-
cal aspect within an agent-based model may allow for unique insights to be drawn
(e.g., how different levels of heterogeneity in risk tolerance influence the patterns
formed, or whether risk tolerance matters more in some migration contexts than
others). Additionally, the influence of social networks on migration has been well
established (Haug, 2008) so this is also a key area where there may be benefits to
adding realism to an agent-based model (Klabunde & Willekens, 2016; Gray et al.,
2017). A review of existing models and empirical studies of decision making in the
context of migration is offered by Czaika et al. (2021).
6.2 Prospect Theory and Discrete Choice 95
the way that they ‘should’ behave based on normative models of optimal behaviour.
However, research within psychology and behavioural economics has called many
of these assumptions into question. The most famous example of this is prospect
theory, developed by Kahneman and Tversky (1979) and subsequently updated to
become cumulative prospect theory (Tversky & Kahneman, 1992). Based on empir-
ical data, prospect theory proposes that people deviate from the optimal or rational
approaches because of biases in the way that they translate information from the
objective real-world situation to their subjective internal representations of the
world. This has clear implications for how people subsequently make judgements
and decisions. Some of the specific empirical findings related to judgement and
decision making that are incorporated within prospect theory include loss aversion,
overweighting/underweighting of probabilities, differential responses to risk (risk
seeking for losses and risk aversion for gains), and framing effects.
Prospect theory was also a useful first area in which to conduct experiments to
inform agent-based models of migration because, unlike many other theories of
judgement and decision making based on empirical findings, it is already formalised
and can therefore be implemented more easily within models. Indeed, in previous
work, de Castro et al. (2016) applied prospect theory to agent-based models of
financial markets, contrasting these models with agent-based models in which
agents behaved according to expected utility theory. De Castro et al. (2016) found
that simulations in which agent behaviour was based on prospect theory were a bet-
ter match to real historical market data than when agent behaviour was based on
expected utility theory. Although the bulk of research on prospect theory has focused
on financial contexts (for reviews see Barberis, 2013; Wakker, 2010), there is also
growing experimental evidence that prospect theory is applicable to other contexts.
For example, support for the theory has been found when outcomes of risky deci-
sions are measured in time (Abdellaoui & Kemel, 2014) or related to health such as
the number of lives saved (Kemel & Paraschiv, 2018), life years (Attema et al.,
2013), and quality of life (Attema et al., 2016).
Czaika (2014) applied prospect theory to migration patterns at a macro-level,
finding that the patterns of intra-European migration into Germany were consistent
with several aspects of prospect theory, such as reference dependence, loss aversion,
and diminished sensitivity. However, because this analysis did not collect micro-
level data from individual migrants, it is necessary to assume that the macro-level
patterns observed occur (at least partially) due to individual migrants behaving in a
way that is consistent with prospect theory. This is a very strong assumption, which
risks falling into the trap of the ecological fallacy. At the same time, however, there
are also a variety of studies that have examined risk preferences of both economic
migrants (Akgüç et al., 2016; Jaeger et al., 2010) and migrants seeking asylum
(Ceriani & Verme, 2018; Mironova et al., 2019), and can therefore provide data
about some individual level behaviour, judgments and decisions to inform agent-
based models of migration. Bocquého et al. (2018) extended this line of research
further, using the parametric method of Tanaka et al. (2010) to elicit utility functions
from asylum seekers in Luxembourg, finding that the data supported prospect
6.2 Prospect Theory and Discrete Choice 97
theory over expected utility theory. However, these previous studies examining risk
and the application of prospect theory to migration still used standard financial
tasks, rather than collecting data within a migration context specifically.
Based on the broad base of existing empirical support, we decided to apply pros-
pect theory to our agent-based models of migration and therefore designed a dedi-
cated experiment to elicit prospect theory parameters within a migration context.
There are a variety of potential approaches that can be used to elicit prospect theory
parameters (potential issues due to divergent experimental approaches are discussed
in Sect. 6.4). To avoid making a priori assumptions about the shape of the utility
function, we chose to use a non-parametric methodology adapted from Abdellaoui
et al. (2016; methodology presented in Table 6.1). Participants made a series of
choices between two gambles within a financial and a migration context. For each
choice, both gambles presented a potential gain or loss in monthly income (50%
chance of gaining and 50% chance of losing income; see Fig. 6.1 for an example
trial). Using this methodology, we elicited six points of the utility function for gains
and six points for losses. We then analysed the elicited utility functions for financial
and migration decisions to test for loss aversion, whether there was evidence of
concavity for gains and/or convexity for losses, and whether there were differences
between the migration and financial contexts (see Appendix D for more details on
the preregistration of the hypotheses, sample sizes, and ethical issues).
There are many ways that the results from these experiments can be used to
inform agent-based models of migration. The first and perhaps simplest way is to
add loss aversion to the model. Because the data collected were within the context
of relative changes in gains and losses for potential destination countries, these
results can be used within the model to create a distribution of population level loss
aversion, from which each agent is assigned an individual level of loss aversion (to
allow for variation across agents). Therefore, rather than making assumptions about
the extent of loss aversion present within a migration context, instead, each agent
within the model would weight potential losses more heavily than potential gains,
following the empirical findings from the experiment in a migration context.
Similarly, after fitting a function to the elicited points for gains and losses, it is pos-
sible to again use this information to inform the shape of the utility functions that
are given to agents within the model. That is, the data can be used to inform the
extent to which agents place less weight on potential gains and losses as they get
further from the reference point (usually implemented as either the current status
quo or the currently expected outcome). For example, the empirical data inform us
whether people consider a gain of $200 in income to be twice as good as a gain of
$100, or only one and a half times as good when they are making a decision.
An additional advantage of including the financial context within the same
experiment is that it allows for direct comparisons between that context and a migra-
tion context. Therefore, because there is a wide body of existing research on deci-
sion making within financial contexts, if the results are similar across conditions
then that may provide some supporting evidence that this body of research can be
relied on when applied to migration contexts. Conversely, if the results reveal that
98 6 The Boundaries of Cognition and Decision Making
4 x1 p ~ x0 p l ℒ
10 𝒢(p) x1 ~ g p x0 𝒢
11
𝒢(p) x2 ~ g p x1 x2−
13
𝒢(p) x4 ~ g p x3 x4−
Notes: elicitation procedure taken from Abdellaoui et al. (2016) with some prespecified values
altered. The step column shows the order in which values are elicited from participants. The elici-
tation equation shows the structure used for each elicitation. The value elicited column shows the
value that is being elicited at that step. Elicited values were initially set so that both gambles had
equivalent utility. The prespecified values column shows the values within the elicitation equations
that are prespecified rather than being elicited. The size of the prespecified values were chosen to
be approximately equidistant in terms of utility rather than in terms of raw values. Therefore, there
is a larger gap between the medium and large stakes than between the medium and small stakes to
account for diminishing sensitivity for values further from the reference point. x0 = reference point,
x1+ through x6+ = the six points of the utility function elicited for gains, x1− through x6− = the six
points of the utility function elicited for losses, p = probability of outcomes, G = a prespecified
(large) gain, L = an elicited loss equivalent to G in terms of utility, l = a prespecified loss, ℒ = an
elicited loss, g = a prespecified (small) gain, 𝒢 = an elicited gain. The tilde (~) denotes approxi-
mate equivalence or indifference between the two alternative options
there are differences between the contexts, then it highlights that modellers should
show caution when applying financial insights to other contexts. The presence of
differences between contexts would highlight the need to collect additional data
within the specific context of interest, rather than relying on assumptions, formali-
sations, or parameter estimates developed in a different context.
6.3 Eliciting Subjective Probabilities 99
Fig. 6.1 An example of the second gain elicitation ( x2+ ) within a migration context and with
+
medium stakes. As shown in panel A, x2 is initially set so that both gambles have equivalent util-
+
ity. The value of x2 is then adjusted in panels B to F depending on the choices made, eliciting the
value of x2+ that leads to indifference between the two gambles. (Source: own elaboration in
Qualtrics)
The key questions for the second set of psychological experiments emerged from
the initial agent-based models presented in Chap. 3 and analysed in Chap. 5. These
models highlighted the important role that information sharing and communication
between agents can play in influencing the formation and reinforcement of migra-
tion routes. Because these aspects played a key role in influencing the results pro-
duced by the models, (as indicated by the preliminary sensitivity analysis of the
influence of the individual model inputs on a range of outputs, see Chap. 5), it
became clear that we needed to gather more information about the processes
involved to ensure the model was empirically grounded.
100 6 The Boundaries of Cognition and Decision Making
informed by previous research conducted in the Flight 2.0/Flucht 2.0 research proj-
ect on the media sources used by asylum seekers before, during, and after their
journeys from their country of origin to Germany (Emmer et al., 2016; see also
Chap. 4 and Appendix B). The specific sources that were chosen for inclusion in the
experiment were: a news article, a family member, an official organisation, someone
with relevant personal experience, and the travel organiser (i.e., the person organis-
ing the boat trip). Additionally, we randomised the verbal likelihood that was com-
municated by each source to be one of the following: very likely, likely, unlikely, or
very unlikely (one verbal likelihood presented per source). For example, a partici-
pant may read that a family member says a migration boat journey across the sea is
likely to be safe, that an official organisation says the trip is unlikely to be safe, that
someone with relevant personal experience says it is very unlikely to be safe, and so
on (see Fig. 6.2 for an example).
Fig. 6.2 Vignette for the migration context (panel A), followed by the screening question to ensure
participants paid attention (panel B) and an example of the elicitation exercise, in which partici-
pants answer questions based on information from a news article (panels C to F). (Source: own
elaboration in Qualtrics)
102 6 The Boundaries of Cognition and Decision Making
After seeing each piece of information, participants judged the likelihood of trav-
elling safely (0–100) and made a binary decision to travel (yes/no). Additionally,
they indicated how confident they were in their likelihood judgement, and whether
they would share the information and their likelihood judgement with another trav-
eller. Participants also made overall judgements of the likelihood of travelling safely
and hypothetical travel decisions based on all the pieces of information, and indi-
cated their confidence in their overall likelihood judgement, and whether they would
share their overall likelihood judgement. At the end of the experiment, participants
indicated how much they trusted the five sources in general, as well as whether they
had ever seriously considered or made plans to migrate to a new country, and
whether they had previously migrated to a new country (again, see Appendix D for
details on the preregistration, sample sizes, and ethical issues).
Conducting this experiment provided a rich array of data that can be used to
inform an agent-based model of asylum seeker migration. For example, it becomes
relatively straightforward to assign numerical judgements about safety to informa-
tion that agents receive within an agent-based model because data has been col-
lected on how people (experiment participants) interpret phrases such as ‘the boat
journey across the sea is likely to be safe’. It is also possible to see whether these
interpretations vary depending on the source of the information, such as whether
‘likely to be safe’ should be interpreted differently by an agent within the model
depending on whether the information comes from a family member or an official
organisation. Additionally, because we collected overall ratings it is possible to
examine how people combine and integrate information from multiple sources to
form overall judgements. This information can be used within an agent-based model
to assign relative weights to different information sources, such as weighting an
official organisation as 50% more influential than a news article, a family member
as 30% less influential than someone with relevant personal experience, and so on.
To more explicitly illustrate this, the data collected in this experiment were used
to inform the model presented in Chap. 8. Specifically, because for each piece of
information participants received they provided both a numerical likelihood of
safety rating and a binary yes/no decision regarding whether they would travel, it
was possible to calculate the decision threshold at which people become willing to
travel, as well as how changes in the likelihood of safety ratings influence the prob-
ability that someone will decide to travel. We could then use these results to inform
parameters within the model that specify how changes in an agent’s internal repre-
sentation of the safety of travelling translate into changes in the probability of them
making specific travel decisions.
In the third round of experiments, conjoint analysis is used to elicit the relative
weightings of a variety of migration drivers. Specifically, the focus is on character-
istics of potential destination countries and analysing which of these characteristics
6.4 Conjoint Analysis of Migration Drivers 103
have the strongest influence on people’s choices between destinations. The impetus
for this experimental focus again came from some key questions within both the
model and the migration literature more broadly. In relation to the model, this line
of experimental inquiry arose because the model uses a graphical representation of
space that the agents attempt to migrate across towards several potential end cities
(end points), with numerous paths and cities present along the way.
In the initial implementations of the Routes and Rumours model, there was no
differentiation between the available end points. That is, the agents within the model
simply wanted to reach any of the available end cities/points and did not have any
preference for some specific end cities over others. This modelling implementation
choice was made to get the model operational and to provide results regarding the
importance of communication between agents and agent exploration of the paths/
cities. However, to enhance the realism of the agent-based model and make it more
directly applicable to the real-world scenarios that we would like to model, it
became clear that it was important for the end cities to vary in their characteristics
and the extent to which agents desire to reach them. Therefore, it was important to
gather empirical data about the characteristics of potential end destinations for
migration as well as how people weight the different characteristics of these desti-
nations and make trade-offs when choosing to migrate.
Previous research has examined the various factors that influence the desirability
of migration destination countries (Carling & Collins, 2018). Recently, a taxonomy
of migration drivers has been developed, made up of nine dimensions of drivers and
24 individual driving factors that fit within these nine dimensions (Czaika &
Reinprecht, 2020). The nine dimensions identified were: demographic, economic,
environmental, human development, individual, politico-institutional, security,
socio-cultural, and supra-national. The breadth of areas covered by these dimen-
sions helps to emphasise the large array of characteristics that may influence the
choices migrants make about the destination countries of interest.
Research using an experimental approach has also previously been used to exam-
ine the importance of a variety of migration drivers, in Baláž et al. (2016) and Baláž
and Williams (2018). Both these studies examined how participants searched for
information related to wages, living costs, climate, crime rate, life satisfaction,
health, freedom and security, and similarity of language (Baláž et al., 2016), as well
as the unemployment rate, attitudes towards immigrants, and whether a permit is
needed to work in the country (Baláž & Williams, 2018). Additionally, in both stud-
ies participants were asked about their previous experience with migration so that
results could be compared between migrants and non-migrants. The results of these
studies showed that, consistent with many existing neo-classical approaches to
migration studies (Borjas, 1989; Harris & Todaro, 1970; Sjaastad, 1962; Todaro,
1969), participants were most likely to request information on economic factors and
also weighted these factors the most strongly in their decisions. Specifically, wages
and cost of living were the most requested pieces of information and had the highest
decision weights. However, they also found that participants with previous migra-
tion experience placed more emphasis on non-economic factors, being more likely
to request information about life satisfaction and to give more weight to life
104 6 The Boundaries of Cognition and Decision Making
satisfaction when making their decisions. This suggests that non-economic factors
can also play an important role in migration, and that experience of migration may
make people more likely to consider and place emphasis on these non-economic
factors.
Building on the questions derived from the agent-based model and this previous
literature, we decided to conduct an experiment informing the conjoint analysis of
the weightings of a variety of migration drivers. Specifically, the approach taken
was to examine the existing literature to identify the key characteristics of destina-
tion countries that are present and may be relevant for the destination countries
within our model. Therefore, we examined the migration drivers included in the
previous experimental work (Baláž et al., 2016; Baláž & Williams, 2018) as well as
the taxonomy of driver dimensions and individual driver factors (Czaika &
Reinprecht, 2020) along with a broader literature review to come up with a long-
form list of migration drivers that could potentially be included. Then, through dis-
cussions with colleagues and experts within the area of migration studies,1 we
reduced the list down to focus in on the key drivers of interest, while also ensuring
the specific drivers chosen provide at least partial coverage across the full breadth
of the driver dimensions identified by Czaika and Reinprecht (2020). Specifically,
the country-level migration drivers chosen for inclusion were: average wage level,
employment level, number of migrants from the country of origin already present,
cultural and linguistic links with the country of origin, climate and safety from
extreme weather events, openness of migration policies, personal safety and politi-
cal stability, education and training opportunities, income equality and standard of
living, and public infrastructure and services (e.g., health).
Having identified the key drivers for inclusion, the approach used to examine this
specific question was an experiment using a conjoint analysis design (Hainmueller
et al., 2014, 2015). In a conjoint analysis experiment, participants are presented
with a series of trials, each of which presents alternatives that contain information
on a number of key attributes (in this case, migration drivers). This approach allows
researchers to gain information about the causal role of a number of attributes within
a single experiment, rather than conducting multiple experiments or one excessively
long experiment that examines the role of each individual attribute one at a time
(Hainmueller et al., 2014). Additionally, because all of the attributes are presented
together on each trial, it is possible to establish the weightings of each attribute rela-
tive to the other presented attributes. That is, a conjoint analysis design allows the
analyst to establish not only whether wages have an effect, but how strong that
effect is relative to other drivers such as employment level or education and training
opportunities. An example of the implementation of the conjoint analysis experi-
ment is presented in Fig. 6.3.
Another benefit of the conjoint analysis approach is that because weightings are
revealed at least somewhat implicitly (rather than in designs that explicitly ask
participants about the weightings or importance they place on specific attributes),
Fig. 6.3 Example of a single trial in the conjoint analysis experiment (panel A) and the questions
participants answer for each trial (panel B). (Source: own elaboration in Qualtrics)
106 6 The Boundaries of Cognition and Decision Making
and because multiple attributes are presented at the same time, participants may be
less influenced by social desirability because they can use any of the attributes pres-
ent to justify their decision. This is supported by a study by Hainmueller et al.
(2015) who found that a paired conjoint analysis design did best at matching the
relative weightings of attributes for decisions on applications for citizenship in
Switzerland when these weightings were compared to a real-world benchmark (the
actual results of referendums on citizenship applications). For these reasons, within
the present study we also ask participants to explicitly state how much they weight
each variable, allowing for greater understanding of how well people’s stated and
revealed preferences align with each other. This comparison between implicit and
explicit weightings is also expected to reveal the extent to which people are aware
of, and able or willing to communicate the relative value they place on the country
attributes that motivate them to choose one destination country over another.
The results from this conjoint analysis experiment can be used to inform the
agent-based model by collecting empirical data on the relative weightings of vari-
ous migration drivers. Additionally, because the experimental data are collected at
an individual level, it is also possible to observe to what extent these weightings are
heterogenous between individuals (e.g., whether some individuals place more
emphasis on safety while others care more about economic opportunities). These
relative weightings can then be combined with real-world data on actual migration
destination countries or cities to calculate ‘desirability’ scores for potential migra-
tion destinations within the model, either at an aggregate level or, if considerable
heterogeneity is present, by calculating individual desirability scores for each agent
to properly reflect the differences in relative weightings found in the empirical data.
The model can then be rerun with migration destinations that vary in terms of desir-
ability to examine what effects this has on aspects such as agent behaviour, route
formation, and total number of agents arriving at each destination.
6.5 D
esign, Implementation, and Limitations
of Psychological Experiments for Agent-Based Models
addressed we also discuss the limitations of the experimental approaches used (and
many psychological experiments more broadly) and suggest ways to overcome these
limitations.
When designing a psychological experiment it is important to consider the
potential for confounds to influence the outcome (Kovera, 2010). Confounding
occurs when there are multiple aspects that vary across experimental conditions,
meaning that it is not possible to infer whether the changes seen are due to the
intended experimental manipulation, or occur because of another aspect that differs
between the conditions. For example, in the experiment discussed in Sect. 6.3, we
were interested in the influence of information source on the judgements and deci-
sions that were made. Therefore, we included information from sources such as a
news article, an official organisation, and a family member. However, we ensured
that the actual information provided to participants was kept consistent regardless of
the source (e.g., ‘the migrant sea route is unlikely to be safe’) rather than varying the
information across the source formats, such as by presenting a full news article
when the source was a news article or a short piece of dialogue when the source was
a family member. To examine the role of source, it was crucial that the actual infor-
mation provided was kept consistent because otherwise it would be impossible to
tell whether differences found were due to changes in the source or because of
another characteristic such as the length or format of the information provided.
However, the drawback in choosing to keep the information presented identical
across sources is that the stimuli used are less representative of their real-world
counterparts (i.e., the news articles used in the study are less similar to real-world
news articles), highlighting that gaining additional experimental control to limit
potential confounds can come at the cost of decreasing external validity.
Another key issue to consider is the importance of measurement (for a detailed
review see Flake & Fried, 2020). Although a full discussion and evaluation is
beyond the scope of the current chapter, some aspects of measurement related issues
are made particularly clear through the experiment described in Sect. 6.2. Within
this study, we wanted to elicit parameters related to prospect theory. However, pre-
vious research by Bauermeister et al. (2018) found that, relevant for prospect theory,
the estimates of risk attitudes and probability weightings for the same participants
depended on the specific elicitation methodology used. Specifically, Bauermeister
et al. compared the methodology from Tanaka et al. (2010) and Wakker and Deneffe
(1996), and found that the elicited estimates for participants were more risk averse
when the former approach was used, whereas they were more biased in their prob-
ability weightings when the latter method was applied (with greater underweighting
of high probabilities and overweighting of low probabilities). This raises serious
concerns around the robustness of findings, because it suggests that the estimates of
prospect theory parameters gathered may be conditional on the experimental meth-
odology used and therefore these estimates are incredibly difficult to generalise and
apply to an agent-based model. We attempted to address these issues by using the
non-parametric methodology of Abdellaoui et al. (2016), since it requires fewer
assumptions than many other elicitation methods. However, the findings of
Bauermeister et al. (2018) still highlight the extent to which the results of studies
108 6 The Boundaries of Cognition and Decision Making
can be highly conditional on the specific methodology and context in which the
study takes place, and therefore may be difficult to generalise.
Issues with the typical samples used within psychology and other social sciences
have been well documented for many years now (Henrich et al., 2010). Specifically,
it has long been pointed out that the populations used for social science research are
much more Western, Educated, Industrialised, Rich, and Democratic (WEIRD) than
the actual human population of the Earth (Henrich et al., 2010; Rad et al., 2018).
This bias means that much of the data within the social sciences literature that can
be used to inform agent-based models may not be applicable whenever the social
process or system being modelled is not itself comprised solely of WEIRD agents.
Even though this issue has been known about for quite some time, there has not yet
been much of a shift within the literature to address it. Arnett (2008) found that
between 2003 and 2007, 96% of the participants of experiments reported in top
psychology journals were from WEIRD samples.
More recently, Rad et al. (2018) found that 95% of the participants of the experi-
ments published in Psychological Science between 2014 and 2017 were from
WEIRD samples, suggesting that even though a decade had passed, there had been
little change in the extent to which non-WEIRD populations are underrepresented
within the psychological literature. Despite their being relatively little research con-
ducted with non-WEIRD samples, that research has produced considerable evi-
dence that there are cultural differences across many areas of human psychology
and behaviour, such as visual perception, morality, mating preferences, reasoning,
biases, and economic preferences (for reviews see Apicella et al., 2020; Henrich
et al., 2010). Of particular relevance for the experiments discussed in the previous
sections, Falk et al. (2018) found that economic preferences vary considerably
between countries and Rieger et al. (2017) found that, although descriptively, the
results from nearly all of the 53 countries they surveyed were consistent with pros-
pect theory, the estimates for the parameters of cumulative prospect theory differed
considerably between countries. Therefore, if there is a desire to use results from the
broader literature or from a specific study to inform an agent-based model, then it is
important for researchers to ensure that the participants included within their studies
are representative of the population(s) of interest, rather than continuing to sample
almost entirely from WEIRD populations and countries.
The issue of the extent to which findings from experimental contexts can be gen-
eralised to the real-world has also received considerable attention across a wide
range of fields (Highhouse, 2007; Mintz et al., 2006; Polit & Beck, 2010; Simons
et al., 2017). As highlighted by Highhouse (2007), many critiques of experimental
methodology place an unnecessarily large emphasis on surface-level ecological
validity. That is, the extent to which the materials and experimental setting appear
similar to the real-world equivalent (e.g., how much the news articles used as mate-
rials within a study look like real-world news articles). However, provided the meth-
odology used allows for proper understanding of “the process by which a result
comes about” (Highhouse, 2007, p. 555), then even if the experiment differs consid-
erably from the real world, the information gained is still helpful for developing
theoretical understanding that can then be tested and applied more broadly. In the
6.5 Design, Implementation, and Limitations of Psychological Experiments… 109
context of asylum migration, additional insights can be gained from some related
areas, for example on evacuations during terrorist attacks or natural disasters
(Lovreglio et al., 2016), where agent-based models are successfully used to predict
and manage the actual human behaviour (e.g. Christensen & Sasaki, 2008; Cimellaro
et al., 2019; see also an example of Xie et al., 2014 in Chapter 5). Conceptually, one
common factor in such circumstances could be the notion of fear (Kok, 2016).
Nonetheless, migration is an area in which the limitations of lab or online-based
experimental methods and the difficulty of truly capturing and understanding the
real-world phenomena of interest becomes clear. Deciding to migrate introduces
considerable disruption and upheaval to an individual or family’s life, along with
potential excitement at new opportunities and discoveries that might await them.
How then can a simple experiment or survey conducted in a lab or online via a web
browser possibly come close to capturing the real-world stakes or the magnitude of
the decisions that are faced by people when they confront these situations in the real
world? This problem is likely even more pronounced for migrants seeking asylum,
who are likely to be making decisions under considerable stress and where the deci-
sions that they make could have actual life or death consequences. Given the large
body of evidence showing that emotion can strongly influence a wide range of
human behaviours, judgments, and decisions (Lerner et al., 2015; Schwarz, 2000),
it becomes clear that it is incredibly difficult to generalise and apply findings from
laboratory and online experimental settings in which the degree of emotional
arousal, emotional engagement, and the stakes at play are so greatly reduced from
the real-world situations and phenomena of interest.
For the purpose of the modelling work presented in this book, we focus therefore
on incorporating the empirical information elicited on the subjective measures
(probabilities) related to risky journeys and the related confidence assessment (Sect.
6.3). The process is summarised in Box 6.1.
(continued)
110 6 The Boundaries of Cognition and Decision Making
Within the model, agents form these beliefs based on their experiences
travelling through the world as well as by exchanging information with other
agents. There is also a scaling parameter for risk, risk_scale which is greater
than 1. Based on the above, for risk-related decisions, an agent’s safety esti-
mate for a given link (s) is derived as:
s t _ risk * 1 v _ risk
risk _ scale
* 100
The logit of the probability to leave for a given link (p) is then calculated as:
p I S*s
The results of the experiment in Sect. 6.3 are incorporated within the model
through the values of the intercept I and slope S. These variables take agent-
specific values drawn from a bivariate normal distribution, the parameters for
which come from the results of a logistic regression conducted on the data
collected in the experiment. In this way, the information gained from the psy-
chological experiment about how safety judgments influence people’s will-
ingness to travel is combined with the beliefs that agents within the model
have formed, thereby influencing the probability that agents will make the
decision to travel along a particular link on their route.
The development of more immersive and engaging experimental setups can provide
an exciting avenue to address several of the concerns outlined in the previous sec-
tion. Increasing immersion within experimental studies is particularly helpful for
addressing concerns related to realism and emotional engagement of participants.
One potentially beneficial approach that can be used to increase emotional engage-
ment, and thereby at least partially close the emotional gap between the experimen-
tal and the real-world, is through ‘gamification’. Research has shown that people are
motivated by games and that playing games can satisfy several psychological needs
such as needs for competence, autonomy, and relatedness (Przybylski et al., 2010;
Ryan et al., 2006).
Additionally, Sailer et al. (2017) showed that a variety of aspects of game design
can be used to increase feelings of competence, meaningfulness, and social con-
nectedness, feelings that many researchers are likely to want to elicit in participants
to increase immersion and emotional engagement while they are completing an
experiment. Using gamification to increase participant engagement and motivation
does not even require the inclusion of complex or intensive game design elements.
6.6 Immersive Decision Making in the Experimental Context 111
Lieberoth (2014) found that when participants were asked to engage in a discussion
of environmental issues, simply framing the task as a game through giving partici-
pants a game board, cards with discussion items, and pawns increased task engage-
ment and self-reported intrinsic motivation, even though there were no actual game
mechanics.
To improve the immersion and emotional engagement of participants in experi-
mental studies of migration, we plan to use gamification aspects in future experi-
ments. Specifically, we aim to design a choose-your-own adventure style of game to
explore judgements and decision making within asylum migration context.
Inspiration for this approach came from interactive choose-your-own adventure
style projects that were developed by the BBC (2015) and Channel 4 (2015) to edu-
cate the public about the experiences of asylum seekers on their way to Europe.2 We
plan to use the agent-based models of migration that have been developed to help
generate an experimental setup, and then combine this with aspects of gamification
to develop an experiment that can be ‘played’ by participants. For example, by map-
ping out the experiences, choices, and obstacles that agents within the agent-based
models encounter as well as the information that they possess, it is possible to gen-
erate sequences of events and choices that occur, and then design a choose-your-
own adventure style game in which real-world participants must go through the
same sequences of events and choices that the agents within the model face. This
allows for the collection of data from real-world participants that can be directly
used to calibrate and inform the setup of the agents within the agent-based model,
while simultaneously also having the advantage of being more immersive, engag-
ing, and motivating for the participants completing the experiment.
Improvements in technology also allow for the development of even more
advanced and immersive experiments in the future, using approaches such as video
game modifications (Elson & Quandt, 2016), and virtual reality (Arellana et al.,
2020; Farooq et al., 2018; Kozlov & Johansen, 2010; Mol, 2019; Moussaïd et al.,
2016; Rossetti & Hurtubia, 2020). Elton and Quandt (2016) highlighted that by
using modifications to video games, it is possible for researchers to have control
over many aspects of a video game, allowing them to design experiments by opera-
tionalising and manipulating variables and creating stimulus materials so that par-
ticipants in experimental and control groups can play through an experiment in an
immersive and engaging virtual environment. At the same time, observational stud-
ies based on information from online games allow for studying many aspects of
social reality and social dynamics, which may be relevant for agent-based models,
such as networks and their structures, collaboration and competition, or inequalities
(e.g. Tsvetkova et al., 2018).
The increased availability and decreased costs of virtual reality headsets have
also allowed for researchers to test the effectiveness of presenting study materials
and experiments within virtual reality. Virtual reality has already been used to
2
For the interactive versions of these online tools, see https://www.bbc.co.uk/news/world-middle-
east-32057601 and http://twobillionmiles.com/ (as of 1 January 2021).
112 6 The Boundaries of Cognition and Decision Making
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 7
Agent-Based Modelling and Simulation
with Domain-Specific Languages
7.1 Introduction
DSLs for modelling are aimed at closing the gap between model documentation
and model implementation, with the ultimate goal to conflate both in an executable
documentation. Two desirable properties of a DSL for modelling are practical
expressiveness, describing the ease of specifying a model in the language as well as
how clearly more complex mechanisms can be expressed, and succinctness.
Whereas the number of the used lines of code can serve as an indication for the lat-
ter, the former is difficult to measure. Practical expressiveness must not be confused
with formal expressiveness, which measures how many models can theoretically be
expressed in the language, or, in other words, the genericity of the language
(Felleisen, 1991).
7.2 Domain-Specific Languages for Modelling 115
7.2.1 Requirements
The DSL must therefore be powerful and flexible enough to express them. In addi-
tion, the language must not be limited to a single model of decision making, to
enable an implementation and comparison of different decision models.
Continuous Time. In agent-based modelling, there are roughly two ways to con-
sider the passing of time. The first approach is the so-called ‘fixed-increment time
advance,’ where all agents have the opportunity to act on equidistant time points.
Although that approach is the dominant one, it can cause problems that threaten the
validity of the simulation results (Law, 2006, 72 ff). First, the precise timing of
events is lost, which prohibits the analysis of the precise duration between events
(Willekens, 2009). Second, events must be ordered for execution at a time point,
which can introduce errors in the simulation. The alternative approach is called
‘next-event time advance’ and allows agents to act at any point on a continuous time
scale. This approach is very rarely used in agent-based modelling, but can solve the
problems above. Therefore, a DSL for agent-based modelling of migration should
allow agents to act in continuous time.
Based on the above requirements we selected the Modelling Language for Linked
Lives (ML3). ML3 is an external domain-specific modelling language for agent-
based demographic models. In this context, external means that it is a new language
independent of any other, as opposed to an internal DSL that is embedded in a host
language and makes use of host language features. ML3 was designed to model life
courses of interconnected individuals in continuous time, specifically with the mod-
elling of migration decisions in mind (Warnke et al., 2017). That makes ML3 a natu-
ral candidate for application in this project. In the following Box 7.1, we give a short
description of ML3, with examples taken from a version of the Routes and Rumours
model introduced in Chap. 3, available at https://github.com/oreindt/routes-
rumours-ml3, and relate it to the requirements formulated above.
7.2 Domain-Specific Languages for Modelling 117
Agents of the type Migrant have three attributes: their capital, which is
a real number (defined by the type real after the colon), for example an amount
in euro; and a Boolean attribute, that denotes if they are currently moving, or
staying at one location; and the number of locations visited so far.
Agents can be created freely during the simulation. To remove them, they
may be declared ‘dead’. Dead agents do still exist, but no longer act on their
own. They may, however, still influence the behaviour of agents who remain
connected to them.
Links: Relationships between entities are modelled by links. Links, denoted
by <->, are bidirectional connections between agents of either the same type
(e.g., migrants forming a social network), or two different types (e.g., migrants
residing at a location that is also modelled as an agent). They can represent
one-to-one (<-> e.g., two agents in a partnership), one-to-many (<-> e.g., many
migrants may be at any one location, but any migrant is only at one location),
or many-to-many relations (<-> e.g., every migrant can have multiple other
migrant contacts, and may be contacted by multiple other migrants). The fol-
lowing defines the link between migrants and their current location in the
Routes and Rumours model:
location:Location[1]<->[n]Migrant:migrants
(continued)
118 7 Agent-Based Modelling and Simulation with Domain-Specific Languages
The value of this function is calculated from the base cost of movement
(the model parameter costs_move), scaled by the friction of the connection
between the two locations, which is gained by filtering all outgoing ones
using the predefined function filter, and then unwrapping the only element
from the set of results using only(). The keyword ego refers to the agent
the function is applied to. Procedures are defined similarly, with -> replac-
ing the:=.
Rules: Agents’ behaviour is defined by rules. Every rule is associated with
one agent type, so that different types of agents behave differently. Besides
the agent type, any rule has three parts: a guard condition, that defines who
acts, i.e., what state and environment an agent of that type must be in, to show
this behaviour; a rate expression, that defines when they act; and the effect,
that defines what they do. With this three-part formulation, ML3 rules are
closely related to stochastic guarded commands (Henzinger et al. 2011). The
following (slightly shortened) excerpt from the Routes and Rumours shows
the rule that models how migrants begin their move from one location to
the next:
1 Migrant
2 | !ego.in_transit // guard
3 @ ego.move_rate() // rate
4 -> ego.in_transit := true // effect
5 ego.destination := ego.decide_destination()
The rule applies to all living agents of the type Migrant (line 1). Like in
a function or procedure, ego refers to one specific agent to which the rule is
applied. According to the guard denoted by | (line 2) the rule applies to all
(continued)
7.2 Domain-Specific Languages for Modelling 119
In general, the guard and rate may be arbitrary expressions, and may make use of
the agent’s attributes, links (and attributes and links of linked agents as well), and
function calls. The effect may be an arbitrary sequence of imperative commands,
including assignments, conditions, loops, and procedure calls. The possibility of
using arbitrary expressions and statements in the rules is included to give ML3
ample expressiveness to define complex behaviour and decision processes. The use
of functions and procedures allows for encapsulating parts of these processes to
keep rules concise, and therefore readable and maintainable.
For each type of agent, multiple rules can be defined to model different parts of
their behaviour, and the behaviour of different types of agents is defined in separate
rules. The complete model can therefore be composed from multiple sub-models
covering different processes, each consisting of one or more rules. Formally, a set of
ML3 rules defines a Generalised Semi-Markov Process (GSMP), or a Continuous-
time Markov Chain (CTMC) if all of the rules use the default exponential rates. The
resulting stochastic process was defined precisely in Reinhardt et al. (2021).
7.2.3 Discussion
Any domain-specific modelling language suggests (or even enforces), by the meta-
phors it applies and the functionality it offers, a certain style of model. Apart from
the notion of linked agents, which is central for agent-based models, for ML3, the
notion of behaviour modelled as a set of concurrent processes in continuous time is
also of key importance. This is in stark contrast to commonly applied ABM frame-
works such as NetLogo (Wilensky, 1999), Repast (North et al., 2013), or Mesa
(Masad & Kazil, 2015), which are designed for modelling in a stepwise, discrete-
time approach. If in a simulation model events shall occur in continuous time, these
events need to be scheduled manually (Warnke et al., 2016). In this regard, and with
its firm grounding in stochastic processes, ML3 is more closely related to stochastic
process algebras, which have also been applied to agent-based systems before
(Bortolussi et al., 2015). Most importantly, this approach results in a complete sepa-
ration of the model itself, and its execution. ML3’s rules describe these processes
120 7 Agent-Based Modelling and Simulation with Domain-Specific Languages
3 @poisson(move_rate(agent, sim.par))
4 ~ ! agent.in_transit
5 => start_move!(agent, sim.model.world, sim.par)
Line 1 is equivalent to line 1 in the ML3 rule (Box 7.1), with the difference that
in ML3 the connection to an agent type is declared individually for every rule, while
this version does it for a whole set of processes. Lines 3 to 5 contain the same three
elements (guard, rate, effect) as ML3 rules, but with the order of the first two
switched. The effect was put in a single function start_move, which contains
code equivalent to that in the effect of the ML3 rule. This Julia version is, however,
not completely able to separate the simulation logic from the model itself, but
requires instructions in the effect, to trigger the rescheduling of events described in
the next section.
7.3 Model Execution 121
We begin the simulation with an initial population of agents, our state s, which is
assumed at some point in time t (see Fig. 7.1a). As described in Sect. 7.2, each ML3
agent has a certain type, and for each type of agent there are a number of stochastic
rules that describe their behavior. Each pair of a living agent a and a rule r matching
the agent’s type, where the rule’s guard condition is fulfilled, yields a possible state
transition (or event), given by the rule’s effect applied to the agent. It is associated
with a stochastic waiting time T until its occurrence, determined by an exponential
distribution whose parameter is given by the rule’s rate applied to the agent λr(a, s).
To advance the simulation we have to determine the event with the smallest waiting
time Δt, execute its effect to get a new state s′ and advance the time to the time of
that event t′ = t + Δt.
As per the semantics of the language, the waiting time T is exponentially
distributed:
P T t 1 e
r a ,s · t
. (7.1)
122 7 Agent-Based Modelling and Simulation with Domain-Specific Languages
Fig. 7.1 Scheduling and rescheduling of events. We begin in state s at some time t depicted as the
position on the horizontal time line (a). Events (squares) are scheduled (b). The earliest event is
selected and executed (c), resulting in a new state s′ at the time of that event (d). Then, affected
events must be rescheduled (e)
1
t ·ln u (7.2)
r a,s
Using this method, we can sample a waiting time for every possible event
(Fig. 7.1b). We can then select the first event, and execute it (Fig. 7.1c). In practice,
the selection of the first event is implemented using a priority queue (also called the
event queue), a data structure that stores pairs of objects (here: events) and priorities
(here: times), and allows retrieval of the object with the highest priority very
efficiently.
After the execution of this event, the system is in a new state s′ at a new time t′.
Further, we still have sampled execution times for all events, except the one that was
executed (Fig. 7.1d). Unfortunately, in this changed state, these times might no lon-
ger be correct. Some events might no longer be possible at all (e.g., the event was
the arrival of a migrant at their destination, so other events of this agent no longer
apply). For others, the waiting time distribution might have changed. And some
events might not have been possible in the old state, but are in the new (e.g., if a new
migrant entered the system, new events will be added). In the worst case, the new
state will require the re-sampling of all waiting times. In a typical agent-based
model, however, the behaviour of any one agent will not directly affect the behav-
iour of many other agents. Their sampled times will still therefore be valid. Only
those events that are affected will need to be re-sampled (Fig. 7.1e). In the ML3
simulator this is achieved using a dependency structure, which links events to attri-
bute and link values of agents. When the waiting time is sampled, all used attributes
and links are stored as dependencies of that event. After an event is executed, the
7.3 Model Execution 123
events dependent on the changed attributes and links can then be retrieved. A
detailed and more technical description of this dependency structure can be found
in Reinhardt and Uhrmacher (2017).
In Box 7.2 below, Algorithm 1 shows the algorithm described above in pseudo-
code, and algorithm 2 shows the sampling of a waiting time for a single event.
7.3.2 Discussion
The simulation algorithm described above is abstract in the sense that it is indepen-
dent of the concrete model. The model itself is only a parameter for the simulation
algorithms – in the pseudo-code in Algorithm 1 in Box 7.2 it is called m. As a result,
the simulator, i.e., the implementation of the simulation algorithm, is model-
independent. All the execution logic can hence be reused for multiple models. This
not only facilitates model development, it also makes it economical to put more
effort into the simulator, as this effort benefits many models.
On the one hand, this effort can be put into quality assurance, resulting in better
tested, more reliable software. A simulator that has been tested with many different
models will generally be more trustworthy than an ad hoc implementation for a
single model (Himmelspach & Uhrmacher, 2009). On the other hand, this effort can
be put into advanced simulation techniques. One of these techniques we have
already covered: using continuous time. The simulation logic for a discrete-time
model is often just a simple loop, where the events of a single time step are pro-
cessed in order, and time is advanced to the next step. The simulation algorithm
described above is considerably more complex than that. But with the simulator
being reusable, the additional effort is well invested. Separation of the modelling
and the simulation concerns serves as an enabler for continuous-time simulation.
Similarly, more efficient simulation algorithms, e.g., parallel or distributed simula-
tors (Fujimoto, 2000), simulators that exploit code generation (Köster et al., 2020),
or approximate the execution of discrete events (Gillespie, 2001) developed for the
language, will benefit all simulation models defined in this language.
The latter leads us back to an important relationship between the expressiveness
of the language and the feasibility and efficiency of its execution. The more expres-
sive the modelling language, and the more freedom it gives to the modeller, the
harder it is to execute models, and especially to do so efficiently. The approximation
technique of Tau-leaping (Gillespie, 2001), for example, cannot simply be applied
to ML3, as it requires the model state and state changes to be expressed as a vector,
and state updates to be vector additions. ML3 states – networks of agents – cannot
be easily represented that way. Ideally, every feature of the language is necessary for
the model, so that implementing the model is possible, but execution is not unneces-
sarily inefficient. DSLs, being tailored to a specific class of models, may achieve this.
7.4.1 Basics
The fundamental idea behind using a DSL for specifying experiments is to provide
a syntax that captures typical aspects of simulation experiment descriptions. Using
this provided syntax, a simulation experiment can be described succinctly. This
way, a DSL for experiment specification ‘abstracts over’ individual simulation
experiments, by creating a general framework covering different specific cases. The
commonalities of the experiments become then part of the DSL, and the actual
experiment descriptions expressed in the DSL focus on the specifics of the individ-
ual experiments.
One experiment specification DSL is the ‘Simulation Experiment Specification
on a Scala Layer’ (SESSL), an internal DSL that is embedded in the object-
functional programming language Scala (Ewald & Uhrmacher, 2014). SESSL uses
a more refined approach to abstracting over simulation experiments. Between the
language core and the individual experiments, SESSL employs simulation-system-
specific bindings that abstract over experiments with a specific simulation system.
Whereas the language core contains general experiment aspects such as writing
observed simulation output to files, the bindings package experiment aspects are
tailored to a specific simulation approach, such as specifying which simulation out-
puts to observe. This way, SESSL can cater to the differences between, for example,
conducting experiments with population-based and agent-based simulation models:
whereas population-based models allow a direct observation of macro-level out-
puts, agent-based models might require aggregating over agents and agent attri-
butes. Another difference is the specification of the initial model state, which, for an
ML3 model, might include specifying how to construct a random network of links
between agents.
To illustrate how experimentation with SESSL works, we now consider an exam-
ple experiment specified with SESSL’s binding for ML3 (Reinhardt et al., 2018).
The following listing shows an excerpt of an experiment specification for the Routes
126 7 Agent-Based Modelling and Simulation with Domain-Specific Languages
9 initializeWith(JSON("init50.json"))
10 val migrants = observe("migrants" ~ agentCount(agentType = "Migrant"))
11 // additional lines elided
12 }
13 }
7.4.3 Reproducibility
7.4.5 Discussion
Understanding how the data and theories have entered the model-generating process
is central for assessing a simulation model, and the simulation results that are gener-
ated based on this simulation model. This understanding also plays a pivotal role in
130 7 Agent-Based Modelling and Simulation with Domain-Specific Languages
Fig. 7.2 Provenance graph for model analysis based on Box 5.1 in Chap. 5. (Source: own
elaboration)
and simulation runs were performed on the 37 resulting design points. We model
these two steps as a single process (run), which generated two entities: the design
points (DP) produced in the design step, and the data produced by the simulation
runs (D).
Subsequently, GP emulators were fitted to the data in the next step (fit), yielding
the emulators and the information about sensitivity they contain (S) as a result. If
this was conducted using a DSL such as SESSL (see Sect. 7.4), or even a general-
purpose programming language, the processes (run) and (fit) would have yielded
the corresponding code as additional products, which would appear as additional
entities, and could be used to easily reproduce the results. However, the analysis
was performed with GEM-SA, a purely GUI-based tool, so there is no script, or
anything equivalent.
Figure 7.3 (see Appendix E for details) shows a broader view of the whole mod-
elling process in less detail, including multiple iterations of models (Mi), their anal-
ysis, psychological experiments, and data assessment. The whole analysis shown in
Fig. 7.2 is then folded into the process a1, the first step of the broader analysis of the
Routes and Rumours model. The analysis shown above uses that model (M3) as an
input, and produces sensitivity information as an output (S1). The process is addi-
tionally linked to the methodology proposed by Kennedy and O’Hagan (2001),
denoted as (K01), and thereby indirectly related to the later steps of the process, in
which a similar analysis is repeated on subsequent versions of the model.
To give the provenance graph meaning, appropriate information about the indi-
vidual entities and activities must be provided. The type of entity or activity deter-
mines what information is necessary. That might be a textual description (e.g., ODD
for models, or a verbal description of the processes as in Box 5.1), code (potentially
in a domain-specific language), or the actual data and relevant meta-data for data-
entities. In our case, to provide sources of this information, in Appendix E we
mostly refer to the appropriate chapters and sections of this book.
132
7 Agent-Based Modelling and Simulation with Domain-Specific Languages
Fig. 7.3 Overview of the provenance of the model-building process – for details, see Appendix E. (Source: own elaboration)
7.6 Conclusion 133
7.6 Conclusion
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Part III
Model Results, Applications, and
Reflections
Chapter 8
Towards More Realistic Models
This chapter is devoted to the presentation of a more realistic version of the model,
Risk and Rumours, which extends the previous, theoretical version (Routes and
Rumours) by including additional empirical and experimental information follow-
ing the process described in Part II of this book. We begin by offering a reflection
on the integration of the five elements of the modelling process, followed by a more
detailed description of the Risk and Rumours model, and how it differs from the
previous version. Subsequently, we present selected results of the uncertainty and
sensitivity analysis, enabling us to make further inference on the information gaps
and areas for potential data collection. We also present model calibration for an
empirically grounded version of the model, Risk and Rumours with Reality. In that
way, we can evaluate to what extent the iterative modelling process has enabled a
reduction in the uncertainty of the migrant route formation. In the final part of the
chapter, we reflect on the model-building process and its implementation.
The move from a data-free, theoretical agent-based model to one that represents the
underlying social processes and reality more closely, requires making advances in
all five areas presented in Part II of this book. The model itself needs to be further
developed to answer more specific research questions in a more realistic scenario,
the data and experimental information need to be collected, ideally guided by the
statistical analysis where possible, and the modelling language and formalism need
to be chosen so that they serve the new modelling aims and purposes.
In the context of the migration model presented in this book, we have therefore
set out to create a more realistic version of the simulation of the migration routes
into Europe. To make the model better resemble real-life scenarios, the notion of
personal risk was introduced into the modelled world – in this case, the chance of
not being able to make it safely to the destination and, in extreme cases, of perishing
along the way. This was intended to align the scenario more closely with the sad
reality of the deadly maritime crossings from North Africa and Turkey into Europe,
especially via the Central Mediterranean route, where at least 17,400 people have
perished between 2014 and January 2021 – a majority of the more than 21,300
deaths in the whole Mediterranean basin in that period1 (Frontex, 2018; IOM, 2021,
see also Chap. 4).
In particular, by extending the model and its purpose, we were interested in
investigating whether our model could be used to test the claim – which was made
by some parties within the EU – that an increased risk on the Mediterranean would
lead to a decrease in ‘pull factors’ of migration and thus a decrease in the number of
arrivals (for a critical discussion of this idea, see e.g. the Death by Rescue report by
Heller and Pezzani 2016, as well as other studies, overviews and briefs, such as
Cusumano & Pattison, 2018; Cusumano & Villa, 2019; and Gabrielsen Jumbert,
2020). This is the type of research question that does not necessarily imply predic-
tive capabilities in a simulation model, but rather seeks to illuminate the mecha-
nisms and trade-offs involved in the interplay between risk, information,
communication, and decisions.
In our case, the starting point for the model extension was the theoretical Routes
and Rumours model, presented in Chap. 3 and Appendix A. Each of the subsequent
building blocks – the empirical data, statistical analysis, psychological experiments,
and the discussion around the choice of an appropriate programming language – as
well as the changes made to the model itself as it was further developed to serve the
purpose, were then used to augment the simulated reality in the light of the knowl-
edge that became available as the modelling process unfolded.
Of course, as discussed before, identifying the empirical basis for the model
proved challenging. Of the many different data sources on asylum migration dis-
cussed in Chap. 4 and Appendix B, only a handful were directly applicable to the
new version of the model, and of those, only a couple ended up being used. The
potentially applicable sources concentrated mainly on the process data on registered
arrivals in Europe, (uncertain) risk-related data on the deaths in the Mediterranean,
and survey-based indications of the sources of information used by migrants along
the way (see Box 4.1).
The statistical analysis discussed in Chap. 5 served as a way of focusing the
model on the most important aspects of the route dynamics, while at the same time
allowing its development in other areas. To that end, the key findings regarding the
sensitivity of the model outputs to a small set of information-related variables
enabled us to concentrate on the key defining features of the underlying social
mechanisms driving route formation, which in this case was focused on information
exchange. At the same time, as was expected given the nature of migration
1
The relative risk of death is also far higher on the Central Mediterranean route than elsewhere: the
minimum estimates suggest the risk of dying of 2.4% in 2016–19 (confirmed deaths and disappear-
ances to attempted crossings), as compared to 0.4% on the other Mediterranean routes: Eastern and
Western – a six-fold difference (IOM, 2021).
8.2 Risk and Rumours: Motivation and Model Description 139
processes, the levels of uncertainty surrounding the modelled route formation and
the impact of its drivers (via model parameters), remained high – and higher than in
the Routes and Rumours model.
On the one hand, the results of the statistical analysis carried out on the first,
theoretical version of the model (Routes and Rumours), helped therefore delineate
the possible uses of the psychological experiments in enhancing the simulation. In
particular, the design of the second set of experiments discussed in Chap. 6, looking
at the attitudes to risk and eliciting subjective probabilities of a safe journey depend-
ing on the source of information, was directly informed by both the model design
and sensitivity analysis reported above. The data from this experiment were then
directly used in informing the way the agents respond to different types of informa-
tion in the current model version.
On the other hand, the choice of a modelling language also influenced the model-
building, albeit indirectly. Despite the model development continuing in a general-
purpose programming language (Julia) rather than a domain-specific one (ML3),
the new version as described in Chap. 3 includes some aspects of the model formal-
ism and semantics, uncovered through parallel implementation in both languages
(Reinhardt et al., 2019). This mainly relates to using the continuous definition of
time and to modelling of events through the waiting times, as recommended in
Chap. 7. At the same time, the provenance description of the model helped under-
stand the mechanics of the modelling process itself, and offered a more systematic
way in which to extend the first version of the model.
Throughout the remainder of this chapter, we present the results of following the
modelling process discussed before, in the form of a more realistic and empirically
grounded, yet still explanatory rather than predictive model of migration route for-
mation. In comparison with Routes and Rumours, the focus goes beyond the role of
information and choice between different options under uncertainty, and now addi-
tionally includes risk and risk avoidance, with potentially very serious consequences
for the agents. We discuss the motivation for the specific elements of the construc-
tion of the resulting Risk and Rumours model, as well as a detailed description of
its constituting parts next.
Most of the capabilities required by our model in order to be able to test whether
increased risk could lead to a reduction in arrivals were already in place in the
Routes and Rumours version, except for one crucial one: the presence of risk, and
the rules governing the agents’ decisions in relation to risky circumstances, the
addition of which was the key feature of the new version, called Risk and Rumours.
Other than that, in the previous version the agents already reacted in real (simulated)
time to the changes in travel conditions. Here, the continuous time paradigm offers
a much more natural environment for framing the process of information flow and
belief update, devoid of the artificial constraints imposed by the granularity of time
140 8 Towards More Realistic Models
steps and scheduling problems in discrete simulations (Chap. 7). Furthermore, the
agents’ decisions are based not only on their subjective (and possibly imperfect)
knowledge, which could be exchanged with other agents, mediated by the levels of
trust, or gained by exploring the environment, but also by different levels of risk and
attitudes towards it.
Contrary to the previous version, and to keep the Risk and Rumours model con-
sistent, both internally and with the reality it aims to represent, in this version of the
model it is possible for agents to die, which removes them from the simulation
entirely. For the sake of simplicity, we assume that the agents can only die when
moving across transport links. As with the other processes in the continuous-time
version of the model, death happens stochastically at a certain rate. The rate of death
for a given link is calculated from a risk value associated with each link that repre-
sents the expected probability of an agent dying when crossing that link, and the
expected time it takes to cross that link. The death rates can be taken from the
empirical data, such as the Missing Migrants project (see Chap. 4), either applied
directly as model inputs, or used to calibrate the outputs.
The agents’ information on the transport links now also includes corresponding
knowledge about risk, which they are able to learn about and communicate in the
same way as for the links’ friction and other properties of their environment (see
Chap. 3). Still, this is the one aspect of the new version of the model that is of crucial
importance from the point of view of examining substantive research questions,
many of which – implicitly or explicitly – rely on some assumptions about the atti-
tudes of prospective migrants towards risk, and on the decisions taken in this light.
To that end, the risk-based decision making in the current version of the model is
directly informed by the empirical experiments on subjective probabilities, risk atti-
tudes and confidence in the ensuing decisions according to the source of informa-
tion, as described in Sect. 6.3. Here, we used a logistic regression of the (stated)
probability of making a decision to travel against the (stated) perceived level of risk,
to parameterise a bivariate normal distribution. From this distribution, we draw for
each agent individual values for the slope S and intercept I of the logit-linear func-
tion mapping the probability of travel, p (as per the experimental setup), and the
agent’s perceived risk, s. As discussed in more detail in Box 6.1 in Sect. 6.5, the
logit of the probability to travel can then be calculated as p = I + S * s. In this version
of the model the value of p is transformed into a probability, and used as part of the
cost calculation on which the agents’ path planning is based. For specific details on
the calculation of risk functions, including the role of risk scaling factors, see Box
6.1 in Sect. 6.5, as well as the online material referenced in Appendix A.
In terms of the topology of the new version of the model, for simulating the effect
of elevated risk we implemented a ‘virtual Mediterranean’ by keeping the risk at
very low levels (0.001) for most links in the world, but increasing it in all links
overlapping a rectangular region that ran across half of the width of the simulated
area (the red – darker – central area in Fig. 8.1, showing the model topology).
In order to be able to run simulation experiments based on complex pre-defined
scenarios such as, for example, policy interventions or changes in the agents’ envi-
ronment over time, we further added a generic ‘plug-in’ scenario system to the
8.2 Risk and Rumours: Motivation and Model Description 141
Fig. 8.1 Topology of the Risk and Rumours model: the simulated world with a link risk repre-
sented by colour (green/lighter – low, red/darker – high) and traffic intensity shown as line width.
In this scenario, cautious agents (left) take traffic routes around the high-risk area, whereas agents
exhibiting risky behaviour (right) take the shortest paths, crossing through the dangerous parts of
the map. (Source: own elaboration)
model. This makes it possible to load additional code during the runtime of the
simulation that, for example, changes the values of some parameters at a pre-defined
time, or occasionally modifies the properties of some parts of the simulated world.
Examples of policy-relevant simulations generated by this model are described
in more detail in Chap. 9. Their implementation required three such ‘plug-in’ sce-
nario modules: two of them simulate simple changes in the external conditions of
departures (the migrant generating process) and travel conditions, namely a change
in departure rate at a given time, and change in the level of risk in the high-risk area
at a given time. The third module simulates a government information campaign to
make migrants aware of the high risk of crossing a dangerous area (here, our virtual
Mediterranean) under varying levels of trust in official information sources informed
by the Flight 2.0/Flucht 2.0 survey (see Box 4.1 in Sect. 4.5, and Appendix B for
source details), as well as by the psychological experiment on eliciting subjective
probabilities, reported in Chap. 6 (Sect. 6.2).
In this module, the information campaign has been implemented by introducing
a simulated ‘government agent’ who has full knowledge concerning the high-risk
area, who then interacts with a certain probability with agents present in the entry
cities (see Appendix A). If an interaction takes place, the migrant agent in question
exchanges information with the government agent analogous to the information
exchange happening during regular agent contacts, albeit with modified trust levels.
In addition to providing insights into the topology of the modelled world, Fig. 8.1
offers some preliminary descriptive findings about the role of risk and risk attitudes,
based on a single model run. In this example, the agents are on average either more
or less risk-taking, which is in line with the qualitative findings of the first cognitive
experiment, on eliciting the prospect curves (Sect. 6.2). These differences in
142 8 Towards More Realistic Models
attitudes to risk have a clear impact on the number of journeys undertaken by agents
through the high-risk area. As expected, the more cautious agents are more likely to
attempt travelling around, while in the scenario with higher risk tolerance, the inten-
sity of travel through the high-risk area is visibly elevated. Some further substantive
questions, which can be posed within the context of the Risk and Rumours setup,
are examined for several policy-relevant scenarios generated by the model, pre-
sented in Chap. 9. Before that, however, an important intermediate question is: what
is driving the behaviour observed in the model? As discussed in Chap. 5, the uncer-
tainty and sensitivity analysis can offer at least some indications in that respect. We
discuss this step of the analysis of the model behaviour next.
To analyse the behaviour of the Risk and Rumours model itself, we follow the tem-
plate from Chap. 5, with a few modifications. To start with, we limit the analysis to
four model parameters related to information exchange, which were previously
identified as key in Chap. 5 and one parameter related to the speed of exploration of
the local environment (speed_expl), plus five additional free parameters, not identi-
fied from the data, yet crucial for the mechanism of the model. These additional
parameters are related to the perceptions of risk, and the detailed list of all ten
parameters used for uncertainty and sensitivity analysis is provided in Table 8.1.
Table 8.1 Parameters of the Risk and Rumours model used in the uncertainty and sensitivity
analysis
Parameter Description Range
p_drop_ Probability of an agent losing a contact from their network [0, 1]
contact
p_info_ Probability of an agent communicating with their own contacts [0, 1]
contacts
p_transfer_ Probability of exchanging information through communication [0, 1]
info
Error Measure of information error (0: perfect information, 1: full noise) [0, 1]a
speed_expl Speed of taking up information when exploring locally [0, 1]
risk_scale Measure of how the chance of survival scales to the perceived safety as [4, 20]
measured in the experimental data from Chap. 6
p_notice_ Two parameters that determine how likely it is that an agent notices [0, 1]
death another agent’s death and how strongly that affects risk perception
speed_risk [0, 1]
speed_expl_ A parameter depicting how quickly the perceived risk is updated by [0, 1]
risk local exploration of the environment
path_ Penalty in terms of additional costs for risk associated with a given [0, ∞)b
penalty_risk stretch of route, relative to movement and resource costs
Notes: aFor uncertainty and sensitivity analysis, limited to [0, 0.5] given minimal variability
beyond this range. bFor the analysis, limited to [0, 10] for practical reasons. (Source: own
elaboration)
8.3 Uncertainty, Sensitivity, and Areas for Data Collection 143
This time, our focus is on two key outputs: the number of arrivals, and the num-
ber of drownings, as the ultimate human cost of undertaking perilous migration
journeys. Both of these outputs are analysed globally, but can also be looked at as
time series of the relevant variables for more specific policy-related questions and
for setting up coherent scenarios, as discussed further in Chap. 9.
Given the number of parameters to be studied in this version of the model, there
is no need to carry out extensive pre-screening, so the analysis can focus on assess-
ing the uncertainty of the outputs and their sensitivity to the individual model inputs,
in order to unravel the dynamics of the system and interactions between its different
components. As before, standard experimental design, based on Latin Hypercube
Samples, is applied, with 80 design points and five replicates per point.
The main results of the sensitivity and uncertainty analysis of the Risk and
Rumours model are reported in Table 8.2. For the two outputs considered – the
number of arrivals and the number of deaths – three parameters related to informa-
tion exchange, introduced in Chap. 5, remain of pivotal importance. The key param-
eter is the probability of exchanging information through direct communication
(p_transfer_info), followed by the probability of communicating with an agent’s
contacts (p_info_contacts) and of losing contacts (p_drop_contact). From the
newly-added parameters, depicting the relationships with risk, the most important
are those related to the speed of updating the information about risk (speed_expl_
risk), and to the mapping between the objective risk of death and its subjective
assessment (risk_scale). The interactions between these parameters also play a role
in shaping both outputs, as shown in Table 8.2.
The mean and variance levels of the expected model outputs indicate that on
average, across the whole ten-dimensional parameter space, per each run with
10,000 travelling agents, the model generates nearly 7800 arrivals and 2200 deaths,
although with some non-negligible variation. The resulting death rate, of around
22%, is clearly by an order of magnitude higher than would be observed even on a
high-risk maritime crossing, such as Central Mediterranean. This suggests that the
model needs to be properly calibrated to the empirical data on deaths in order for it
to be more representative of the underlying reality of migration journeys. The esti-
mated total variance in the code output translates into standard deviations of nearly
1150 for arrivals and over 650 for deaths, indicating considerable disparities across
the whole parameter space. On the other hand, the impact of code uncertainty on the
total estimated emulator variance is relatively small: the σ2 term for the code vari-
ability ‘nugget’ is two orders of magnitude smaller than the overall fitted variance
term of the emulator, σ2. On the whole, the fit of the underlying GP emulator is
reasonable, with the root mean squared standardised error (RMSSE) above two for
both outputs, somewhat larger than the ideal levels of one, which would indicate
that the emulator results are close to the model outputs.
Figure 8.2 illustrates the response surfaces with respect to the two parameters
describing the relationship with risk (risk_scale and speed_expl_risk), over their
space of variability defined in Table 8.1, [4, 20] × [0, 1]. The predicted values of the
GP emulator, means and standard deviations, are shown for the two outputs: num-
bers of arrivals and deaths. For simplicity, only the results assuming Normal prior
144 8 Towards More Realistic Models
Table 8.2 Uncertainty and sensitivity analysis for the Risk and Rumours model
Sensitivity analysis
Input\output Arrivals Deaths
Input prior: Normal Uniform Normal Uniform
p_drop_contact 3.006 2.851 10.700 9.130
p_info_contacts 6.092 4.990 15.823 16.784
p_transfer_info 57.644 48.593 40.864 38.264
error 0.145 0.176 2.330 2.712
speed_expl 0.718 0.564 0.533 0.597
risk_scale 2.746 4.297 3.863 3.868
p_notice_death 0.184 0.215 0.138 0.152
speed_risk 0.183 0.212 0.261 0.195
speed_expl_risk 4.597 4.739 10.097 9.371
path_penalty_risk 0.991 1.562 0.655 0.542
Interactions 18.260 22.809 11.522 12.790
Residual 5.433 8.994 3.215 5.595
Total % explained 94.567 91.006 96.785 94.405
Uncertainty analysis (Normal prior)
Mean of expected code output 7763.92 2236.99
Variance of expected code output 4608.59 777.78
Mean total variance in code output 1,315,010 428,657
Fitted sigma^2 1.3160 1.2289
Nugget sigma^2 0.0111 0.0193
Cross-validation (leave 20% out)
RMSE 152.30 116.33
RMSPE (%) 67.73% 6.05%
RMSSE (standardised) 2.5165 2.3836
The experiments were run on 80 Latin Hypercube Sample design points, with five repetitions per
point. The values in bold correspond to inputs with visible (>2.5%) shares of attributed variance.
(Source: own elaboration in GEM-SA. (Kennedy & Petropoulos, 2016))
distributions of inputs are shown, and the values for the remaining parameters are
set at arbitrary, yet realistic values.2 As can be seen from Fig. 8.2, both outputs show
clear gradients along both risk-related parameter dimensions, with arrivals increas-
ing and deaths decreasing with both risk_scale and speed_expl_risk, and with lower
uncertainty estimated for ‘middle’ values of both parameters than around the edges
of the respective graphs.
The results of the sensitivity analysis additionally point to the areas of further
data collection, in particular with respect to information transfers over networks
(parameters p_transfer_info, p_info_contacts, and p_drop_contact), mapping of
2
Here, we assume p_info_contacts = p_transfer_info = 0.8, p_drop_contact = 0.5, p_info_min-
gle = 0.5, error = 0.1, p_notice_death = 0.8, speed_risk = 0.7, and path_penalty_risk = 5. Note that
as per the outcomes of the sensitivity analysis reported in Table 8.2, only the first three of these
parameters really matter.
8.3 Uncertainty, Sensitivity, and Areas for Data Collection 145
Fig. 8.2 Response surfaces of the two output variables, numbers of arrivals and deaths, for the two
parameters related to risk. (Source: own elaboration in GEM-SA, Kennedy & Petropoulos, 2016)
objective and subjective risk measures (risk_scale), and the speed of updating
the information about risk through observation (speed_expl_risk). These are the
areas where the information gains in the model are likely to be the highest, and at
the same time, where the existing evidence base is scarce or non-existent. Here, as
discussed in Chap. 6, carrying out the more interactive and immersive cognitive
experiments on decision making would bear a promise of producing results that
may be less influenced by the respondent bias, which is a concern for respondents
with no lived experience of migration, not to mention asylum migration. Setting up
such an experiment can additionally be helped by carrying out a dedicated qualita-
tive survey, specifically targeted at asylum seekers and refugees, the results of which
would inform the experimental protocol and help manage some ethical issues
related to the sensitivity of the topic.
Still, even within the confines of the current model, there is scope for further
inclusion of selected data sources, discussed in Chap. 4, in order to make it even
closer aligned with the reality the model aims to represent. We discuss these addi-
tions, leading to the creation of a new version of the model, called Risk and Rumours
with Reality, and the process of calibrating this model to observed data by using
Bayesian statistical methods, in the next section of this chapter.
146 8 Towards More Realistic Models
8.4 R
isk and Rumours with Reality: Adding
Empirical Calibration
As discussed before, during the so-called ‘migration crisis’ following the Arab
Spring and the Syrian civil war, attempts to cross the Mediterranean via the Central
route, from Libya and Tunisia to Italy and Malta, saw a massive increase (Chap. 4).
The European Union reacted to these developments by implementing a ‘deterrence’
strategy, in cooperation with North African states. This strategy relied on making it
harder for humanitarian rescue missions to operate in the Mediterranean, while at
the same time boosting efforts by coast guards in Libya and Tunisia to intercept
asylum seekers’ boats before they could reach international waters. As mentioned
before, the available data indicate that between 2015 and 2019 these policy changes
could have led to a strong increase in interceptions at the African coast, and also to
a greater number of fatalities, especially on the Central Mediterranean route
(Frontex, 2018; IOM, 2021; see Sects. 4.2 and 8.1). The concomitant reduction in
sea arrivals in Southern Europe, however, seems to indicate that their harrowing
humanitarian costs notwithstanding these policy changes at least accomplished
their declared goal.
It should be possible to test if this ‘deterrence hypothesis’ is true – that is, whether
the effect of deterrence can indeed explain the reduction in the number of arrivals –
by using an empirically calibrated model of migration that includes the effects of
perceived risk on the migrants’ decisions. A full test of the hypothesis goes beyond
the scope of this book; however, in the following discussion we demonstrate the first
steps towards such a test, by calibrating the Risk and Rumours model against the
refugee situation in the Mediterranean in the years 2016–2019, and thus creating a
new version, Risk and Rumours with Reality. Setting up the modelling framework
for this exercise involved four additional processes: (1) specifying the topology of
the transport network, (2) extracting and assessing data on fatality and interception
rates, (3) reassessing the sensitivity of the adjusted model to key parameters, and
finally (4) calibrating the parameter values based on the empirical information.
To begin with, to define a geographically-plausible model topology for the net-
work of cities and links between them in the model, we extracted the geographical
locations of the most important cities in North Africa, the Levant and on the Turkish
coast as well as some important landing points for refugee boats in Italy, Malta,
Cyprus and Greece from OpenStreetMaps (using OpenRouteService – source
S02 in Appendix B). From the same data source, we calculated travel distances
between these locations to be used as a proxy for the friction parameter. The result-
ing map is shown in Fig. 8.3.
In terms of data for the period 2016–2019, the number of interceptions at the
Tunisian and Libyan coasts as well as numbers of presumed fatalities are available
from IOM (2021) (see also Chap. 4, with sources 11 and 12 listed and discussed in
more detail in Appendix B). Since we do not know the number of departures, we
have to infer fatality and interception rates for each year by using arrivals (idem) in
the corresponding year. For this, we assume that every migrant will attempt
8.4 Risk and Rumours with Reality: Adding Empirical Calibration 147
Fig. 8.3 Basic topological map of the Risk and Rumours with Reality model with example routes:
green/lighter (overland) with lower risk, and red/darker (maritime) with higher risk. Line thickness
corresponds to travel intensity over a particular route for a randomly-selected model run, with
dashed lines denoting unused routes. (Source: own elaboration based on OpenStreetMaps)
departure until they either manage to make the crossing, or die. Intercepted migrants
wait a certain amount of time and then make another attempt. Based on these
assumptions we can estimate the interception probability as pi = Ni/(Ni + Na + Nd)
and probability of dying as pd = Nd/(Ni + Na + Nd), where Ni denotes the number of
interceptions, Na – number of arrivals, and Nd – number of fatalities.
In the third step, we revisited the sensitivity and uncertainty of the revised ver-
sion of the model to different parameters, with the detailed results reported in
Table 8.3. In this iteration of the analysis, there is a noteworthy decrease in the share
of the variance explained by individual parameters in comparison with previous
model versions. There is also visibly higher impact of the parameter interactions, as
well as other, residual factors that drive the model behaviour, which are not yet fully
accounted for in the model, such as the changes in the intensity of migrant departures.
148 8 Towards More Realistic Models
Table 8.3 Uncertainty and sensitivity analysis for the Risk and Rumours with Reality model
Sensitivity analysis
Input\output Arrivals Deaths
Input prior: Normal Uniform Normal Uniform
p_drop_contact 2.454 4.413 14.361 9.539
p_info_contacts 7.292 9.118 4.877 5.550
p_transfer_info 0.855 0.740 0.923 1.094
error 0.781 0.676 2.390 2.499
speed_expl 2.985 4.134 7.619 4.844
risk_scale 3.135 4.495 1.923 1.589
p_notice_death 0.874 0.756 0.688 0.814
speed_risk 0.668 0.578 1.319 1.564
speed_expl_risk 1.589 2.540 0.885 1.050
path_penalty_risk 3.413 3.973 0.575 0.682
Interactions 34.389 39.076 64.153 51.182
Residual 41.566 29.502 0.287 19.594
Total % explained 58.434 70.499 99.713 80.406
Uncertainty analysis (Normal prior)
Mean of expected code output 9483.28 179.59
Variance of expected code output 8311.37 2.27
Mean total variance in code output 576,153 183.68
Fitted sigma^2 1.6179 1.0892
Nugget sigma^2 0.0158 0.3946
Cross-validation (leave 20% out)
RMSE 105.786 13.87
RMSPE (%) 1.15% 9.06%
RMSSE (standardised) 1.2577 2.4834
The experiments were run on 80 Latin Hypercube Sample design points, with five repetitions per
point. The values in bold correspond to inputs with visible (>2.5%) shares of attributed variance.
(Source: own elaboration in GEM-SA, Kennedy & Petropoulos, 2016)
To increase the alignment of the model with reality further, by using the three
outputs discussed above, Ni, Na and Nd, we selected a number of parameters that had
emerged as being the most important in the sensitivity analysis – such as path_pen-
alty_risk, p_info_contacts, p_drop_contact and speed_expl – as well as the two
most important parameters determining the agents’ sensitivity to risk – risk_scale
and path_penalty_risk. We subsequently calibrated the model using a Population
Monte Carlo ABC algorithm (Beaumont et al., 2009) with the rates of change in the
numbers of arrivals and interceptions between the years, as well as the fatality rates
per year, as summary statistics. The rates of change were used in order to at least
approximately get rid of the possible biases identified for these sources during the
data assessment presented in Chap. 4 (in Table 4.3), tacitly assuming that these
biases remain constant over time. A similar rationale was applied for using fatality
rates. Here, the assumption was that the bias in the numerator (number of deaths)
and in the denominator (attempted crossings) were of the same, or similar magnitude.
8.4 Risk and Rumours with Reality: Adding Empirical Calibration 149
We ran the model for 2000 simulation runs spread over ten iterations, with 500
time periods for each run, corresponding to 5 years in historical time, 2015–19, with
the first year treated as a burn-in period. Under this setup, however, the model turned
out not to converge very well. Therefore, we additionally included the between-year
changes in departure rates to the parameters to be calibrated. With this change we
were able to closely approximate the development of the real numbers of arrivals
and fatalities for the years 2016–19 in our model (see also Chap. 9).
In parallel, we have carried out calibration for two outputs together (arrivals and
interceptions) based on the GP emulator approach, the results of which confirmed
those obtained for the ABC algorithm. Specifically, we have estimated the GP emu-
lator on a sample of 400 LHS design points, with twelve repetitions at each point,
and 13 input variables, including three sets of departure rates (for 2017–19). The
emulator performance and fit were found reasonable, and the results proved to be
sensitive to the prior assumptions about the variance of the model discrepancy term
(see also Chap. 5).
Selected results of the model calibration exercise are presented in Fig. 8.4 in
terms of the posterior estimates of selected model parameters: as for the ABC esti-
mates, we did not learn much about most of the model inputs, except for those
related to departures. This outcome confirmed that our main results and qualitative
conclusions were broadly stable across the two methods of calibration (ABC and
GP emulators), strengthening the substantive interpretations made on their basis.
To illustrate the calibration outcomes, Fig. 8.5, presents the trajectories of the
model runs for the calibrated period. These two Figs. 8.4 and 8.5 – are equivalent
to Figs. 5.7 and 5.8 presented in Chap. 5 for the purely theoretical model (Routes
and Rumours), but this time including actual empirical data, both on inputs and
outputs, and allowing for a time-varying model response.
In the light of the results for the three successive model iterations, one important
question from the point of view of the iterative modelling process is: to what extent
Fig. 8.4 Selected calibrated posterior distributions for the Risk and Rumours with Reality model
parameters, obtained by using GP emulator. (Source: own elaboration)
150 8 Towards More Realistic Models
Fig. 8.5 Simulator output distributions for the not calibrated (black/darker lines), and calibrated
(green/lighter lines) Risk and Rumours with Reality model. For calibrated outputs, the simulator
was run at a sample of input points from their calibrated posterior distributions. (Source: own
elaboration)
Table 8.4 Uncertainty analysis – comparison between the three models: Routes and Rumours,
Risk and Rumours, and Risk and Rumours with Reality, for the number of arrivals, under Normal
prior for inputs
Routes & Risk & Risk & Rumours
Indicator\Model Rumours Rumours with Reality
Mean of expected code output 9272.02 7763.92 9483.28
Variance of expected code output 46.41 4608.59 8311.37
Mean total variance in code output 17,639 1,315,010 576,153
Fitted sigma^2 9.4513 1.3160 1.6179
Nugget sigma^2 0.3062 0.0111 0.0158
Source: own elaboration in GEM-SA. (Kennedy & Petropoulos, 2016)
does adding more empirically relevant detail to the model, but at the expense of
increased complexity, change the uncertainty of the model output? To that end,
Table 8.4 compares the results of the uncertainty analysis for the number of arrivals
in three versions of the model: two theoretical (Routes and Rumours and Risk and
Rumours), and one more empirically grounded (Risk and Rumours with Reality).
The results of the comparison are unequivocal: the key indicator of how uncertain
the model results are, the mean total variance in code output (shown in bold in
Table 8.4) is by nearly two orders of magnitude larger for the more sophisticated
version of the theoretical model, Risk and Rumours, than for the basic one, Routes
and Rumours. On the other hand, the inclusion of additional data in Risk and
8.5 Reflections on the Model Building and Implementation 151
Rumours with Reality, enabled reducing this uncertainty more than two-fold. Still,
the variance of the expected code output turned out to be the largest for the empiri-
cally informed model version.
At the same time, reduction in the mean model output for the number of arrivals
is not surprising, as in Risk and Rumours, ceteris paribus, many agents may die
during their journey, especially while crossing the high-risk routes. In the Risk and
Rumours with Reality version, the level of this risk is smaller by an order of magni-
tude (and more realistic). This leads to adjusting the mean output back to the levels
seen for the Routes and Rumours version, which is also more credible in the light of
the empirical data, although this time with a more realistic variance estimate. In
addition, the fitted variance parameters of the GP emulator are smaller for both Risk
and Rumours models, meaning that in the total variability, the uncertainty related to
the emulator fit and code variability is even smaller. In the more refined versions of
the model, uncertainty induced by the unknown inputs matters a lot.
Altogether, our results point to the possible further extensions of the models of
migrant routes, as well as to the importance of adding both descriptive detail and
empirical information into the models, but also to their intrinsic limitations.
Reflections on these issues, and on other, practical aspects of the process of model
construction and implementation, are discussed next.
In terms of the practical side of the construction of the model, and in particular the
more complex and more empirically grounded versions (respectively, Risk and
Rumours, and Risk and Rumours with Reality), the modifications that were neces-
sary to make the model ready for more empirically oriented studies were surpris-
ingly easy to implement. In part, this was due to the transition to an event-based
paradigm which, as set out in Chap. 7, tends to lead to a more modular model
architecture.
Additionally, we found that it was straightforward to implement a very general
scenario system in the model. Largely this is because Julia – a general-purpose pro-
gramming language used for this purpose – is a dynamic language that makes it
easy to apply modifications to the existing code during the runtime. Traditionally,
dynamic languages (such as Python, Ruby or Perl) have bought this advantage with
substantially slower execution speed and have therefore rarely been used for time-
critical modelling. Statically-compiled languages such as C++ on the other hand,
while much faster, make it much harder to do these types of runtime modifications.
Julia’s just-in-time compilation, however, offers the possibility to combine the high
speed of a static language with the flexibility provided by a dynamic language, mak-
ing it therefore an excellent choice for agent-based modelling.
As concerns the combination of theoretical modelling with empirical experi-
ments, one conclusion we can draw is that having a theoretical model first makes
designing the empirical version substantially easier. Only after implementing,
152 8 Towards More Realistic Models
running, and analysing the first version of the model (see Chap. 3) were we able to
determine which pieces of empirical information would be most useful in develop-
ing the model further. This also makes a strong case for using a model-based
approach not only as a tool for theoretical research, but also as a method to guide
and inspire empirical studies, reinforcing the case for iterative model-based enqui-
ries, advocated throughout this book (see Courgeau et al., 2016).
In terms of the future work enabled by the modelling efforts presented in this
book, the changes implemented to the model through the process we describe would
also make it easy to tackle larger, empirically oriented projects that go beyond the
scope of this work. In particular, with a flexible scenario system in place, we could
model arbitrary changes to the system over time. For example, using detailed data
on departures, arrivals and fatalities around the Mediterranean (see Chap. 4) as well
as the timing of some crucial policy changes in the EU affecting death rates, we
would be able to better calibrate the model parameters to empirical data. In the next
step, we could then run a detailed analysis of policy scenarios (see Chap. 9) using
the calibrated model to make meaningful statements on whether an increased risk
does indeed lead to a reduction of arrivals.
Similar types of scenarios can involve complex pattern of changes in the border
permeability, asylum policy developments, and either support or hostility directed
towards refugees in different parts of Europe between 2015 and 2020. A well-
calibrated model, together with an easy way to set up complex scenarios, would
allow investigating the effectiveness of actual as well as potential policy measures,
relative to their declared aims, as well as humanitarian criteria. An example of
applying this approach in practice based on the Risk and Rumours with Reality
model is presented in Chap. 9. In addition, the adversarial nature of some of the
agents within the model, such as law enforcement agents and migrant smugglers,
can be explicitly recognised and modelled (for a thorough, statistical treatment of
the adversarial decision making processes, see Banks et al., 2015).
At a higher level, model validation remains a crucial general challenge in com-
plex computational modelling. As laid out in Chaps. 4, 5 and 6, and demonstrated
above, the additional data and ‘custom-made’ empirical studies, coupled with a
comprehensive sensitivity and uncertainty of model outcomes, can be a very useful
way of directly improving aspects of a model that are known to be underdefined. In
order to be able to test the overall validity of the model, however, it ideally has to be
tested and calibrated against known outcomes.
One possible way of doing that would entail focusing on a limited real-world
scenario with relatively good availability of data. The assumption would then be
that a good fit to the data in a particular scenario implies a good fit in other scenarios
as well. For example, we could use detailed geographical data on transport topology
in a small area in the Balkans, combined with data on presence of asylum seekers in
camps, coupled with registration and flow data, to calibrate the model parameters.
An indication of the ‘empirical’ quality of the model is then its ability to track his-
torical changes in these numbers, spontaneous or in reaction to external factors.
Given the level of spatial detail that would be required to design and calibrate such
models, they remain beyond the scope of our work; however, even the version of the
8.5 Reflections on the Model Building and Implementation 153
model presented throughout this book, and more broadly the iterative process of
arriving at successive model versions in an inductive framework, enables making
some conclusions and recommendations for practical and policy uses.
This discussion leads to a more general point: what lessons have we learned from
the iterative and gradual process of model-building and its practical implementa-
tion? The proposed process, with five clearly defined building blocks, allows for a
greater control over the model and its different constituent parts. Analytical (and
theoretical) rigour, coherence of the assumptions and results, as well as an in-built
process of discovery of the previously unknown features of the phenomena under
study, can be gained as a result. Even though some elements of this approach cannot
be seen as a purely inductive way of making scientific advances, the process none-
theless offers a clear gradient of continuous ascent in terms of the explanatory
power of models built according to the principles proposed in this book, following
Franck (2002) and Courgeau et al. (2016).
In terms of the analysis, the coherent description of phenomena at different lev-
els of aggregations also helps illuminate their mutual relationships and trade-offs, as
well as – through the sensitivity analysis – identify the influential parts of the pro-
cess for further enquiries. Needless to say, for each of the five building blocks in
their own right, including data analysis, cognitive experiments, model implementa-
tion and analysis, as well as language development, interesting discoveries can
be made.
At the same time, it is also crucial to reflect on what the process does not allow.
The proposed approach is unlikely to bring about much change in a meaningful
reduction of the uncertainty of the social processes and phenomena being modelled.
This is especially visible in the situations where uncertainty and volatility are very
high to start with, such as for asylum migration. This point is particularly well illus-
trated by the uncertainty analysis presented in the previous section: introducing
more realism in the model in practice meant adding more complexity, with further
interacting elements and elusive features of the human behaviour thrown into the
design mix. It is no surprise then that, as in our case, this striving for greater realism
and empirical grounding has ultimately led to a large increase in the associated
uncertainty of the model output.
In situations such as those described in this chapter, there are simply too many
‘moving parts’ and degrees of freedom in the model for the reduction of uncertainty
to be even contemplated. Crucially, this uncertainty is very unlikely to be reduced
with the available data: even when many data sources are seemingly available, as in
the case of Syrian migration to Europe (Chap. 4), the empirical material that corre-
sponds exactly to the modelling needs, and can be mapped onto the sometimes
abstract concepts used in the model (e.g., trust, confidence, information), is likely to
be limited. This requires the modellers to make compromises, and make sometimes
arbitrary decisions, or leave the model parameters underspecified and uncertain,
which increases the errors of the outputs further.
These limitations underline high levels of aleatory uncertainty in the modelling
of such a volatile process as asylum migration. Even if the inductive model-building
process can help reduce the epistemic uncertainty to some extent, by furthering our
154 8 Towards More Realistic Models
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 9
Bayesian Model-Based Approach: Impact
on Science and Policy
In this chapter, we summarise the scientific and policy implications of the Bayesian
model-based approach, starting from an evaluation of its possible advantages, limi-
tations, and potential to influence further scientific developments, policy and prac-
tice. We focus here specifically on the role of limits of knowledge and reducible
(epistemic), as well as irreducible (aleatory) uncertainty. To that end, we also reflect
on the scientific risk-benefit trade-offs of applying the proposed approaches. We
discuss the usefulness of proposed methods for policy, exploring a variety of uses,
from scenario analysis, to foresight studies, stress testing and early warnings, as
well as contingency planning, illustrated with examples generated by the Risk and
Rumours models presented earlier in this book. We conclude the chapter by provid-
ing several practical recommendations for the potential users of our approach,
including a blueprint for producing and assessing the impact of policy interventions
in various parts of the social system being modelled.
9.1 B
ayesian Model-Based Migration Studies: Evaluation
and Perspectives
questions concerning migration itself, but also with respect to human behaviour
more generally, that will substantially improve our ability to model and understand
social systems. At the same time, we can utilise different types of data (micro and
macro, quantitative and qualitative, contextual and process-related) in a way that
explicitly recognises their quality and describes uncertainty to be included in the
models. This is especially important given the paucity of data on such complex
processes as migration: here, a formal audit of data quality, as presented in Chap. 4,
is a natural starting point.
Still, large gaps in available empirical knowledge of migration remain, which
makes any kind of formal modelling challenging. For one, data on many processes
that are known to be important are missing or sparse, especially at individual level.
Even with a case study such as the recent Syrian asylum migration, there are parts
of the process with little or no data, and the data that exist rarely measure specifi-
cally what the modellers may want them to. The challenge is to identify and describe
the limitations of the data while also identifying how and where they may be useful
in the model, and to make consistent comparisons across a wide range of data
sources, with a clearly set out audit framework.
More fundamentally, however, we often do not even know which of the possible
underlying processes occur in reality, and even if they do, how they affect migration.
Besides, human behaviour is intrinsically hard to model, and not well understood in
all the detail. Finally, the combination of a large spatially distributed system with
the fact that imperfect spatial knowledge is a key part of the system dynamics leads
to some technical challenges, due to the sheer size of the problem being modelled.
One key piece of new knowledge generated from the psychological experiments
thus far is that migration decision making deviates from the rationality assumptions
often used. We found that people exhibit loss aversion when making migration deci-
sions (they weight losses more heavily than gains of the same magnitude), as well
as that people show diminished sensitivity for gains in monthly income (i.e., they
are less responsive to potential gains as they get further from their current income
level). We have also found that people differentially weight information about the
safety of a migration journey depending on the source of the information.
Specifically, this information seems to be weighted most strongly when it comes
from an official organisation, while the second most influential source of informa-
tion seems to be other migrants with relevant personal experience.
When conducting cognitive experiments and adding greater psychological real-
ism to agent-based models of migration, several important obstacles remain. One
key challenge is how to simulate complex real-world environments within the con-
fines of an online or lab-based experiment. Migration decisions have the potential to
change one’s life to a very large extent, be associated with considerable upheaval,
and, in the case of asylum migration, occur in life-threatening circumstances. For
ethical reasons, no lab-based or online experiment can come close to replicating the
real-world stakes or magnitude of these decisions. This is a major challenge for both
designing migration decision-making experiments and for applying existing insights
from the decision-making literature to migration. Another important challenge is
that migration decisions are highly context dependent and influenced by a huge
9.1 Bayesian Model-Based Migration Studies: Evaluation and Perspectives 157
number of factors. Therefore, even if it were possible to gain insight into specific
aspects of migration decision making, important challenges would remain: estab-
lishing the extent to which these insights are applicable across migration decision-
making contexts, and understanding and/or making reasonable assumptions about
how various factors interact.
In terms of computation, the languages we developed show that the benefits of
domain-specific modelling languages (e.g., separation of model and simulation,
easy to implement continuous time), that are already known in other applications
domains (such as cell biology), can also apply to agent-based models in the social
sciences. The models gradually developed and refined in this project, and other
models of social processes intended to give a better understanding of the dynamic
resulting from individual behaviour, have a strong emphasis on the agents’ knowl-
edge and decision making.
However, modelling knowledge, valuation of new information, and decision
making requires much more flexible and powerful modelling languages than the
ones typically used in other areas. For example, we found that the modelling lan-
guage needs to support complex data structures to represent knowledge. As the
resulting language would share many features of general-purpose programming lan-
guages, it should be embedded into such a general-purpose language, rather than be
implemented as an external domain-specific language.
In addition, our parallel implementation of the core model in two different pro-
gramming languages demonstrated the value of independent validation of simula-
tion code. To understand and evaluate a simulation model, it is not enough to know
how it works; it is also necessary to know why it is designed that way. Provenance
models can supplement (or partially replace) existing model documentation stan-
dards (such as the ODD or ODD+D protocols, the ‘+D’ in the latter referring to
Decisions, Müller et al., 2013; Grimm et al., 2020; see also Chap. 7), showing the
history and the foundations of a simulation model. This is especially pertinent for
those models, such as ours, which are to be constructed in an iterative manner, by
following the inductive model-based approach.
At the same time, the key language design challenge for this kind of models
seems to be finding a way to design the language in such a way that it is:
• powerful and flexible enough;
• easy to use, easy to learn and (perhaps most importantly) easy to read; and
• possible to execute efficiently.
For the provenance models, a key challenge is to identify the entities and pro-
cesses that need to be included, and the relevant meta-information about them.
Some of this is common to all simulation studies, independent of the modelling
method or the application domain. At the same time, other aspects are application-
specific (e.g., certain kinds of data are specific to demography, or to migration stud-
ies, and some information specific to these types of data is relevant). This
meta-information can be gathered with the help existing documentation standards,
such as ODD, which additionally underscores the need for a comprehensive data
and data quality audit, as outlined in Chap. 4.
158 9 Bayesian Model-Based Approach: Impact on Science and Policy
9.2 A
dvancing the Model-Based Agenda Across
Scientific Disciplines1
1
This section includes additional contributions by participants of the workshop “Modelling
Migration and Decisions”, Southampton, 21 January 2020. Many thanks go to André Grow,
Katarzyna Jaśko, Elzemiek Kortlever, Eric Silverman, and Sarah Wise for providing the voices
in the discussion.
9.2 Advancing the Model-Based Agenda Across Scientific Disciplines 159
importantly – which knowledge gaps remain. At the moment, there is still untapped
potential with using digital trace data, for example from mobile phones or social
media, to inform modelling. Of course, such data would need to come not only with
proper ethical safeguards, but also with knowledge of what they actually represent,
and an honest acknowledgement of their limitations.
As the data inventory grows and the quality assessment framework is applied to
different settings, the criteria for comparison may be applicable more consistently.
For example, it is easier to assess the relative quality of a particular type of source
if a similar source has already been assessed. On the whole, the data assessment
tools may also be used to identify additional gaps in available data, by helping
decide which data would be appropriate for the purpose and of sufficient quality,
and therefore can inform targeted future data collection. The quality assessment
framework can also encourage the application of rigorous methods of data collec-
tion and processing before its publication, in line with the principles of open science.
Besides any statistical analysis, the use of empirical data in modelling can
involve face validity tests of the individual model output trajectories, which would
confirm the viability of individual-level assumptions. This approach would provide
confirmation, rather than validation, of the model workings, and that the process of
identifying data gaps and requirements could be iterative. At a more general level,
having specific principles and guidelines for using different types of individual data
sources in modelling endeavours would be helpful – in particular, it would directly
feed into the provenance description of the formal relationships within the model, in
a modular fashion. There is a need for introducing minimum reporting requirements
for documentation, noting that the provenance models discussed in Chap. 7 are in
fact complementary, rather than competing with narrative-based approaches, such
as the ODD(+D) protocols (Müller et al., 2013; Grimm et al., 2020).
With cognitive experiments for modelling, one key area for future advancement
is the development of experimental setups that reduce the gap between experiments
and the real-world situations they are attempting to investigate. The more immersive
and interactive experiment suggested in Chap. 6 would attempt to advance experi-
mental work on decision making in this direction, and we expect that future work
will continue to develop along these lines. Additionally, it will be crucial for future
experimental work to examine the interplay of multiple factors that influence migra-
tion decisions simultaneously, rather than focusing on individual factors one
at a time.
As also mentioned in Chap. 6, another key challenge is how to map the data from
the experimental population to a specific population of interest, such as migrants,
including asylum seekers or refugees. The external validity of the experiments, and
their capacity for generalisation, is especially important given the cultural and
socio-economic differences between experiment participants. One promising pos-
sibility, subject to ethical considerations, consists in ‘dual track’ experimentation on
different populations at the same time, to try to estimate the biases involved. This
could be done, for example, via social media, targeting the groups of interest, and
comparing the demographic profiles with the samples collected by using traditional
methods.
160 9 Bayesian Model-Based Approach: Impact on Science and Policy
realm. Still, having a catalogue of models, and possibly their individual sub-
modules, can offer future modellers a very helpful toolbox for describing and
explaining the mechanisms being modelled. At the same time, the modellers need to
be clear about the model epistemology and limitations, and it is best when a model
serves to describe one, well-defined phenomenon. In this way, models can serve as
a way to formalise and embody the “theories of the middle range”, a term originally
coined by Merton (1949) to denote “partial explanation of phenomena … through
identification of core causal mechanisms” (Hedström & Udehn, 2011), and further
codified within the wider Analytical Sociology research programme (Hedström &
Swedberg, 1998; Hedström, 2005; Hedström & Ylikoski, 2010).2 In this way, the
modelling gives up on the unrealistic aspiration of offering grand theories of social
phenomena. This in turn enables the modellers to focus on answering the research
questions at the ‘right’ level of analysis, which choice may well be a pragmatic and
empirical one.
Third, the pragmatic considerations around how to carry out model-based migra-
tion enquiries in practice are often difficult and idiosyncratic, but this can be par-
tially overcome by identifying examples of existing good practice and greater
precision about the type of research questions such models can answer. At the same
time, there is acute need for being mindful of the epistemological limitations of
various modelling approaches. A related issue of how to make any modelling exer-
cises suitable and attractive for users and policy-makers additionally requires a
careful managing of expectations, to highlight the novelty and potential of the pro-
posed modelling approaches, while making sure that what is offered remains realis-
tic and can be actually delivered.
One important remaining research challenge, where we envisage the concentra-
tion of more work in the coming years, is how to combine the different constituting
elements of the modelling process together. Here again, having agreed guidelines
and examples of good practice would be helpful, both for the research community
and the users. In terms of the quality of input data and other information sources,
there is a need to be explicit about what various sources of information can tell us,
as well as about the quality aspects – and here, explicit modelling of the model
provenance can help, as argued in Chap. 7 (see, in particular, Fig. 7.3).
In future endeavours, for multi-component modelling to succeed, establishing
and retaining open channels for conversation and collaboration across different sci-
entific disciplines is crucial, despite natural constraints in terms of publication and
conference ‘silos’. For informed modelling of complex processes such as migration,
it is imperative to involve interdisciplinary research teams, with modelling and ana-
lytical experts, and diverse, yet complementary expertise of subject matter. Open
discussions around good practice, exploring different approaches to modelling and
decisions, matter a lot both for the practitioners as well as theorists and methodolo-
gists, especially in such a complex and uncertain area as migration. Importantly, this
also matters if models are to be used as tools of policy support and advice. We dis-
cuss the specific aspects of that challenge next.
We are particularly grateful to André Grow for drawing our attention to this interpretation.
2
162 9 Bayesian Model-Based Approach: Impact on Science and Policy
9.3 P
olicy Impact: Scenario Analysis, Foresight, Stress
Testing, and Planning
In the context of practical implications for the users of formal models, it is a truism
to say that any decisions to try to manage or influence complex processes, such as
migration, are made under conditions of high uncertainty. Broadly speaking, as sig-
nalled in Chap. 2, we can distinguish two main types of uncertainty. The epistemic
uncertainty is related to imperfect knowledge of the past, present, or future charac-
teristics of the processes we model. The aleatory uncertainty, in turn, is linked to
the inherent and irreducible randomness and non-determinism of the world and
social realm (for a discussion in the context of migration, see Bijak & Czaika,
2020). The role of these two components changes over time, as conjectured in
Fig. 9.1, with diminishing returns from current knowledge in the more distant
future, which is dwarfed by the aleatory aspects, driven by ever-increasing com-
plexity. Importantly, the influences of uncertain events and drivers accumulate over
time, and there is greater scope for surprises over longer time horizons.
In the case of migration, the epistemic uncertainty is related to the conceptualisa-
tion and measurement of migration and its key drivers and their multi-dimensional
environments or ‘driver complexes’, acting across many levels of analysis (Czaika
& Reinprecht, 2020). In addition, the methods used for modelling and for assessing
human decisions in the migration context also have a largely epistemic character.
Conversely, systemic shocks and unpredictable events affecting migration and its
drivers are typically aleatory, as are the unpredictable aspects of human behaviour,
especially at the individual level (Bijak & Czaika, 2020). At a fundamental level, the
future of any social or physical system remains largely open and indeterministic,
Fig. 9.1 Stylised relationship between the epistemic and aleatory uncertainty in migration model-
ling and prediction
9.3 Policy Impact: Scenario Analysis, Foresight, Stress Testing, and Planning 163
Early warnings and stress testing are particularly useful for short term, operational
purposes, such as humanitarian relief, border operations, or similar. What is required
of formal models in such applications is a very detailed description, ideally aligned
with empirical data. This description should be linked to the relevant policy or oper-
ational outcomes of interest, especially if the models are to be benchmarked to some
quantitative features of the real migration system. Here, the models can be addition-
ally augmented by using non-traditional data sources, such as digital traces from
mobile phones, internet searches or social media, due to their unparalleled timeli-
ness. In particular, formal simulation models can help calibrate early warning sys-
tems, by allowing to set the response thresholds at appropriate levels (see Napierała
164 9 Bayesian Model-Based Approach: Impact on Science and Policy
et al., 2021). At the same time, models can help with stress testing of the existing
migration management tools and policies, by indicating with what (and how
extreme) events such tools and policies can cope. One stylised example of such
applications for the Risk and Rumours version of the migration route formation
model is presented in Box 9.1.
(continued)
9.3 Policy Impact: Scenario Analysis, Foresight, Stress Testing, and Planning 165
35 35
30 30
25 25
20 20
15 15
10 10
5 5
0 0
0 50 100 150 200
Daily arrivals Cusum
Fig. 9.2 Cusum early warnings based on the simulated numbers of daily arrivals at the destination
in the migrant route model, with different reaction thresholds
166 9 Bayesian Model-Based Approach: Impact on Science and Policy
At the other end of the temporal spectrum, foresight and scenario-based analyses,
deductively obtained from the model results (see Chap. 2), are typically geared for
higher-level, more strategic applications. Given the length of the time horizons,
such approaches can offer mainly qualitative insights, and offer help with carrying
out the stimulus-response (‘what-if’) analyses, as discussed later. This also means
that these models can be more approximate and broad-brush than those tailored for
operational applications, and can have more limited detail of the system description.
An illustration of how an agent-based model can be used to generate scenarios of
the emergence of various migration route topologies is offered in Box 9.2, in this
case with specific focus on how migration responds to unpredictable exogenous
shocks, rather than examining the reactions of flows to policy interventions, which
is discussed next.
(continued)
9.3 Policy Impact: Scenario Analysis, Foresight, Stress Testing, and Planning 167
Fig. 9.3 Scenarios of the numbers of arrivals (top) and fatalities (bottom), assuming an increased
volume of departures at t = 150, and deteriorating chances of safe crossing from t = 200. Results
shown for the low and high effects of risk on path choice (‘risk-taking’ and ‘cautious’) and levels
of initial knowledge and communication (‘informed’ and ‘uninformed’), including between-
replicate variation (grey shade)
9.3 Policy Impact: Scenario Analysis, Foresight, Stress Testing, and Planning 169
Whether the insights discussed above can be also gained from the model cali-
brated to the actual data series is another matter. To test it, in Box 9.4 we repeat the
‘what-if’ exercise introduced before, but this time for the Routes and Rumours with
Reality version of the model, calibrated by using the Approximate Bayesian
Computation (ABC) approach, described in Sect. 8.4.
On the whole, the results of scenarios, such as those presented in Boxes 9.3, and
9.4, can go some way towards answering substantive research and policy questions.
This also holds for the questions posed in Chap. 8, as to whether increased risk – as
well as information about risk – can bring about a reduction in fatalities among
migrants by removing one possible ‘pull factor’ of migration. As can be seen from
the results, this is not so simple, and due to the presence of many trade-offs and
interactions between risk, peoples’ attitudes, preferences, information, and trust, the
effect can even be neutral, or even the opposite to what was intended. This is espe-
cially important in situations when different agents may follow different – and
sometimes conflicting – objectives (see Banks et al., 2015). These findings – even if
interpreted carefully – strengthen the arguments against withdrawing support for
migrants crossing the perilous terrain, such as the Central Mediterranean (see Heller
& Pezzani, 2016; Cusumano & Pattison, 2018; Cusumano & Villa, 2019; Gabrielsen
Jumbert, 2020).
170 9 Bayesian Model-Based Approach: Impact on Science and Policy
Fig. 9.4 Outcomes of different ‘what-if’ scenarios for arrivals (top) and deaths (bottom) based on
a public information campaign introduced at t = 210 in response to the increase in fatalities
the underlying behavioural dynamics of the agents and their interactions. Of course,
the process of modelling does not have to end here: in the spirit of inductive model-
based enquiries, these results indicate the need to get more detailed information
both on the mechanisms and on observable features of the migration reality, so that
the journey towards further discoveries can follow in a ‘continuous ascent’ of
knowledge, in line with the broad inductive philosophy of the model-based approach.
9.4 T
owards a Blueprint for Model-Based Policy
and Decision Support
In practice, the identification of the way in which the models can support policy or
practice should always start from the concrete needs of the users and decision mak-
ers, in other words, from identifying the questions that need answering. Here, the
172 9 Bayesian Model-Based Approach: Impact on Science and Policy
Fig. 9.5 Outcomes of the ‘what-if’ scenarios for arrivals (top) and deaths (bottom) based on a
public information campaign introduced at t = 210, for the calibrated Risk and Rumours with
Reality model
9.4 Towards a Blueprint for Model-Based Policy and Decision Support 173
Fig. 9.6 Blueprint for identifying the right decision support by using formal models
174 9 Bayesian Model-Based Approach: Impact on Science and Policy
to the reality they represent, their results and predictions, especially quantitative (in
the short run), but also qualitative (in the long run), can be far off. As any model-
based prediction is difficult, and long-term quantitative predictions particularly so
(Frigg et al., 2014), the expectations of model users need to be carefully managed to
avoid false overpromise.
Still, especially in the context of fundamental and irreducible uncertainty, pos-
sibly the most important role of models as decision support tools is to illuminate
different trade-offs. If the outputs are probabilistic, and the user-specific loss func-
tions are known, indicating possible losses under different scenarios of over- and
underprediction, the Bayesian statistical decision analysis can help (for a fuller
migration-related argument, see Bijak, 2010). Still, even without these elements,
and even with qualitative model outputs alone, different decision or policy options
can be traded off according to some key dimensions: benefits versus risk, greater
efficiency versus preparedness, liberty versus security. These are some of the key
considerations especially for public policy, with its non-profit nature and hedging
against the risk preferable to maximising potential benefits or rewards. At the end of
the day, policies, and the related modelling questions, are ultimately a matter of
values and public choice: modelling can make the options, their price tags and
trade-offs more explicit, but is no replacement for the choices themselves, the
responsibility for which rests with decision makers.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 10
Open Science, Replicability,
and Transparency in Modelling
Toby Prike
Recent years have seen large changes to research practices within psychology and a
variety of other empirical fields in response to the discovery (or rediscovery) of the
pervasiveness and potential impact of questionable research practices, coupled with
well-publicised failures to replicate published findings. In response to this, and as
part of a broader open science movement, a variety of changes to research practice
have started to be implemented, such as publicly sharing data, analysis code, and
study materials, as well as the preregistration of research questions, study designs,
and analysis plans. This chapter outlines the relevance and applicability of these
issues to computational modelling, highlighting the importance of good research
practices for modelling endeavours, as well as the potential of provenance model-
ling standards, such as PROV, to help discover and minimise the extent to which
modelling is impacted by unreliable research findings from other disciplines.
10.1 T
he Replication Crisis and Questionable
Research Practices
Over the past decade many scientific fields, perhaps most notably psychology, have
undergone considerable reflection and change to address serious concerns and
shortcomings in their research practices. This chapter focuses on psychology
because it is the field most closely associated with the replication crisis and there-
fore also the field in which the most research and examination has been conducted
(Nelson et al., 2018; Schimmack, 2020; Shrout & Rodgers, 2018). However, the
issues discussed are not restricted entirely to psychology, with clear evidence that
similar issues can be found in many scientific fields. These include closely related
fields such as experimental economics (Camerer et al., 2016) and the social sciences
more broadly (Camerer et al., 2018), as well as more distant fields such as biomedi-
cal research (Begley & Ioannidis, 2015), computational modelling (Miłkowski
et al., 2018), cancer biology (Nosek & Errington, 2017), microbiome research
(Schloss, 2018), ecology and evolution (Fraser et al., 2018), and even within meth-
odological research (Boulesteix et al., 2020). Indeed, many of the lessons learned
from the crisis within psychology and the subsequent periods of reflection and
reform of methodological and statistical practices apply to a broad range of scien-
tific fields. Therefore, while examining the issues with methodological and statisti-
cal practices in psychology, it may also be useful to consider the extent to which
these practices are prevalent within other research fields with which the modeller is
familiar, as well as the research fields that the findings of the modelling exercise
either relies on, or is applied to.
Although there was already a long history of concerns being raised about the
statistical and methodological practices within psychology (Cohen, 1962; Sterling,
1959), a succession of papers in the early 2010s brought these issues to the fore and
raised awareness and concern to a point where the situation could no longer be
ignored. For many within psychology, the impetus that kicked off the replication
crisis was the publication of an article by Bem (2011) entitled “Feeling the future:
Experimental evidence for anomalous retroactive influences on cognition and
affect.” Within this paper, Bem reported nine experiments, with a cumulative sam-
ple size of more than 1000 participants and statistically significant results in eight of
the nine studies, supporting the existence of paranormal phenomena. This placed
researchers in the position of having to believe either that Bem had provided consid-
erable evidence in favour of anomalous phenomena that were inconsistent with the
rest of the prevailing scientific understanding of the universe, or that there were
serious issues and flaws in the psychological research practices used to produce the
findings.
Further issues were highlighted through the publication of two studies on ques-
tionable research practices in psychology, “False-positive psychology: Undisclosed
flexibility in data collection and analysis allows presenting anything as significant”
by Simmons et al. (2011), and “Measuring the prevalence of questionable research
practices with incentives for truth telling”, by John et al. (2012). Using two example
experiments and a series of simulations, Simmons et al. (2011) demonstrated how a
combination of questionable research practices could lead to false-positive rates of
60% or higher, far higher than the 5% maximum false-positive rate implied by the
endorsement of p < 0.05 as the standard threshold for statistical significance.
Specifically, the authors showed that collecting multiple dependent variables, not
specifying the number of participants in advance, controlling for gender or the inter-
action of gender with treatment, or having three conditions but preferentially choos-
ing to report either all three or only two of the conditions, can lead to large increases
in the false-positive rates that become even more extreme when several of these
research practices are combined. To drive home the point further, Simmons et al.
(2011) conducted a real study with 20 undergraduate students and then used the
analytical flexibility available to them and the lax reporting standards for statistical
analyses to report an impossible finding: that they had ‘found’ that listening to the
song “When I’m Sixty-Four” rather than “Kalimba” led to participants being
younger, with the test statistic F(1, 17) = 4.92 implying a ‘significant’ p-value,
p = 0.040.
10.1 The Replication Crisis and Questionable Research Practices 177
Closely following the Simmons et al. (2011) paper, John et al. (2012) published
a survey on the research practices of psychologists, finding that the type of practices
Simmons et al. (2011) had shown to be highly problematic were commonplace.
Responses to the full list of questionable research practices included in the survey
varied considerably (see John et al., 2012 for full results for all ten questionable
research practices). Some research practices were considered much less defensible,
such as outright falsification of data (admitted to by 0.6–1.7% of the sample of
researchers, depending on the condition) or making misleading or untrue statements
within the paper such as, “In a paper, claiming that results are unaffected by demo-
graphic variables (e.g., gender) when one is actually unsure (or knows that they
do)”, (admitted to by 3.0–4.5% of the sample, depending on condition). Even more
commonplace was the benefit of hindsight: the statement, “In a paper, reporting an
unexpected finding as having been predicted from the start”, was admitted to by
27.0–35.0% of the sample, again depending on condition (John et al., 2012, passim).
Other research practices examined in the survey were considered more defensi-
ble and were admitted to by a majority of the psychologists surveyed, but can still
contribute to massively increased false positive rates prevalent in the literature. For
example, 55.9–58.0% of the sample admitted to, “Deciding whether to collect more
data after looking to see whether the results were significant”, and 63.4–66.5% of
the sample admitted to, “In a paper, failing to report all of a study’s dependent mea-
sures” (idem). It is also important to note that these are conservative estimates based
on the willingness of individual psychologists to admit that they personally had
engaged in questionable research practices, and therefore the actual prevalence of
questionable research practices is likely far higher. John et al. (2012) also calculated
prevalence estimates based on respondents’ answers to questions about the percent-
age of other psychologists who have engaged in a questionable research practice as
well as the percentage of those other psychologists who have engaged in a question-
able research practice and would admit to having done so, and for nearly all of the
questionable research practices these estimates were considerably higher than the
number who actually made self-admissions within the survey (idem).
The publication of a large-scale replication attempt of 100 psychological find-
ings by the Open Science Collaboration (2015) showed the practical extent of the
problems highlighted by Simmons et al. (2011) and John et al. (2012). Although 97
of the 100 original studies included for replication reported statistically significant
results, only 36 of the replication attempts ended up statistically significant, despite
having statistically well-powered designs (with an average power – probability of
correctly rejecting a false hypothesis – equal to 0.92), and despite matching the
original studies closely, including using original materials wherever possible. Other
large-scale replication efforts, including the Many Labs projects within psychology
(Ebersole et al., 2016; Klein et al., 2014, 2018), projects in fields such as experi-
mental economics (Camerer et al., 2016), and the social sciences more broadly
(Camerer et al., 2018), as well as more distant fields, such as cancer biology (Nosek
& Errington, 2017), have highlighted that, to varying extents, there are serious
issues with the reliability and replicability of findings published within many scien-
tific areas.
178 10 Open Science, Replicability, and Transparency in Modelling
Once the issues outlined above were clearly highlighted, many scholars within psy-
chology decided that reform was necessary, and serious changes within the field
needed to be made.1 Changes to current practices were recommended at several
levels of the scientific process, including at the level of individual authors, review-
ers, publishers, and funders (Munafò et al., 2017; Nosek et al., 2015; Simmons
et al., 2011). Some of the changes to research practice that have been most com-
monly recommended and widely engaged with by researchers include openly pub-
lishing the data and analysis code online, openly publishing study materials online,
and the preregistration of study methodology and analysis plans (Christensen
et al., 2019).
The change in research practice that has seen the earliest and greatest uptake by
researchers is the public sharing of data and/or analysis code (Christensen et al.,
2019). Making the data and analysis code underlying research claims openly avail-
able has many potential benefits for both science as a whole and for individual
researchers who engage in the practice. Benefits to the scientific process from the
open sharing of data include: allowing other scientists to re-analyse data to help
verify the results and check for errors, providing safeguards against misconduct
such as data fabrication, or taking advantage of analytical flexibility, for example,
because other scientists can discover that a result is entirely reliant on a specific
covariate. It also allows other researchers to reuse the data for a variety of purposes
(Tenopir et al., 2011). If data are publicly available, then they may be reanalysed to
answer new questions that were not initially examined by the researchers. Without
open data, these reanalyses would not be possible and therefore the scientific knowl-
edge would either not be generated at all, or would require the recollection of the
same, or highly similar data, leading to waste and inefficiency in the use of resources
(usually public funding; Tenopir et al., 2011).
There are also good reasons for individual researchers to publicly post their data
even if they are motivated by their own self-interest. Articles with publicly available
data have an advantage in the number of citations received (Christensen et al., 2019;
Piwowar & Vision, 2013), and willingness to share data are associated with the
strength of evidence and quality of the reporting of statistical results (Wicherts
et al., 2011). However, even though the uptake of the public posting of data and
software code is growing quickly and should be lauded, there are still many prob-
lematic areas, such as incomplete data, missing instructions, and insufficient infor-
mation provided. These issues mean that even when data are publicly shared,
independent researchers may still regularly face considerable hurdles and/or not
actually be able to analytically reproduce the results reported in the paper (Hardwicke
et al., 2018; Obels et al., 2020; Stagge et al., 2019; Wang et al., 2016).
1
Although it has to be noted that there was also pushback from some scholars – see Schimmack
(2020) for further discussion of the responses to the replication crisis.
10.2 Open Science and Improving Research Practices 179
Another common and rapidly growing area of open science is the public posting
of study materials or instruments and experimental procedures (Christensen et al.,
2019). Like open data and analysis code, this practice has the benefit of increasing
transparency and making it clear to editors, reviewers, and readers of articles, what
exactly was done within the study. This increased transparency allows for easier
assessment of whether there are potential confounds or other flaws in the study
methodology that may have impacted on the conclusions. It also allows for easier
assessment of the appropriateness and validity of the stimuli and materials used.
Openly sharing materials and procedures also has the additional benefits of making
it far easier for other researchers to conduct direct replications of the research (i.e.,
taking the same materials and procedures and collecting new data to independently
verify the results), as well as to conduct follow up studies that attempt to conceptu-
ally replicate, adapt, or expand on some or all of the aspects of the study without the
need to contact the original authors and/or to expend time and resources reproduc-
ing or creating new study materials and procedures. These practices are in addition
to ensuring the reproducibility of the results, which is here understood as ensuring
that the software or computer code applied to a given dataset produces the same set
of results as reported in the study.2
One major change in research practice that has the potential to greatly reduce
questionable research practices and improve the quality of science is preregistra-
tion: registering the aims, methods and hypotheses of a study with an independent
information custodian before data collection takes place (Nosek et al., 2018;
Wagenmakers et al., 2012). Although preregistration is still currently less common
than openly sharing data, code, and materials, the uptake of the practice is increas-
ing rapidly (Christensen et al., 2019). Preregistration has been referred to as ‘the
cure’ for analytical flexibility or ‘p-hacking’, the practice of fine-tuning analyses
until the desired or a publishable result, as measured by the magnitude of p-values,
can be obtained (Nelson et al., 2018, p. 519).
When researchers preregister their studies, they need to outline in advance what
their research questions and hypotheses are, as well as their plans for analysing the
data to answer these questions and verify the hypotheses (Nosek et al., 2018;
Wagenmakers et al., 2012). Therefore, if done correctly, preregistration ensures that
the analyses conducted are confirmatory, which is a required assumption for null
hypothesis significance testing. It also allows both the researchers themselves and
other consumers of research products to have much greater confidence that the
results can be relied upon, and the false-positive rate has not been greatly inflated
through questionable research practices (Simmons et al., 2011). In this way, prereg-
istration is also useful for the researchers conducting the research, as it helps them
to avoid biases and misleading themselves (Nosek et al., 2018). Once discovering
an unexpected but impactful result in the data, or that controlling for a variable or
excluding participants based on a specific criterion leads to a statistically significant
2
For a broad terminological discussion of replicability and reproducibility, which are terms
that still remain far from being unambiguously defined and used, see e.g. National Academies of
Sciences, Engineering, and Medicine (2019).
180 10 Open Science, Replicability, and Transparency in Modelling
finding that can be published, it can be easy for hindsight bias and wishful thinking
to lead researchers to justify these analytical decisions to both themselves and oth-
ers, and to believe that they predicted or planned them all along (also known as
‘hark-ing’ – “hypothesising after results are known”; Kerr, 1998).
However, preregistration alone is not likely to solve the problems with research
malpractice unless reviewers, editors, publishers, and readers ensure that research-
ers actually follow their preregistered hypotheses and analysis plans. Registration of
clinical trials has been commonplace for some time now, yet published trials still
regularly diverge from the prespecified registrations, with publications switching
and/or not reporting the primary outcomes listed in trial registries (Goldacre et al.,
2019; Jones et al., 2015), and journals showing resistance to attempts to highlight or
correct issues when informed of discrepancies between the trial registries and the
articles they had published (Goldacre et al., 2019). Going even further than prereg-
istration, a growing number of journals now offer a registered report format in
which studies are reviewed based on the underlying research question(s), study
design, and analysis plan and can then be given in principle acceptance, meaning
that the study will be published regardless of the results provided the authors adhere
to the pre-agreed protocols (Chambers 2013, 2019; Nosek & Lakens, 2014; Simons
et al., 2014).
In addition to the changes in research practice outlined above, there has also been
considerable discussion about the use of statistics within psychology and other sci-
entific fields, including a special issue of The American Statistician entitled
“Statistical Inference in the 21st Century: A World Beyond p < 0.05”. Within the
special issue, and in various other articles, books, and publications, the contributors
have criticised the use of p-values, and particularly the p < 0.05 cut-off convention-
ally used to determine ‘statistical significance’, as well as the phrase ‘statistically
significant’ itself. Indeed, the editors of The American Statistician recommended
that the phrase ‘statistically significant’ no longer be used (Wasserstein et al., 2019).
There is still much disagreement about what new statistical practices should be
adopted or how researchers should move forward, with a variety of potential solu-
tions proposed. For example, some have recommended that the p < 0.05 threshold
be redefined to p < 0.005 instead (Benjamin et al., 2018), whereas others have advo-
cated for a shift away from null hypothesis significance testing towards Bayesian
analyses and inference (Wagenmakers et al., 2018). At the same time, some other
authors, notably Gigerenzer and Marewski (2015), have warned about the idolisa-
tion of simple Bayesian measures, such as Bayes Factors. In the same way as had
happened with p-values, indolent statistical reporting can occur under the Bayesian
paradigm as much as in the frequentist one. Although there is still some disagree-
ment about the possible future directions for statistical analysis and inference, the
general guidance provided by the editors of The American Statistician – “Accept
uncertainty. Be thoughtful, open, and modest.” (Wasserstein et al., 2019, p. 2) – pro-
vides a direction for future empirical enquiries.
10.3 Implications for Modellers 181
The above discussion has outlined a series of issues that have occurred within psy-
chology and a variety of other experimental and empirical domains of science, as
well as some of the solutions that are already being implemented and potential
future directions for further improvements in methodology and statistics. The fol-
lowing section relates these considerations back to the specific domains of compu-
tational modelling and simulation, highlighting the relevance of the lessons learned
for researchers and practitioners within these domains. There is documented evi-
dence of similar issues occurring within computational modelling, and issues within
empirical fields can also impact computation modelling because of the intercon-
nectedness of scientific disciplines.
Many of the issues highlighted above are also relevant for computational model-
ling, and even in circumstances where a concern is not directly applicable to model-
ling challenges, there are some analogous concerns (Miłkowski et al., 2018; Stodden
et al., 2013). As with the practice of sharing data, analysis code, study materials, and
study procedures for empirical studies, clearly and transparently documenting mod-
els is vital for other researchers to be able to verify and expand upon existing work.
Chapter 7 of this book highlights several existing methods that modellers can use to
document or describe simulation models, such as the ODD protocol (Overview,
Design concepts, Details; Grimm et al., 2006), or provenance standards, such as
PROV (Groth & Moreau, 2013).
Similar to the sharing of data and analysis code, there are often serious issues
with attempting to computationally reproduce existing models and simulations even
if code is provided. This can happen because of a range of factors, such as the exclu-
sion of important information within publications and failing to properly document
model and/or simulation code (Miłkowski et al., 2018). As with sharing data and
analysis code for empirical work, transparently sharing documentation and descrip-
tions of computational models has the advantage of allowing other researchers to
test and verify the extent to which outputs are dependent on specific modelling
choices made in the modelling process, how sensitive the model is to changes in
various inputs (see Chap. 5 for more details on sensitivity analysis), and/or the
extent to which the results change (or remain consistent) when the model uses dif-
ferent data or is applied in a different context (e.g., if a model of asylum migration
from Syria is applied to asylum migration from Afghanistan).
Computational modelling often requires far more decisions regarding design,
formalisation, and implementation than standard experimental or empirical work,
and in some cases is more exploratory in nature. Therefore, preregistration does not
seem like a readily applicable or appropriate format to be transferred to all aspects
of computational modelling, although it is certainly still applicable to at least some
aspects (e.g., if models are to be compared, it is useful to preregister the models that
will be compared as well as how the comparison will be conducted; see Lee et al.,
2019 for more information). Nonetheless, there are several strategies that can be
used to try and reduce the extent to which modellers have the flexibility to tinker
with their models to find the specific settings that produce the desired (publishable)
results.
182 10 Open Science, Replicability, and Transparency in Modelling
One option here is for modellers to develop and rely on prespecified architectures
within their models, such as the BEN (Behavior with Emotions and Norms) archi-
tecture, which provides modules that can add aspects such as emotions, personality,
and social relationships to agent-based models (Bourgais et al., 2020). Alternatively,
independent researchers can recreate a model without referring to or relying on the
original model code, which can help to test the extent to which outputs are depen-
dent on modelling choices for which there are a variety of plausible and defensible
alternative options (see Silberzahn et al., 2018 for an analogous example with sta-
tistical analyses). Reinhardt et al. (2019) have provided a detailed discussion of the
processes and lessons learned from implementing the same model in two different
modelling languages, one a general-purpose language using discrete-time and the
other a domain-specific modelling language using continuous time.
In addition to the open science and methodological concerns within computa-
tional modelling, related research practices within psychology and other empirical
fields can also have considerable impact on modelling practice because of the inter-
play between scientific disciplines and how computational models may rely on or be
informed by findings from empirical work. Therefore, the tendency for many empir-
ical fields to simply rely on finding ‘statistically significant’ effects rather than
attempt to accurately estimate effect sizes or test them for robustness limits the
extent to which these findings can be usefully and easily applied to computational
models. Additionally, if a computational model is informed by, or relies on, empiri-
cal findings to justify mechanisms and processes within the model (e.g., the deci-
sion making of agents within an agent-based model), then if those findings are
unreliable and/or based on questionable research practices, this may effectively
undermine the whole model.
These limitations once again highlight the advantage of provenance modelling
standards, such as PROV (Groth & Moreau, 2013; Ruscheinski & Uhrmacher,
2017), as a format for documenting and describing models. PROV allows informa-
tion to be stored in a structured format that can be queried, thereby allowing it to be
easily seen which entities a model relies on (see Chap. 7). Therefore, if new research
highlights issues within the existing literature (e.g., a failed replication within psy-
chology), or new discoveries are made, it is a relatively simple and straightforward
task to search PROV information, and discover which models have incorporated
this information as an entity, and therefore may have at least some aspects of the
model that need to be reconsidered or updated.
This strategy could also be combined with sensitivity analysis (see Chap. 5) to
establish the extent to which the model outputs are sensitive to aspects that rely on
the entity now called into question, and therefore whether it is necessary to update
the model in light of the new information. Additionally, PROV has the potential to
contribute to the empirical literature by highlighting specific entities (e.g., research
studies) that are commonly featured within models. Such studies may therefore
become a high priority for large-scale replication efforts, not only to ensure the reli-
ability and robustness of the findings, but also to identify potential moderators
(mediating and confounding variables) and boundary conditions.
10.3 Implications for Modellers 183
The choice of specific tools and solutions notwithstanding, one lesson for mod-
ellers that can be learned from the replicability crisis is clear: transparency and
proper documentation of the different stages of the modelling process are vital for
generating trust in the modelling endeavours and in the results that the models gen-
erate. For the results to be scientifically valid, they need to be reproducible and
replicable in the broadest possible sense – and documenting the provenance of mod-
els is a necessary step in the right direction.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 11
Conclusions: Towards a Bayesian
Modelling Process
11.1 B
ayesian Model-Based Population Studies: Moving
the Boundaries
Given the current state of knowledge, what are the perspectives for computational
migration and population modelling? The two intertwined challenges, those of
uncertainty and complexity, can be broken down into a range of specific knowledge
gaps, dependent on the context and research questions being addressed. The explan-
atory power of simulation models (for a general discussion, see Franck, 2002 and
Courgeau et al., 2016), well suited for tackling the complexity of social processes,
such as migration, can be coupled with the statistical analysis aimed at the quantifi-
cation of uncertainty. Throughout this book, we have argued for the use of model-
ling and its encompassing statistical analysis as elements of a language for describing
and formalising relationships between elements of complex systems. We discuss
some of the specific points and lessons next.
The main high-level argument put forward in this book is that model building
is – or needs to be – a continuing process, which aims to reduce the complexity of
social reality. The formal sensitivity analysis helps retain focus on the important
aspects, while disregarding those whose impact is only marginal. All the constitut-
ing building blocks of this process are therefore important: starting from the com-
putational model itself, and its implementation in a suitable programming language,
through empirical data, information on human decision making – which, as in our
case, can come from experiments – and the statistical analysis of each model ver-
sion. All of these elements contribute to our greater ability to understand the model
workings, while retaining realism about the degree to which the model remains a
faithful description of the reality it aims to represent. The formalisation of model
analysis also allows us to explore the model behaviour and outcomes in a rigorous
way, while being transparent about the assumptions made. In this way, we can illu-
minate the micro-level mechanisms (micro-foundations) that generate the popula-
tion-level processes we observe at the macro scale, while formally acknowledging
the different sources of their uncertainty.
Of course, when it comes to representing reality, all models are more likely to
hold higher resemblance to the actual processes under specific conditions. To that
end, adding more detail and data helps approximate the reality, but this comes at a
cost of increased uncertainty. By doing so, the models also run the risk of losing
generality, and their nature becomes more descriptive than predictive or explana-
tory. At the same time, as shown in Chap. 9, there are trade-offs involved in the
different purposes of modelling, too: better predictive capabilities of a model can
lead to a loss of explanatory power of the underlying mechanisms, if it is dominated
by the information used for model calibration.
In such cases, additional effort is required in terms of data collection and assess-
ment, to make sure that the model-based description of an idiosyncratic social pro-
cess is as accurate as possible. The successive model iterations may then not be
strictly embedded within one another, so that the ‘ascent’ of knowledge, which
would be ideally seen in the classical inductive approach, is not necessarily mono-
tonic (Courgeau et al., 2016). Still, even in such cases, the more detailed models can
offer more accurate approximations of the reality. Formal description of the model-
building process, for example by using provenance modelling tools discussed in
Chap. 7, can help shed light on that, while keeping track of the developments in the
individual building blocks in the successive model versions.
At the same time, such models can retain some ability to generalise their out-
comes, although at the price of increased uncertainty. To that end, models can still
make some theoretical contributions (Burch, 2018), especially if ‘theory’ is not
interpreted in a strict nomological way, as a set of well-established propositions
from which the predictions can be simply deduced (Hempel, 1962). Instead, the
models can answer well-posed explanatory questions (‘how?’) in a credible man-
ner – offering increasingly plausible descriptions of the underlying social mecha-
nisms, as long as their construction follows several iterations of the outlined process,
checking the model-based predictions against the observed reality. At the same
time, some residual (aleatory) uncertainty always remains, especially in the model-
ling of social processes, and addressing it requires going beyond models alone.
In the light of the above findings, the modelling processes can also be given
novel interpretations. Social phenomena, such as migration, are very complicated
and complex inverse problems, which in the absence of an omniscient Laplace’s
11.1 Bayesian Model-Based Population Studies: Moving the Boundaries 187
demon – a hypothetical being with the complete knowledge of the world, devoid of
the epistemic uncertainty – do not have unique solutions (see Frigg et al., 2014). The
scientific challenges of model identifiability are therefore akin to the studies of non-
response or missing information, but this time carried out on a space of several pos-
sible (and plausible) models. Model choice becomes yet another source of the
uncertainty of the description of the process under study, alongside the data, param-
eters, expert input, and so on. Still, the iterative model construction process advo-
cated throughout this book enables building models of increasing analytical and
explanatory potential, which at the same time remain computationally tractable.
This is yet another argument for turning to the philosophy of Bayesian statistical
inference: the initial model specification is but a prior in the space of all possible
models, and the modelling process by which we can arrive at the increasingly accu-
rate approximations of reality is akin to Bayesian model selection. Of course, there
is an obvious limitation here of being restricted to a class of models pre-defined by
the modellers’ choices and, ultimately, their imagination (see also the discussion of
inductive and abductive reasoning in Chap. 2). The inductive process of iterative
learning about the dynamics of complex phenomena, besides being potentially
Bayesian itself, can also include several other Bayesian elements, describing the
uncertainty of different constituting parts, such as individual decisions of agents in
the model (and updating of knowledge), model estimation and calibration, and
meta-modelling.
The status quo in demography and population studies, on which this work builds,
can be broadly described as the domination of empiricism at the expense of more
theoretical enquiries (Xie, 2000), with an increasing recognition that some areas of
theoretical void can be filled by formal models (see Burch, 2003, 2018). At the same
time, recent years have seen promising advances in the demographic and social sci-
ence methodology. The modelling approaches of statistical demography, including
Bayesian ones, hardly existent until the second half of the twentieth century, are
now a well-established part of mainstream population sciences (Courgeau, 2012;
Bijak & Bryant, 2016), while agent-based and other computational approaches,
despite recent advances (Billari & Prskawetz, 2003; van Bavel & Grow, 2016),
remain somewhat of a novelty. So far, as discussed in Daniel Courgeau’s Foreword,
these two modelling approaches have remained hardly connected, and connecting
them was one of the main motivations behind undertaking the work presented in
this book.
Against this background, our achievements can be seen both at the level of the
individual constituent parts of the modelling process, presented in Chaps. 3, 4, 5, 6,
and 7, as well as – if still tentatively – the way in which they can coherently work
together. To that end, advances made at the level of process development and docu-
mentation, together with their philosophical underpinnings, offer a blueprint for
constructing empirically relevant computational models for studying population
(and, more broadly, social) research questions. The opening up of population and
other social sciences for new approaches and insights from other disciplines can be
an important step towards moving the boundaries of analytical possibilities for
studying the complex and the uncertain social world. However, despite all the
advances, some important obstacles on this journey remain, which we discuss next.
188 11 Conclusions: Towards a Bayesian Modelling Process
11.2 L
imitations and Lessons Learned: Barriers
and Trade-Offs
From the discussion so far, key challenges for advancing the Bayesian model-based
agenda for population and broader social sciences are already clear. The main one
relates to putting the different building blocks together in a unified, interdisciplinary
modelling workflow. The interdisciplinarity is of lesser concern: most disciplines in
social sciences are very familiar and comfortable with the high-level notion of mod-
elling as an approximation of reality, so all that is needed for a successful bridging
of disciplinary barriers is willingness to share other perspectives, open communica-
tion, and clear definitions of the concepts and ideas so that they can be understood
across disciplines.
A much greater challenge lies in the fusion of different building blocks at an
operational level: how to include experimental results in the simulation model?
How to operationalise data and model uncertainty? How to implement the model in
a way that balances computational efficiency with the transparency of code? These
are just a few examples of questions that need answering for this approach to reach
its full potential. Some possibilities for ideas dealing with these challenges have
been proposed throughout this book, but they are just the tip of the iceberg. To
develop some of these ideas further, and to come up with robust practical recom-
mendations, a higher-level reflection is needed. Such a synthetic view and advice
could be offered, for example, from the point of view of philosophy of science, sci-
ence and technology studies, or similar meta-disciplines.
Another key challenge relates to the empirical information being too sparse and
not exactly well tailored, either for the model requirements, or for answering indi-
vidual research questions. What is contained in the publicly available datasets is
often, at least to some extent, different to what is needed for modelling purposes.
This leads to important problems at several levels. First, the models can be only
partially identified through data, with many data gaps and free parameters com-
pounding the output uncertainty. Second, the quality of the existing data may be
low, with their uncertainty assessment contributing additional errors into the model.
Third, the use of proxies for variables that conceptually may be somewhat different
(e.g. GDP per capita instead of income, or Euclidean distance between capital cities
of origin and destination countries instead of the distance travelled), can introduce
additional biases and uncertainty, not all aspects of which may be readily visible
even after a thorough quality assessment (see Chap. 4). The operationalisation prob-
lem is particularly acute for such variables and concepts as, for example, trust, risk-
aversion, or many other psychological traits, for which no standard measures exist.
At the same time, as shown in Chaps. 5 and 8, modelling coupled with a formal
sensitivity analysis can provide a way of identifying the data and knowledge gaps,
and consequently of filling them with information collected through dedicated
means. From the point of view of addressing individual research questions, this can
be quite resource-consuming, sometimes prohibitively so, as it requires devoting
additional resources in terms of time, labour and money, to the collection of new
11.2 Limitations and Lessons Learned: Barriers and Trade-Offs 189
data. Yet when such data can be generated and deposited in an open-access reposi-
tory, such activities, whenever possible, can offer positive externalities for a broader
research community, with the possible applications of the collected data going
beyond a particular piece of research (see Chap. 10). The same holds for tailor-made
experiments, for which an additional aspect of the sensitivity analysis involves veri-
fying the impact of psychologically plausible decision rules and mechanisms against
the default placeholder assumptions, such as rational choice and maximum utility
(Chap. 6).
The interpretation of models as tools to broaden the understanding of the pro-
cesses at hand, through illuminating the information gaps, feedbacks, unintended
consequences, and other aspects of individual-level human decisions and their
impact on observed macroscopic, population-level patterns, is one of the many non-
predictive applications of formal modelling (Epstein, 2008). In fact, as with the
examples presented in this book, the purely predictive uses of models become of
secondary importance. There is so much uncertainty in complex social and popula-
tion processes, that not only proper description of the full extent of this uncertainty
becomes difficult, but also any formal decision analysis on the basis of such predic-
tive models would be very limited, and may well be hardly possible.
In the case of complex social processes, even once everything that is potentially
known or knowable has been accounted for, and the corresponding epistemic uncer-
tainty, related to imperfect knowledge, has been reduced, the residual uncertainty
remains large. Even the most carefully designed and calibrated models still reflect
the underlying messy and complex social reality, which is characterised by rela-
tively large and irreducible aleatory uncertainty, related to the intrinsic randomness
of the social world. For such applications, the focus of the analysis shifts from exact
prediction and the resulting well-defined cost-benefit decision analysis, to aiding
the broader preparedness and planning. In this way, the models can play an impor-
tant role in testing the impact of different scenarios and assumptions, including
qualitative ones, in a logically coherent simulated environment (Chap. 9).
The main lessons learned from the model-based endeavours, however, are about
trade-offs. Of course, such trade-offs also exist at the level of the model analysis,
with changes in some variables having non-trivial impact on others through non-
linear relationships and feedback loops. Still, from the methodological point of
view, even more important may be the process-level trade-offs, such as between
increasing the level of detail and description of the social phenomena (topology of
the world, decision processes, agents’ memory and learning, and so on), and the
computational constraints, including run times, computer memory efficiency.
Every building block of the modelling process includes trade-offs as well. For
data, the choice may be between their bias and variance; for experiments, between
different levels of cognitive plausibility and less realistic default assumptions; for
implementation, between general-purpose and domain-specific languages; for the
analysis, between descriptive and more sophisticated analytical tools; and for docu-
mentation, between description and formalisation. As in real life, modelling leaves
plenty of room for choice, but the model-based process we suggest in this book is
designed to help make these choices and their consequences transparent and explicit.
190 11 Conclusions: Towards a Bayesian Modelling Process
11.3 T
owards Model-Based Social Enquiries:
The Way Forward
So, in summary, what can formal models and the lessons learned from following an
interdisciplinary modelling process potentially offer population and other social sci-
entists? The specific findings and more general reflections reported throughout this
book point to important insights that can be generated by modelling, not necessarily
limited to the specific research question or questions, but also leading to chance
discoveries of some related process features, which can in turn produce new insights
or lines of enquiry. In this way, modelling increases not only our understanding of
the pre-defined features of the processes, but also the more general characteristics
of the process dynamics. This is especially important for such complex and uncer-
tain phenomena as migration flows. At the same time, it is also important to reflect
on the practical limitations of furthering the model-based agenda, and health warn-
ings related to the interpretation of the model results.
The key lessons from the work we describe throughout this book are threefold.
First, modelling of a complex social phenomenon itself is a process, not a one-off
endeavour. The process is iterative, and its aim is an ever-better sequence of approx-
imations of the problem at hand, in line with the inductive philosophical principles
of the scientific method, possibly coupled, where needed, with the pragmatic tenets
of abductive reasoning (see Chap. 2). Second, the presence of many aspects of the
modelling process – as well as of the process being modelled, especially in the
social realm – requires true interdisciplinarity and interconnectedness between the
different perspectives, rather than working in individual, discipline-specific silos.
Third, the formal acknowledgement of uncertainty – in the data, parameters, and
models themselves – needs to be central to the modelling efforts. Given the complex
and highly structured nature of social problems, Bayesian methods provide an
appropriate formal language for describing this uncertainty in different guises.
These principles, coupled with a thorough and meticulous documentation of the
work, both for legacy purposes and possible replication (see Chap. 10), are the main
scientific guidelines for model development and implementation.
At the same time, the impact of models is not limited to the scientific arena. To
make the most of the modelling endeavours targeted at practical applications, as
argued in Chap. 9, the involvement of the users and other relevant audiences in the
modelling process needs amplifying. This in turn requires greater modelling literacy
on the part of the model users, next to statistical literacy (Sharma, 2017). The onus
on ensuring greater literacy is on modellers, though: the communication of model
workings and limitations needs to be specific and trustworthy, and provided at the
right level of technical detail for the audience to understand. The levels of trust can
be, of course, heightened by following established conventions in modelling (see
Chap. 3): carrying out a thorough assessment of the available data (Chap. 4) and a
multi-dimensional assessment of uncertainty (Chap. 5); following established ethi-
cal principles in gathering information that requires it (Chap. 6); and providing
meticulous documentation of the process, for example through ODD and
11.3 Towards Model-Based Social Enquiries: The Way Forward 191
provenance description (Chap. 7). In short, the keys to good communication and
effective user involvement are transparency, rigour, and awareness of the limitations
of modelling. At the same time, the very purpose of model-building, and any practi-
cal uses of the models, are also related to societal values and can have ethical dimen-
sions, which needs to be borne in mind.
There are other practical obstacles related to interdisciplinary modelling. Large
and properly multi-perspective modelling endeavours are themselves complex,
time-consuming and costly, having to rely on interdisciplinary teams. For commu-
nication within teams, a common language needs to be established, ensuring that
the joint efforts are targeting shared problems. Even within the best-functioning
teams, however, scientific challenges at the connecting points between the disci-
plines are inevitable (see Chap. 8). At the same time, overcoming them takes time
and patience. Some interesting discoveries reported in this book were a result of our
evolution in thinking about the modelling process and its components over the
course of a five-year project. That there are not too many existing examples of such
modelling projects and endeavours, is exactly why such work is both needed, and so
difficult at the same time. This is also why large-scale scientific investments, offer-
ing funding beyond disciplinary silos, with modelling explicitly recognised as
cross-cutting activity, are of crucial importance. They provide the necessary struc-
tures to help scientists from different areas connect by making them learn – and
speak – the same language: the language of formal models.
Of course, modelling cannot solve all problems faced by population sciences,
migration studies, or social enquiries more generally. As argued above, the aleatory
uncertainty, some of which is related to human behaviour and agency, remains irre-
ducible: this is in fact a welcome sign of the power of human spirit, free will and
imagination. Still, formal models can help us get answers to questions that are more
complex and sophisticated – and hopefully also more interesting and relevant – than
those allowed by the more traditional social science tools. This is the beginning of
a longer journey into the world of modelling, and despite the price that has to be
paid for engaging in such activities, this is definitely worth doing, for the sake of
exploring new intellectual horizons, designing more robust solutions to practical
and policy problems, and ultimately making the social world a bit less uncertain.
192 11 Conclusions: Towards a Bayesian Modelling Process
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Appendices: Supporting Information
Martin Hinsch
The aim of the model is to investigate the formation of migration routes and how
they are affected by the availability and exchange of information. In our model
agents attempt to traverse a – for them – unknown landscape, having to rely on
either local exploration or communication with other agents to find the best path
across. The following gives a general overview of the model. For a more detailed
description, as well as the source code, we would like to refer to Hinsch and Bijak
(2019), and the links to the online repository with model code and documentation
are available at: www.baps-project.eu.
Entities
Entities directly represented in the model are agents, settlements, and trans-
port links.
Agents
The agents represent migrants undertaking a journey from the origin to the destina-
tion. At any time, agents are either present at a settlement or a transport link or they
have arrived at the destination.
Contacts
Each agent has a list of other agents that it is in contact with (representing their
social network), and can exchange information with (see information below).
Knowledge
Each agent has a potentially incomplete and inaccurate set of knowledge items con-
cerning the world. Each item describes the properties and topology of a settlement
or a transport link.
Settlements
Settlements are located at a specific position on the map and differ in quality and
resources. Settlements are connected among each other by random transport links
(see setup below).
Transport Links
Links always connect two settlements. The only property of links is friction, which
subsumes length and difficulty of travel.
Interactions
The only entities to change state over the course of the simulation are the agents.
They do that by interacting with cities, links and other agents. Agents can exchange
information with agents either in their contact list or present at the same location as
them. They can travel along transport links and collect information on their current
and neighbouring locations. For more details see Section A2 on model-specific pro-
cesses below.
Information
Information and how agents use and exchange it is a crucial part of the model. Each
item of knowledge an agent has – for example, the quality of a specific settlement –
is described by an estimate and a level of certainty. That is, an agent has an idea of
the numerical value of a given property and how certain it is that the value is correct.
For a given agent, these numbers change either when the agent explores its envi-
ronment or when it exchanges information with other agents. When collecting
information from the environment, the estimate becomes more accurate while the
certainty increases. Information exchange is a bit more complicated. Generally
speaking, the more certain an agent is (i.e. the higher its certainty value) the stronger
the effect on the other agent’s estimate. At the same time, agents with similar beliefs
(i.e. similar values for estimate) will reinforce each other and their certainty will
increase, while for very dissimilar beliefs certainty can decrease.
Travel
Agents start out at entry settlements (origin locations) at one edge of the map and
attempt to reach exit settlements (destination locations) at the other edge.
Appendices: Supporting Information 195
Agents decide if and where to go purely based on the subjective information they
have available. If an agent does not have enough information to find a route to an
exit, it will attempt to improve its local position (if possible) by travelling to an
adjacent city that is ‘better’ than the current one, where quality is determined by the
city properties (quality and resources), the travel distance or effort (i.e. friction) and
the city’s proximity to the exit edge of the map.
If an agent knows enough to find a complete route, it will attempt to travel the
route with the lowest costs, where costs are again a function of city properties and
travel effort.
Setup
Before the start of the simulation a map of settlements and links is generated and
their property values assigned. To generate the topology we use a random geometric
graph: all cities are placed at random locations, then cities that are closer than a
given threshold are connected with a transport link. In addition, we place a fixed
number of entry and exit settlements at the respective edge of the map and connect
them with the nearest ‘regular’ settlements.
At the beginning of the simulation no agents are present in the simulation.
Newly-added agents (see Processes below) start out at entry cities with (dependent
on scenario) either no or only rudimentary knowledge of the world and some ran-
domly selected contacts to other agents pre-assigned.
A2. Processes
Contacts
Agents that are not travelling can add other agents that are present at the same loca-
tion to their list of contacts. The rate of gaining contacts depends on the number of
agents present at the location.
Leaving
Agents that are not travelling can leave. This means they change their location to a
transport link and thus become travelling agents. The rate at which agents leave is
constant.
Arriving
Agents that are travelling can arrive at the next location (and thus become non-
travelling agents). If they arrive at an exit they immediately become inactive (they
can still communicate information to their contacts, however). Arrival rates depend
on a link’s friction.
Communication
At any time, agents can exchange information with one of their contacts. The rate at
which this happens depends on the number of contacts an agent has.
A3. Illustration
Fig. A.1 Realised (top) and hypothetical optimal (bottom) migration routes with migrants travel-
ling left to right. Circles represent cities, transport links are shown as lines. Links without any
traffic are drawn dashed, and lines with traffic are solid. Thickness of the line represents sum traffic
over the entire run of the simulation. Source: own elaboration
198 Appendices: Supporting Information
study
Source type: Quantitative Process Macro-level Time detail: Geography:
RCT survey Qualitative Context Micro-level Oct-Nov 2018 Senegal
Content description: a one-off impact evaluation study, employing a survey-based randomized
control trial (RCT) amongst the participants of IOM information and intervention programmes,
aiming to assess the efficiency of peer-to-peer information campaigns about the reality between
prospective migrants from Senegal.
Link: https://publications.iom.int/books/migrants-messengers-impact-peer-peer-
communication-potential-migrants-senegal-impact.
Access information: individual-level data are not publicly available, but the report and the
accompanying technical annex contain aggregate results tables.
Table C.1 Selected software packages for experimental design, model analysis, and uncertainty
quantification
Software Description URL
R packages R packages related to uncertainty https://cran.r-project.org/
quantification
lhs Package for creating Latin hypercube samples …/package=lhs/
AlgDesign Package for creating different (algorithmic) …/package=AlgDesign/
experimental designs, including factorial ones
DiceKriging Package for estimating and analysing computer …/package=DiceKriging/
experiments with non-Bayesian kriging models
rsm Package for generating response surface models, …/package=rsm/
creating surface plots
tgp Treed GPs: package for a general, flexible, …/package=tgp/
non-parametric class of meta-models
BACCO Toolkit for applying the Kennedy and O’Hagan …/package=BACCO
(2001) framework to emulation and calibration
gptk GP Toolkit: package for a range of GP-based …/package=gptk/
regression model functions
GEM-SA Gaussian Emulation Machine for Sensitivity http://www.tonyohagan.
Analysis (see Kennedy & Petropoulos, 2016) co.uk/academic/GEM
Gaussian A repository of links to various GP-related http://www.
Processes routines, mainly in Matlab, Python and C++ gaussianprocess.org/
UQLab Comprehensive, general-purpose software for https://www.uqlab.com/
uncertainty quantification, based on Matlab
Source: own elaboration. Links current as of 1 February 2021
Error 0.2538 0.5488 0.0669 36.4927 36.6574 35.0669 76.4010 14.7351 68.0989 20.1476 43.4140 51.6134
p_find_links 0.4081 0.5613 0.1191 0.2482 0.9514 0.0002 2.6256 1.3189 2.8085 2.3934 3.8855 3.3441
p_find_dests 1.4957 0.6918 0.6128 0.1820 0.2583 0.0479 0.7313 0.8168 0.5810 0.8978 1.5653 0.1153
speed_expl_stay 0.5536 0.6144 0.3574 1.0826 1.0858 0.7769 1.3740 0.8992 0.7412 1.1358 2.2779 1.2338
speed_expl_move 0.2003 0.4141 0.0003 0.5533 0.6358 0.2910 2.4765 0.8633 0.2852 1.6236 2.8415 2.4769
qual_weight_x 0.2386 0.5018 0.0050 0.9840 1.1857 0.7549 0.3081 1.0565 0.1810 1.5166 5.9727 3.4927
qual_weight_res 0.4842 0.6208 0.1180 0.2899 0.4654 0.1639 0.8429 0.9435 0.7635 0.8720 2.4428 0.2459
path_weight_frict 0.8701 0.8619 0.5510 1.5559 1.4843 1.3565 0.2935 0.9555 0.0574 1.0483 2.7944 1.7892
weight_traffic 0.5218 0.6555 0.2951 0.2848 0.3933 0.0462 0.5493 0.9833 0.3406 1.0284 1.8629 0.0022
costs_stay 0.2331 0.4304 0.0480 0.2707 0.8047 0.0238 0.2927 0.9534 0.0633 0.8261 3.5357 0.1534
costs_move 2.0486 0.4799 0.0025 2.6870 3.5960 1.6021 0.3941 0.9004 0.2902 2.4457 1.6440 0.7625
ben_resources 1.4450 0.5813 0.1549 0.3710 0.5254 0.0953 4.4356 1.1965 0.0963 1.1298 1.8900 0.0177
Interactions 4.7815 6.5045 0.0000 5.7987 7.5184 0.0000 6.3422 12.4944 0.0000 10.9863 0.0000 0.0000
Total % explained: 99.45 69.21 92.54 99.85 99.27 85.99 99.53 42.70 74.47 56.66 100.00 80.75
Notes: for each output, the sensitivity was assessed three times: two times in GEM-SA (Kennedy & Petropoulos, 2016), under two different random seeds, and
through ANOVA in R. The values in bold correspond to the inputs with high (>5%) share of the variance attributed to individual variables
a
The experiments were run on 37 Definitive Screening Design points: for prop_stdd one repetition per point, for all other outputs ten per point Source: own
215
Table C.3 Key results of the uncertainty and sensitivity analysis for the Routes and Rumours (data-free) version of the migration model
Input assumptions: Normal prior Uniform prior
Sensitivity analysis
Input\Output mean_ stdd_link_c corr_opt_ prop_stdda mean_freq_ stdd_ corr_opt_links prop_stdda
freq_plan links plan link_c
p_drop_contact 0.2906 4.4151 0.8502 5.0949 0.4802 5.7799 1.0706 5.2907
p_info_mingle 6.9762 4.8300 9.5387 9.4807 9.6796 5.8638 8.5181 11.3356
p_info_contacts 9.3481 0.3196 3.2030 6.1303 9.5604 0.3077 3.7471 2.8154
p_transfer_info 71.7956 22.2017 44.4823 30.7951 66.7805 16.1100 39.0571 21.6262
Error 0.0990 27.5070 18.7827 7.1979 0.1130 24.9533 17.6223 7.6150
Exploration 0.7538 3.0526 2.3338 4.0429 0.5882 3.6425 2.9926 4.1407
Interactions 8.6676 26.8109 16.6100 33.0812 9.5531 28.1409 19.7976 39.5982
Residual 2.0692 10.8631 4.1992 4.1771 3.2450 15.2020 7.1946 7.5782
Total % explained 97.9308 89.1369 95.8008 95.8229 96.7550 84.7980 92.8054 92.4218
Uncertainty analysis
Mean of expected code output 0.4296 50.5192 0.3219 0.0173 0.4130 53.1535 0.3024 0.0178
Variance of expected code output 0.0000 1.2563 0.0000 0.0000 0.0000 1.2931 0.0000 0.0000
Mean total variance in code output 0.0068 278.7480 0.0141 0.0000 0.0080 348.9120 0.0161 0.0001
Fitted sigma^2 1.1363 1.4992 1.6505 4.1507 1.1363 1.4992 1.6505 4.1507
Nugget sigma^2 0.0094 0.0203 0.0194 0.2307 0.0094 0.0203 0.0194 0.2307
RMSE 0.0069 3.7551 0.0199 0.0085 0.0069 3.7551 0.0199 0.0085
RMSPE (%) 2.72% 4.77% 7.89% 74.51% 2.72% 4.77% 7.89% 74.51%
RMSSE (standardised) 1.5894 1.9498 1.6701 1.7812 1.5894 1.9498 1.6701 1.7812
a
The experiments were run on 65 Latin Hypercube Sample design points: for prop_stdd one repetition per point, for all other outputs six per point. The values
in bold denote inputs with high (>5%) share of attributed variance. Source: own elaboration in GEM-SA (Kennedy & Petropoulos, 2016)
Appendices: Supporting Information
Appendices: Supporting Information 217
Fig. C.1 Estimated response surface of the standard deviation of the number of visits over all
links vs two input parameters, probabilities of information transfer and information error: mean
(top) and standard deviation (bottom). Source: own elaboration
218 Appendices: Supporting Information
Fig. C.2 Estimated response surface of the correlation of the number of passages over links with
the optimal scenario vs two input parameters, probabilities of information transfer and information
error: mean (top) and standard deviation (bottom). Source: own elaboration
Appendices: Supporting Information 219
Fig. C.3 Estimated response surface of the standard deviation of traffic between replicate runs vs
two input parameters, probabilities of information transfer and of communication with local
agents: mean (top) and standard deviation (bottom). Source: own elaboration
220 Appendices: Supporting Information
Toby Prike
Experiment Link:
https://southampton.qualtrics.com/jfe/form/SV_e9uicjzpa30RDeu
Open Science Framework Link:https://osf.io/vx4d9/
Because the research in this study involved participants making choices between
gambles, there was the potential that it could cause harm or distress to some
Experiment Link:
https://southampton.qualtrics.com/jfe/form/SV_20kQsSP0cyi6o06
Open Science Framework Link:https://osf.io/3qrs8
In this study, the salience of the topics (risk involved in migration and travel dur-
ing a pandemic) in the public consciousness, and the general, high-level formulation
of the individual tasks, questions and responses, without specific recourse to indi-
vidual experience, meant that the ethical issues were minimal. Any residual issues
were controlled through an appropriate research design, participant information and
debriefing, which can be seen under the experiment link above. This study has
received approval from the University of Southampton Ethics Committee, via the
Ethics and Research Governance Online (ERGO) system, submission number
56865. Given that the timing of data collection coincided with the COVID-19 pan-
demic of 2020, the experiments were carried out exclusively online, via Amazon
Mechanical Turk. The data collection took place in June 2020.
2
See the version cited on https://www.icrg.org/resources/brief-biosocial-gambling-screen (as of 1
February 2021).
222 Appendices: Supporting Information
Experiment Link:
https://southampton.qualtrics.com/jfe/form/SV_2h4jGJH1PA9qJsq
Open Science Framework Link:https://osf.io/ayjcq/
In this study, we asked about aspects of a country that influence its desirability as
a migration destination. Because the migration drivers and countries were included
at an abstract level and without specific recourse to individual experience, the ethi-
cal issues were minimal. Any residual issues were controlled through an appropriate
research design, participant information and debriefing, which can be seen under
the experiment link above. This study has received approval from the University of
Southampton Ethics Committee, via the Ethics and Research Governance Online
(ERGO) system, submission number 65103. Given that the timing of data collection
coincided with the COVID-19 pandemic, the experiments were carried out exclu-
sively online, via the Prolific platform. The data collection took place in October 2021.
Appendices: Supporting Information 223
Oliver Reinhardt
Table E.1 (continued)
Entity Description
R2 OpenScienceFramework repository for ex2 (preregistration, data, code):
https://osf.io/ws63f/
R3 OpenScienceFramework repository for ex3 (preregistration, data, code):
https://osf.io/ayjcq/
RF Risk functions derived from the subjective probabilities (Box 6.1)
RQ1 Research question: Does information exchange between migrants play a role in the
formation of migration routes? (Box 3.1)
RQ2 Research question: How do risk perception and risk avoidance affect the formation of
migration routes? (Chap. 8)
RQ3 Research question: In a realistic scenario, can more information lead to fewer
fatalities? (Chap. 9, Sect. 9.3)
RW Relative weights of migration drivers
S1 Sensitivity information about all 17 parameters of the routes and Rumours model (Box
5.1)
S2 Sensitivity information about the routes and Rumours model (Box 5.3)
S3 Sensitivity information about the risk and Rumours model (Table 8.2)
S4 Sensitivity information about the risk and Rumours with reality model (Table 8.3)
SCI Scenario inputs (Box 9.2)
SCO Scenario outcomes (Box 9.2)
SIO Simulated intervention outcomes (Box 9.3)
SIO’ Simulated intervention outcomes (Box 9.4)
SP Subjective probabilities elicited in the second experiment (Sect. 6.3)
SR Scientific reports about migration route formation, e.g., (Massey et al., 1993; Castles,
2004; Alam & Geller, 2012; Klabunde & Willekens, 2016; Wall et al., 2017)
SU1 Survey (demonstration link:
https://sotonpsychology.eu.qualtrics.com/jfe/form/SV_e4FTbu1MidTCsyW)
SU2 Survey (demonstration link:
https://sotonpsychology.eu.qualtrics.com/jfe/form/SV_41PZg9XavyKFNl3)
SU3 Survey (demonstration link:
https://sotonpsychology.eu.qualtrics.com/jfe/form/SV_cMzaslXJ47MrErk)
U2 Uncertainty information about the routes and Rumours model (Box 5.3)
U3 Uncertainty information about the risk and Rumours model (Table 8.2)
U4 Uncertainty information about the risk and Rumours with reality model (Table 8.3)
UF Utility functions the first experiment (Sect. 6.2)
W19 Paper on interpreting verbal probabilities used to inform ex2 study design (Wintle
et al., 2019)
226 Appendices: Supporting Information
Abdellaoui, M., & Kemel, E. (2014). Eliciting prospect theory when consequences are measured
in time units: “Time is not money”. Management Science, 60, 1844–1859.
Abdellaoui, M., Bleichrodt, H., L’Haridon, O., & van Dolder, D. (2016). Measuring loss aversion
under ambiguity: A method to make prospect theory completely observable. Journal of Risk
and Uncertainty, 52, 1–20.
Ahmed, M. N., Barlacchi, G., Braghin, S., Calabrese, F., Ferretti, M., Lonij, V., Nair, R., Novack,
R., Paraszczak, J., & Toor, A. S. (2016). A multi-scale approach to data-driven mass migration
analysis. In The Fifth Workshop on Data Science for Social Good. SoGood@ECML-PKDD.
Ajzen, I. (1985). From intentions to actions: A theory of planned behavior. In J. Kuhl & J. Beckmann
(Eds.), Action control. From cognition to behavior (pp. 11–39). Springer.
Akgüç, M., Liu, X., Tani, M., & Zimmermann, K. F. (2016). Risk attitudes and migration. China
Economic Review, 37, 166–176.
Alam, S. J., & Geller, A. (2012). Networks in agent-based social simulation. In A. Heppenstall,
A. Crooks, L. See, & M. Batty (Eds.), Agent-based models of geographical systems
(pp. 199–216). Springer.
Andrianakis, I., Vernon, I. R., McCreesh, N., McKinley, T. J., Oakley, J. E., Nsubuga, R. N.,
Goldstein, M., & White, R. G. (2015). Bayesian history matching of complex infectious disease
models using emulation: A tutorial and a case study on HIV in Uganda. PLoS Computational
Biology, 11(1), e1003968.
Angione, C., Silverman, E., & Yaneske, E. (2020). Using machine learning to emulate agent-based
simulations. Mimeo. arXiv. https://arxiv.org/abs/2005.02077. (as of 1 August 2020)
Apicella, C., Norenzayan, A., & Henrich, J. (2020). Beyond WEIRD: A review of the last decade
and a look ahead to the global laboratory of the future. Evolution and Human Behavior, 41(5),
319–329.
Arango, J. (2000). Explaining migration: A critical view. International Social Science Journal,
52, 283–296.
Arellana, J., Garzón, L., Estrada, J., & Cantillo, V. (2020). On the use of virtual immersive real-
ity for discrete choice experiments to modelling pedestrian behaviour. Journal of Choice
Modelling, 37, 100251.
Ariely, D. (2008). Predictably irrational. The hidden forces that shape our decisions. Harper
Collins.
Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less
American. American Psychologist, 63(7), 602–614.
Attema, A. E., Brouwer, W. B., & L’Haridon, O. (2013). Prospect theory in the health domain: A
quantitative assessment. Journal of Health Economics, 32, 1057–1065.
Attema, A. E., Brouwer, W. B., L’Haridon, O., & Pinto, J. L. (2016). An elicitation of utility for
quality of life under prospect theory. Journal of Health Economics, 48, 121–134.
Axtell, R., Epstein, J., Dean, J., et al. (2002). Population growth and collapse in a multiagent
model of the Kayenta Anasazi in Long House Valley. Proceedings of the National Academy of
Sciences of the United States of America, 99(Suppl. 3), 7275–7279.
Azose, J. J., & Raftery, A. E. (2015). Bayesian probabilistic projection of international migration.
Demography, 52(5), 1627–1650.
Bacon, F. (1620). Novum organum. J. Bill. English translation by J Spedding, RL Ellis, and DD
Heath (1863) in The Works (Vol. VIII). Taggard and Thompson.
Bakewell, O. (1999). Can we ever rely on refugee statistics? Radical Statistics, 72, art. 1. Accessible
via: www.radstats.org.uk/no072/article1.htm (as of 1 February 2019)
Baláž, V., & Williams, A. M. (2018). Migration decisions in the face of upheaval: An experimental
approach. Population, Space and Place, 24, e2115.
Baláž, V., Williams, A. M., & Fifekova, E. (2016). Migration decision making as complex choice:
Eliciting decision weights under conditions of imperfect and complex information through
experimental methods. Population, Space and Place, 22, 36–53.
Banks, D., & Norton, J. (2014). Agent-based modeling and associated statistical aspects. In
International Encyclopaedia of the Social and Behavioural Sciences (2nd ed., pp. 78–86).
Oxford University Press.
Banks, D. L., Rios Aliaga, J. M., & Rios Insua, D. (2015). Adversarial risk analysis. CRC Press.
Barberis, N. C. (2013). Thirty years of prospect theory in economics: A review and assessment.
Journal of Economic Perspectives, 27, 173–196.
Barbosa Filho, H. S., Lima Neto, F. B., & Fusco, W. (2013). Migration, communication and social
networks – An agent-based social simulation. In R. Menezes, A. Evsukoff, & M. C. González
(Eds.), Complex networks. Studies in computational intelligence (Vol. 424, pp. 67–74).
Springer.
Barker ER and Bijak J (2020) . Conceptualisation and analysis of migration uncertainty: Insights
from macroeconomics (QuantMig project deliverable D9.1). University of Southampton. Via
https://www.quantmig.eu
Barth, R., Meyer, M., & Spitzner, J. (2012). Typical pitfalls of simulation modeling – Lessons
learned from armed forces and business. Journal of Artificial Societies and Social Simulation,
15(2), 5.
Bauermeister, G.-F., Hermann, D., & Musshoff, O. (2018). Consistency of determined risk atti-
tudes and probability weightings across different elicitation methods. Theory and Decision,
84(4), 627–644.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical
Transactions of the Royal Society of London, 53, 370–418.
BBC News. (2015). Syrian journey: Choose your own escape route. Accessible via: https://www.
bbc.co.uk/news/world-middle-east-32057601 (as of 1 February 2021).
Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., & Robert, C. P. (2009). Adaptive approximate
Bayesian computation. Biometrika, 96(4), 983–990.
Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in science. Circulation Research,
116(1), 116–126.
Bélanger, A., & Sabourin, P. (2017). Microsimulation and population dynamics. An introduction
to Modgen 12. Springer.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences
on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425.
Ben-Akiva, M., de Palma, A., McFadden, D., Abou-Zeid, M., Chiappori, P.-A., de Lapparent,
M., Durlauf, S. N., FosgerauM, F. D., Hess, S., Manski, C., Pakes, A., Picard, N., & Walker,
J. (2012). Process and context in choice models. Marketing Letters, 23, 439–456.
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R.,
Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M.,
References 235
Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson,
V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10.
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2014). Julia: A fresh approach to numeri-
cal computing. SIAM Review, 59(1), 65–98.
Bijak, J. (2010). Forecasting international migration in Europe: A Bayesian view. Springer.
Bijak, J., & Bryant, J. (2016). Bayesian demography 250 years after Bayes. Population Studies,
70(1), 1–19.
Bijak, J., & Czaika, M. (2020). Assessing uncertain migration futures – A typology of the unknown
(QuantMig project deliverable D1.1). University of Southampton and Danube University
Krems. Via https://www.quantmig.eu
Bijak, J., & Koryś, I. (2009). Poland. In H. Fassman, U. Reeger, & W. Sievers (Eds.), Statistics and
reality: Concepts and measurements of migration in Europe (pp. 195–216). AUP.
Bijak, J., & Lubman, S. (2016). The disputed numbers. Search of the demographic basis for studies
of Armenian population losses, 1915–1923. In A. Demirdjian (Ed.) The Armenian Genocide
Legacy (pp. 26–43). Palgrave.
Bijak, J., & Wiśniowski, A. (2010). Bayesian forecasting of immigration to selected European
countries by using expert knowledge. Journal of the Royal Statistical Society, Series A 173(4):
775–796.
Bijak, J., Kupiszewska, D., Kupiszewski, M., Saczuk, K., & Kicinger, A. (2007). Population and
labour force projections for 27 European countries, 2002-2052: Impact of international migra-
tion on population ageing. European Journal of Population, 23(1), 1–31.
Bijak, J., Hilton, J., Silverman, E., & Cao, V. D. (2013). Reforging the wedding ring: Exploring a
semi-artificial model of population for the UK with Gaussian process emulators. Demographic
Research, 29, 729–766.
Bijak, J., Forster, J. J., & Hilton, J. (2017). Quantitative assessment of asylum-related migration: A
survey of methodology (Report for the European Asylum Support Office). EASO.
Bijak, J., Disney, G., Findlay, A. M., Forster, J. J., Smith, P. W. F., & Wiśniowski, A. (2019).
Assessing time series models for forecasting international migration: Lessons from the United
Kingdom. Journal of Forecasting, 38(5), 470–487.
Bijak, J., Higham, P., Hilton, J. D., Hinsch, M., Nurse, S., Prike, T., Reinhardt, O., Smith, P. W.,
& Uhrmacher, A. M. (2020). Modelling migration: Decisions, processes and outcomes. In
Proceedings of the Winter Simulation Conference 2020 (pp. 2613–2624). IEEE.
Billari, D. C. (2015). Integrating macro- and micro-level approaches in the explanation of popula-
tion change. Population Studies, 65(S1), S11–S20.
Billari, F., & Prskawetz, A. (Eds.). (2003). Agent-based computational demography: Using simu-
lation to improve our understanding of demographic behaviour. Plenum.
Billari, F. C., Fent, T., Prskawetz, A., & Scheffran, J. (Eds.). (2006). Agent-based computa-
tional modelling. Applications in demography, social, economic and environmental sciences.
Physica-Verlag.
Billari, F., Aparicio Diaz, B., Fent, T., & Prskawetz, A. (2007). The “Wedding–Ring”. An agent-
based marriage model based on social interaction. Demographic Research, 17(3), 59–82.
Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (1975/2007). Discrete multivariate analysis:
Theory and practice (Reprint ed.). Springer.
Bocquého, G., Deschamps, M., Helstroffer, J., Jacob, J., & Joxhe, M. (2018). Risk and refugee
migration. Sciences Po OFCE Working Paper hal-02198118. Paris, France.
Bohra-Mishra, P., & Massey, D. S. (2011). Individual decisions to migrate during civil conflict.
Demography, 48(2), 401–424.
Bonabeau, E. (2002). Agent-based modeling: Methods and techniques for simulating human
systems. Proceedings of the National Academy of Sciences of the United States of America,
99(Suppl 3), 7280–7287.
Borjas, G. J. (1989). Economic theory and international migration. International Migration
Review, 23(3), 457–485.
236 References
Bortolussi, L., De Nicola, R., Galpin, V., Gilmore, S., Hillston, J., Latella, D., Loreti, M., &
Massink, M. (2015). CARMA: Collective adaptive resource-sharing Markovian agents.
Electronic Proceedings in Theoretical Computer Science, 194, 16–31.
Boulesteix, A.-L., Groenwold, R.H.H., Abrahamowicz, M., Binder, H., Briel, M., Hornung, R.,
Morris, T.P., Rahnenführer, J., and Sauerbrei, W. for the STRATOS Simulation Panel. (2020).
Introduction to statistical simulations in health research. BMJ Open 10: e039921.
Boukouvalas, A., & Cornford, D. (2008). Dimension reduction for multivariate emulation
(Technical report NCRG/2008/006. Neural Computing Research Group). Aston University.
Bourgais, M., Taillandier, P., & Vercouter, L. (2020). BEN: An architecture for the behavior of
social agents. Journal of Artificial Societies and Social Simulation, 23(4), 12.
Bourgeois-Pichat, J. (1994). La dynamique des populations. Populations stables, semi stables,
quasi stables. Institut national d’études démographiques, Presses Universitaires de France.
Brenner, T., & Werker, C. (2009). Policy advice derived from simulation models. Journal of
Artificial Societies and Social Simulation, 12(4), 2.
Briñol, P., & Petty, R. E. (2009). Source factors in persuasion: A self-validation approach. European
Review of Social Psychology, 20(1), 49–96.
Bryant, J., & Zhang, J. (2018). Bayesian demographic estimation and forecasting. CRC Press.
Bryson, J. J., Ando, Y., & Lehmann, H. (2007). Agent-based modelling as scientific method: A case
study analysing primate social behaviour. Philosophical transactions of the Royal Society of
London. Series B, Biological Sciences, 362(1485), 1685–1698.
Budde, K., Smith, J., Wilsdorf, P., Haack, F., & Uhrmacher, A. M. (2021). Relating simulation
studies by provenance – Developing a family of Wnt signaling models. PLoS Computational
Biology, 17(8), e1009227.
Budescu, D. V., Por, H.-H., Broomell, S. B., & Smithson, M. (2014). The interpretation of IPCC
probabilistic statements around the world. Nature Climate Change, 4, 508–512.
Burch, T. (2003). Demography in a new key: A theory of population theory. Demographic
Research, 9, 263–284.
Burch, T. (2018). Model-based demography. Essays on integrating data, technique and theory
(Demographic Research Monographs, Vol. 14). Springer.
Burks, A. W. (1946). Peirce’s theory of abduction. Philosophy of Science, 13(4), 301–306.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M.,
Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S.,
Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating replicability of laboratory
experiments in economics. Science, 351(6280), 1433.
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M.,
Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E.,
Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicabil-
ity of social science experiments in nature and science between 2010 and 2015. Nature Human
Behaviour, 2(9), 637–644.
Carling, J., & Collins, F. (2018). Aspiration, desire and drivers of migration. Journal of Ethnic and
Migration Studies, 44(6), 909–926.
Carling, J., & Schewel, K. (2018). Revisiting aspiration and ability in international migration.
Journal of Ethnic and Migration Studies, 44(6), 945–963.
Casini, L., Illari, P., Russo, F., & Williamson, J. (2011). Models for prediction, explanation and
control: Recursive Bayesian networks. Theoria, 26(1), 5–33.
Castles, S. (2004). Why migration policies fail. Ethnic and Racial Studies, 27(2), 205–227.
Castles, S., de Haas, H., & Miller, M. J. (2014). The age of migration: International population
movements in the modern world (5th ed.). Palgrave.
Cellier, F. E. (1991). Continuous system modeling. Springer.
Ceriani, L., & Verme, P. (2018). Risk preferences and the decision to flee conflict (Policy research
working paper no. 8376). World Bank.
References 237
Chaiken, S., & Maheswaran, D. (1994). Heuristic processing can bias systematic processing:
Effects of source credibility, argument ambiguity, and task importance on attitude judgement.
Journal of Personality and Social Psychology, 66, 460–473.
Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science,
10(3), 273–304.
Chambers, C. D. (2013). Registered reports: A new publishing initiative at cortex. Cortex, 49(3),
609–610.
Chambers, C. (2019). The registered reports revolution: Lessons in cultural reform. Significance,
16(4), 23–27.
Channel 4 News. (2015). Two billion miles. Accessible via: http://twobillionmiles.com/ (as of 1
February 2021).
Christensen, K., & Sasaki, Y. (2008). Agent-based emergency evacuation simulation with individu-
als with disabilities in the population. Journal of Artificial Societies and Social Simulation,
11(3), 9.
Christensen, G., Dafoe, A., Miguel, E., Moore, D. A., & Rose, A. K. (2019a). A study of the impact
of data sharing on article citations using journal policies as a natural experiment. PLoS One,
14(12), e0225883.
Christensen, G., Wang, Z., Paluck, E. L., Swanson, N., Birke, D. J., Miguel, E., & Littman,
R. (2019b). Open Science practices are on the rise: The state of social science (3S) survey.
MetaArXiv. Preprint.
Cimellaro, G. P., Mahin, S., & Domaneschi, M. (2019). Integrating a human behavior model within
an agent-based approach for blasting evacuation. Computer-Aided Civil and Infrastructure
Engineering, 34, 3–20.
Clark, R. D., III, & Maass, A. (1988). The role of social categorization and perceived source cred-
ibility in minority influence. European Journal of Social Psychology, 18, 381–394.
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. The
Journal of Abnormal and Social Psychology, 65(3), 145–153.
Cohen, J. E., Roig, M., Reuman, D. C., & GoGwilt, C. (2008). International migration beyond
gravity: A statistical model for use in population projections. Proceedings of the National
Academy of Sciences of the United States of America, 105(40), 15268–15274.
Coleman, J. S. (1986). Social theory, social research, and a theory of action. American Journal of
Sociology, 91(6), 1309–1335.
Collier, N., & Ozik, J. (2013). Repast Simphony Batch runs getting started. Available from https://
repast.sourceforge.net/docs/RepastBatchRunsGettingStarted.pdf (as of 1 January 2021).
Collins, F. L. (2018). Desire as a theory for migration studies: Temporality, assemblage and becom-
ing in the narratives of migrants. Journal of Ethnic and Migration Studies, 44(6), 964–980.
Collins, A. J., & Frydenlund, E. (2016). Agent-based modeling and strategic group forma-
tion: A refugee case study. In Proceedings of the Winter Simulation Conference 2016
(pp. 1289–1300). IEEE.
Collins, A. J., Etemadidavan, S., & Pazos-Lago, P. (2020). A human experiment using a hybrid
agent-based model. In Proceedings of the Winter Simulation Conference 2020. IEEE.
Commons, M. L., Nevin, J. A., & Davison, M. C. (Eds.). (2013). Signal detection: Mechanisms,
models, and applications. Psychology Press.
Conte, R., & Paolucci, M. (2014). On agent-based modeling and computational social science.
Frontiers in Psychology, 5(668), 1–9.
Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., Loreto, V., Moat,
S., Nadal, J.-P., Sanchez, A., Nowak, A., Flache, A., San Miguel, M., & Helbing, D. (2012).
Manifesto of computational social science. European Physical Journal Special Topics, 214,
325–346.
Courgeau, D. (1985). Interaction between spatial mobility, family and career life cycle: A French
survey. European Sociological Review, 1(2), 139–162.
Courgeau, D. (2007). Multilevel synthesis. From the group to the individual. Springer.
238 References
Courgeau, D. (2012). Probability and social science: Methodological relationships between the
two approaches. Springer.
Courgeau, D., Bijak, J., Franck, R., & Silverman, E. (2016). Model-based demography: Towards
a research agenda. In J. Van Bavel & A. Grow (Eds.), Agent-based modelling in population
studies: Concepts, methods, and applications (pp. 29–51). Springer.
Cox, D. R. (1958/1992). Planning of experiments. Wiley.
Cressie, N. (1990). The origins of kriging. Mathematical Geology, 22, 239–252.
Crisp, J. (1999). Who has counted the refugees? UNHCR and the politics of numbers (New issues
in refugee research, No. 12). UNHCR.
Cusumano, E., & Pattison, J. (2018). The non-governmental provision of search and rescue in the
Mediterranean and the abdication of state responsibility. Cambridge Review of International
Affairs, 31(1), 53–75.
Cusumano, E., & Villa, M. (2019). Sea rescue NGOs: A pull factor of irregular migration?
(Migration policy Centre policy brief 22/2019). European University Institute.
Czaika, M. (2014). Migration and economic prospects. Journal of Ethnic and Migration Studies,
41, 58–82.
Czaika, M., & Reinprecht, C. (2020). Drivers of migration: A synthesis of knowledge (IMI working
paper no. 163). University of Amsterdam.
Czaika, M., Bijak, J., & Prike, T. (2021). Migration decision-making and its four key dimensions.
The Annals of the American Academy of Political and Social Science, forthcoming.
David, N. (2009). Validation and verification in social simulation: Patterns and clarification of ter-
minology. In F. Squazzoni (Ed.), Epistemological aspects of computer simulation in the social
sciences (Lecture Notes in Artificial Intelligence, 5466) (pp. 117–119). Springer.
Davies, O. L., & Hay, W. A. (1950). The construction and uses of fractional factorial designs in
industrial research. Biometrics, 6(3), 233–249.
de Castro, P. A. L., Barreto Teodoro, A. R., de Castro, L. I., & Parsons, S. (2016). Expected
utility or prospect theory: Which better fits agent-based modeling of markets? Journal of
Computational Science, 17, 97–102.
De Finetti, B. (1974). Theory of probability (Vol. 2). Wiley.
de Haas, H. (2010). Migration and development: A theoretical perspective. International Migration
Review, 44(1), 227–264.
De Jong, G. F., & Fawcett, J. T. (1981). Motivations for migration: An assessment and a value-
expectancy research model. In G. F. De Jong & R. W. Gardener (Eds.), Migration decision
making: Multidisciplinary approaches to microlevel studies in developed and developing coun-
tries (pp. 13–57). Pergamon.
de Laplace, P.-S. (1780). Mémoire sur les probabilités. Mémoires de l’Académie Royale des
Sciences de Paris, 1781, 227–332.
De Nicola, R., Latella, D., Loreti, M., & Massink, M. (2013). A uniform definition of stochastic
process calculi. ACM Computing Surveys, 46(1), 1–35.
DeGroot, M. H. (2004). Optimal statistical decisions. Wiley classics (Library ed.). Chichester: Wiley.
Dekker, R., Engbersen, G., Klaver, J., & Vonk, H. (2018). Smart refugees: How Syrian asylum
migrants use social media information in migration decision making. Social Media & Society,
4(1), 2056305118764439.
Devroye, L. (1986). Non-uniform random variate generation. Springer.
Di Paolo, E. A., Noble, J., & Bullock, S. (2000). Simulation models as opaque thought experi-
ments. In ALife 7 conference proceedings (pp. 497–506). MIT Press.
Diaz, B. A., Fent, T., Prskawetz, A., & Bernardi, L. (2011). Transition to parenthood: The role of
social interaction and endogenous networks. Demography, 48(2), 559–579.
Disney, G., Wiśniowski, A., Forster, J. J., Smith, P. W. F., & Bijak, J. (2015). Evaluation of existing
migration forecasting methods and models (Report for the Migration Advisory Committee).
Centre for Population Change.
D’Orazio, M., Di Zio, M., & Scanu, M. (2006). Statistical matching: Theory and practice. Wiley.
References 239
Douven, I. (2017). Abduction. In E. N. Zalta (Ed.), The Stanford Encyclopedia of philosophy.
(Summer 2017 ed.). Available via h ttps://plato.stanford.edu/archives/sum2017/entries/abduc-
tion (as of 1.10.2018)
Drogoul, A., Vanbergue, D., & Meurisse, T. (2003). Multi-agent based simulation: Where are the
agents? In J. S. Sichman, F. Bousquet, & P. Davidsson (Eds.), Multi-agent-based simulation
II. Lecture Notes in Computer Science, (Vol. 2581, pp. 1–15). Springer.
Dunsch, F., Tjaden, J., & Quiviger, W. (2019). Migrants as messengers: The impact of peer-to-
peer communication on potential migrants in Senegal. Impact evaluation report. International
Organization for Migration.
Dustmann, C., Fasani, F., Meng, X., & Minale, L. (2017). Risk attitudes and household migration
decisions (IZA discussion papers no. 10603). Institute for the Study of Labor (IZA).
EASO. (2016). The push and pull factors of asylum-related migration. A literature review (Report
by Maastricht University and the global migration data analysis Centre (GMDAC) for the
European Asylum Support Office). EASO.
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B.,
Baranski, E., Bernstein, M. J., Bonfiglio, D. B. V., Boucher, L., Brown, E. R., Budiman, N. I.,
Cairo, A. H., Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman, J. A.,
Conway, J. G., … Nosek, B. A. (2016). Many labs 3: Evaluating participant pool quality across
the academic semester via replication. Journal of Experimental Social Psychology, 67(Special
Issue: Confirmatory), 68–82.
Edmonds, B., Le Page, C., Bithell, M., Grimm, V., Meyer, R., Montañola-Sales, C., Ormerod, P.,
Root, H., & Squazzoni, F. (2019). Different modelling purposes. Journal of Artificial Societies
and Social Simulation, 22(3), 6.
Elson, M., & Quandt, T. (2016). Digital games in laboratory experiments: Controlling a complex
stimulus through modding. Psychology of Popular Media Culture, 5(1), 52–65.
Emmer, M., Richter, C., & Kunst, M. (2016). Flucht 2.0: Mediennutzung durch Flüchtlinge vor,
während und nach der Flucht. Institut für Publizistik, FU Berlin.
Entwisle, B., Williams, N. E., Verdery, A. M., Rindfuss, R. R., Walsh, S. J., Malanson, G. P.,
Mucha, P. J., et al. (2016). Climate shocks and migration: An agent-based modeling approach.
Population and Environment, 38(1), 47–71.
Epstein, J. M. (2008). Why model? Journal of Artificial Societies and Social Simulation, 11(4), 12.
Epstein, J. M., & Axtell, R. (1996). Complex adaptive systems. Growing artificial societies: Social
science from the bottom up. MIT Press.
Erdal, M. B., & Oeppen, C. (2018). Forced to leave? The discursive and analytical significance of
describing migration as forced and voluntary. Journal of Ethnic and Migration Studies, 44(6),
981–998.
Euler, L. (1760). Recherches générales sur la mortalité et la multiplication du genre humain.
Histoire de l’Académie Royale des Sciences et des Belles Lettres de Berlin, 16, 144–164.
European Commission. (2015). Communication from the commission to the European Parliament,
the council, the European economic and social committee and the Committee of the Regions
Commission work programme 2016. COM(2015)610 final. European Commission.
European Commission. (2016). Fact sheet: Reforming the common European asylum system:
Frequently asked questions. European Commission, 13 July 2016. Accessible via: http://
europa.eu/rapid/press-release_MEMO-16-2436_en.htm (as of 26 September 2019)
European Commission. (2020). Communication from the commission to the European Parliament,
the council, the European economic and social committee and the Committee of the Regions on
a new pact on migration and asylum. COM(2020)609 final. European Commission.
Ewald, R., & Uhrmacher, A. M. (2014). SESSL: A domain-specific language for simulation exper-
iments. ACM Transactions on Modeling and Computer Simulation (TOMACS), 24(2), 1–25.
Falk, A., Becker, A., Dohmen, T., Enke, B., Huffman, D., & Sunde, U. (2018). Global Evidence on
Economic Preferences. The Quarterly Journal of Economics, 133(4), 1645–1692.
Fang, K.-T., Li, R., & Sudjianto, A. (2006). Design and modeling for computer experiments. CRC.
240 References
Farooq, B., Cherchi, E., & Sobhani, A. (2018). Virtual immersive reality for stated preference travel
behavior experiments: A case study of autonomous vehicles on Urban roads. Transportation
Research Record, 2672(50), 35–45.
Feldman, R. H. L. (1984). The influence of communicator characteristics on the nutrition attitudes
and behavior of high school students. Journal of School Health, 54, 149–151.
Felleisen, M. (1991). On the expressive power of programming languages. Science of Computer
Programming, 17(1), 35–75.
Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture
of Great Britain, 33, 503–513.
Fisher, R. A. (1935/1971). The design of experiments. Macmillan.
FitzGerald, D. S. (2015). The sociology of international migration. In C. B. Brettell & J. F. Hollifield
(Eds.), Migration theory, talking across disciplines (3rd ed., pp. 115–147). Routledge.
Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable measurement
practices and how to avoid them. Advances in Methods and Practices in Psychological Science,
3(4), 456–465.
Foresight. (2011). Migration and global environmental change: Future challenges and opportuni-
ties. Final project report. Government Office for Science.
Fowler, M. (2010). (with R Parsons) Domain-specific languages. Addison-Wesley.
Franck, R. (Ed.). (2002). The explanatory power of models. Kluwer Academic Publishers.
Frank, U., Squazzoni, F., & Troitzsch, K. G. (2009). EPOS-epistemological perspectives on simu-
lation: An introduction. In F. Squazzoni (Ed.), Epistemological aspects of computer simulation
in the social sciences (Lecture Notes in Artificial Intelligence, 5466) (pp. 1–11). Springer.
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., & Fidler, F. (2018). Questionable research prac-
tices in ecology and evolution. PLoS One, 13(7), e0200303.
Fricker, T. E., Oakley, J. E., & Urban, N. M. (2013). Multivariate Gaussian process emulators with
nonseparable covariance structures. Technometrics, 55(1), 47–56.
Frigg, R., Bradley, S., Du, H., & Smith, L. A. (2014). Laplace’s demon and the adventures of his
apprentices. Philosophy of Science, 81(1), 31–59.
Frontex. (2018). Risk analysis for 2018. Frontex.
Frydenlund, E., Foytik, P., Padilla, J. J., & Ouattara, A. (2018). Where are they headed next?
Modeling emergent displaced camps in the DRC using agent-based models. In Proceedings of
the Winter Simulation Conference 2018. IEEE.
Frydman, R., & Goldberg, M. D. (2007). Imperfect knowledge economics. Princeton
University Press.
Fujimoto, R. M. (2000). Parallel and distributed simulation systems (Wiley series on parallel and
distributed computing). Wiley.
Gabrielsen Jumbert, M. (2020). The “pull factor”: How it became a central premise in European
discussions about cross-Mediterranean migration. Available at: www.law.ox.ac.uk/research-
subject-groups/centre-criminology/centreborder-criminologies/blog/2020/03/pull-factor-how
(as of 1 February 2021).
GAO. (2006). Darfur crisis: Death estimates demonstrate severity of crisis, but their accuracy
and credibility could be enhanced (Report to congressional requesters GAO-07-24). US
Government Accountability Office.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., & Rubin, D. (2013). Bayesian Data
Analysis (3rd ed.). CRC Press/Chapman and Hall.
Ghanem, R., Higdon, D., & Owhadi, H. (2019). Handbook of uncertainty quantification. Living
reference work. Online resource, available at https://link.springer.com/referencework/10.100
7/978-3-319-11259-6 (as of 1 November 2019).
Gibson, J., & McKenzie, D. (2011). The microeconomic determinants of emigration and return
migration of the best and brightest: Evidence from the Pacific. Journal of Development
Economics, 95, 18–29.
Gigerenzer, G. (2008). Rationality for mortals: How people cope with uncertainty. OUP.
References 241
Gigerenzer, G., & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for
scientific inference. Journal of Management, 41(2), 421–440.
Gilbert, N., & Ahrweiler, P. (2009). The epistemologies of social simulation research. In
F. Squazzoni (Ed.), Epistemological aspects of computer simulation in the social sciences
(Lecture Notes in Artificial Intelligence, 5466) (pp. 12–28). Springer.
Gilbert, N., & Tierna, P. (2000). How to build and use agent-based models in social science. Mind
and Society 1(1): 57–72.
Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of
Physical Chemistry, 81(25), 2340–2361.
Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting sys-
tems. The Journal of Chemical Physics, 115(4), 1716–1733.
Ginot, V., Gaba, S., Beaudouin, R., Aries, F., & Monod, H. (2006). Combined use of local and
ANOVA-based global sensitivity analyses for the investigation of a stochastic dynamic model:
Application to the case study of an individual-based model of a fish population. Ecological
Modelling, 193(3–4), 479–491.
Godfrey-Smith, P. (2009). Models and fictions in science. Philosophical Studies: An International
Journal for Philosophy in the Analytic Tradition, 143(1), 101–116.
Goldacre, B., Drysdale, H., Dale, A., Milosevic, I., Slade, E., Hartley, P., Marston, C., Powell-
Smith, A., Heneghan, C., & Mahtani, K. R. (2019). COMPare: A prospective cohort study
correcting and monitoring 58 misreported trials in real time. Trials, 20(1), 118.
Graunt, J. (1662). Natural and political observations mentioned in a following index, and made
upon the bills of mortality. Tho. Roycroft for John Martin, James Allestry, and Tho. Dicas.
Gray, J., Bijak, J., & Bullock, S. (2016). Deciding to disclose – A decision theoretic agent model
of pregnancy and alcohol misuse. In J. Van Bavel & A. Grow (Eds.), Agent-based modelling in
population studies: Concepts, methods, and applications (pp. 301–340). Springer.
Gray, J., Hilton, J., & Bijak, J. (2017). Choosing the choice: Reflections on modelling decisions
and behaviour in demographic agent-based models. Population Studies, 71(Supp), 85–97.
Grazzini, J., Richiardi, M.G., & Tsionas, M. (2017). Bayesian estimation of agent-based models.
Journal of Economic Dynamics and Control, 77(1), 26–47.
Greenwood, M. J. (2005). Modeling Migration. In K. Kempf-Leonard (Ed.), Encyclopedia of
social measurement (pp. 725–734). Elsevier.
Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W. M., Railsback, S. F., Thulke, H.-H.,
Weiner, J., Wiegand, T., & DeAngelis, D. L. (2005). Pattern-oriented modeling of agent-based
complex systems: Lessons from ecology. Science, 310(5750), 987–991.
Grimm, V., Berger, U., Bastiansen, F., Eliassen, S., Ginot, V., Giske, J., Goss-Custard, J., Grand, T.,
Heinz, S. K., Huse, G., Huth, A., Jepsen, J. U., Jørgensen, C., Mooij, W. M., Müller, B., Pe’er,
G., Piou, C., Railsback, S. F., Robbins, A. M., … DeAngelis, D. L. (2006). A standard proto-
col for describing individual-based and agent-based models. Ecological Modelling, 198(1–2),
115–126.
Grimm, V., Augusiak, J., Focks, A., Frank, B. M., Gabsi, F., Johnston, A. S. A., Liu, C., Martin,
B. T., Meli, M., Radchuk, V., Thorbek, P., & Railsback, S. F. (2014). Towards better mod-
elling and decision support: Documenting model development, testing, and analysis using
TRACE. Ecological Modelling, 280, 129–139.
Grimm, V., Railsback, S. F., Vincenot, C. E., Berger, U., Gallagher, C., DeAngelis, D. L., Edmonds,
B., Ge, J., Giske, J., Groeneveld, J., Johnston, A. S. A., Milles, A., Nabe-Nielsen, J., Polhill,
J. G., Radchuk, V., Rohwäder, M.-S., Stillman, R. A., Thiele, J. C., & Ayllón, D. (2020). The
ODD protocol for describing agent-based and other simulation models: A second update to
improve clarity, replication, and structural realism. Journal of Artificial Societies and Social
Simulation, 23(2), 7.
Groen, D. (2016). Simulating refugee movements: Where would you go? Procedia Computer
Science, 80, 2251–2255.
242 References
Groen, D., Bell, D., Arabnejad, H., Suleimenova, D., Taylor, S. J. E., & Anagnostou, A. (2020).
Towards modelling the effect of evolving violence on forced migration. In Proceedings of the
Winter Simulation Conference 2019 (pp. 297–307). IEEE.
Groth, P., & Moreau, L. (2013). PROV-overview – An overview of the PROV family of documents.
Technical report. World Wide Web Consortium.
Grow, A., & Van Bavel, J. (2015). Assortative mating and the reversal of gender inequality in edu-
cation in Europe: An agent-based model. PLoS One, 10(6), e0127806.
Gurak, T., & Caces, F. (1992). Migration networks and the shaping of migration systems.
In M. M. Kritz, L. L. Lim, & H. Zlotnik (Eds.), International migration systems: A global
approach (pp. 150–176). Clarendon Press.
Hafızoğlu, F. M., & Sen, S. (2012). Analysis of opinion spread through migration and adop-
tion in agent communities. In I. Rahwan, W. Wobcke, S. Sen, & T. Sugawara (Eds.), PRIMA
2012: Principles and practice of multi-agent systems (Lecture Notes in Computer Science)
(pp. 153–167). Springer.
Hahn, U., Harris, A. J. L., & Corner, A. (2009). Argument content and argument source: An explo-
ration. Informal Logic, 29, 337–367.
Hailegiorgis, A., Crooks, A., & Cioffi-Revilla, C. (2018). An agent-based model of rural house-
holds’ adaptation to climate change. Journal of Artificial Societies and Social Simulation,
21(4), 4.
Hainmueller, J., Hopkins, D. J., & Yamamoto, T. (2014). Causal inference in conjoint analysis:
Understanding multidimensional choices via stated preference experiments. Political Analysis,
22(1), 1–30.
Hainmueller, J., Hangartner, D., & Yamamoto, T. (2015). Validating vignette and conjoint survey
experiments against real-world behavior. Proceedings of the National Academy of Sciences,
112(8), 2395–2400.
Hainy, M., Müller, W. G., & Wynn, H. P. (2014). Learning functions and approximate Bayesian
computation design: ABCD. Entropy, 16(8), 4353–4374.
Hanczakowski, M., Zawadzka, K., Pasek, T., & Higham, P. A. (2013). Calibration of metacogni-
tive judgments: Insights from the underconfidence-with-practice effect. Journal of Memory
and Language, 69, 429–444.
Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C.,
Hofelich Mohr, A., Clayton, E., Yoon, E. J., Henry Tessler, M., Lenne, R. L., Altman, S.,
Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility:
Evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society
Open Science, 5(8), 180448.
Harris, J. R., & Todaro, M. P. (1970). Migration, unemployment and development: A two-sector
analysis. American Economic Review, 60(1), 126–142.
Harris, A. J. L., Hahn, U., Madsen, J. K., & Hsu, A. S. (2016). The appeal to expert opinion:
Quantitative support for a Bayesian network approach. Cognitive Science, 40(6), 1496–1533.
Hassani-Mahmooei, B., & Parris, B. (2012). Climate change and internal migration patterns in
Bangladesh: An agent-based model. Environment and Development Economics, 17, 763–780.
Haug, S. (2008). Migration networks and migration decision making. Journal of Ethnic and
Migration Studies, 34(4), 585–605.
Heard, D., Dent, G., Schiffeling, T., & Banks, D. (2015). Agent-based models and microsimula-
tion. Annual Review of Statistics and Its Application, 2, 259–272.
Hébert, G. A., Perez, L., & Harati, S. (2018). An agent-based model to identify migration pathways
of refugees: The case of Syria. In L. Perez, E.-K. Kim, & R. Sengupta (Eds.), Agent-based
models and complexity science in the age of geospatial big data (pp. 45–58). Springer.
Hedström, P. (2005). Dissecting the social: On the principles of analytical sociology. Springer.
Hedström, P., & Swedberg, R. (Eds.). (1998). Social mechanisms. An analytical approach to social
theory. Cambridge University Press.
References 243
Hedström, P., & Udehn, L. (2011). Analytical sociology and theories of the middle range. In
P. Hedström & P. Bearman (Eds.). The Oxford handbook of analytical sociology (online).
https://doi.org/10.1093/oxfordhb/9780199215362.013.2
Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences. Annual Review of
Sociology, 36, 49–67.
Heiland, F. (2003). The collapse of the Berlin Wall: Simulating state-level east to west German
migration patterns. In F. C. Billari & A. Prskawetz (Eds.), Agent-based computational
demography. Using simulation to improve our understanding of demographic behaviour
(pp. 73–96). Kluwer.
Heller, C., & Pezzani, L. (2016). Death by rescue: The lethal effects of the EU’s policies of non-
assistance at sea. Goldsmiths University of London.
Hempel, C. G. (1962). Deductive-nomological vs. statistical explanation. In Scientific explanation,
space, and time. Minnesota studies in the philosophy of science (Vol. 3, pp. 98–169). University
of Minnesota Press.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature,
466(7302), 29.
Henzinger, T., Jobstmann, B., & Wolf, V. (2011). Formalisms for specifying markovian population
models. International Journal of Foundations of Computer Science, 22(04), 823–841.
Herzog, T. N., Scheuren, F. J., & Winkler, W. E. (2007). Data quality and record linkage tech-
niques. Springer.
Higdon, D., Gattiker, J., Williams, B., & Rightley, M. (2008). Computer model calibration using
high-dimensional output. Journal of the American Statistical Association, 103(482), 570–583.
Higham, P. A., Zawadzka, K., & Hanczakowski, M. (2015). Internal mapping and its impact
on measures of absolute and relative metacognitive accuracy. In The Oxford handbook of
Metamemory. OUP.
Highhouse, S. (2007). Designing experiments that generalize. Organizational Research Methods,
12(3), 554–566.
Hilton, J. (2017). Managing uncertainty in agent-based demographic models. PhD Thesis,
University of Southampton.
Hilton, J., & Bijak, J. (2016). Design and analysis of demographic simulations. In J. Van Bavel &
A. Grow (Eds.), Agent-based modelling in population studies: Concepts, methods, and appli-
cations (pp. 211–235). Springer.
Himmelspach, J., & Uhrmacher, A. M. (2009). What contributes to the quality of simulation
results? In: 2009 INFORMS Simulation Society research workshop (pp. 125–129). Available
via http://eprints.mosi.informatik.uni-rostock.de/346/ (as of 1 February 2021).
Hinsch, M., & Bijak, J. (2019). Rumours lead to self-organized migration routes. Paper for the
Agent-based Modelling Hub, Artificial Life conference 2019, Newcastle. Available via www.
baps-project.eu (as of 1 August 2021).
Hobcraft, J. (2007). Towards a scientific understanding of demographic behaviour. Population –
English Edition, 62(1), 47–51.
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging:
A tutorial. Statistical Science, 14(4), 382–417.
Holland, J. H. (2012). Signals and boundaries. MIT Press.
Hooten, M. B., Johnson, D. S., & Brost, B. M. (2021). Making recursive Bayesian inference acces-
sible. American Statistician, 75(2), 185–194.
Hooten, M., Wikle, C., & Schwob, M. (2020). Statistical implementations of agent-based demo-
graphic models. International Statistical Review, 88(2), 441–461.
Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to automata theory, languages, and computa-
tion. Addison-Wesley.
Hovland, C., & Weiss, W. (1951). The influence of source credibility on communication effective-
ness. The Public Opinion Quarterly, 15, 635–650.
Hughes, C., Zagheni, E., Abel, G. J., Wiśniowski, A., Sorichetta, A., Weber, I., & Tatem, A. J. (2016).
Inferring migrations: Traditional methods and new approaches based on Mobile phone, social
244 References
media, and other big data. Feasibility study on inferring (labour) mobility and migration in the
European Union from big data and social media data (Report for the European Commission).
Publications Office of the EU.
Hugo, G., Abbasi-Shavazi, M. J., & Kraly, E. P. (Eds.). (2018). Demography of refugee and forced
migration (International studies in population, Vol. 13). Springer.
IOM. (2021). Missing migrants: Mediterranean. IOM GMDAC. Accessible via: https://missing-
migrants.iom.int/region/mediterranean? (as of 9 February 2021)
Isernia, P., Urso, O., Gyuzalyan, H., & Wilczyńska, A. (2018). A review of empirical surveys of
asylum-related migrants. Report, European Asylum Support Office.
Jacobs, S. (1991). John Stuart mill on induction and hypotheses. Journal of the History of
Philosophy, 29(1), 69–83.
Jaeger, D. A., Dohmen, T., Falk, A., Huffman, D., Sunde, U., & Bonin, H. (2010). Direct evidence
on risk attitudes and migration. The Review of Economics and Statistics, 92(3), 684–689.
Jager, W. (2017). Enhancing the realism of simulation (EROS): On implementing and developing
psychological theory in social simulation. Journal of Artificial Societies and Social Simulation,
20(3), 14.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable
research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.
Johnson, B. R. (2010). Eliminating the mystery from the concept of emergence. Biology and
Philosophy, 25(5), 843–849.
Jones, B., & Nachtsheim, C. J. (2011). A class of three-level designs for definitive screening in the
presence of second-order effects. Journal of Quality Technology, 43(1), 1–15.
Jones, B., & Nachtsheim, C. J. (2013). Definitive screening designs with added two-level categori-
cal factors. Journal of Quality Technology, 45(2), 121–129.
Jones, C. W., Keil, L. G., Holland, W. C., Caughey, M. C., & Platts-Mills, T. F. (2015). Comparison
of registered and published outcomes in randomized controlled trials: A systematic review.
BMC Medicine, 13(1), 282.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47(2), 263–291.
Kamiński, B. (2015). Interval metamodels for the analysis of simulation input-output relations.
Simulation Modelling Practice and Theory, 54, 86–100.
Kashyap, R., & Villavicencio, F. (2016). The dynamics of son preference, technology diffusion, and
fertility decline underlying distorted sex ratios at birth: A simulation approach. Demography,
53(5), 1261–1281.
Kemel, E., & Paraschiv, C. (2018). Deciding about human lives: An experimental measure of risk
attitudes under prospect theory. Social Choice and Welfare, 51, 163–192.
Kennedy, M. C., & O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the
Royal Statistical Society B, 63(3), 425–464.
Kennedy, M. C., & Petropoulos, G. P. (2016). GEM-SA: The Gaussian emulation machine for
sensitivity analysis. In G. P. Petropoulos & P. K. Srivastava (Eds.), Sensitivity analysis in earth
observation modelling (pp. 341–361). Elsevier.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196–217.
Kersting, K., Plagemann, C., Pfaff, P., & Burgard, W. (2007). Most likely heteroscedastic Gaussian
process regression. In Z. Ghahramani (Ed.), Proceedings of the 24th International Conference
on Machine Learning, Corvallis, OR, 2007. Association for Computing Machinery.
Keyfitz, N. (1971). Models. Demography, 8(4), 571–580.
Keyfitz, N. (1972). On Future Population. Journal of the American Statistical Association, 67(338),
347–363.
Keyfitz, N. (1981). The limits of population forecasting. Population and Development Review,
7(4), 579–593.
Kim, J. K., & Shao, J. (2014). Statistical methods for handling incomplete data. CRC Press/
Chapman & Hall.
References 245
King, R. (2002). Towards a new map of European migration. International Journal of Population
Geography, 8(2), 89–106.
Kingsley, P. (2016). The new Odyssey: The story of Europe’s refugee crisis. Faber & Faber.
Kirk, P. D. W., Babtie, A. C., & Stumpf, M. P. H. (2015). Systems biology (un)certainties. Science,
350, 386–388.
Klabunde, A. (2011). What explains observed patterns of circular migration? An agent-based
model. In 17th International Conference on Computing in Economics and Finance (pp. 1–26).
Klabunde, A. (2014). Computational economic modeling of migration. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.2470525 (as of 1 February 2021).
Klabunde, A., & Willekens, F. (2016). Decision making in agent-based models of migration: State
of the art and challenges. European Journal of Population, 32(1), 73–97.
Klabunde, A., Zinn, S., Leuchter, M., & Willekens, F. (2015). An agent-based decision model
of migration, embedded in the life course: Description in ODD+D format (MPIDR working
paper WP 2015-002). Max Planck Institute for Demographic Research.
Klabunde, A., Zinn, S., Willekens, F., & Leuchter, M. (2017). Multistate modelling extended by
behavioural rules: An application to migration. Population Studies, 71(Supp), 51–67.
Kleijnen, J. P. C. (1995). Verification and validation of simulation models. European Journal of
Operational Research, 82(1), 145–162.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian,
K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W.,
Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek,
B. A. (2014). Investigating variation in replicability. Social Psychology, 45(3), 142–152.
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M.,
Axt, J. R., Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R.,
Bialobrzeska, O., Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., … Nosek, B. A. (2018).
Many labs 2: Investigating variation in replicability across samples and settings. Advances in
Methods and Practices in Psychological Science, 1(4), 443–490.
Kniveton, D., Smith, C., & Wood, S. (2011). Agent-based model simulations of future changes in
migration flows for Burkina Faso. Global Environmental Change, 21, S34–S40.
Kok, L. D. (2016). Forecasting violence induced human mobility flows: Introducing fear to the
decision model. Steps towards establishing a conceptual framework of violence induced
human mobility (Report for Intergovernmental Consultations on Migration, Asylum and
Refugees). IGC.
Köster, T., Warnke, T., & Uhrmacher, A. M. (2020). Partial evaluation via code generation for static
stochastic reaction network models. In Proceedings of the 2020 ACM SIGSIM conference on
principles of advanced discrete simulation, Association for Computing Machinery, Miami, FL,
Spain, SIGSIM-PADS ’20 (pp. 159–170).
Kovera, M. B. (2010). Confounding. In N. J. Salkind (Ed.), Encyclopedia of research design. Sage.
Kozlov, M. D., & Johansen, M. K. (2010). Real behavior in virtual environments: Psychology
experiments in a simple virtual-reality paradigm using video games. Cyberpsychology,
Behavior and Social Networking, 13(6), 711–714.
Kritz, M., Lim, L. L., & Zlotnik, H. (Eds.). (1992). International migration systems: A global
approach. Clarendon Press.
Kulu, H., & Milevski, N. (2007). Family change and migration in the life course: An introduction.
Demographic Research, 17(19), 567–590.
Lattner, C., & Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis &
transformation. In Proceedings of the international symposium on code generation and optimi-
zation: Feedback-directed and runtime optimization. CGO ’04. IEEE.
Law, A. (2006). Simulation modeling and analysis (4th ed.). McGraw-Hill.
Lazega, E., & Snijders, T. A. B. (Eds.). (2016). Multilevel network analysis for the social sciences.
Theory, methods and applications. Springer.
Lee, E. S. (1966). A theory of migration. Demography, 3(1), 47–57.
246 References
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., Matzke, D., Rouder, J. N.,
Trueblood, J. S., White, C. N., & Vandekerckhove, J. (2019). Robust modeling in cognitive
science. Computational Brain & Behavior, 2(3), 141–153.
Lerner, J. S., Li, Y., Valdesolo, P., & Kassam, K. S. (2015). Emotion and decision making. Annual
Review of Psychology, 66(1), 799–823.
Leurs, K., & Smets, K. (2018). Five questions for digital migration studies: Learning from
digital connectivity and forced migration in(to) Europe. Social Media & Society, 4(1),
205630511876442.
Lieberoth, A. (2014). Shallow gamification: Testing psychological effects of framing an activity as
a game. Games and Culture, 10(3), 229–248.
Liepe, J., Filippi, S., Komorowski, M., & Stumpf, M. P. H. (2013). Maximizing the information
content of experiments in systems biology. PLoS Computational Biology, 9, e1002888.
Lin, L., Carley, K. M., & Cheng, S.-F. (2016). An agent-based approach to human migration move-
ment. In Proceedings of the Winter Simulation Conference 2016 (pp. 3510–3520). IEEE.
Lipton, P. (1991/2004). Inference to the best explanation (1st/2nd ed.). Routledge.
Little, R. J. A., & Rubin, D. B. (2020). Statistical analysis with missing data (3rd ed.). Wiley.
Lomnicki, A. (1999). Individual-based models and the individual-based approach to population
ecology. Ecological Modelling, 115(2–3), 191–198.
Lorenz, T. (2009). Abductive fallacies with agent-based modelling and system dynamics. In
F. Squazzoni (Ed.), Epistemological aspects of computer simulation in the social sciences
(Lecture Notes in Artificial Intelligence, 5466) (pp. 141–152). Springer.
Lovreglio, R., Ronchi, E., & Nilsson, D. (2016). An evacuation decision model based on perceived
risk, social influence and Behavioural uncertainty. Simulation Modelling Practice and Theory,
66, 226–242.
Lucas, R. E., Jr. (1976). Econometric policy evaluation: A critique. Carnegie-Rochester Conference
Series on Public Policy, 1, 19–46.
Lutz, W. (2012). Demographic metabolism: A predictive theory of socioeconomic change.
Population and Development Review, 38(Suppl), 283–301.
Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scien-
tists. Springer.
Mabogunje, A. L. (1970). Systems approach to a theory of rural-urban migration. Geographical
Analysis, 2(1), 1–18.
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4(3), 415–447.
Macmillan, N. A., & Creelman, C. D. (2004). Detection theory: A user’s guide (2nd ed.). Erlbaum.
Maddux, J. E., & Rogers, R. W. (1980). Effects of source expertness, physical attractiveness, and
supporting arguments on persuasion: A case of brains over beauty. Journal of Personality and
Social Psychology, 39, 235–244.
Marin, J. M., Pudlo, P., Robert, C. P., & Ryder, R. J. (2012). Approximate Bayesian computational
methods. Statistics and Computing, 22(6), 1167–1180.
Masad, D., & Kazil, J. L. (2015). Mesa: An agent-based modeling framework. In K. Huff &
J. Bergstra (Eds.), Proceedings of the 14th Python in science conference (pp. 51–58).
Massey, D. S. (2002). A synthetic theory of international migration. In V. Iontsev (Ed.), World in
the mirror of international migration (pp. 142–152). MAX Press.
Massey, D. S., & Zenteno, R. M. (1999). The dynamics of mass migration. Proceedings of the
National Academy of Sciences of the United States of America, 96(9), 5328–5335.
Massey, D. S., Arango, J., Hugo, G., Kouaouci, A., Pellegrino, A., & Taylor, J. E. (1993). Theories
of international migration: Review and appraisal. Population and Development Review, 19(3),
431–466.
Mauboussin, A. & Mauboussin, M. J. (2018). If you say something is “likely,” how
likely do people think it is? Harvard Business Review, July 3. https://hbr.org/2018/07/
if-you-say-something-is-likely-how-likely-do-people-think-it-is
References 247
McAlpine, A., Kiss, L., Zimmerman, C., & Chalabi, Z. (2021). Agent-based modeling for migra-
tion and modern slavery research: A systematic review. Journal of Computational Social
Science, 4, 243–332.
McAuliffe, M., & Koser, K. (2017). A Long way to go. Irregular migration patterns, processes,
drivers and decision making. ANU Press.
McGinnies, E., & Ward, C. D. (1980). Better liked than right: Trustworthiness and expertise as
factors in credibility. Personality and Social Psychology Bulletin, 6, 467–472.
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for select-
ing values of input variables in the analysis of output from a computer code. Technometrics,
21(2), 239–245.
Merton, R. K. (1949). Social theory and social structure. The Free Press.
Miłkowski, M., Hensel, W. M., & Hohol, M. (2018). Replicability or reproducibility? On the
replication crisis in computational neuroscience and sharing only relevant detail. Journal of
Computational Neuroscience, 45(3), 163–172.
Mintz, A., Redd, S. B., & Vedlitz, A. (2006). Can we generalize from student experiments to the
real world in political science, military affairs, and international relations? Journal of Conflict
Resolution, 50(5), 757–776.
Mironova, V., Mrie, L., & Whitt, S. (2019). Risk tolerance during conflict: Evidence from Aleppo,
Syria. Journal of Peace Research, 56(6), 767–782.
Mol, J. M. (2019). Goggles in the lab: Economic experiments in immersive virtual environments.
Journal of Behavioral and Experimental Economics, 79, 155–164.
Morgan, S. P., & Lynch, S. M. (2001). Success and future of demography. The role of data and
methods. Annals of the New York Academy of Sciences, 954, 35–51.
Moussaïd, M., Kapadia, M., Thrash, T., Sumner, R. W., Gross, M., Helbing, D., & Hölscher,
C. (2016). Crowd behaviour during high-stress evacuations in an immersive virtual environ-
ment. Journal of the Royal Society Interface, 13(122), 20160414.
MUCM. (2021). Managing uncertainty in complex models. Online resource, via https://mogp-
emulator.readthedocs.io/en/latest/methods/meta/MetaHomePage.html (as of 1 March 2021).
Müller, B., Bohn, F., Dreßler, G., et al. (2013). Describing human decisions in agent-based mod-
els – ODD + D, an extension of the ODD protocol. Environmental Modelling & Software,
48, 37–48.
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert,
N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto
for reproducible science. Nature Human Behaviour, 1(1), 0021.
Naivinit, W., Le Page, C., Trébuil, G., & Gajaseni, N. (2010). Participatory agent-based modeling
and simulation of Rice production and labor migrations in Northeast Thailand. Environmental
Modelling & Software, 25(11), 1345–1358.
Napierała, J., Hilton, J., Forster, J. J., Carammia, M., & Bijak, J. (2021). Toward an early warn-
ing system for monitoring asylum-related migration flows in Europe. International Migration
Review, forthcoming.
Naqvi, A. A., & Rehm, M. (2014). A multi-agent model of a low income economy: Simulating the
distributional effects of natural disasters. Journal of Economic Interaction and Coordination,
9(2), 275–309.
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replica-
bility in science. The National Academies Press.
Neal, R. M. (1996). Bayesian learning for neural networks. Springer.
Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review
of Psychology, 69(1), 511–534.
Noble, J., Silverman, E., Bijak, J., et al. (2012). Linked lives: The utility of an agent-based approach
to modeling partnership and household formation in the context of social care. In Proceedings
of the Winter Simulation Conference 2012. IEEE.
248 References
North, M. J., Collier, N. T., Ozik, J., Tatara, E. R., Macal, C. M., Bragen, M., & Sydelko, P. (2013).
Complex adaptive systems modeling with repast Simphony. Complex Adaptive Systems
Modeling, 1(1), 3.
Nosek, B. A., & Errington, T. M. (2017). Reproducibility in cancer biology: Making sense of
replications. eLife, 6, e23383.
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of
published results. Social Psychology, 45(3), 137–141.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S.,
Chambers, C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J.,
Glennerster, R., Goroff, D., Green, D. P., Hesse, B., Humphreys, M., … Yarkoni, T. (2015).
Promoting an open research culture. Science, 348(6242), 1422.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration rev-
olution. Proceedings of the National Academy of Sciences of the United States of America,
115(11), 2600–2606.
Nowak, A., Rychwalska, A., & Borkowski, W. (2013). Why simulate? To develop a mental model.
Journal of Artificial Societies and Social Simulation, 16(3), 12.
NRC [National Research Council]. (2000). Beyond six billion. Forecasting the World’s population.
National Academies Press.
Nubiola, J. (2005). Abduction or the logic of surprise. Semiotica, 153(1/4), 117–130.
O’Hagan, A. (2013). Polynomial Chaos: A tutorial and critique from a Statistician’s perspective
mimeo. University of Sheffield. Via http://tonyohagan.co.uk/academic/pdf/Polynomial-chaos.
pdf (as of 1 November 2019)
Oakley, J., & O’Hagan, A. (2002). Bayesian inference for the uncertainty distribution of computer
model outputs. Biometrika, 89, 769–784.
Oakley, J. E., & O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A
Bayesian approach. Journal of the Royal Statistical Society B, 66(3), 751–769.
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of open data and
computational reproducibility in registered reports in psychology. Advances in Methods and
Practices in Psychological Science, 3(2), 229–237.
Öberg, S. (1996). Spatial and economic factors in future south-North Migration. In W. Lutz (Ed.),
The future population of the world: What can we assume today? (pp. 336–357). Earthscan.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science, 349(6251), aac4716.
Ozik, J., Wilde, M., Collier, N., & Macal, C. M. (2014). Adaptive simulation with repast Simphony
and swift. In Lecture Notes in Computer Science. Springer.
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41(1), 100–115.
Pascal, B. (1654). Traité du triangle arithmétique, avec quelques autres traités sur le même sujet.
Guillaume Desprez.
Pascal, B. (1670). Pensées. Editions de Port Royal.
Peck, S. (2012). Agent-based models as fictive instantiations of ecological processes. Philosophy,
Theory and Practice in Biology, 4(3), 1–2.
Peirce, C. S. (1878/2014). Deduction, induction and hypothesis. In C. De Waal (Ed.), Illustrations
of the logic of science (pp. 167–184) [original text from Popular science monthly 13,
470–482, ibid].
Peng, D., Warnke, T., & Uhrmacher, A. M. (2015). Domain-specific languages for flexibly experi-
menting with stochastic models. Simulation Notes Europe, 25(2), 117–122.
Petty, R. E., Cacioppo, J. T., & Goldman, R. (1981). Personal involvement as a determinant of
argument-based persuasion. Journal of Personality and Social Psychology, 41, 847–855.
Pilditch, T. D., Madsen, J. K., & Custers, R. (2020). False prophets and Cassandra’s curse: The role
of credibility in belief updating. Acta Psychologica, 202, 102956.
Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ,
1, e175.
References 249
Poile, C., & Safayeni, F. (2016). Using computational modeling for building theory: A double-
edged sword. Journal of Artificial Societies and Social Simulation, 19(3), 8.
Polhill, J. G., Sutherland, L.-A., & Gotts, N. M. (2010). Using qualitative evidence to enhance an
agent-based modelling system for studying land use change. Journal of Artificial Societies and
Social Simulation, 13(2), 10.
Polit, D. F., & Beck, C. T. (2010). Generalization in quantitative and qualitative research: Myths
and strategies. International Journal of Nursing Studies, 47(11), 1451–1458.
Poole, D., & Raftery, A. E. (2000). Inference for deterministic simulation models: The Bayesian
melding approach. Journal of the American Statistical Association, 95(452), 1244–1255.
Pope, A. J., & Gimblett, R. (2015). Linking Bayesian and agent-based models to simulate complex
social-ecological systems in semi-arid regions. Frontiers in Environmental Science, 3, art. 55.
Popper, K. R. (1935). Logik der Forschung. Julius Springer Verlag, Wien [(1959) The logic of
scientific discovery. Hutchinson].
Popper, K. R. (1982). The open universe. An argument for indeterminism. Hutchinson.
Pornpitakpan, C. (2004). The persuasiveness of source credibility: A critical review of five decades’
evidence. Journal of Applied Social Psychology, 34, 243–281.
Poulain, M., Perrin, N., & Singleton, A. (Eds.). (2006). Towards harmonised European statistics
on international migration. Presses Universitaires de Louvain.
Preston, S. H., & Coale, A. J. (1982). Age structure/growth, attrition and accession: A new synthe-
sis. Population Index, 48(2), 217–259.
Przybylski, A. K., Rigby, C. S., & Ryan, R. M. (2010). A motivational model of video game
engagement. Review of General Psychology, 14(2), 154–166.
Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of Homo sapiens:
Making psychological science more representative of the human population. Proceedings of
the National Academy of Sciences, 115(45), 11401.
Raftery, A. E., Givens, G. H., & Zeh, J. E. (1995). Inference from a deterministic population
dynamics model for bowhead whales. Journal of the American Statistical Association, 90(430),
402–416.
Rahmandad, H., & Sterman, J. D. (2012). Reporting guidelines for simulation-based research in
social sciences. System Dynamics Review, 28(4), 396–411.
Railsback, S. F., Lytinen, S. L., & Jackson, S. K. (2006). Agent-based simulation platforms:
Review and development recommendations. Simulation, 82(9), 609–623.
Ranjan, P., & Spencer, N. (2014). Space-filling Latin hypercube designs based on randomization
restrictions in factorial experiments. Statistics & Probability Letters, 94, 239–247.
Rao, A. S., & Georgeff, M. P. (1991). Modeling rational agents within a BDI architecture. In
Proceedings of the second international conference on principles of knowledge representation
and reasoning, KR’91, Cambridge, MA (pp. 473–484). Morgan Kaufmann.
Ravenstein, E. G. (1885). The laws of migration. Journal of the Statistical Society of London,
48(2), 167–227.
Raymer, J., Wiśniowski, A., Forster, J. J., Smith, P. W. F., & Bijak, J. (2013). Integrated modeling
of European migration. Journal of the American Statistical Association, 108(503), 801–819.
Read, S. J., & Monroe, B. M. (2008). Computational models in personality and social psychol-
ogy. In R. Sun (Ed.), The Cambridge handbook of computational psychology (pp. 505–529).
Cambridge University Press.
Reichlová, N. (2005). Can the theory of motivation explain migration decisions? (Working papers
IES, 97). Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies.
Reinhardt, O., & Uhrmacher, A. M. (2017). An efficient simulation algorithm for continuous-time
agent-based linked lives models. In Proceedings of the 50th Annual Simulation Symposium,
International Society for Computer Simulation, San Diego, CA, USA, ANSS ’17, pp 9:1–9:12.
Reinhardt, O., Hilton, J., Warnke, T., Bijak, J., & Uhrmacher, A. (2018). Streamlining simulation
experiments with agent-based models in demography. Journal of Artificial Societies and Social
Simulation, 21(3), 9.
250 References
Reinhardt, O., Uhrmacher, A. M., Hinsch, M., & Bijak, J. (2019). Developing agent-based
migration models in pairs. In Proceedings of the Winter Simulation Conference 2019
(pp. 2713–2724). IEEE.
Reinhardt, O., Warnke, T., & Uhrmacher, A. M. (2021). A language for agent-based discrete-event
modeling and simulation of linked lives. In ACM Transactions on Modeling and Computer
Simulation. (under review).
Richiardi, M. (2017). The future of agent-based modelling. Eastern Economic Journal, 43(2),
271–287.
Rieger, M. O., Wang, M., & Hens, T. (2017). Estimating cumulative prospect theory parameters
from an international survey. Theory and Decision, 82(4), 567–596.
Rogers, A., & Castro, L. J. (1981). Model migration schedules (IIASA Report RR8130). IIASA.
Rogers, A., Little, J., & Raymer, J. (2010). The indirect estimation of migration: Methods for deal-
ing with irregular, inadequate, and missing data. Springer.
Romanowska, I. (2015). So you think you can model? A guide to building and evaluating archaeo-
logical simulation models of dispersals. Human Biology, 87(3), 169–192.
Rossetti, T., & Hurtubia, R. (2020). An assessment of the ecological validity of immersive videos
in stated preference surveys. Journal of Choice Modelling, 34, 100198.
Roustant, O., Ginsbourger, D., & Deville, Y. (2012). DiceKriging, DiceOptim: Two R packages
for the analysis of computer experiments by kriging-based metamodelling and optimisation.
Journal of Statistical Software, 51(1), 1–55.
Ruscheinski, A., & Uhrmacher, A. (2017). Provenance in modeling and simulation studies –
Bridging gaps. In Proceedings of the Winter Simulation Conference 2017 (pp. 872–883). IEEE.
Ruscheinski, A., Wilsdorf, P., Dombrowsky, M., & Uhrmacher, A. M. (2019). Capturing and
reporting provenance information of simulation studies based on an artifact-based workflow
approach. In Proceedings of the 2019 ACM SIGSIM conference on principles of advanced
discrete simulation (pp. 185–196). Association for Computing Machinery.
Ryan, R. M., Rigby, C. S., & Przybylski, A. (2006). The motivational pull of video games: A self-
determination theory approach. Motivation and Emotion, 30(4), 344–360.
Sailer, M., Hense, J. U., Mayr, S. K., & Mandl, H. (2017). How gamification motivates: An experi-
mental study of the effects of specific game design elements on psychological need satisfac-
tion. Computers in Human Behavior, 69, 371–380.
Salecker, J., Sciaini, M., Meyer, K. M., & Wiegand, K. (2019). The nlrx R package: A next-
generation framework for reproducible NetLogo model analyses. Methods in Ecology and
Evolution, 10(11), 1854–1863.
Saltelli, A., Tarantola, S., & Campolongo, F. (2000). Sensitivity analysis as an ingredient of model-
ing. Statistical Science, 15(4), 377–395.
Saltelli, A., Chan, K., & Scott, E. M. (2008). Sensitivity Analysis. Wiley.
Sánchez-Querubín, N., & Rogers, R. (2018). Connected routes: Migration studies with digital
devices and platforms. Social Media & Society, 4(1), 1–13.
Santner, T. J., Williams, B. J., & Notz, W. I. (2003). The design and analysis of computer experi-
ments. Springer.
Sargent, R. G. (2013). Verification and validation of simulation models. Journal of Simulation,
7(1), 12–24.
Sawyer, R. K. (2004). Social explanation and computational simulation. Philosophical
Explorations, 7(3), 219–231.
Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2),
143–186.
Schelling, T. C. (1978). Micromotives and macrobehavior. Norton.
Schimmack, U. (2020). A meta-psychological perspective on the decade of replication failures in
social psychology. Canadian Psychology/Psychologie Canadienne, 61(4), 364–376.
Schloss, P. D. (2018). Identifying and overcoming threats to reproducibility, replicability, robust-
ness, and generalizability in microbiome research. MBio, 9(3), e00525-18.
References 251
Schmolke, A., Thorbek, P., DeAngelis, D. L., & Grimm, V. (2010). Ecological models support-
ing environmental decision making: A strategy for the future. Trends in Ecology & Evolution,
25(8), 479–486.
Schwarz, N. (2000). Emotion, cognition, and decision making. Cognition and Emotion, 14(4),
433–440.
Sechrist, G. B., & Milford-Szafran, L. R. (2011). “I depend on you, you depend on me. shouldn’t
we agree?”: The influence of interdependent relationships on individuals’ racial attitudes.
Basic and Applied Social Psychology, 33, 145–156.
Sechrist, G. B., & Young, A. F. (2011). The influence of social consensus information on intergroup
attitudes: The moderating effects of Ingroup identification. The Journal of Social Psychology,
151, 674–695.
Ševčíková, H., Raftery, A. D., & Waddell, P. A. (2007). Assessing uncertainty in urban simulations
using Bayesian melding. Transportation Research Part B, 41(6), 652–669.
Sharma, S. (2017). Definitions and models of statistical literacy: A literature review. Open Review
of Educational Research, 4(1), 118–133.
Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction:
Broadening perspectives from the replication crisis. Annual Review of Psychology, 69(1),
487–510.
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., Bahník, Š., Bai,
F., Bannard, C., Bonnier, E., Carlsson, R., Cheung, F., Christensen, G., Clay, R., Craig, M. A.,
Dalla Rosa, A., Dam, L., Evans, M. H., Flores Cervantes, I., … Nosek, B. A. (2018). Many
analysts, one data set: Making transparent how variations in analytic choices affect results.
Advances in Methods and Practices in Psychological Science, 1(3), 337–356.
Silveira, J. J., Espíndola, A. L., & Penna, T. J. P. (2006). Agent-based model to rural-Urban migra-
tion analysis. Physica A: Statistical Mechanics and its Applications, 364, 445–456.
Silverman, E. (2018). Methodological investigations in agent-based modelling, with applications
for the social sciences (Methodos series, vol. 13). Springer.
Silverman, E., Bijak, J., Hilton, J., Cao, V. D., & Noble, J. (2013). When demography met social
simulation: A tale of two modelling approaches. Journal of Artificial Societies and Social
Simulation, 16(4), 9.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed
flexibility in data collection and analysis allows presenting anything as significant. Psychological
Science, 22(11), 1359–1366.
Simon, M. (2019). Path dependency and adaptation: The effects of policy on migration systems.
Journal of Artificial Societies and Social Simulation, 22(2), 2.
Simon, M., Schwartz, C., Hudson, D., & Johnson, S. D. (2016). Illegal migration as adaptation:
An agent based model of migration dynamics. In 2016 APSA Annual Meeting & Exhibition.
Simon, M., Schwartz, C., Hudson, D., & Johnson, S. D. (2018). A data-driven computational
model on the effects of immigration policies. Proceedings of the National Academy of Sciences,
115(34), E7914–E7923.
Simons, D. J., Holcombe, A. O., & Spellman, B. A. (2014). An introduction to registered replica-
tion reports at perspectives on psychological science. Perspectives on Psychological Science,
9(5), 552–555.
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed
addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128.
Singleton, A. (2016). Migration and asylum data for policy-making in the European Union – The
problem with numbers (CEPS paper no. 89). Centre for Europe and Policy Studies.
Sisson, S. A., Fan, Y., & Beaumont, M. (2018). Handbook of approximate Bayesian computation.
Chapman and Hall/CRC.
Sjaastad, L. A. (1962). The costs and returns of human migration. Journal of Political Economy,
70(5), 80–93.
Smajgl, A., & Bohensky, E. (2013). Behaviour and space in agent-based modelling: Poverty pat-
terns in East Kalimantan, Indonesia. Environmental Modelling and Software, 45, 8–14.
252 References
Smaldino, P. E. (2016). Models are stupid, and we need more of them. In R. Vallacher, S. Read, &
A. Nowak (Eds.), Computational models in social psychology (pp. 311–331). Psychology Press.
Smith, R. C. (2013). Uncertainty quantification: Theory, implementation, and applications. SIAM.
Smith, C. (2014). Modelling migration futures: Development and testing of the rainfalls agent-
based migration model – Tanzania. Climate and Development, 6(1), 77–91.
Smith, C., Wood, S., & Kniveton, D. (2010). Agent based modelling of migration decision mak-
ing. In Proceedings of the European workshop on multi-agent systems (EUMAS-2010) (p. 15).
Sobol’, I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte
Carlo estimates. Mathematics and Computers in Simulation, 55(1–3), 271–280.
Sokolowski, J. A., Banks, C. M., & Hayes, R. L. (2014). Modeling population displacement
in the Syrian City of Aleppo. In Proceedings of the Winter Simulation Conference 2014
(pp. 252–263). IEEE.
Spiegelhalter, D. J., & Riesch, H. (2011). Don’t know, can’t know: Embracing deeper uncertain-
ties when analysing risks. Philosophical Transactions of the Royal Society A, 369(1956),
4730–4750.
Stagge, J. H., Rosenberg, D. E., Abdallah, A. M., Akbar, H., Attallah, N. A., & James, R. (2019).
Assessing data availability and research reproducibility in hydrology and water resources.
Scientific Data, 6(1), 190030.
Stan Development Team. (2021). Stan modeling language users guide and reference manual.
Retrieved from http://mc-stan.org/index.html (as of 1 February 2020).
Stark, O. (1991). The migration of labor. Basil Blackwell.
Stark, O., & Bloom, D. E. (1985). The new economics of labor migration. American Economic
Review, 75(2), 173–178.
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from
tests of significance—Or vice versa. Journal of the American Statistical Association, 54, 30–34.
Stillwell, J., Bell, M., Ueffing, P., Daras, K., Charles-Edwards, E., Kupiszewski, M., &
Kupiszewska, D. (2016). Internal migration around the world: Comparing distance travelled
and its frictional effect. Environment and Planning A, 48(8), 1657–1675.
Stodden, V., Guo, P., & Ma, Z. (2013). Toward reproducible computational research: An empirical
analysis of data and code policy adoption by journals. PLoS One, 8(6), e67111.
Strevens, M. (2016). How idealizations provide understanding. In S. R. Grimm, C. Baumberger, &
S. Ammon (Eds.), Explaining understanding: New essays in epistemology and the philosophy
of science (pp. 37–49). Routledge.
Suhay, E. (2015). Explaining group influence: The role of identity and emotion in political confor-
mity and polarization. Political Behavior, 37, 221–251.
Suleimenova, D., & Groen, D. (2020). How policy decisions affect refugee journeys in South
Sudan: A study using automated ensemble simulations. Journal of Artificial Societies and
Social Simulation, 23(1), 17.
Suleimenova, D., Bell, D., & Groen, D. (2017). Towards an automated framework for agent-based
simulation of refugee movements. In Proceedings of the Winter Simulation Conference 2017
(pp. 1240–1251). IEEE.
Suriyakumaran, A., & Tamura, Y. (2016). Asylum provision: A review of economic theories.
International Migration, 54(4), 18–30.
Swets, J. A. (2014). Signal detection theory and ROC analysis in psychology and diagnostics:
Collected papers. Psychology Press.
Tabeau, E. (2009). Victims of the Khmer rouge regime in Cambodia, April 1975 to January 1979:
A critical assessment of existing estimates and recommendations for court. Expert report,
Extraordinary Chambers of the Courts of Cambodia.
Tack, L., Goos, P., & Vandebroek, M. (2002). Efficient Bayesian designs under heteroscedasticity.
Journal of Statistical Planning and Inference, 104(2), 469–483.
Tanaka, T., Camerer, C. F., & Nguyen, Q. (2010). Risk and time preferences: Linking experimental
and household survey data from Vietnam. American Economic Review, 100, 557–571.
References 253
Tavaré, S., Balding, D. J., Griffiths, R. C., & Donnelly, P. (1997). Inferring coalescence times from
DNA sequence data. Genetics, 145(2), 505–518.
ten Broeke, G., van Voorn, G., & Ligtenberg, A. (2016). Which sensitivity analysis method should
I use for my agent-based model? Journal of Artificial Societies and Social Simulation, 19(1), 5.
Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Manoff, M., & Frame,
M. (2011). Data sharing by scientists: Practices and perceptions. PLoS One, 6(6), e21101.
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The art and science of prediction.
Random House.
Thompson, E. L., & Smith, L. A. (2019). Escape from model-land. Economics: The Open-Access,
Open-Assessment E-Journal, 13(40), 1–17. https://www.econstor.eu/handle/10419/204779
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of
Machine Learning Research, 1, 211–244.
Tobin, S. J., & Raymundo, M. M. (2009). Persuasion by causal arguments: The motivating role of
perceived causal expertise. Social Cognition, 27(1), 105–127.
Todaro, M. P. (1969). A model of labor migration and Urban unemployment in less developed
countries. The American Economic Review, 59(1), 138–148.
Troitzsch, K. G. (2017). Using empirical data for designing, calibrating and validating simulation
models. In W. Jager, R. Verbrugge, A. Flache, G. de Roo, L. Hoogduin, & C. Hemelrijk (Eds.),
Advances in social simulation 2015 (pp. 413–427). Springer.
Tsvetkova, M., Wagner, C., & Mao, A. (2018). The emergence of inequality in social groups:
Network structure and institutions affect the distribution of earnings in cooperation games.
PLoS One, 13(7), e0200965. https://doi.org/10.1371/journal.pone.0200965
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,
185(4157), 1124–1131.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of
uncertainty. Journal of Risk and Uncertainty, 5(4), 297–323.
UN. (2016). New York declaration for refugees and migrants (Resolution adopted by the general
assembly on 19 September 2016. A/RES/71/1). United Nations.
UNHCR. (1951/1967). Text of the 1951 convention relating to the status of refugees; text of the
1967 protocol relating to the status of refugees; resolution 2198 (XXI) adopted by the United
Nations general assembly with an introductory note by the Office of the United Nations High
Commissioner for refugees. UNHCR.
UNHCR. (2021). UNHCR refugee statistics. Via: https://www.unhcr.org/refugee-statistics/ (as of
1 February 2021).
Van Bavel, J., & Grow, A. (Eds.). (2016). Agent-based modelling in population studies: Concepts,
methods, and applications. Springer.
Van der Vaart, E., Beaumont, M. A., Johnston, A. S. A., & Sibly, R. M. (2015). Calibration and
evaluation of individual-based models using approximate Bayesian computation. Ecological
Modelling, 312, 182–190.
Van Deursen, A., Klint, P., & Visser, J. (2000). Domain-specific languages: An annotated bibliog-
raphy. Sigplan Notices, 35(6), 26–36.
Van Hear, N., Bakewell, O., & Long, K. (2018). Push-pull plus: Reconsidering the drivers of
migration. Journal of Ethnic and Migration Studies, 44(6), 927–944.
Vernon, I., Goldstein, M., & Bower, R. G. (2010). Galaxy formation: A Bayesian uncertainty
analysis. Bayesian Analysis, 5(4), 619–669.
Vogel, D., & Kovacheva, V. (2008). Classification report: Quality assessment of estimates on stocks
of irregular migrants (Report of the Clandestino project). Hamburg Institute of International
Economics.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012).
An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6),
632–638.
Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., Selker, R., Gronau,
Q. F., Šmíra, M., Epskamp, S., Matzke, D., Rouder, J. N., & Morey, R. D. (2018). Bayesian infer-
254 References
ence for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic
Bulletin & Review, 25(1), 35–57.
Wakker, P. P. (2010). Prospect theory: For risk and ambiguity. Cambridge University Press.
Wakker, P., & Deneffe, D. (1996). Eliciting von Neumann-Morgenstern utilities when probabilities
are distorted or unknown. Management Science, 42(8), 1131–1150.
Wall, M., Otis Campbell, M., & Janbek, D. (2017). Syrian refugees and information Precarity. New
Media & Society, 19(2), 240–254.
Waltemath, D., Adams, R., Bergmann, F. T., Hucka, M., Kolpakov, F., Miller, A. K., Moraru, I. I.,
Nickerson, D., Sahle, S., Snoep, J. L., & Le Novère, N. (2011). Reproducible computational
biology experiments with SED-ML – The simulation experiment description markup language.
BMC Systems Biology, 5(1), 198.
Wang, S., Verpillat, P., Rassen, J., Patrick, A., Garry, E., & Bartels, D. (2016). Transparency and
reproducibility of observational cohort studies using large healthcare databases. Clinical
Pharmacology & Therapeutics, 99(3), 325–332.
Warnke, T., Klabunde, A., Steiniger, A., Willekens, F., & Uhrmacher, A. M. (2016). ML3: A lan-
guage for compact modeling of linked lives in computational demography. In Proceedings of
the Winter Simulation Conference 2015. IEEE.
Warnke, T., Reinhardt, O., Klabunde, A., Willekens, F., & Uhrmacher, A. M. (2017). Modelling
and simulating decision processes of linked lives: An approach based on concurrent processes
and stochastic race. Population Studies, 71(Supp), 69–83.
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p < 0.05”.
The American Statistician, 73(Sup1), 1–19.
Weintraub, E. R. (1977). The microfoundations of macroeconomics: A critical survey. Journal of
Economic Literature, 15(1), 1–23.
Weisberg, M. (2007). Three kinds of idealization. Journal of Philosophy, 104(12), 639–659.
Werth, B., & Moss, S. (2007). Modelling migration in the Sahel: An alternative to cost-benefit
analysis. In S. Takahashi, D. Sallach, & J. Rouchier (Eds.), Advancing social simulation: The
first world congress (pp. 331–342). Springer Japan.
Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related
to the strength of the evidence and the quality of reporting of statistical results. PLoS One,
6(11), e26828.
Wilensky, U. (1999). NetLogo. https://ccl.northwestern.edu/netlogo (as of 1 February 2021).
Wilensky, U. (2018). NetLogo 6.0.3 user manual: BehaviorSpace guide. Available from: https://
ccl.northwestern.edu/netlogo/docs/behaviorspace.html (as of 1 February 2021).
Willekens, F. (1990). Demographic forecasting; state-of-the-art and research needs. In C. A. Hazeu
& G. A. B. Frinking (Eds.), Emerging issues in demographic research (pp. 9–66). Elsevier.
Willekens, F. (1994). Monitoring international migration flows in Europe: Towards a statistical
data base combining data from different sources. European Journal of Population, 10(1), 1–42.
Willekens, F. (2009). Continuous-time microsimulation in longitudinal analysis. In A. Zaidi,
A. Harding, & P. Williamson (Eds.), New Frontiers in microsimulation modelling (pp. 413–436).
Ashgate.
Willekens, F. (2018). Towards causal forecasting of international migration. Vienna Yearbook of
Population Research, 16, 1–20.
Willekens, F., Massey, D., Raymer, J., & Beauchemin, C. (2016). International migration under the
microscope. Science, 352(6288), 897–899.
Willekens, F., Bijak, J., Klabunde, A., & Prskawetz, A. (Eds.). (2017). The science of choice: An
introduction. Population Studies, 71(Supp), 1–13.
Williams, A. D., & Baláž, V. (2011). Migration, risk, and uncertainty: Theoretical perspectives.
Population, Space and Place, 18(2), 167–180.
Williams, A. M., & Baláž, V. (2014). Mobility, risk tolerance and competence to manage risks.
Journal of Risk Research, 17(8), 1061–1088.
References 255
Williamson, D., & Goldstein, M. (2015). Posterior belief assessment: Extracting meaningful sub-
jective judgements from Bayesian analyses with complex statistical models. Bayesian Analysis,
10(4), 877–908.
Wilsdorf, P., Dombrowsky, M., Uhrmacher, A. M., Zimmermann, J., & van Rienen, U. (2019).
Simulation experiment schemas – Beyond tools and simulation approaches. In Proceedings of
the 2019 Winter Simulation Conference. IEEE.
Wilsdorf, P., Haack, F., Budde, K., Ruscheinski, A., & Uhrmacher, A. M. (2020). Conducting
systematic, partly automated simulation studies Unde Venis et quo Vadis. In 17th International
Conference of Numerical Analysis and Applied Mathematics, 020001 (AIP conference pro-
ceedings 2293(1)). AIP Publishing LCC.
Wintle, B. C., Fraser, H., Wills, B. C., Nicholson, A. E., & Fidler, F. (2019). Verbal probabilities:
Very likely to be somewhat more confusing than numbers. PLoS One, 14, e0213522.
Wipf, D. P., & Nagarajan, S. S. (2008). A new view of automatic relevance determination. In
J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in neural information process-
ing systems 20 (pp. 1625–1632). Curran Associates.
Xie, Y. (2000). Demography: Past, present and future. Journal of the American Statistical
Association, 95(450), 670–673.
Xie, Q., Lu, S., Cóstola, D., & Hensen, J. L. M. (2014). An arbitrary polynomial Chaos-based
approach to analyzing the impacts of design parameters on evacuation time under uncertainty.
In D. Nilsson, P. van Hees, & R. Jansson (Eds.), Fire safety science–proceedings of the eleventh
international symposium (pp. 1077–1090). International Association for Fire Safety Science.
Yang, L., & Guo, Y. (2019). Combining pre- and post-model information in the uncertainty
quantification of non-deterministic models using an extended Bayesian melding approach.
Information Sciences, 502, 146–163.
Zaidi, A., Harding, A., & Williamson, P. (Eds.). (2009). New Frontiers in microsimulation model-
ling. Routledge.
Zawadzka, K., & Higham, P. A. (2015). Judgments of learning index relative confidence, not sub-
jective probability. Memory & Cognition, 43(8), 1168–1179.
Zeigler, B., & Sarjoughian, H. S. (2017). Guide to modeling and simulation of systems of systems
(Simulation foundations, methods and applications) (2nd ed.). Springer.
Zeigler, B. P., Muzy, A., & Kofman, E. (2019). Theory of modeling and simulation (3rd ed.).
Academic.
Zelinsky, W. (1971). The hypothesis of the mobility transition. Geographical Review, 61(2),
219–249.
Zinn, S. (2012). A mate-matching algorithm for continuous-time microsimulation models.
International Journal of Microsimulation, 5(1), 31–51.
Zolberg, A. R. (1989). The next waves: Migration theory for a changing world. International
Migration Review, 23, 403–430.
Index
A melding, 22, 89
Abduction, 23–25, 27 methods, 6, 8, 72, 87–91
Agency, 4, 14, 15, 37, 38, 115, 163, 191 modelling, 90, 185–191
Agent, 4, 15, 34, 60, 71, 93, 113, 137, 156, probability interpretation, 65
182, 187, passim recursive approach (see Recursive
Agent-based model Bayesian Approach)
documentation of, 130 (see also Overview, uncertainty quantification (see Uncertainty,
Design concepts, Details (ODD); quantification (UQ))
Provenance) Bayes linear methods, 88
examples of, 8 Behaviour
Amazon Mechanical Turk, 220, 221 individual, 21, 33, 93, 157
Analysis of variance (ANOVA), 78, 85, micro-level (see Behaviour, individual)
213, 215 risky, 141
Approximate Bayesian Computation (ABC), Behavioural economics, 15, 93,
22, 88, 148, 149, 169, 171, 224, 226 94, 96
Asylum Belief
migration, 4, 7, 8, 13, 15–17, 37, 51–56, differences in, 45
60–65, 68, 93, 95, 109, 111, 138, update of, 44
145, 153, 155, 156, 164, 181 Bias, 26, 61–63, 65–67, 96, 108, 145, 148,
policies, 56, 152 159, 179, 180, 188, 189, 224
recognition rates, 56
seekers, 16, 19, 54–56, 96, 100–102, 111,
145, 146, 152, 159, 164, 199, C
205–207, 210, 211 Calibration
Attitude, 26, 59, 103, 107, 139–142, 169, of computational models, 88
205, 220 of probability distributions, 87, 88,
90, 91, 150
Causality, 21, 34
B Chance, 36, 97, 137, 142, 167,
Balkan route, 38 168, 190
Bayesian Cognition, 93–112, 176
demography, 72 Cognitive psychology, 5, 8, 21
estimation, 89 Communication, 57, 69, 78, 83, 86, 89, 99,
learning, 77 100, 103, 138, 142, 143, 166–168,
measures, 180 188, 190, 191, 193, 196, 219
J explanations of
Julia, 46, 120, 139, 151, 160, 213 causal, 19
See also Programming languages see also Explanation; Migration theory
flows, 5, 7, 13, 15, 17, 18, 48, 51, 52,
56–58, 61, 68, 164, 166, 167, 169,
K 190, 198
Knowledge, 4, 5, 8, 11, 14, 15, 18, 22, 23, 25, forced, 16, 17, 53, 208
27, 28, 33, 38, 45, 47–49, 51–72, international, 4, 5, 7, 15, 21, 35, 42, 43,
115, 119, 120, 138, 140, 141, 203, 208, 212
154–160, 162, 163, 166, 168, 171, laws (see Migration theory)
178, 185–189, 194, 195, 198, 226 management
efficiency of, 11
networks, 14, 26, 40, 43, 44, 47, 48, 53, 57,
L 60, 94, 115, 155, 203, 207
Laplace’s demon, 186 policies, 10, 17, 37, 48, 60, 104, 167, 208,
See also Uncertainty, epistemic 209, 212 (see also Policy)
Latin Hypercube Sample predictability of, 5, 13, 18
space-filling, 76 predictions, 7, 14, 15, 18–20, 40, 42, 47,
Links, 14, 34, 52, 71, 104, 115, 48, 69, 162
140, 193, passim processes, 3, 4, 6, 7, 14, 15, 17, 26, 33, 37,
Loss, 76, 77, 86, 89, 96–98, 156, 167, 38, 42, 48, 51–53, 56–60, 64, 67,
174, 186 69, 155, 170, 198, 208
See also Utility push and pull factors
Lucas critique, 20 ‘hard’ factors, 17
push-pull-plus, 20
‘soft’ factors, 17
M routes
Map see Topology formation of, 4, 8, 33, 38, 40, 51, 90,
Mechanistic (theory) see Functional- 106, 138, 139, 164, 193, 196, 197,
mechanistic approach 224, 225
Mediterranean, The friction of, 43, 53, 60
Central route, 146 see also Migrants, journeys of
Eastern route, 54, 138, 209 studies, 3, 5–7, 10, 13, 14, 16, 20, 48, 61,
Western route, 54, 138 103, 104, 155–157, 191, 211
Meta-cognition see Cognition theory
Meta-model see Emulator failures of, 20, 187
Methodology, 7, 24, 25, 29, 37, 52, 56, 63, 72, neoclassical, 95
80, 93, 97, 107, 108, 131, 178, 179, new economics of migration, 14
181, 187, 198, 200, 202, uncertainty, 15–18, 26, 37–38, 47, 48, 72,
205–207, 224 77, 82, 86, 90, 163
Micro-foundations, 5, 13, 19–22, 166, 186 ML3, 6, 113, 116–122, 124–126, 128, 130,
Microsimulations, 6, 19, 21, 25, 88 133, 139, 144, 223, 224, 226
Migrants See also Programming languages
asylum-seeking (see Asylum-seekers) Mobility, 4, 7, 13–15, 17, 19,
journeys of, 15, 58, 59, 155, 169, 207 52, 60, 160
Migration Model
asylum-related, 16, 17, 51, 198 agent-based (see Agent-based model)
concepts, 8, 14 analysis of, 8, 70, 75, 87, 131, 158, 186,
data (see Data on migration) 189, 214, 223
definitions, 3–4, 13–17, 20–22, 33–49, canonical (lack of), 35, 48, 160
51–58, 60–64, 68–70, 102–106, computational, 51, 74, 88, 182, 186
155–158, 197–212, 222 design of, 139, 214
drivers, 4, 9, 102–106, 166, 220, 222, development of, 7, 9, 113, 120, 124, 139,
224, 225 158, 190, 223
environment, 14, 16, 21, 45, 46, 57, 59, 68, discrepancy, 71, 74, 75, 88,
103, 156, 162, 220 90, 149
Index 261
complexity of, 185 (see also Transparency, 10, 60–63, 66, 69, 73, 82, 160,
Complexity) 171, 175–183, 188, 191, 198–208
Space, see Topology
Statistical significance, 176, 180
Structure, 3, 7–10, 22–28, 40, 41, 54, 66, 98, U
104, 109, 111, 116, 120–123, 157, Uncertainty
160, 191, 223 aleatory, 20, 153, 155, 162, 163, 186,
Surprise, see Discovery 189, 191
Surrogate, see Emulator analysis, 73, 79, 84, 143, 144, 148, 150,
Syntax, 46, 116, 117, 120, 121, 125 153, 213, 216
Syrian Arab Republic (Syria) of the computer code, 81
civil war in, 53, 146 of decision making (see Decision making,
refugees from, 54, 56, 199, 206, 207, 211 under uncertainty)
System, 14, 16–18, 33–37, 40, 46, 48, 69, in demography, 7, 13–29
71–74, 79, 108, 115, 117, 119–122, epistemic, 153, 162, 163, 187, 189
125, 127, 129, 139, 140, 143, 151, of migration (see Migration, uncertainty)
152, 155, 156, 158, 160, 162–167, in migration studies, 14
169, 185, 207, 221, 222, 224, 226 of prediction, 7, 14, 15, 18, 72, 74, 88 (see
See also Complexity also Population processes,
predictability of)
quantification (UQ), 8, 10, 71–91,
T 158, 214
Theories of the middle range, 161 sources of, 8, 71, 72, 89
Theory of planned behaviour, 26, 42, 47 United Nations High Commissioner for
See also Decision Refugees (UNHCR), 15, 54–56,
Time 199, 200
continuous, 116, 119, 120, 124, 139, 140, Utility
157, 182 elicitation of, 220
discrete, 119, 124, 182, 195, 224 function, 96–98, 220, 225
fixed-increment time advance, 116 maximisation of, 42
next-event time advance, 116
Topology
grid-based, 41 V
map-based, 147 Validation, see Model, validation of
network-based, 146 Verification, see Model, verification of
Trade-offs, 4, 9, 28, 44, 46, 56, 65, 82, 94, Volatility, 13, 19, 20, 153
103, 106, 133, 138, 153, 155, 158,
163, 164, 169, 174, 186, 188–189
Traffic, 41, 62, 78, 112, 141, 197, 213, W
215, 219 WEIRD agents (Western, Educated,
Training sample, 75, 79, 88, 89 Industrialised, Rich, and
See also Latin Hypercube Sample Democratic), 108