Modeling and Forecasting of COVID-19 Using A Hybrid Dynamic Model Based On SEIRD With ARIMA Corrections
Modeling and Forecasting of COVID-19 Using A Hybrid Dynamic Model Based On SEIRD With ARIMA Corrections
Epidemics
journal homepage: www.elsevier.com/locate/epidemics
Keywords: UVA-EpiHiper is a national scale agent-based model to support the US COVID-19 Scenario Modeling Hub
COVID-19 Scenario Modeling Hub (SMH). UVA-EpiHiper uses a detailed representation of the underlying social contact network along with data
Agent-based model measured during the course of the pandemic to initialize and calibrate the model. In this paper, we study
Heterogeneity
the role of heterogeneity on model complexity and resulting epidemic dynamics using UVA-EpiHiper. We
Complexity
discuss various sources of heterogeneity that we encounter in the use of UVA-EpiHiper to support modeling
Digital twin
and analysis of epidemic dynamics under various scenarios. We also discuss how this affects model complexity
and computational complexity of the corresponding simulations. Using round 13 of the SMH as an example,
we discuss how UVA-EpiHiper was initialized and calibrated. We then discuss how the detailed output
produced by UVA-EpiHiper can be analyzed to obtain interesting insights. We find that despite the complexity
in the model, the software, and the computation incurred to an agent-based model in scenario modeling,
it is capable of capturing various heterogeneities of real-world systems, especially those in networks and
behaviors, and enables analyzing heterogeneities in epidemiological outcomes between different demographic,
geographic, and social cohorts. In applying UVA-EpiHiper to round 13 scenario modeling, we find that disease
outcomes are different between and within states, and between demographic groups, which can be attributed
to heterogeneities in population demographics, network structures, and initial immunity.
1. Introduction Virginia (UVA) that also supported the SMH. The UVA teams after
much deliberations decided to keep these two models active throughout
The world just witnessed the largest pandemic since 1918. The the last 3 years, even though the computational and human costs
COVID-19 pandemic led to significant social, economic and health were significant. Our decision was based on the fact that we wanted
impacts worldwide. Computational models played an important role to provide results based on two different models; the key difference
in supporting the policy makers during the pandemic. The US COVID- between them was the level of aggregation. A companion paper on
19 Scenario Modeling Hub (SMH) (Scenario Modeling Hub, 2023a) UVA-adaptive (Porebski et al., 2024) describes the other effort. Here
was formed in late 2020 to support policy makers. The consortium of we focus on UVA-EpiHiper.
modeling teams over the last 3 years considered 18 rounds of different UVA-EpiHiper is a national-scale individual-level agent-based model
what-if scenarios and created ensemble models that provided senior among the models that participated in the COVID-19 SMH. This was
level policy makers analytical insights. The UVA-EpiHiper team was one of its unique features. Its modeling capabilities allow it to incor-
one of the 15 models that has participated in this community effort. It porate heterogeneities in various surveillance data sets; to implement
has contributed projections to the COVID-19 SMH in all rounds from 6 heterogeneities in the disease model, contact network, and behavior
to 13 and most recently round 17. UVA-EpiHiper has also been adapted between individuals and among subpopulations; to produce projections
to participate in the Flu Scenario Modeling Hub (Scenario Modeling that can be stratified to study outcome heterogeneities among geo-
graphical, social, or demographical groups. Such capabilities are made
Hub, 2023b), the RSV Scenario Modeling Hub (Scenario Modeling Hub,
computationally efficient by workflows that can handle computational
2023c), and the European COVID-19 Scenario Hub (Scenario Modeling
heterogeneities.
Hub, 2023d). UVA-adaptive was the other team from University of
∗ Corresponding authors.
E-mail addresses: [email protected] (J. Chen), [email protected] (M. Marathe).
https://doi.org/10.1016/j.epidem.2024.100779
Received 15 August 2023; Received in revised form 20 May 2024; Accepted 17 June 2024
Available online 27 June 2024
1755-4365/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Chen et al. Epidemics 48 (2024) 100779
It is crucial to be able to model the heterogeneities at all scales to in the SMH too, e.g. Porebski et al. (2024), Bouchnita et al. (2024),
better understand an epidemic and better evaluate interventions. This is Srivastava (2023). UVA-EpiHiper is the only agent-based model in
challenging for the compartmental models or even the metapopulation SMH that models the whole of USA. These agent-based models usually
models. An agent-based networked model is a natural solution. While have an explicit representation of the individuals and the underlying
such a model enables us to model heterogeneous data, policies, indi- social contact network on which the disease spreads. Comparing with
vidual behavior, and individual disease progression, it poses challenges compartmental mass action models, such a representation can capture
regarding computations, data needed for initialization, and analyses heterogeneity between individual agents and in the contact structure.
due to its complexity. This paper describes how the UVA-EpiHiper The agent-based models allow us to directly capture behaviors and
model handles the heterogeneities with its capabilities and how we interventions, and to study targeted policies and response strategies.
address the heterogeneities in the computation and analytics. Our UVA-EpiHiper model aims to provide the following features which
are crucial in the SMH work: scalability in terms of ability to simulate
Overview of UVA-EpiHiper. UVA-EpiHiper includes a high perfor- epidemics and interventions on national scale networks (with 100–300
mance computing oriented pipeline designed for scalable epidemic million nodes), capabilities and expressiveness in terms of disease and
analytics. The pipeline has five steps. Step 1: Build a digital twin of intervention modeling, and ease of specification in terms of ability to
the social contact network. The digital twin is statistically similar to specify disease models and interventions.
the real-world network but preserves the privacy and confidential- The use of agent-based models for scenario planning and projection
ity of individuals. Step 2: Initialize the digital twin with surveillance also has a rich history. See the paper by Runge et al. (2023) for a
data. Step 3: Use a high performance computing oriented simulation more detailed account. Over the past two decades, we have used agent-
(EpiHiper) to calibrate and execute a statistical experiment design to based models to support scenario projections in epidemic science for
study the specific decision-theoretic questions. Step 4: Create aggregate various sponsors. Examples of our work include: (𝑖) supporting Office of
projections from the simulation outputs to be comparable with data Homeland Security (OHS) and Joint Task Force Civil Support (JTF-CS)
from surveillance. Step 5: Analyze the simulation generated detailed on Smallpox (Eubank et al., 2004), (𝑖𝑖) targeted layered containment
data to obtain policy insights. Additional details about the pipeline can (TLC) study done for the National Security Council (NSC) (Halloran
be found in Bhattacharya et al. (2021, 2023). et al., 2008), (𝑖𝑖𝑖) studies on H1N1 pandemic (Barrett et al., 2011a;
Chen et al., 2010, 2018) (𝑖𝑣) studies on Ebola epidemic (Rivers et al.,
1.1. Summary of contributions 2014; Venkatramanan et al., 2018) (𝑣) tabletop exercise of pandemic
planning done for the Defense Threat Reduction Agency (DTRA) and
senior officials in the USG (Barrett et al., 2011b, 2015) (𝑣𝑖) studies done
In this paper, we focus on three central questions: (𝑖) How does
for the Department of Defense (DoD) to support MEDCOM and National
UVA-EpiHiper capture various heterogeneities of real-world systems?
Guard (Barrett et al., 2012). During the COVID-19 outbreak, we have
(𝑖𝑖) How do these heterogeneities affect UVA-EpiHiper in terms of
continued to support DTRA and Virginia Department of Health (VDH)
model complexity, software complexity, computational complexity, and
on various scenario planning exercises, using both metapopulation
analytical complexity? (𝑖𝑖𝑖) How do the heterogeneities impact the re-
and agent-based models (Venkatramanan et al., 2019; Chen et al.,
sulting epidemic dynamics in various counterfactual scenarios studied 2019). The present paper describes our work done in the context of
as a part of SMH? For the first question, we describe the various the ongoing Scenario Modeling Hub (SMH) effort (Scenario Modeling
forms of heterogeneity that are represented within the UVA-EpiHiper Hub, 2023a). This collaborative effort is novel in that it has brought
framework. This includes: network heterogeneity, disease model and in- together a diverse group of modelers, policy makers, and analysts to
tervention heterogeneity, initialization heterogeneity in terms of diverse design and implement complex scenarios and has used the ensemble
data streams, and analytical heterogeneity that stems from the need to results to inform policy makers as they plan and respond during an
analyze large detailed output data. For the second question, we describe ongoing epidemic outbreak. SMH has repeatedly shown that ensemble of
how each of these heterogeneities impacts UVA-EpiHiper with respect the projections from a diverse set of models provides a more robust set
to three important measures: (𝑎) model complexity – the complexity of projections than a single model.
of representing the underlying model, (𝑏) software complexity, and (𝑐)
computational complexity – the computational resources used to com- 2. Heterogeneities in modeling capabilities of UVA-EpiHiper
plete the needed analysis. For the third question, as a concrete example,
we discuss how UVA-EpiHiper was used to support SMH Round 13. Two UVA-EpiHiper is an individual-based model, in which each individ-
natural broad questions arise with such detailed models: (𝑖) when are ual is explicitly modeled and represented. In this section we describe
such models needed and (𝑖𝑖) how does one validate such models. Both (𝑖) how we model heterogeneities among individuals in terms of their
are important questions and will be discussed in Section 6. demographic and socio-economic attributes; (𝑖𝑖) how we model hetero-
geneities in the social contact network based on individuals interacting
with each other when they have activities at the same locations, instead
1.2. Related work
of a particular structural graph model; (𝑖𝑖𝑖) how we model heteroge-
neous disease transmission and progression for different individuals;
Over the last three decades, agent-based models have become a
(𝑖𝑣) how we model heterogeneous behavior regarding mitigation mea-
popular modeling paradigm in epidemiology. Such models include
sures, including compliance to pharmaceutical interventions (PIs) and
e.g. EpiSimdemics (Barrett et al., 2008), EpiFast (Bisset et al., 2009),
non-pharmaceutical interventions (NPIs).
FRED (Grefenstette et al., 2013), Indemics (Bisset et al., 2014), EMOD The UVA-EpiHiper model was designed to support a broad range
(Bershteyn et al., 2018), and more recent models developed for COVID- of disease models and a large class of interventions needed to sup-
19 modeling: Covid-Sim (Ferguson et al., 2020), Covasim (Kerr et al., port policy makers in complex scenarios. The formal model is time
2021), OpenABM-Covid19 (Hinch et al., 2021), OpenCOVID (Shattock stepped and captures the following elements of a contagion propagating
et al., 2022), and the model in Shoukat et al. (2020). A few agent-based over a network 𝐺(𝑉 , 𝐸) of vertices 𝑉 with an associated set =
models, including our UVA-EpiHiper have participated in the SMH. The {𝑋1 , 𝑋2 , … , 𝑋𝑚 } of health states: (𝑖) transmission of a contagion from
NotreDame-FRED model (Moore et al., 2024) is based on FRED with infectious vertices to susceptible vertices, (𝑖𝑖) disease progression within
modifications for COVID-19 and is mainly used for Indiana. The UF- each vertex that has become infected, and (𝑖𝑖𝑖) interventions which are
ABM model (Pillai et al., 2023) is an agent-based model developed to formal procedures applied to the states of associated set of vertices or
study COVID-19 pandemic in Florida. The COVSIM model (Rosenstrom edges (intervention target) when certain predicate (trigger condition)
et al., 2024) is a stochastic agent-based COVID-19 simulation model is satisfied. We often refer to vertices as people, and although that does
for North Carolina. A few compartmental models have participated not need to be the case, it will be assumed in the following.
2
J. Chen et al. Epidemics 48 (2024) 100779
2.1. Models of disaggregated populations and social contact networks patterns, and in the locations they visit and the contacts they form at
these locations. The heterogeneity is reflected spatially, temporally and
The UVA-EpiHiper model uses a digital twin of the population of the socially (e.g. mixing patterns).
study region. Such a digital twin captures the people with demographic The heterogeneity is manifested in the structural and dynamical
attributes, their partition into households with household attributes, properties of the resulting social contact network 𝐺𝑃 . For example,
an activity sequence for each individual, and a set of residence and the ratio of the number of people to the number of activity locations
activity locations where people conduct their activities. The mapping of will clearly influence the number of simultaneous visits and thus the
activities to locations allows one to infer a contact network which forms potential for interactions (edges). Similarly, the number of activities
the basis for disease transmission in the UVA-EpiHiper model. The and their duration will shape densities and properties (labels) of edges:
construction of the digital twin, which is illustrated in Fig. 1, is carried more activities will typically lead to more interactions, albeit of shorter
out so that the synthetic population and network closely resemble their duration. The network 𝐺𝑃 may thus get a larger average degree 𝑑̄ as
real counterparts on dimensions relevant for epidemic scenarios. people have more activities, but durations of contact will diminish.
In the constructed digital twin, individual demographic attributes Flow data such as the American Community Survey (ACS) commute
include age, gender, race/ethnicity, employment status, etc. Household data (U.S. Census, 2020a) will influence the spatial embedding of 𝐺𝑃 :
attributes include household size, income, and location (latitude/lon- as the average commute distance increases, one would expect the
gitude). The construction of households and individuals is based on network to have ‘‘longer’’ edges (connecting people residing farther
iterative proportional fitting (IPF) (Beckman et al., 1996) using Public away from each other) which in turn could make an epidemic outbreak
Use Microdata Samples (PUMS) (U.S. Census, 2021b)) and US Census spread faster throughout a region and thus be harder to contain.
demographic distributions, and is conducted at the resolution of census To illustrate the resulting heterogeneity, we computed a few struc-
block groups. A set of geographically embedded synthetic locations tural measures for the contact networks 𝐺𝑃 generated across the set
is constructed through detailed modeling and data fusion involving of US states. A more detailed account will be discussed in a separate
PostGIS and machine learning-based techniques, using the Microsoft manuscript that focuses on the construction of the digital twin. In our
Building Data (Microsoft, 2020), HERE (HERE, 2020) and BuildingFoot- analysis, we found that some measures are sensitive to details whereas
printUSA (BuildingFootprintUSA, 2020) point-of-interest (POI) data, others are not. For example, the average degree 𝑑̄ varies across the
National Center of Education Statistics (NCES) (NCES, 2021) data on range 27.59 ≤ 𝑑̄ ≤ 47.43 for the US states while the relative size 𝑟
school and college locations, land-use classification data (HERE, 2020), of a giant component is quite stable satisfying 0.97 ≤ 𝑟 ≤ 1.00. A
and urban/rural classifications (U.S. Census, 2021a). giant component of a network is a connected component that contains
Each individual is then assigned an activity sequence through Clas- a significant fraction of all the nodes and the fraction is called its
sification and Regression Trees (CART) and Finite Volume Method relative size. Focusing on the populations and networks for the states
(FVM) (Lum et al., 2016; Breiman, 1984), using harmonized data of Massachusetts (MA) and Michigan (MI), their average degrees are
from the National Household Travel Survey (NHTS) (U.S. Department 𝑑̄MA = 30.96 and 𝑑̄MI = 39.57. There is, however, virtually no difference
of Transportation, Federal Highway Administration, 2020) and the between MA and MI in the average contact duration (total hours per
American Time Use Survey (ATUS) (U.S. Department of Labor, Bureau person): 𝑇̄MA = 102 and 𝑇̄MI = 103. Their degree distributions and
of Labor Statistics, 2020). Each activity in an individual’s activity core number distributions are shown in Fig. 2, further illustrating
sequence includes a type (e.g. home, work, school, college, shopping, differences in structural characteristics. While such differences may not
religion, or other), a start time, duration in seconds, and a location. impact certain kinds of dynamics over the population networks, it can
The location of each activity is assigned with a set of rules, using still offer valuable insight when determining efficient interventions.
the American Community Survey (ACS) commute flow data (U.S. Cen- This structural insight obtained through the constructive model for
sus, 2020a) and the LEHD Origin–Destination Employment Statistics the populations and arising as emergent properties thereof, could be
(LODES) (U.S. Census, 2020b). The location assignment of activities challenging to capture in agent-based models that do not consider this
can be represented by the people-location network 𝐺𝑃 𝐿 illustrated in level of detail in their design approach.
the middle panel of Fig. 1.
From 𝐺𝑃 𝐿 we derive a contact network 𝐺𝑃 in which vertices are 2.2. Models of within-host disease progression and between-host disease
the people and edges are people–people contacts, as in the right panel transmission
of Fig. 1. We apply an extension of the Erdős–Rényi random graph
model (Erdős and Rényi, 1959) to each location to connect people The disease model in UVA-EpiHiper consists of within-host disease
visiting the location simultaneously. The model is calibrated based progression and between-host disease transmission. The former refers
on SocioPatterns and POLYMOD data (SocioPatterns, 2023; Cattuto to an individual transitioning from one health state to another health
et al., 2010; Mossong et al., 2008; Prem et al., 2017). The network 𝐺𝑃 state independent of other individuals. The latter refers to the disease
forms a baseline for disease transmission in UVA-EpiHiper, and can be being transmitted from an infectious individual A to a susceptible
changed by interventions. A more detailed overview of the construction individual B, causing B to transition to an infected state. The disease
methodology and their validation is provided in Mortveit et al. (2020). progression is represented using a probabilistic timed transition system
(PTTS) (Bisset et al., 2014; Barrett et al., 2008) over the set of health
Other significant efforts on constructing digital twin populations states . PTTS extend the classical finite state machine by allowing
include (Tatem, 2017; Weber et al., 2021; Socioeconomic Data and one to represent probabilistic and timed state transitions as specified
Applications Center, 2020) on gridded populations, and Mistry et al. by per-edge dwell time distributions. For disease transmission, consider
(2021), Wheaton et al. (2009), Gallagher et al. (2018) whose data a susceptible person 𝑃 in health states 𝑋𝑘 who is in contact with an
structure and details are closer to that in our work. Some epidemic infectious persons 𝑃 ′ in state 𝑋𝑖 . We combine the state infectivity 𝜄(𝑋𝑖 )
simulation models construct networks on the fly (Kerr et al., 2021; and state susceptibility 𝜎(𝑋𝑘 ) of their health states with the infectivity
Shattock et al., 2022) or obtain scaling (in terms of population size) by scaling factor 𝛽𝜄 (𝑃 ′ ) of 𝑃 ′ and the susceptibility scaling factor 𝛽𝜎 (𝑃 ) of 𝑃
allowing each agent to represent a given number 𝑘 of actual individuals. to form the propensity 𝜌 associated with the contact configuration 𝑇𝑖,𝑗,𝑘 =
A unique aspect of our digital twin is the level of detail included 𝑇 (𝑋𝑖 , 𝑋𝑗 , 𝑋𝑘 ) for the potential transition of the health state of person 𝑃
and the diverse data sets used to synthesize the twin. As a result, a to 𝑋𝑗 as:
population and the resulting network constructed using our approach [ ] [ ] [ ]
𝜌(𝑃 , 𝑃 ′ , 𝑇𝑖,𝑗,𝑘 , 𝑒) = 𝑇 ⋅𝜏 ×𝑤𝑒 ×𝛼𝑒 × 𝛽𝜎 (𝑃 )⋅𝜎(𝑋𝑖 ) × 𝛽𝜄 (𝑃 ′ )⋅𝜄(𝑋𝑘 ) ×𝜔(𝑇𝑖,𝑗,𝑘 )
has extensive heterogeneity in all aspects including the individuals
and their attributes, in their household structures, in their activity (1)
3
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 1. An illustration of the population and network components used in UVA-EpiHiper. On the left, synthetic people with demographic attributes and household structure are
illustrated along with their assignment to residence locations. Each person is assigned an activity sequence consisting of activities such as work and shopping. These activities are
mapped to appropriate activity locations (e.g., a government worker goes to work at a location with a government classification) as illustrated by the dashed paths. This complete
assignment of activities to locations is formally represented as the people-location network 𝐺𝑃 𝐿 illustrated in the middle panel, which in turn gives rise to a social contact network
𝐺𝑃 as shown on the right. The latter captures person–person contacts, including their duration and location.
Fig. 2. Heterogeneities between states in terms of structural properties of their contact networks 𝐺𝑃 . The top row shows (undirected) degree distributions and the bottom row
shows the core number distributions for Massachusetts (left) and Michigan (right).
In Eq. (1), 𝑇 is the contact duration of the edge 𝑒 = (𝑃 ′ , 𝑃 , 𝑤𝑒 , 𝛼𝑒 , 𝑇 ), 𝑤𝑒 types of hosts, e.g., human and vector. The latter is more common,
is the edge weight, and 𝛼𝑒 is an indicator variable encoding whether where theoretically we can assign to each individual unique values for
or not the edge is active (e.g., disabled because of an ongoing school the parameters (𝑝𝑖 , 𝑑𝑖 ) depending on the individual’s attributes such as
closure). Finally, 𝜔(𝑇𝑖,𝑗,𝑘 ) is the transmission weight of the contact con- their age, their immune profile and medical history. In actual studies,
figuration 𝑇𝑖,𝑗,𝑘 , and 𝜏 is the transmissibility. Propensities are determined we often partition the population by a set of variables, e.g. by age
at each time step and for each person 𝑃 using Eq. (1) for all edges 𝑒 and into age groups. People in the same group have the same disease
contact configurations 𝑇𝑖,𝑗,𝑘 , generating a sequence 𝜌𝑃 . A Gillespie pro- model parameterizations. This substantially reduces the complexity of
cess (Gillespie, 1976, 1977) is used to determine if 𝑃 becomes infected. representing and implementing the disease model in UVA-EpiHiper,
Also, the person 𝑃 ′ , to whom one attributes 𝑃 becoming infected, and improves the computational efficiency of the resulting simulations.
is chosen randomly with probabilities weighted by the corresponding The disease models listed in Table 1 are variations of the COVID-19
propensities. disease model, which is an extension of the classical SEIR model, with
In Fig. 3, we show a simplified version of COVID-19 disease model extra features added over the rounds of scenario modeling to handle
implemented by UVA-EpiHiper in the SMH work. It illustrates the PTTS various scenarios. Their complexity can be represented by the number
and the associated probability 𝑝 and dwell time 𝑑 of each state transi- of states and the number of state transitions.
tion. Each state in the diagram is also associated with susceptibility Our scenario modeling work began long before SMH. In January
𝜎 and infectivity 𝜄, which can be regulated by different susceptibility 2020, we started with COVID v1 model to study asymptomatic ratio
factor 𝛽𝜎 and infectivity factor 𝛽𝜄 for different individuals. In UVA- and various outcomes including hospitalization, ventilation, and death.
EpiHiper, heterogeneities resulting from various diseases and its mani- In March 2020, we augmented our disease model to COVID v2 to
festation can be represented by (𝑖) a different PTTS structure for every implement age stratification. We partition the population into five
individual; (𝑖𝑖) the same PTTS structure across all individuals, but with age groups: p (preschooler, 0–4), s (school age, 5–17), a (adult, 18–
different parameterizations. The former can arise between different 49), o (older adult, 50–64), and g (golden age, 65+), and assign
4
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 3. An illustration of disease models that can be potentially implemented in UVA-EpiHiper. The subgraph connected by solid arrows forms a PTTS for within-host progression.
Each arrow is associated with a state transition probability 𝑝 and a dwell time 𝑑 which may be a random variable. The dashed arrows are associated with disease transmission,
for which other nodes in the network are involved.
Table 1
Variations of COVID-19 disease model implemented by UVA-EpiHiper with different features and complexity.
Disease Features Implementation Complexity
States Transmissions Progressions
COVID v1 asymptomatic state; severe add states to a basic SEIR 13 6 16
outcomes model
COVID v2 v1 + age stratification states/transitions for each age 90 225 100
group; transmissions across
age groups
COVID v3 v2 + vaccines vaccinated states with 105 300 120
different transitions
COVID v4 v3 + multivariant variant-specific infectious 140 600 185
states
COVID v5 v4 + immune waning/escape transition from R to W; 170 975 250
transmission across variants
different probabilities of transition to H (hospitalized) and D (dead) The trigger condition 𝐶 is a Boolean expression formed using time,
states, to model e.g. higher likelihood of severe outcomes in senior sizes of sets, and values of variables. The trigger sets as well as the
people if they are infected with COVID-19. At the end of 2020, when target sets consist of vertices or edges and can be formed using UVA-
vaccines began to be administered, we expanded our model to COVID EpiHiper internal attributes of individuals (nodes) or contacts (edges).
v3 with vaccinated states. Along the rounds of SMH, we extended These attributes can be augmented through an external trait database
the disease model for various doses of vaccines, initially 𝑉1 for being defining properties of individuals or contact locations. The target set
vaccinated with one mRNA dose (either Pfizer or Moderna), 𝑉2 for may be sampled and different operations can be specified for the
being vaccinated with two doses, and 𝑉𝑗𝑗 for being vaccinated with sampled and non-sampled subsets.
Johnson&Johnson vaccine (from round 6 to round 9). When boosters Operations are ordered, first by execution time and second by pri-
started to be administered, we added 𝑉𝑏1 for the first booster (from ority. Sub-sequences of operations of the same priority are processed
round 10 to round 13), then 𝑉𝑏2 for the second booster (round 17). in random order. The operations are organized into the control blocks
These different vaccinated states differ by associated susceptibility, to specified in Listing 1. More details about the semantics of each block
represent different efficacies of corresponding vaccines/doses. From can be found in Appendix A.2. In each operation, a variable is assigned
SMH round 6, to model multiple variants, we upgraded our disease the value of an expression, but the assignment is scheduled for exe-
model to COVID v4 by creating infectious states for each variant and cution after a delay 𝑑 ≥ 0 relative to the current time step. One may
expanded disease transmissions between each combination of infectious additionally assign it an integer priority (default value 0) and a condition
state and susceptible state. From SMH round 11, we updated our which is a Boolean expression that must hold at execution time to avoid
disease model to COVID v5, in which we implemented asymmetric the operation being canceled.
immune escape, where a node with immunity to an older variant
(e.g. Delta), obtained either by infection or by vaccination, can be Intervention complexity. As a result of this formal set-based structure
infected by a newer variant (e.g. Omicron), but not the opposite. In UVA-EpiHiper is able to implement a rich class of interventions without
UVA-EpiHiper, immunity waning is implemented as a discrete process any coding. That is, modelers have full control over the interventions
by state transition from a state with obtained immunity (natural or without the help of programmers. Table 2 describes some common in-
vaccinal) to a state with waned immunity. terventions implemented in UVA-EpiHiper for various studies. Table 4
details their representational complexity. The column ‘‘Traits’’ refers
2.3. Models of pharmaceutical and non-pharmaceutical interventions to the usage of custom, time-varying attributes of nodes, whereas the
column ‘‘Demographics’’ gives the number of fields accessed in the
UVA-EpiHiper interventions describe both pharmaceutical (PIs) and UVA-EpiHiper person trait database. The demographic information in
non-pharmaceutical interventions (NPIs). An UVA-EpiHiper interven- these experiments is only used during initialization. Therefore it has
tion consists of a trigger condition 𝐶, an intervention target 𝑇 , and a limited influence on the running time.
collection of operations that are applied against the variables associated We investigate impacts of the disease model and intervention com-
with the elements of the target, or against variables not attached to plexity with a performance study on the synthetic population of Vir-
target entities (through the once construct). ginia using COVID v2 model in Table 1. The study included eight
5
J. Chen et al. Epidemics 48 (2024) 100779
Table 2
Interventions and their descriptions.
Intervention Description
Voluntary home isolation Individuals who notice symptoms comply voluntarily (sampling with a compliance rate)
with the recommendation to stay at home. All non-home edges of the compliant
individuals are deactivated for 15 days.
School closure All students who go to school or college stay at home. This is implemented by
deactivating edges for which either source or target activity is either school or college.
Note that teachers or other school employees are still in contact with each other.
Stay at home orders These include date dependent stay at home orders (SH), reversal of such orders (RO), as
well as an automated policy where the order is triggered by more than 2000
hospitalizations and reversed once the number drops below 2000. Prior to the
simulation the individuals who will comply with these orders are selected (sampling
with compliance rate) to have the trait complyWhenOrdered. Non-home edges of
either the target or the source node who is complying are deactivated and activated
either at predetermined dates (SH, OR) or according to the prescribed threshold (PS).
Test and isolate asymptotic cases Individuals, up to a number determined by testing capacity, who do not show symptoms
are tested. Voluntary home isolation is applied to those tested with positive results.
Contact tracing distance 1 Close contacts reported by confirmed cases are recommended to quarantine themselves
at home. If they comply (sampling with a compliance rate), their non-home edges are
deactivated for 15 days.
Contact tracing distance 2 Close contacts reported by confirmed cases and the close contacts of these contacts are
recommended to quarantine themselves at home. If they comply (sampling with a
compliance rate), their non-home edges are deactivated for 15 days.
Table 3
Differences in the HPC clusters used for executing EpiHiper simulation workflows.
Rivanna Bridges 2 Anvil
# Total nodes 115 488 1000
# CPU cores per node 40 128 128
RAM per node (GB) 384 256 256
CPU make Intel Xeon Gold 6148 AMD EPYC 7742 AMD EPYC 7763
Network adapters Mellanox ConnectX-5 Mellanox ConnectX-6 Mellanox ConnectX-6
OS CentOS Linux 7 CentOS Linux 8 Rocky Linux 8.7
Filesystem GPFS Lustre GPFS
computational experiments (labeled I through VIII in Table 5) with in- of the population at individual level. We argue that UVA-EpiHiper
creasing complexity. For reference, each experiment was conducted on is able to leverage the details available in these data sets and that
compute nodes with dual CPUs having 20 cores each and 375 GB total the heterogeneities in the input are reflected in the output, which in
memory with its specified collection of interventions for 15 replicates. scenario modeling is the projections of epidemic outcomes.
Fig. 8 shows the clear impact of intervention complexity on running In UVA-EpiHiper, we do not always model the whole history of a
time. For each experiment we show the run time contribution from each pandemic from its very beginning. Specifically, in scenario modeling
of the main simulation tasks (intervention, transmission, update, syn- of the COVID-19 pandemic which started in the U.S. from early 2020,
chronization, output, and initialization). The height of the ■-colored we initialize the system from time 𝑡0 , where 𝑡0 + 𝛿 is the beginning
bar segment shows the time spent executing the interventions across of the scenarios. We choose 𝛿 such that UVA-EpiHiper has sufficient
experiments I through VIII. We find that contact tracing interven- ‘‘ramp-up’’ time to catch up with the most recent development of the
tions (CTD1, CTD2) have higher computational complexity than other epidemic. Based on experimenting, 𝛿 is about 1–2 months. For example,
interventions, as indicated by Table 4. for scenario modeling in March 2022, we initialize our model from
February 2022. We use the history before 𝑡0 to derive the distribution of
3. Heterogeneities in the input data immunity level in the population and initialize each individual’s health
state at 𝑡0 . This way, we do not need to calibrate and compute the whole
UVA-EpiHiper model is data-driven, i.e., it takes disease surveillance trajectory of the system from 𝑡 = 0 to 𝑡 = 𝑡0 . We note that this approach
data and vaccination data to initialize the state of the system including is common in e.g. Influenza modeling, where it is impossible to go back
the initial health state of each individual, and to schedule interventions to the first cases.
for each individual. In this section, we describe (𝑖) how we take age In the COVID-19 model of UVA-EpiHiper, there are four immunity
stratified, county level daily confirmed case data to assign individuals classes: full susceptibility (S), natural immunity (R), vaccinal immunity
to different initial health states and immunity classes, including naively (V), partial immunity (W). The nodes with vaccinal immunity are
susceptible with no immunity, partially susceptible with waned immu- partially protected from infection and severe outcomes. The nodes with
nity, and non-susceptible with full immunity; (𝑖𝑖) how we take state natural immunity are fully protected from infection (in the absence
level daily vaccine administration data stratified by age and dose to of immune escape). The nodes with partial immunity are partially
assign individuals to various vaccinated states at the beginning of the protected from infection. We initialize each node to one of the health
simulation. states corresponding to these immunity classes in the following way.
Data from The New York Times (The New York Times, 2023), based on
3.1. Initializations of UVA-EpiHiper reports from state and local health agencies, provides daily cumulative
number of confirmed cases for each county 𝑐: 𝑋𝑐,𝑡 from Jan. 21st, 2020.
The heterogeneities in UVA-EpiHiper not only originate from its From CDC website, we have the cumulative number of cases in each
models for disease spread and social contact networks, but also come age group at national level, from which we compute a distribution
from the data used to initialize the models and to drive the simulations. of cases among age groups. We apply to 𝑋𝑐,𝑡 to obtain age stratified
In this section, we describe how UVA-EpiHiper takes surveillance data cumulative confirmed cases 𝑋𝑐,𝑎,𝑡 for each county 𝑐, each age group
sets of various resolutions and uses them to initialize the health state 𝑎, and each day 𝑡. With a per-age group case ascertainment rate 𝛼𝑎 we
6
J. Chen et al. Epidemics 48 (2024) 100779
Table 4
UVA-EpiHiper interventions and factors influencing their representational complexity.
Intervention Details Node Edge Set Traits Demographics
sets sets operations
VHI Voluntary home isolation 1 2 2 1 2
SC School closure 0 1 0 1 2
SH, RO, PS order, reverse, alternate 1 1 1 1 2
stay at home order
TA Test and isolation of 1 5 3 2 2
asymptomatic cases
CTD1 Contact tracing distance 1 4 5 5 3 2
CTD2 Contact tracing distance 2 7 7 8 3 2
7
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 4. Projected cumulative number of infections (per 100 K population size) over time after March 13, 2022, across different US states in different scenarios. Each curve is the
average over 50 simulation replicates. Between states, the curves show various differences in magnitude (normalized by population size), trends, and impact of waning and new
variant.
Fig. 5. Projected cumulative number of infections (per 100 K population size) over time after March 13, 2022, across different health districts in the state of Virginia, in different
scenarios. Each curve is the average over 50 simulation replicates. Between health districts, the curves show similar trends and impact of waning and new variant, but different
magnitude (even after normalization by population size).
• Difference in compute node configurations requires separate • Difference in access control requirements makes development
optimization for different compute configurations, otherwise causes of automate workflows for multi-cluster systems difficult.
underuse of compute on systems with low memory/CPU ratios. Based on our application requirements we concluded that a multi-
• Difference in filesystem availability and limits requires cus- cluster workflow pipeline should ideally be able to satisfy the following
tomization of workflows to account for the different setups, and
requirements:
hinders debugging on systems with more expensive filesystems.
• Be able to efficiently execute multi-node distributed memory MPI
• Compute reservations vs. custom QoS (quality of service): This
difference requires extensive planning on systems that only sup- (Message Passing Interface) applications. In particular be able to
port reservations to make large amounts of compute available, leverage existing HPC schedulers (such as Slurm), and use Process
otherwise we risk resource waste in case things do not go as Management Interface (PMI) for quick startup of multi-node MPI
planned. applications.
8
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 6. Spatial heterogeneity — correlation plots: Comparing the overall spread in a state with the spread in top three counties by population. Two scenarios (B and D) and two
states (MA and KS) are considered for comparison. In each case, 50 replicates are considered. The plots are ordered as follows: state followed by top three counties ordered by
decreasing population. The diagonal plots show the histogram of infections per 100 K individuals over the simulation. The off-diagonal plots show the distribution of Pearson’s
correlation coefficient of the infection time series (epicurves) corresponding to each pair of regions. Plots for scenarios A and C are in the appendix.
• Be able to run on modern secure clusters where login and services design was driven by practical considerations and was guided heavily
like ssh are secured using single sign-on with central authenti- by the mult-cluster system development issues described above. While
cation, two-factor authentication, networks isolated with VPNs, many systems existed that addressed some of the above issues — such
etc. as: HPC cluster schedulers like Slurm (Yoo et al., 2003) and PBS (Feng
• Be able to support complex task dependencies via task-dependency et al., 2007), pilot-based systems such as Radical Pilot (Merzky et al.,
graphs. Additionally, they must support dynamic on-the-fly task 2021), multi-cluster schedulers like Argo and Balsam (Childers et al.,
creation, which is important for settings such as calibration. 2017; Salim et al., 2019) and Leiden Grid Infrastructure (Somers,
• Be able to do task semantic-aware fault detection and recovery. 2019), and modern big data and machine learning-oriented schedulers
• Be able to support disparate HPC site-specific configurations.
like Mesos (Hindman et al., 2011), Yarn (Vavilapalli et al., 2013),
• Provide a simple centralized interface to submit tasks that can be
Dask (Rocklin, 2015) and Ray (Moritz et al., 2018), — none of these
run on available HPC clusters.
systems satisfied all of the requirements as stated above. Wormu-
Wormulon. To be able to answer policy driven questions related to lon was developed to satisfy our task-specific needs. This workflow
epidemiological modeling in real-time we developed a custom work- pipeline helps UVA-EpiHiper to handle computational heterogeneities
flow pipeline called Wormulon (Bhattacharya et al., 2023). Wormulon’s and provide real-time epidemiological modeling capability.
9
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 7. Spatial heterogeneity — epicurves: Continuing from Fig. 6, we compare disease spread over time in a state with that in top three counties by population size. Plots are for
a subset of cascades. The x-axis of each plot shows days from simulation starting date (2022-02-13); the y-axis shows new infection counts per 100 K individuals. The title of each
plot contains the cascade number followed by correlation coefficients between the state infection counts and the county infection counts in the descending order of populations.
Plots for scenarios A and C are in the appendix.
5. Example: Scenario modeling hub round 13 was assumed to have the same transmissibility and severity as existing
variants. But an individual with immunity against existing variants has
Round 13 of the Scenario Modeling Hub (SMH) focused on modeling a 30% larger risk to be infected by X. A detailed description of the
the epidemiological implications of waning immunity from previous scenarios can be found at Scenario Modeling Hub (2022). Calibration
infections and vaccinations, and emergence of a new significant SARS- results are shown in Fig. 9.
CoV-2 variant. It was performed in March 2022 and aimed to project The spread pattern of a contagion is a result of the disease char-
epidemic outcomes from mid-March 2022 to mid-March 2023. acteristics (such as virulence and variants) and the dynamics of the
Overall Round 13 models four scenarios: (𝑖) Scenario A: optimistic population affected by it (immunity levels, nature of interactions, epi-
immunity waning with no new variant emergence; (𝑖𝑖) Scenario B: opti- demic response, etc.). To better understand the complex phase space
mistic immunity waning with emergence of variant X; (𝑖𝑖𝑖) Scenario C: of the agent-based model, we conducted several fine-grained analyses
pessimistic immunity waning with no new variant emergence; and (𝑖𝑣) at different scales (node- and node-subset-level) across different Round
Scenario D: pessimistic immunity waning with emergence of variant X. 13 scenarios and different networks (state- and county-level). To this
Immunity includes protections gained from infection and vaccination. end, we apply the graphical viewpoint of simulation outputs; each
Waning is modeled as a transition from an immune state to a partially instance of the simulation output can be viewed as cascade graph
immune state. In the optimistic (or pessimistic) scenarios, the transition ensembles, where each cascade (Newman, 2003) is a highly structured
takes place after a median time of 10 (or 4) months with 40% (or 60%) who-infected-who graph with rich structural (network related) and
reduction in protection compared to the baseline immunity. Variant X dynamical (disease and disease response related) attributes. We study
10
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 8. Impact of intervention complexity on computation time in experiments I–VIII (see text) factored by the main simulation tasks which include ■ intervention, ■ transmission,
■ update, ■ synchronization, ■ output, and ■ initialization. The unit on the 𝑦-axis is seconds. Clearly time spent on ■ intervention increases with the complexity of interventions
involved in the experiment. Specifically, contact tracing interventions (CTD1, CTD2) significantly increase computational complexity of simulations. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 9. Simulated data (black) vs. target data (red) in the calibration period for each state. Figure shows how well calibration result fits the time series of case counts from
surveillance (The New York Times, 2023). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
how heterogeneity in space, age, and socioeconomic status manifest in level details, among all trajectories that lead to the same aggregate
the spread patterns. outcomes. The cascade data generated by UVA-EpiHiper simulations is
an example. This enables one to analyze not only what happens but
Summary of findings. By analyzing the projected epidemic outcomes
also how it happens, and to obtain unique insight regarding potential
in different scenarios of Round 13, as well as detailed simulated cas-
interventions to mitigate disease spread.
cades, we find that heterogeneities in our populations and networks,
disease models, interventions, and initialization data do lead to hetero-
5.1. Heterogeneities between and within states
geneities in the outcomes: 5.1 aggregate disease outcomes are different
between states or at sub-state level in terms of magnitude, trends, or
impact of scenario axes considered by the SMH, and differences in Epidemic outcomes over time. We first analyze how heterogeneity
network structures between states may have contributed to the differ- between and within different state networks impact the projected epi-
ences in disease outcomes; 5.2 disease transmission is dominated by demic outcomes. In Fig. 4 we show projected cumulative number of
interactions between children which mostly occur in schools. We argue infections over time for all scenarios in each US state. We observe dif-
that even when an agent-based model like UVA-EpiHiper produces ferent states have different trends (e.g. between California and Oregon),
results at aggregate levels similar to a compartmental or metapop- different magnitude (e.g. between California and Washington) even
ulation model, it provides the possible trajectories, with individual after normalization to counts per 100 K population, different impact
11
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 10. Decision tree for impacts of immunity waning and emergence of variant X on state level cumulative attack rate. The model includes features on demographics, network
structure, initial immunity level, and vaccine coverage, and is fitted using rpart (Therneau and Atkinson, 2023). Figure (generated by rpart.plot (Milborrow, 2024)) shows
that between-state heterogeneity can be partly explained by average household size, average contact duration (the total hours each individual has with all their contacts, averaged
over all individuals), and initial immunity level (the immunity each individual has at the beginning of projection period averaged over all individuals).
of waning (e.g. minimal gap between scenarios A and C in Maryland than with others. For the MA network, we generally observe that the
but significant difference between A and C in Vermont), or different infection curves in the top three counties are representative of the trend
impact of new variant (e.g. minimal gap between scenarios A and B in for the entire state. One reason for this could be that all these counties
Vermont but significant difference between A and B in Maryland). are geographically very close to each other (around the Boston area),
Based on whether waning and/or new variant X have significant and therefore, there is a lot of mixing between the populations of these
impacts on the cumulative infections (normalized by population size), areas. Fig. 13 (in the appendix) shows high edge density between the
we categorize the states into four classes: neither has a significant im- top county (Middlesex) and the other two counties.
pact, only variant X does, only waning does, and both have significant In the case of KS, we observe a similar trend, but there are instances
impacts. We fit a decision tree using features on demographics (aver- of widely differing infection curves. See for example cascades 2, 4, 7, 8,
age age, average household size), network structure (average contact and 9 in Fig. 7(d). Unlike MA, the top counties of KS are geographically
duration), initial immunity level, and vaccine uptake. We find that the far. Fig. 13 shows relatively low edge density across counties when
most important predictors are average household size, average contact compared with MA. In Fig. 6, many outliers can be observed for KS in
duration, and initial immunity — see Fig. 10. both Scenarios B and D. This can be attributed to the spatial uncertainty
On the other hand, in Fig. 5 we observe that different health districts in the weekly importation rates of the new variant in the case of
in Virginia seem to have similar trends of cumulative infections over Scenarios B and D. In the case of Scenarios A and C, we generally
time, as well as similar impact of waning and new variant.1 But we do observe very high correlation values.
observe different magnitude in infection numbers in different health
districts even after normalization to counts per 100 K people — higher 5.2. Heterogeneities between demographic groups
in Northern Virginia but much lower in the southwest districts.
Here we study the transmissions between and within subpopulations
Detailed comparison between two states. The heterogeneity in pop-
induced by different age groups. Again we consider the KS and MA
ulation distributions across the study region induces subgraphs of
networks. We partition each state population into three subpopulations:
varying density in the contact network. How does the spread in dense
children 𝑐 (0–17), adults 𝑎 (18–64), and elderly 𝑔 (65+). Fig. 14 (in
subgraphs compare with the spread in the whole network? In this
the appendix) shows the number of infections per cascade by age
analysis, we consider Massachusetts (MA) and Kansas (KS) which,
group relative to the subpopulation sizes across scenarios. We first note
comparatively have very different county population distributions. MA
that in both the KS and MA networks, approximately 60% of nodes
has one large county, with population around double that of the next
are adults and 25% are children. However, the number of infection
largest county, while the top two counties of KS are comparable in size.
events (including reinfections) do not reflect the subpopulation sizes.
We analyze the evolution of the infection count over time in the state
The share of infections for the children is around 40% while this
and in each of the top three counties, and compare them with each
subpopulation only constitutes around 25% of the total population. The
other. For scenarios B and D, both of which correspond to emergence of
higher attack rate in the children, comparing with the other groups,
a new variant, we show within-state correlations in Fig. 6 and projected
can be explained by the initial immunity level and node degrees in
epidemic curves in Fig. 7. The plots for scenarios A and C are in Figs. 11
the contact network. Children have higher attack rates than adults due
and 12 in the appendix.
to lower initial immunity levels in children (see Fig. 15(a)). On the
In Fig. 6 we observe that, in both networks and for both scenarios,
other hand, children have higher attack rates than elderly due to higher
the state counts are generally more correlated with the top county
network degrees of children nodes (see Fig. 15(b)).
We use labeled path motif counts to characterize and quantify the
1
It is to be noted that while the team had access to heterogeneous vaccine transmissions. For example, a child-infecting-adult event is denoted
uptake data within Virginia, these were not used for the SMH models, to ensure by 𝑐 → 𝑎. We also consider longer chains of transmission, such as
uniform approach across states. This could partially explain the apparent paths of length two. The results of the KS network are in Fig. 16
similarity across Virginia health districts. in the appendix. We observe that transmissions within and across
12
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 11. Continued from Fig. 6: Spatial heterogeneity — correlation plots: Comparing the overall spread in a state with the spread in top three counties by population. Two
scenarios (A and C) and two states (MA and KS) are considered for comparison. In each case, 50 replicates are considered. The plots are ordered as follows: state followed by top
three counties ordered by decreasing population size. The diagonal plots show the histogram of number of infections per 100 K individuals over the simulation. The off-diagonal
plots show the distribution of Pearson’s correlation coefficient of the infection time series (epicurves) corresponding to each pair of regions.
subpopulations of children and adults dominate the counts. These Role of ABMs in epidemic scenario modeling. Modeling environments
correspond to mainly school, workplace, and household transmissions. such as UVA-EpiHiper (also see other papers in this issue on use of
On average, the number of 𝑐 → 𝑐 counts (and 𝑐 → 𝑐 → 𝑐 counts among agent-based models (Moore et al., 2024; Pillai et al., 2023; Rosenstrom
paths of length 2) are the highest, mainly corresponding to school and et al., 2024)) provide diversity to the pool of models used in the SMH.
household transmissions. Many of the transmissions to the elderly are They are often computationally resource intensive and challenging
through household contacts. Note that the number of 𝑔 → 𝑎 → 𝑔 counts from the standpoint of software maintenance and model enhancements.
is much higher than 𝑔 → 𝑐 → 𝑔. The former count corresponds to On the other hand, such models allow one to: (𝑖) get a more detailed
transmissions in households as well as in assisted living facilities, while picture of the epidemic spread and incorporate the diverse data sets
the latter corresponds to only household transmissions. We have similar that are often available; (𝑖𝑖) look at the model output in ways that
observations for the MA network in Fig. 17 in the appendix. are not typically possible in compartmental and statistical models,
e.g. looking at the transmission trees to understand chains of trans-
6. Discussion mission; and (𝑖𝑖𝑖) study the impact of the epidemic and intervention
on any desired subgroup so far as it is represented in the digital twin.
In Section 1, we raised two broad questions that naturally arise The last part is important — the basic premise is that models do
when developing and using ABMs such as UVA-EpiHiper: (𝑖) when are not have to be made for every question that might arise but a more
such models needed and (𝑖𝑖) how does one validate such models. Both general model can be used. For instance, a simple compartmental model
are important questions, we discuss them briefly below. might not have age-stratified population. Making inferences on age
13
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 12. Spatial heterogeneity — epicurves: continued from Fig. 7. We compare disease spread over time in a state with that in top three counties by population size. The x-axis
of each plot shows days from simulation starting date (2022-02-13); the y-axis shows new infection counts per 100 K individuals. Plots are for a subset of cascades. The title of
each plot contains the cascade number followed by correlation coefficients between the state infection counts and the county infection counts in the descending order of population
size. Plots for Scenarios B and D are in the main text.
Fig. 13. Edge densities within and across counties. For the non-diagonal entries, the plot shows the ratio of number of edges between two counties to the population of the county
in the row. For the diagonal entries, it is the ratio of the number of edges in the county to the population of the county.
14
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 14. Fraction of infections per age group and its comparison with population sizes. For each age group, we show the number of infections as a fraction of total infections in
different scenarios. It seems about 40% of all infections belong to 𝑐 and 𝑎 while 20% belong to 𝑔. We also show the number of individuals in each age group as a fraction of
total population. Clearly adults (𝑎) has a lower attack rate relative to its group size. This is true in both networks and all scenarios.
Fig. 15. (a) In all scenarios, initial immunity levels in children and elderly are lower than in adults. (b) Children and adults have more contacts than elderly. The higher attack
rate in children is due to the combined effect from both (a) and (b). Figure shows analysis results for MA. The results for KS are similar.
Fig. 16. Transmission motifs for studying the propagation within and across different subpopulations. In this case, we have plotted labeled path motif counts corresponding to
adults 𝑎, children 𝑐 and elderly 𝑔. We have shown counts of labeled path motifs of lengths 1 and 2 in the transmission cascade. Each row focuses on one subpopulation. The left
column corresponds to how the rest of the population affects the focus population. The center column corresponds to how the focus population affects the rest of the population.
The right column corresponds to how the transmission occurs from focus population to itself. The results are for the KS network.
15
J. Chen et al. Epidemics 48 (2024) 100779
Fig. 17. Continued from Fig. 16: Transmission motifs for studying the propagation within and across different subpopulations. In this case, we have plotted labeled path motif
counts corresponding to adults 𝑎, children 𝑐 and elderly 𝑔. We have shown counts of labeled path motifs of lengths 1 and 2 in the transmission cascade. Each row focuses on one
subpopulation. The left column corresponds to how the rest of the population affects the focus population. The center column corresponds to how the focus population affects the
rest of the population. The right column corresponds to how the transmission occurs from focus population to itself. The results are for the MA network.
stratified populations would need compartmental models to include age Over the last two decades, we have developed a formal computational
stratification. But now if one wants to understand the heterogeneity in theory of coevolving graphical dynamical systems (CGDS) (Adiga et al.,
space, the model will have to add compartments for say each county 2019). The theory allows us to address questions related to structural
and each age group. In agent-based models such as UVA-EpiHiper, one validation; e.g. comparing two simulations, ensuring the networks are
gets this for free and it is largely a question of analyzing the output synthesized correctly, etc. Nevertheless this remains an active area of
data than constantly changing the model structure. Another use of research and much needs to be done in terms of developing formal
such models in our opinion is that they provide an impetus to collect methods.
highly resolved data sets. The advent of personalized digital devices has
Limitations. The heterogeneities in an agent-based model are limited
already facilitated collection of highly detailed and personalized data
by the heterogeneities in the input data. For example, health dis-
sets. We believe these data sets can be used to initialize and calibrate
parities between race and ethnicity groups can be modeled directly
ABMs leading to a more nuanced picture of the epidemic outbreak,
by UVA-EpiHiper, but meaningfully only if we have relevant data,
see Grekousis and Liu (2021), Aleta et al. (2020), Cencetti et al. (2021),
such as surveillance of cases and deaths and vaccine coverage in each
Vogt et al. (2022) for further discussion on this topic.
racial/ethnic group. We did not model racial/ethnic disparities in SMH
Validating agent-based epidemic models used for scenario modeling. Vali- rounds until the recent equity round, where we have race/ethnicity
dation of complex systems and large ABMs is challenging as well (Car- specific data for California and North Carolina to calibrate our model.
ley, 2017; Adiga et al., 2019; Popper, 2005; Oreskes, 2003; Forrester, In SMH work, for the calibration of UVA-EpiHiper we have mainly used
1971; Senge and Forrester, 1980; Oreskes, 2018). When using such surveillance data on confirmed cases as the target. As the collection
models for scenario modeling, we can consider three components of of such data was discontinued by The New York Times (The New
validation: (𝑖) Data (or external) validation: comparing model output York Times, 2023) in March 2023 and other agents (e.g. The Johns
data with real life, in-situ, and in-vivo measurements where state– Hopkins Coronavirus Resource Center (The Johns Hopkins Coronavirus
space predictions by the model are matched with measured data; (𝑖𝑖) Resource Center, 2023)), we have to look for other data sources as
Structural validation: ensure that local functions or rules used to repre- the calibration target. Wastewater surveillance data is an option. We
sent agent (component) interaction, behavior, and decision-making are plan to expand our digital twin with an additional layer of wastew-
correct and adequate; and (𝑖𝑖𝑖) Functional validation: the model should ater surveillance and update the UVA-EpiHiper pipeline to integrate
reproduce global well-known structural features of the complex system wastewater data-based calibration.
that is being modeled. In general, data validation alone is not adequate
for large scale ABMs, where data matching exercises are usually post- 7. Concluding remarks
dictions of historical information such as matching an epidemiological
model output to an infection time series of the 1968 flu season. While We described UVA-EpiHiper modeling framework that has been
useful, such examination can also be misleading and inadequate. In used to support the US COVID-19 SMH over the last 2.5 years. This
general, postdiction is challenging for SMH; see Runge et al. (2023) along with UVA-adaptive (Porebski et al., 2024) were two models used
for further discussions. Nevertheless, one way to do this is to con- by the UVA team throughout the pandemic. The development, exten-
sider synthetic data based scenarios. We have begun initial discussion sion, and use of the model as the pandemic evolved required significant
on such scenarios as a future SMH round. Beyond retroactive and efforts by the team. New problems arose as pandemic evolved and in
predictive validity, external validity should also reproduce important general the models had to be updated constantly. The SMH played an
features of the state–space of the complex system that is being modeled. important role in guiding the development of the model. In each round
16
J. Chen et al. Epidemics 48 (2024) 100779
the scenarios were novel and required new capabilities to be added to Appendix
the modeling environment. The lively discussions that took place on
Fridays proved invaluable in this regard.
As we move forward, the UVA-EpiHiper modeling framework will A.1. EpiHiper: HPC-enabled simulation platform for agent-based epidemic
need to be enhanced for new questions that are likely to arise. This models
includes: (𝑖) further improving the performance of the system; (𝑖𝑖)
new capabilities in terms of modeling multi-network dynamical pro- The UVA-EpiHiper modeling process was conducted with a clear
cesses (e.g. modeling mask wearing or hesitancy and its coevolution path towards a highly efficient and scalable implementation. It is
at individual level), (𝑖𝑖𝑖) taking new data sources into account to based on our simulation engine, EpiHiper, which can routinely handle
improve model calibration, (𝑖𝑣) modeling inter-state disease transmis-
populations of size 108 − 109 and their detailed contact networks. The
sions, and (𝑣) network-aware initializations that take into account
EpiHiper software architecture is a hybrid MPI/OpenMP design that is
different susceptibility levels of nodes due to their network properties
implemented in C++ for high performance. The contact networks can
(e.g. degree).
be represented either as text or binary files, with the option to perform
pre-partitioning for the desired target combinations of compute nodes
CRediT authorship contribution statement
and cores. The population (vertices) and their contacts (edges) can
Jiangzhuo Chen: Conceptualization, Data curation, Formal analy- be equipped with customizable traits which are exposed to EpiHiper
sis, Funding acquisition, Investigation, Methodology, Project adminis- through a Postgres database. This database, which can be shared among
tration, Resources, Software, Validation, Visualization, Writing – orig- computational experiments, has been finely tuned to handle a large
inal draft, Writing – review & editing. Parantapa Bhattacharya: Data number of simultaneous queries, particularly as they occur at the
curation, Formal analysis, Methodology, Resources, Software, Vali- initialization stage of large EpiHiper compute jobs. Similarly, EpiHiper
dation, Visualization, Writing – original draft, Writing – review & supports a location trait database. As detailed in Section 2.1, each
editing. Stefan Hoops: Data curation, Formal analysis, Methodology, edge is associated with a location, and locations can be augmented
Resources, Software, Validation, Visualization, Writing – original draft, with attributes and presented to EpiHiper through this database. This
Writing – review & editing. Dustin Machi: Funding acquisition, Re- provides a flexible approach to modulating transmission by location (or
sources, Software, Writing – review & editing. Abhijin Adiga: Data location type) and constructing highly location-specific interventions.
curation, Formal analysis, Validation, Visualization, Writing – original The same applies to interventions cast in terms of the person/edge trait
draft, Writing – review & editing. Henning Mortveit: Data curation, database.
Formal analysis, Validation, Visualization, Writing – original draft, One of the key designs that set EpiHiper apart from other epidemic
Writing – review & editing. Srinivasan Venkatramanan: Data cura- simulation tools (e.g., (Bershteyn et al., 2018; Ferguson et al., 2020;
tion, Funding acquisition, Visualization, Writing – review & editing. Hinch et al., 2021; Kerr et al., 2021; Shattock et al., 2022)) is that
Bryan Lewis: Conceptualization, Investigation, Methodology, Writing – the disease models and interventions are specified externally. Many mod-
review & editing. Madhav Marathe: Conceptualization, Funding acqui- els support configuration of existing models and interventions. New
sition, Investigation, Methodology, Project administration, Resources, disease models or interventions will require adding new code to the
Supervision, Validation, Writing – original draft, Writing – review & simulation code base. This design decision for EpiHiper was to cleanly
editing. disentangle this aspect and, at least in principle, lower the bar for ease-
of-use by removing the need for programming skills from the user as
Declaration of competing interest well as the need to understand the software design and implementa-
tion encountering such cases. Further while initializing the simulation
The authors declare that they have no known competing finan- extensive checks are performed in order to assure contact network, the
cial interests or personal relationships that could have appeared to
person and location trait database, the contagion model, initialization,
influence the work reported in this paper.
and the interventions are consistent and all required operations can
be performed during runtime. If validation fails a detailed message
Acknowledgments
pinpointing the problem location in the configuration files is generated
to help problem solving.
We thank the members of UVA Biocomplexity Institute and UVA Re-
search computing. We also thank all members of the SMH for lively dis-
cussions throughout the last few years; these discussions were helpful in A.2. Semantics of interventions in UVA-EpiHiper
creating many of the extensions reported in the paper. In particular, we
thank Cecile Viboud for carefully going through our model during the Semantics of intervention blocks. The operations within the once block
consortium’s own ‘‘peer review process’’. Her questions and suggestions are executed whenever the trigger condition 𝐶 holds, even if the
were thoughtful and helped us improve our models a great deal. Finally, target set is empty. It is used to set variables that are not attached
we thank staff members at the PSC, Purdue Supercomputing center and to elements of the intervention target (e.g., the number of available
UVA Research computing for their exceptional help and support over vaccines on a given day). All operations within the foreach block
the past three years to ensure that our studies could be done in a timely are applied to the matching variables of the target elements. Aspects
manner. This work was supported in part by the following grants: Uni- such as compliance are handled through the sampling block: several
versity of Virginia Strategic Investment Fund (Award Number SIF160), sampling methods are supported where operations are applied to the
National Science Foundation Grants CCF-1918656 (Expeditions), OAC- sampled and nonsampled sets. We note that recursive application of
1916805 (CINES), IIS-1955797, VDH, United States Grant PV-BII VDH operation ensembles are supported in the sampling control structure.
COVID-19 Modeling Program VDH-21-501-0135, DTRA subcontrac-
t/ARA S-D00189-15-TO-01-UVA, NIH 2R01GM109718-07, and CDC Semantics of operations. The syntax of operations is provided in Listing
MIND cooperative agreement U01CK000589. This work used resources, 1. In an operation, a variable is assigned the value of an expression, the
services, and support from the COVID-19 HPC Consortium (https:// assignment being scheduled for execution using a non-negative offset
covid19-hpcconsortium.org/), a private–public effort uniting govern- delay relative to the current time step. The assignment may optionally
ment, industry, and academic leaders who are volunteering free com- be assigned an integer priority (default value 0) and a condition
pute time and resources in support of COVID-19 research as well as (default value True) which is a Boolean expression that must hold at
computing resources provided by the NSF XSEDE program. the scheduled execution time for the assignment to be carried out.
17
J. Chen et al. Epidemics 48 (2024) 100779
Operation execution. All operations enter a priority queue which is Barrett, Christopher, Eubank, Stephen, Marathe, Achla, Marathe, Madhav,
sorted first by scheduled execution time and second by priority. Within Swarup, Samarth, 2015. Synthetic information environments for policy informatics:
a time step, all operations scheduled are processed in priority order. a distributed cognition perspective. In: Governance in the Information Era.
Routledge, pp. 285–302.
Collections of operations of equal priority are processed in random
Beckman, Richard J., Baggerly, Keith A., McKay, Michael D., 1996. Creating synthetic
order. Finally, an operation is executed only if its condition is true at
baseline populations. Transp. Res. A 30 (6), 415–429.
the time of processing. Priorities and conditions allow for fine grained Bershteyn, Anna, Gerardin, Jaline, Bridenbecker, Daniel, Lorton, Christopher W.,
conflict resolution. Regarding the processing order of interventions, Bloedow, Jonathan, Baker, Robert S., et al., 2018. Implementation and applications
one needs to pay careful attention when designing interventions where of EMOD, an individual-based multi-disease modeling platform. Pathog. Dis. 76 (5).
the order of operations may matter. It is the responsibility of the user Bhattacharya, Parantapa, Chen, Jiangzhuo, Hoops, Stefan, Machi, Dustin, Lewis, Bryan,
constructing the (set of) interventions to assign priorities and conditions Venkatramanan, Srinivasan, et al., 2023. Data-driven scalable pipeline using
to ensure interventions are applied in the intended order. national agent-based models for real-time pandemic response and decision support.
Int. J. High Perform. Comput. Appl. 37 (1), 4–27.
Set construction. UVA-EpiHiper interventions can target any subset of Bhattacharya, Parantapa, Machi, Dustin, Chen, Jiangzhuo, Hoops, Stefan, Lewis, Bryan,
individuals or edges. Set elements can be selected by internal at- Mortveit, Henning, et al., 2021. AI-driven agent-based models to study the role
tributes which may be user defined (traits). Furthermore the external of vaccine acceptance in controlling COVID-19 spread in the US. In: 2021 IEEE
International Conference on Big Data. pp. 1566–1574.
PostgreSQL database allows a user to define arbitrary properties of
Bisset, Keith R., Chen, Jiangzhuo, Deodhar, Suruchi, Feng, Xizhou, Ma, Yifei,
individuals and locations. Association between UVA-EpiHiper and the Marathe, Madhav V., 2014. Indemics: An interactive high-performance computing
database are achieved through unique IDs. Thus sets created through framework for data-intensive epidemic modeling. ACM Trans. Model. Comput.
internal attributes or external properties can be combined through set Simul. 24 (1).
operations (intersection and union). This allows the user to construct Bisset, Keith R., Chen, Jiangzhuo, Feng, Xizhou, Kumar, V.S. Anil, Marathe, Madhav V.,
any subset of individuals or edges which may be used as targets or in 2009. EpiFast: a fast algorithm for large scale realistic epidemic simulations on
triggers. distributed memory systems. In: Proceedings of the 23rd International Conference
on Supercomputing. pp. 430–439.
Bouchnita, Anass, Bi, Kaiming, Fox, Spencer J., Meyers, Lauren Ancel, 2024. Projecting
Listing 1: The UVA-EpiHiper block structure for operations used in Omicron scenarios in the US while tracking population-level immunity. Epidemics
interventions expressed in grammar form. 100746.
operationEnsemble := Breiman, L., 1984. Classification and regression trees. In: Wadsworth statistics/proba-
once <o p e r a t i o n L i s t > bility series, Wadsworth International Group, New York.
foreach <o p e r a t i o n L i s t > BuildingFootprintUSA, 2020. https://www.buildingfootprintusa.com/. (Last accessed 1
sampling <s a m p l i n g S p e c i f i c a t i o n > April 2020).
sampled <operationEnsemble >
Carley, Kathleen M., 2017. Validating Computational Models. Technical Report
nonsampled <operationEnsemble >
CMU-ISR-17-105, Institute for Software Research, Carnegie Mellon University.
s a m p l i n g S p e c i f i c a t i o n := Cattuto, Ciro, Van den Broeck, Wouter, Barrat, Alain, Colizza, Vittoria, Pinton, Jean-
( François, Vespignani, Alessandro, 2010. Dynamics of person-to-person interactions
relativeSampling ( individual | group ) <percentage > | from distributed RFID sensor networks. PLoS One 5 (7), e11596.
absoluteSampling <i n t e g e r >
Cencetti, Giulia, Santin, Gabriele, Longa, Antonio, Pigani, Emanuele, Barrat, Alain, Cat-
)
tuto, Ciro, Lehmann, Sune, Salathe, Marcel, Lepri, Bruno, 2021. Digital proximity
o p e r a t i o n L i s t := <o p e r a t i o n >+ tracing on empirical contact networks for pandemic control. Nat. Commun. 12 (1),
1655.
o p e r a t i o n := <v a r i a b l e > <o p e r a t o r > <e x p r e s s i o n > \ Centers for Disease Control and Prevention, 2023. COVID data tracker. https://covid.
delay(< i n t e g e r >) \ cdc.gov/covid-data-tracker/. (Last accessed 24 March 2023).
[ p r i o r i t y (< i n t e g e r >) ] \
Chen, Jiangzhuo, Hoops, Stefan, Lewis, Bryan L., Mortveit, Henning S., Venkatra-
[ condition (< b o o l _ e x p r e s s i o n >) ]
manan, Srini, Wilson, Amanda, 2019. EpiHiper: Modeling and Implementation.
o p e r a t o r := ( = | ∗= | /= | += | −= ) Technical Report 2019-003, NSSAC, University of Virginia.
Chen, Jiangzhuo, Marathe, Achla, Marathe, Madhav, 2010. Coevolution of epidemics,
social networks, and individual behavior: a case study. In: Advances in Social
Computing: Proceedings of the Third International Conference on Social Computing,
Behavioral Modeling, and Prediction. pp. 218–227.
References
Chen, Jiangzhuo, Marathe, Achla, Marathe, Madhav, 2018. Feedback between
behavioral adaptations and disease dynamics. Sci. Rep. 8 (1), 12452.
Adiga, Abhijin, Barrett, Chris, Eubank, Stephen, Kuhlman, Chris J., Marathe, Madhav V.,
Childers, J. Taylor, Uram, Thomas D., Benjamin, Doug, LeCompte, Thomas J.,
Mortveit, Henning, et al., 2019. Validating agent-based models of large networked
Papka, Michael E., 2017. An Edge Service for Managing HPC Workflows. In:
systems. In: 2019 Winter Simulation Conference. WSC, IEEE, pp. 2807–2818.
Proceedings of the Fourth International Workshop on HPC User Support Tools.
Aleta, Alberto, Martin-Corral, David, Pastore y Piontti, Ana, Ajelli, Marco, Litvi-
Erdős, P., Rényi, A., 1959. On random graphs I. Publ. Math. Debrecen 6, 290–297.
nova, Maria, Chinazzi, Matteo, Dean, Natalie E., Halloran, M. Elizabeth,
Eubank, Stephen, Guclu, Hasan, Kumar, V.S. Anil, Marathe, Madhav V., Srinivasan, Ar-
Longini, Jr., Ira M., Merler, Stefano, et al., 2020. Modelling the impact of testing,
avind, Toroczkai, Zoltan, Wang, Nan, 2004. Modelling disease outbreaks in realistic
contact tracing and household quarantine on second waves of COVID-19. Nat. Hum.
urban social networks. Nature 429 (6988), 180–184.
Behav. 4 (9), 964–971.
Feng, Hanhua, Misra, Vishal, Rubenstein, Dan, 2007. PBS: A unified priority-based
Barrett, Chris, Beckman, Richard, Bisset, Keith, Chen, Jiangzhuo, DuBois, Thomas,
scheduler. In: Proceedings of the 2007 ACM SIGMETRICS International Conference
Eubank, Stephen, Kumar, V.S. Anil, Lewis, Bryan, Marathe, Madhav V., Srini-
on Measurement and Modeling of Computer Systems. pp. 203–214.
vasan, Aravind, et al., 2012. Optimizing epidemic protection for socially essential
workers. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Ferguson, Neil, Laydon, Daniel, Nedjati Gilani, Gemma, Imai, Natsuko, Ainslie, Kylie,
Symposium. pp. 31–40. Baguelin, Marc, et al., 2020. Report 9: Impact of non-pharmaceutical interventions
Barrett, Christopher L., Bisset, Keith R., Eubank, Stephen G., Feng, Xizhou, (NPIs) to reduce COVID19 mortality and healthcare demand.
Marathe, Madhav V., 2008. EpiSimdemics: An efficient algorithm for simulating Forrester, Jay W., 1971. Counterintuitive behavior of social systems. Theory Decis. 2
the spread of infectious disease over large realistic social networks. In: SC ’08: (2), 109–140.
Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. pp. 1–12. Gallagher, Shannon, Richardson, Lee F., Ventura, Samuel L., Eddy, William F., 2018.
Barrett, Chris, Bisset, Keith, Leidig, Jonathan, Marathe, Achla, Marathe, Madhav, 2011a. SPEW: Synthetic populations and ecosystems of the world. J. Comput. Graph.
Economic and social impact of influenza mitigation strategies by demographic class. Statist. 27 (4), 773–784.
Epidemics 3 (1), 19–31. Gillespie, D.T., 1976. A general method for numerically simulating the stochastic time
Barrett, Christopher L., Eubank, Stephen, Marathe, Achla, Marathe, Madhav V., evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434.
Pan, Zhengzheng, Swarup, Samarth, 2011b. Information integration to support Gillespie, Daniel T., 1977. Exact stochastic simulation of coupled chemical reactions.
model-based policy informatics. Innov. J. 16 (1). J. Phys. Chem. 81 (25), 2340–2361.
18
J. Chen et al. Epidemics 48 (2024) 100779
Grefenstette, John J., Brown, Shawn T., Rosenfeld, Roni, DePasse, Jay, Rosenstrom, Erik T., Ivy, Julie S., Mayorga, Maria E., Swann, Julie L., 2024. COVSIM: A
Stone, Nathan T.B., Cooley, Phillip C., et al., 2013. FRED (A Framework for stochastic agent-based COVID-19 SIMulation model for North Carolina. Epidemics
Reconstructing Epidemic Dynamics): an open-source software system for modeling 46, 100752.
infectious diseases and control strategies using census-based populations. BMC Runge, Michael C., Shea, Katriona, Howerton, Emily, Yan, Katie, Hochheiser, Harry,
Public Health 13 (1), 940. Rosenstrom, Erik, Probert, William J.M., Borchering, Rebecca, Marathe, Madhav V.,
Grekousis, George, Liu, Ye, 2021. Digital contact tracing, community uptake, and Lewis, Bryan, Venkatramanan, Srinivasan, Truelove, Shaun, Lessler, Justin, Vi-
proximity awareness technology to fight COVID-19: a systematic review. Sustain. boud, Cécile, 2023. Scenario design for infectious disease projections: Integrating
Cities Soc. 71, 102995. concepts from decision analysis and experimental design. medRxiv.
Halloran, M. Elizabeth, Ferguson, Neil M., Eubank, Stephen, Longini, Jr., Ira M., Salim, Michael A., Uram, Thomas D., Childers, J. Taylor, Balaprakash, Prasanna,
Cummings, Derek A.T., Lewis, Bryan, et al., 2008. Modeling targeted layered Vishwanath, Venkatram, Papka, Michael E., 2019. Balsam: Automated Scheduling
containment of an influenza pandemic in the United States. Proc. Natl. Acad. Sci. and Execution of Dynamic, Data-Intensive HPC Workflows. https://arxiv.org/abs/
105 (12), 4639–4644. 1909.08704v1.
HERE, 2020. http://www.here.com. (Accessed April 2020). Scenario Modeling Hub, 2022. COVID-19 Scenario Modeling Hub Round 13
Hinch, Robert, Probert, William JM, Nurtay, Anel, Kendall, Michelle, Wymant, Chris, scenarios. https://github.com/midas-network/covid19-scenario-modeling-
Hall, Matthew, et al., 2021. OpenABM-Covid19—An agent-based model for non- hub/blob/master/previous-rounds/README_Round13.md.
pharmaceutical interventions against COVID-19 including contact tracing. PLoS Scenario Modeling Hub, 2023a. The COVID-19 Scenario Modeling Hub. https://
Comput. Biol. 17 (7), e1009146. covid19scenariomodelinghub.org/. (Last accessed 31 December 2023).
Hindman, Benjamin, Konwinski, Andy, Zaharia, Matei, Ghodsi, Ali, Joseph, Anthony D., Scenario Modeling Hub, 2023b. Flu Scenario Modeling Hub. https://
Katz, Randy H., Shenker, Scott, Stoica, Ion, 2011. Mesos: A platform for fine-grained fluscenariomodelinghub.org/. (Last accessed 31 December 2023).
resource sharing in the data center. In: Proceedings of the 8th USENIX Conference Scenario Modeling Hub, 2023c. RSV Scenario Modeling Hub. https://github.com/midas-
on Networked Systems Design and Implementation. NSDI, pp. 295–308. network/rsv-scenario-modeling-hub. (Last accessed 31 December 2023).
Kerr, Cliff C., Stuart, Robyn M., Mistry, Dina, Abeysuriya, Romesh G., Rosenfeld, Kather- Scenario Modeling Hub, 2023d. The European COVID-19 Scenario Hub. https://
ine, Hart, Gregory R., et al., 2021. Covasim: an agent-based model of COVID-19 covid19scenariohub.eu/. (Last accessed 31 December 2023).
dynamics and interventions. PLoS Comput. Biol. 17 (7), e1009149. Senge, Peter M., Forrester, Jay W., 1980. Tests for building confidence in system
Lum, Kristian, Chungbaek, Youngyun, Eubank, Stephen, Marathe, Madhav, 2016. A dynamics models. Syst. Dyn. TIMS Stud. Manag. Sci. 14, 209–228.
two-stage, fitted values approach to activity matching. Int. J. Transp. 4, 41–56. Shattock, Andrew J., Le Rutte, Epke A., Dünner, Robert P., Sen, Swapnoleena,
Merzky, André, Turilli, Matteo, Titov, Mikhail, Saadi, Aymen Al, Jha, Shantenu, 2021. Kelly, Sherrie L., Chitnis, Nakul, Penny, Melissa A., 2022. Impact of vaccination
Design and performance characterization of RADICAL-pilot on leadership-class and non-pharmaceutical interventions on SARS-CoV-2 dynamics in Switzerland.
platforms, CoRR abs/2103.00091. Epidemics 38, 100535.
Microsoft, 2020. U.S. building footprints. https://github.com/Microsoft/ Shoukat, Affan, Wells, Chad R., Langley, Joanne M., Singer, Burton H., Galvani, Ali-
USBuildingFootprints. son P., Moghadas, Seyed M., 2020. Projecting demand for critical care beds during
Milborrow, Stephen, 2024. Rpart.plot: Plot ‘rpart’ models: An enhanced version of COVID-19 outbreaks in Canada. Can. Med. Assoc. J. 192 (19), E489–E496.
‘plot.rpart’. Socioeconomic Data and Applications Center, 2020. Gridded population of the
Mistry, Dina, Litvinova, Maria, Pastore y Piontti, Ana, Chinazzi, Matteo, Fu- world (GPW), v4. https://sedac.ciesin.columbia.edu/data/collection/gpw-v4. (Last
manelli, Laura, Gomes, Marcelo F.C., et al., 2021. Inferring high-resolution human accessed 1 April 2020).
mixing patterns for disease modeling. Nature Commun. 12 (1), 323. SocioPatterns, 2023. http://www.sociopatterns.org/. (Last accessed 17 February 2023).
Moore, Sean, Cavany, Sean, Perkins, T. Alex, Espana, Guido Felipe Camargo, 2024. Somers, Mark, 2019. Leiden Grid Infrastructure. https://lgi.tc.lic.leidenuniv.nl/LGI/
Projecting the future impact of emerging SARS-CoV-2 variants under uncertainty: docs/LGI.pdf.
modeling the initial omicron outbreak. Epidemics 47, 100759. Srivastava, Ajitesh, 2023. The variations of SIkJalpha model for COVID-19 forecasting
Moritz, Philipp, Nishihara, Robert, Wang, Stephanie, Tumanov, Alexey, Liaw, Richard, and scenario projections. Epidemics 45, 100729.
Liang, Eric, et al., 2018. Ray: A distributed framework for emerging AI applications. Tatem, Andrew J., 2017. WorldPop, open data for spatial demography. Sci. Data 4.
In: 13th USENIX Symposium on Operating Systems Design and Implementation The Johns Hopkins Coronavirus Resource Center, 2023. COVID-19 Data Repository by
OSDI 18. pp. 561–577. the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
Mortveit, Henning S., Adiga, Abhijin, Barrett, Chris L., Chen, Jiangzhuo, Chung- https://github.com/CSSEGISandData/COVID-19.
baek, Youngyun, Eubank, Stephen, et al., 2020. Synthetic Populations and The New York Times, 2023. Coronavirus (Covid-19) data in the United States. https:
Interaction Networks for the U.S.. Technical Report 2019-025, NSSAC, University //github.com/nytimes/covid-19-data. (Last accessed 24 March 2023).
of Virginia. Therneau, Terry, Atkinson, Beth, 2023. Rpart: Recursive partitioning and regression
Mossong, Joël, Hens, Niel, Jit, Mark, Beutels, Philippe, Auranen, Kari, Mikola- trees.
jczyk, Rafael, et al., 2008. Social contacts and mixing patterns relevant to the U.S. Census, 2020a. 2011–2015 5-year ACS commuting flows. (Last accessed April
spread of infectious diseases. PLoS Med. 5, 1. 2020).
NCES, 2021. National Center for Education Statistics. http://nces.ed.gov. (Last accessed U.S. Census, 2020b. Longitudinal Employer-Household Dynamics. https://lehd.ces.
December 2021). census.gov/data/. (Last accessed 26 November 2020).
Newman, Mark E.J., 2003. The structure and function of complex networks. SIAM Rev. U.S. Census, 2021a. 2010 Census Urban and Rural Classification. (Last accessed June
45 (2), 167–256. 2021).
Oreskes, Naomi, 2003. The role of quantitative models in science. Model. Ecosyst. Sci. U.S. Census, 2021b. Public Use Microdata Sample (PUMS). Last accessed 24 May 2021.
13, 27. U.S. Department of Labor, Bureau of Labor Statistics, 2020. The American Time Use
Oreskes, Naomi, 2018. Why believe a computer? Models, measures, and meaning in Survey (ATUS). (Last accessed February 2020).
the natural world. In: The Earth Around Us. Routledge, pp. 70–82. U.S. Department of Transportation, Federal Highway Administration, 2020. The
Pillai, Alexander N., Toh, Kok Ben, Perdomo, Dianela, Bhargava, Sanjana, Stoltzfus, Ar- National Household Travel Survey (NHTS). (Last accessed February 2020).
lin, Longini, Jr., Ira M., Pearson, Carl A.B., Hladish, Thomas J., 2023. Agent-based Vavilapalli, Vinod Kumar, Murthy, Arun C., Douglas, Chris, Agarwal, Sharad,
modeling of the COVID-19 pandemic in Florida. Epidemics 47, 100774. Konar, Mahadev, Evans, Robert, et al., 2013. Apache hadoop yarn: Yet another
Popper, Karl, 2005. The Logic of Scientific Discovery. Routledge. resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud
Porebski, Przemyslaw, Venkatramanan, Srinivasan, Adiga, Aniruddha, Klahn, Brian, Computing. pp. 1–16.
Hurt, Benjamin, Wilson, Mandy L., Chen, Jiangzhuo, Vullikanti, Anil, Venkatramanan, Srinivasan, Chen, Jiangzhuo, Fadikar, Arindam, Gupta, Sandeep,
Marathe, Madhav, Lewis, Bryan, 2024. Data-driven mechanistic framework Higdon, Dave, Lewis, Bryan, Marathe, Madhav, Mortveit, Henning, Vullikanti, Anil,
with stratified immunity and effective transmissibility for COVID-19 scenario 2019. Optimizing spatial allocation of seasonal influenza vaccine under temporal
projections. Epidemics 47, 100761. constraints. PLoS Comput. Biol. 15 (9), e1007111.
Prem, Kiesha, Cook, Alex R., Jit, Mark, 2017. Projecting social contact matrices in 152 Venkatramanan, Srinivasan, Lewis, Bryan, Chen, Jiangzhuo, Higdon, Dave, Vul-
countries using contact surveys and demographic data. PLOS Comput. Biol. 13, likanti, Anil, Marathe, Madhav, 2018. Using data-driven agent-based models for
e1005697. forecasting emerging infectious diseases. Epidemics 22, 43–49.
Rivers, Caitlin M., Lofgren, Eric T., Marathe, Madhav, Eubank, Stephen, Lewis, Bryan L., Vogt, Florian, Haire, Bridget, Selvey, Linda, Katelaris, Anthea L., Kaldor, John, 2022.
2014. Modeling the impact of interventions on an epidemic of Ebola in sierra leone Effectiveness evaluation of digital contact tracing for COVID-19 in New South
and liberia. PLoS Curr. 6. Wales, Australia. Lancet Public Health 7 (3), e250–e258.
Rocklin, Matthew, 2015. Dask: Parallel computation with blocked algorithms and task Weber, Eric, Moehl, Jessica, Weston, Spencer, Rose, Amy, Sims, Kelly, 2021. LandScan
scheduling. In: Proceedings of the 14th Python in Science Conference, Vol. 130. USA 2020 [Data set]. Oak Ridge National Laboratory. http://dx.doi.org/10.48690/
Citeseer, p. 136. 1523373.
19
J. Chen et al. Epidemics 48 (2024) 100779
Wheaton, William D., Cajka, James C., Chasteen, Bernadette M., Wagener, Diane K., Yoo, Andy B., Jette, Morris A., Grondona, Mark, 2003. Slurm: Simple linux utility
Cooley, Philip C., Ganapathi, Laxminarayana, Roberts, Douglas J., Allpress, Jus- for resource management. In: Workshop on Job Scheduling Strategies for Parallel
tine L., 2009. Synthesized population databases: A US geospatial database for Processing. Springer, pp. 44–60.
agent-based models. Methods Rep. (RTI Press) 2009 (10), 905.
20