Evolutionary Effects on Epidemic Spread
Evolutionary Effects on Epidemic Spread
Complex Networks
Rashad Eletreby1 , Yong Zhuang1 , Kathleen M. Carley2 , Osman Yağan1 , and H. Vincent Poor3
arXiv:1810.04545v2 [physics.soc-ph] 2 Nov 2019
1
Department of Electrical and Computer Engineering
Carnegie Mellon University, Pittsburgh, PA 15213 USA
2
Institute for Software Research, School of Computer Science
Carnegie Mellon University, Pittsburgh, PA 15213 USA
3
Department of Electrical Engineering,
Princeton University, Princeton, NJ 08544 USA
November 5, 2019
Abstract
A common theme among the proposed models for network epidemics is the assumption that
the propagating object, i.e., a pathogen (in the context of infectious disease propagation) or a
piece of information (in the context of information propagation), is transferred across the nodes
without going through any modification or evolution. However, in real-life spreading processes,
pathogens often evolve in response to changing environments and medical interventions and
information is often modified by individuals before being forwarded. In this paper, we investi-
gate the evolution of spreading processes, such as infectious diseases or information, in complex
networks with the aim of i) revealing the role of evolutionary adaptations on the threshold,
probability, and final size of epidemics; and ii) exploring the interplay between the structural
properties of the network and the process of evolution. We start by considering the case where
co-infection with different pathogen strains (respectively, different variations of information) is
not possible, i.e., a susceptible individual may only be infected with a single pathogen strain
(respectively, a single variant of the information). In this case, we develop a mathematical
theory that accurately predicts the epidemic threshold and the expected epidemic size as func-
tions of the characteristics of the spreading process, the evolutionary pathways of the pathogen
(respectively, information), and the structure of the underlying contact network. In addition to
the mathematical theory, we perform extensive simulations on random and real-world contact
networks to verify our theory and reveal the significant shortcomings of the classical mathe-
matical models that do not capture evolution. Our results reveal that the classical, single-type
bond-percolation models may accurately predict the threshold and final size of epidemics, but
their predictions on the probability of emergence are inaccurate on both random and real-world
networks. This inaccuracy sheds the light on a fundamental disconnect between the classi-
cal bond-percolation models and real-life spreading processes that entail evolution. Then, we
consider the case when co-infection is possible, i.e., a susceptible individual who receives simulta-
neous infections with multiple pathogen strains (respectively, multiple variations of information)
becomes co-infected. We show by computer simulations that co-infection gives rise to a rich set
1
of dynamics: it can amplify or inhibit the spreading dynamics, and more remarkably lead the
order of phase transition to change from second-order to first-order. We investigate the delicate
interplay between the characteristics of co-infection, the structure of the underlying contact
network, and the evolutionary pathways of the pathogen (respectively, information) and reveal
the cases where such interplay induces a first-order phase transition for the expected epidemic
size.
1 Introduction
What causes an outbreak of a disease? How can we predict its emergence and control its progres-
sion? Over the past several decades, multidisciplinary research efforts were converging to tackle the
above questions, aiming for providing a better understanding of the intricate dynamics of disease
propagation and accurate predictions on its course [1–6, 6–12]. At the heart of these research ef-
forts is the development of mathematical models that provide insights on predicting, assessing, and
controlling potential outbreaks [13–16]. The early mathematical models relied on the homogeneous
mixing assumption, meaning that an infected individual is equally likely to infect any other individ-
ual in the population, without regard to her location, age, or the people with whom she interacts.
Homogeneity allowed writing a set of differential equations that characterize the speed and scale of
propagation (in the limit of large population size), providing insights on how the parameters of a
disease, e.g., its basic reproductive number, indicate whether a disease will die out, or an epidemic
will emerge [5, 16].
In real-life, however, the spread of a disease is highly dependent on the contact patterns between
individuals. In particular, a person may only infect those with whom she interacts, and the number
of contacts people have, varies dramatically between individuals. These basic observations render
the homogeneous mixing models inaccurate, as they tend to underestimate the epidemic size in the
initial stages of the outbreak and overestimate it towards the end [17]. As a result of the these
shortcomings, network epidemics has emerged as a mathematical modeling approach that takes the
underlying contact network into consideration [1, 3, 18–20]. Since then, a large body of research
has looked into the delicate interplay between the structural properties of the contact network and
the dynamics of propagation, leading to accurate predictions of the spatio-temporal progression
of disease outbreaks. In addition to diseases, opinions and information also propagate through
networks in patterns similar to those of epidemics [21]. Hence, research efforts on information
propagation draw on the theory of infectious diseases to model the dynamics of propagation [8, 22–
25]. Throughout, we use the term spreading processes to denote a general class of processes that
propagate in contact networks, such as infectious diseases and information.
A common theme among the proposed models for network epidemics is the assumption that
the propagating object, i.e., a virus or a piece of information, is transferred across the nodes
without going through any modification or evolution [5,22,23,26–31]. However, in real-life spreading
processes, pathogens often evolve in response to changing environments and medical interventions
[9, 32–35], and information is often modified by individuals before being forwarded [36, 37]. In
fact, 60% of the (approximately) 400 emerging infectious diseases that have been identified since
1940 are zoonotic 1 [39, 40]. A zoonotic disease is initially poorly adapted, poorly replicated, and
1
A zoonosis is any disease or infection that is naturally transmissible from vertebrate animals to humans [38].
2
inefficiently transmitted [41], hence its ability to go from animal-to-human transmissions to human-
to-human transmissions depends on the pathogen evolving to a strain that is well-adapted to the
human host. For instance, genetic variations in some critical genes were reported to be essential for
the transition from animal-to-human transmission to human-to-human transmission in the severe
acute respiratory syndrome (SARS) outbreak of 2002-2003 [42].
Similar patterns of evolution are observed in the way information propagates among individu-
als. Needless to say, one observes, on a daily basis, how information mutates unintentionally, or
perhaps intentionally by an adversary, on social media platforms [36]. At a high-level, an individual
may mutate the information by exaggeration, hoping for her variant to go viral. Mutations may
also occur unintentionally. In particular, Dawkins [43] argued that ideas and information spread
and evolve between individuals with patterns similar to genes, in a sense that they self-replicate,
mutate, and respond to selective pressure as they interact with their host. Concluding, if we are to
ignore evolution, we underestimate the severity of the epidemic and fail to understand the intricate
interplay between the dynamics of propagation and evolution.
In this paper, we aim to bridge the disconnect between how spreading processes propagate and
evolve in real-life, and the current mathematical and simulation models that do not capture evolu-
tion. In particular, we investigate the evolution of spreading processes with the aim of i) revealing
the role of evolutionary adaptations on the threshold, probability, and final size of epidemics; and
ii) understanding the interplay between the structural properties of the network and the process of
evolution. Throughout, we use the term epidemics to denote disease/information outbreaks that
result in a positive fraction of infected individuals in the limit of large network size and self-limited
outbreaks to denote small disease/information outbreaks for which the fraction of infected individu-
als tends to zero in the limit of large network size. We also use the term strain to denote a pathogen
strain in the context of infectious disease propagation, or a particular variation of the information
in the context of information propagation. At a high level, strains represent homogeneous groups
within species [44] and they generally possess unique features such as virulence, infectivity, growth
rate, etc.
In modeling the evolution of spreading processes, we adopt the multiple-strain model that was
introduced by Alexander and Day in [33]. Their model can be briefly outlined as follows (more
details are given in Section 3). Consider a multiple-strain spreading process that starts with an
individual, i.e., the seed, receiving infection (from an external reservoir) with strain-1 of a particu-
lar pathogen (respectively, information). The seed infects each of her contacts independently with
probability T1 , called the transmissibility of strain-1. Once a susceptible individual receives the
infection from the seed, the pathogen may evolve within that new host prior to any subsequent
infections. In particular, the pathogen may remain as strain-1 with probability µ11 or mutate to
strain-2 (that has transmissibility T2 ) with probability µ12 = 1 − µ11 . If the pathogen remains as
strain-1 (respectively, mutates to strain-2) within a newly infected host, then that host infects
each of her susceptible neighbors in the subsequent stages independently with probability T1 (re-
spectively, T2 ). As the process continues to grow, if any susceptible individual receives strain-1,
the pathogen may remain as strain-1 with probability µ11 or mutate to strain-2 with probability
µ12 = 1 − µ11 prior to subsequent infections. Similarly, if any susceptible individual receives strain-
2, the pathogen may remain as strain-2 with probability µ22 or mutate to strain-1 with probability
µ21 = 1 − µ22 prior to subsequent infections. The process continues to grow until no additional
infections are possible. We remark that it is straightforward to extend the model to the general
case, where there are m possible strains for some finite integer m ≥ 2. More details are given in
3
Section 4.
Note that as multiple strains propagate throughout the population, a susceptible individual may
simultaneously get into infectious contact with neighbors infected with strain-1 as well as neighbors
infected with strain-2. This gives rise to the possibility of a susceptible individual becoming co-
infected with multiple pathogen strains. Indeed, co-infection with multiple pathogen strains is
prevalent in disease-causing protozoa, helminths, bacteria, fungi, and viruses and is known to
cause significant implications [44–48]. However, from a mathematical standpoint, the possibility of
co-infections creates phase discontinuities (see Section 7) that render the process mathematically
intractable.
We start by considering the case when co-infection is ignored, meaning that a susceptible in-
dividual may only be infected with a single strain. In particular, a susceptible individual who
simultaneously receives x infections of strain-1 and y infections of strain-2 becomes infected by
strain-1 (respectively, strain-2) with probability x/(x + y) (respectively, y/(x + y)). In this case, we
develop a mathematical theory that draws on the tools developed for analyzing the zero-temperature
random-field Ising model on Bethe lattices [49] as well as on random graphs [50, 51]. Our theory
fully characterizes the process and accurately predicts the epidemic threshold, expected epidemic
size and the expected fraction of individuals infected by each strain (all at steady state). These
metrics are computed as functions of the characteristics of the spreading process (i.e., T1 and T2 ),
evolutionary adaptations (i.e., µ11 and µ22 ), and the structure of the underlying contact network
(e.g., its degree distribution).
In addition to the mathematical theory, we perform extensive simulations on random graphs
with arbitrary degree distributions (generated by the configuration model [52–54]) as well as with
real-world networks (obtained from SNAP dataset [55]) to verify our theory and reveal the signifi-
cant shortcomings of the classical mathematical models that do not capture evolution. In particular,
we show that the classical, single-type bond-percolation models [3, 56–58] may accurately predict
the threshold and final size of epidemics, but their predictions on the probability of emergence
are significantly inaccurate on both random and real-world networks. This inaccuracy sheds the
light on a fundamental disconnect between the classical single-type, bond-percolation models and
real-life spreading processes that entail evolution.
We then focus on the case where co-infection is possible. Although recent studies have shown
that co-infection with multiple pathogen strains is prevalent in nature [44–48], there has been
a lack of models that explain its occurrence, reveal its implications, and investigate its delicate
interplay with the underlying contact network. Note that a considerable amount of literature
has examined the case where co-infection with multiple diseases is possible [59–62], yet multiple-
disease co-infection is fundamentally different from multiple-strain co-infection (see Section 2). In
this paper, we use computer simulations to explore the case where multiple-strain co-infection
is possible. In particular, a susceptible individual who gets infected with strain-1 and strain-2
simultaneously becomes co-infected, and starts to transmit the co-infection, i.e., the mixture of the
two strains, with a transmissibility Tco .
The transmissibility Tco could be larger than the maximum of T1 and T2 (e.g., modeling a
synergistic cooperation between the two resident strains) or smaller than their minimum (e.g.,
modeling a negative competition among the two resident strains), and it may also fall anywhere in
between. We show that co-infection gives rise to a rich set of dynamics: it can amplify or inhibit
the spreading dynamics, and more remarkably lead the order of phase transition to change from
second-order to first-order. We investigate the interplay between the characteristics of co-infection,
4
the structure of the underlying contact network, and evolutionary adaptations and reveal the cases
where such interplay induces a first-order phase transition for the expected epidemic size.
Summary: We consider the evolution of spreading processes in complex networks. We start
with the case where co-infection is ignored. In this case, we develop a mathematical theory that
unravels the relationship between the characteristics of the spreading process, the structure of the
underlying contact network, and the process of evolution, thereby, providing accurate predictions on
the epidemic threshold, expected epidemic size, and the expected fraction of individuals infected
by each strain at steady state. In addition to the mathematical theory, we perform extensive
simulations on random and real-world networks to verify our theory and reveal the significant
shortcomings of the classical mathematical models that do not capture evolution. Then, we use
computer simulations to explore the case where co-infection is possible and show that co-infection
could lead the order of phase transition to change from second-order to first-order. We investigate
the interplay between the characteristics of co-infection, the structure of the underlying contact
network, and evolutionary adaptations and explain how such interplay controls the order of phase
transition for the expected epidemic size.
Structure: The rest of the paper is organized as follows. In Section 2, we survey the related
work on evolution and co-infection. In Section 3, we present the multiple-strain model for evolution
and demonstrate how we model the underlying contact network. In Section 4, we present and derive
the main results of this work, while in Section 5, we confirm our theoretical results via computer
simulations. We empirically consider the case where co-infection is possible in Section 7. In
Section 6, we consider evolution on real-world networks obtained from SNAP dataset [55] and reveal
the significant shortcomings of the classical mathematical models that do not capture evolution.
Finally, Section 8 concludes the paper.
2 Related Work
2.1 Evolution of Infectious Diseases
A large body of research has investigated the role of evolutionary adaptations in enabling pathogen
establishment in human populations [9,35,40,42,63,64]. A pronounced example of such evolutionary
adaptations is the emergence of zoonoses. In particular, zoonotic diseases are poorly adapted and
inefficiently transmitted at first [41], yet they may eventually (through evolutionary adaptations)
cross the species barrier and start to spread from human to human. In fact, a key event that
is thought to have caused the emergence of the 1918 H1N1 pandemic is a recombination in the
hemagglutinin gene that resulted in a novel virus with increased virulence [65]. Other evolutionary
adaptations include genetic changes (e.g., Salmonella enterica), recombination or reassortment
(e.g., H5N1 influenza), and hybridization (e.g., Phytophthora alni) [63].
To date, most of the research studies on the evolution of infectious diseases either assume a
homogeneous-mixing host population, or focus entirely on the ecological or environmental factors
of pathogen evolution. Indeed, the recent advances in network epidemics pave the way for exploring
new depths and revealing new insights on the delicate interplay between the structural properties
of the host contact network and the process of evolution. In what follows, we review the recent
progress in creating a modeling framework that captures the spread and evolution of infectious
diseases on realistic host contact networks.
In [33], Alexander and Day proposed a network-based framework that characterizes the spread
and evolution of an introduced pathogen on a contact network. Their main objective was to
5
investigate the probability of emergence, and its relation to mutation probabilities, pathogens’
transmissibilities, and the structure of the underlying contact network. Using a multi-type branch-
ing process [66, 67], they derived recursive relations governing the probability of emergence for a
given initial strain of the pathogen. The initial strain was assumed to have a poor transmissibility,
hence, evolution to a strain with sufficient transmissibility was necessary for emergence. Alexan-
der and Day explored the potential risk factors that could lead to such evolutionary emergence
of the pathogen. In particular, they showed that for a given transmissibility, heterogeneity in
network structure can significantly increase the risk of emergence. Moreover, certain mutational
schemes (e.g., reverse mutation) have limited impact on the probability of emergence, while others
(e.g., simultaneous point mutations or recombination) have a dramatic effect on the probability of
emergence.
The framework proposed by Alexander and Day in [33] represents a crucial first step towards
understanding the role of evolutionary adaptations in driving the emergence of infectious diseases,
but it lacks any insights on the expected epidemic size (denoted by S) or, more precisely, the
expected fraction of individuals infected by each strain (denoted by S1 and S2 , respectively). Also,
the multi-type branching formalism inherently assumes a tree structure of the underlying graph,
hence co-infection (which mainly occurs due to the existence of cycles) is essentially ignored in their
framework. Finally, the results presented in [33] were neither verified on theoretical, nor real-world
contact networks. Our paper addresses those limitations by means of i) developing a mathematical
theory that characterizes the epidemic threshold, expected epidemic size and the expected fraction
of individuals infected by each strain; ii) validating our results (as well as Alexander and Day’s
results) on theoretical and real-world contact networks; and iii) investigating the case when co-
infection is possible.
When the timescale of evolution is much longer than the timescale of propagation, mutations
might occur after the original pathogen has invaded the population. In [32], Leventhal et al.
considered an SIS process that starts with a pathogen (of single-strain) invading the population.
As the disease reaches an endemic equilibrium, a second strain of the disease appears in a random
infected individual. Authors assumed that co-infection is not possible, i.e., an infected host carries
either strain-1 or strain-2, but not both. Moreover, hosts infected by either strain have perfect
immunity against the other strain. Authors investigated the probability that the second strain
invades the population and drives the resident strain to extinction, i.e., the fixation probability.
Results from both theoretical and real-world networks suggested that the heterogeneity in network
structure (which facilitated the spread of the resident strain) lowers the fixation probability, hence
enhancing the resiliency of the resident strain to invasion by new variants.
In contrast to [32], our paper considers the case when the epidemiological and evolutionary pro-
cesses occur on a similar time scale. In particular, each new infection event entails an opportunity
for mutation, leading to an entirely different model (with different scope) than the one proposed by
Leventhal et al. in [32]. The model considered in our paper is reasonable for pathogens with long
infectious periods, e.g., HIV, or pathogens with short infectious periods but high mutation rates,
large population sizes, and short generation times, e.g., RNA viruses [68]. Furthermore, Leven-
thal et al. [32] ignore the case where co-infection is possible. However, recent studies revealed the
prevalence of multiple-strain co-infection in disease-causing protozoa, bacteria, and viruses [44–48].
Since humans, animals, plants, and other organisms may become co-infected with multiple
diseases, a growing body of research has attempted to explore the emergence of this phenomenon
and its consequences on complex networks [59–62]. However, most of the research studies focus on
6
(a) (b)
(c) (d)
Figure 1: Information mutation on Twitter. A collection of four tweets posted within a 10-
minute window during the 2011 Arab Spring in Cairo, Egypt. The tweets were posted in response
to the same underlying event, namely, the marching of protesters towards the presidential palace
in order to force the then president, Mubarak, to resign. Information mutation gave rise to several
variants with potentially different consequences. Observe that (a) reports peaceful, traditional
demonstrations while (d) suggests that the country is on a brink of collapse. User names are
hidden for anonymity and tweet ids are given instead.
the case where co-infection results from simultaneous exposure to multiple diseases (or pathogen
species), rather than multiple-strains of the same pathogen. In [59], Cai et al. considered the case
when two diseases are spreading on the same contact network. A susceptible host that has not
been exposed to either disease has probability p to get infected by an infective neighbor. Note that
the infection probabilities are the same for both diseases. Infected hosts recover after exactly one
time step, and gain immunity against the disease that they were infected with, but not the other
disease. A host that has been infected by one disease (being still active or has already recovered)
has a probability q (with q > p) to get infected by the other disease, i.e., an infection with one
disease weakens the immune system of the infected individual and makes her more susceptible to
the second disease. Cai et al. revealed that co-infection dynamics could give rise to a hybrid phase
transition, where the probability of emergence exhibits a second-order transition, while the fraction
of doubly infected nodes exhibits a first-order transition.
In Section 7, we consider the case where co-infection with multiple strains of the same pathogen
is possible, giving rise to a different class of epidemiological processes than those considered in
[59–62]. Our model is motivated by the recent research findings that revealed the prevalence
of multiple-strain co-infection in disease-causing protozoa, helminths, bacteria, fungi, and viruses
[44–48]. From a modeling standpoint, the key difference between the two processes is that evolution
is a perquisite for co-infection in our model. In particular, the epidemic process in [59] i) does not
entail any mutation events and ii) starts with a doubly infected seed, i.e., an infected host that
initially carries both diseases. However, our epidemic process starts with a host receiving infection
7
with only one strain of the pathogen, e.g., strain-1, hence the emergence of other strains (which
is dictated by the underlying mutational scheme, transmissibility, and network structure) is a
perquisite for co-infection. Moreover, our co-infection process differs fundamentally in the way a
host becomes co-infected. Unlike the model given in [59], we assume a perfect cross-immunity,
i.e., a host that has recovered from strain-1 develops immunity against both strain-1 and strain-2.
Hence, the only pathway for co-infection is when a susceptible host is exposed simultaneously to
one or more infections of strain-1 and one or more infections of strain-2.
8
a power-law distribution for low-mutation rates, yet it deviates from the power-law behavior for
high mutation rates. Theoretical predictions based on Yule processes [69] (in the limits of very low
and very large mutation rates) were shown to have a close resemblance to the empirically observed
distributions.
The scope of [36] was limited to one type of propagation, i.e., copy-paste mechanism, and
mutations were only characterized by the edit distance 2 between a given variant of the meme
and its original version. Indeed, the copy-paste mechanism is no longer sensible in modern social
networks where individuals have the option to “Share” a post rather than copying and pasting it.
In addition, using the edit distance as the sole metric for mutation essentially ignores the semantic
differences between two different versions of the meme. The theoretical model presented in [36]
is technically different than the multiple-strain model [33], yet it resembles a very special case of
the latter when i) Ti = 1 for all i = 1, 2, . . .; and ii) the evolutionary pathways are only limited
to one-step irreversible mutations. Even then, Yule model was considered in [36] only in the limit
of very low and very high mutation rates. In contrast to [36], our paper attempts to explore
information propagation and evolution from a mathematical modeling perspective aiming to lay
down the foundations for creating a universal model for information propagation and evolution
across a wide variety of social media and different possible evolutionary pathways.
3 Model Definitions
3.1 A multiple-strain model for evolution
In [33], Alexander and Day proposed a multiple-strain model that accounts for evolution. Their
model is captured by two matrices, namely, the transmissibility matrix T and the mutation matrix
µ , both with dimensions m × m for a finite integer m ≥ 2 denoting the number of possible strains.
The transmissibility matrix T is a m×m diagonal matrix, with [Ti ] representing the transmissibility
of strain-i, i.e.,
T1 0 . . . 0
0 T2 . . . 0
T = . .. . . .. .
. . . . .
0 0 . . . Tm
The mutation matrix µ is aPm × m matrix with µij denoting the probability that strain-i
mutates to strain-j. Note that j µij = 1, hence µ is a row-stochastic matrix. One example for
the transmissibility and mutation matrices was given by Antia et al. in [34], where the fitness
landscape consisted of m strains, with strain-1 through m − 1 having identical transmissibility such
that R0,i < 1 for i = 1, . . . , m − 1, with R0,i denoting the basic reproductive number of strain-i.
Strain-m has transmissibility Tm such that R0,m > 1, hence the emergence of the pathogen requires
evolution from strain-1 to strain-m. Antia et al. considered the the so-called one-step irreversible
mutation [33, 34] where the pathogen must acquire m − 1 mutations (in order and one at a time)
2
The edit distance was defined in [36] as the number of character additions and deletions that must be performed
in order to obtain one variant of the meme from another.
9
to evolve to strain-m , i.e.,
T1 0 0 ... 0
0 T1 0 ... 0
T = 0 0 T1 ... 0
.. .. .. .. ..
. . . . .
0 0 ... 0 Tm
and
1−µ µ 0 ... 0 0 0
0 1 − µ µ ... 0 00
.. . .. . . .. ..
..
µ= . .. . . . ..
0 0 0 ... 0 1 − µ µ
0 0 0 ... 0 0 1
The multiple-strain model proposed by Alexander and Day [33] works as follows. Consider
a spreading process that starts with an individual, i.e., the seed, receiving infection with strain-
1 from an external reservoir. Since strain-1 has transmissibility T1 , the seed infects each of her
contacts independently with probability T1 . Once a susceptible individual receives the infection
from the seed, the pathogen may evolve within that new host prior to any subsequent infections.
In particular, the pathogen may remain as strain-1 with probability µ11 or mutate to strain-i
(that has transmissibility Ti ) with probability µ1i for i = 2, . . . , m. If the pathogen remains as
strain-1 (respectively, mutates to strain-i), then the host infects each of her susceptible neighbors
in the subsequent stages independently with probability T1 (respectively, Ti ). Observe that as
the process continues to grow, multiple strains may coexist in the population as governed by the
transmissibility matrix T and the mutation matrix µ . At an intermediate stage, if any susceptible
individual receives strain-j, the pathogen may remain as strain-j with probability µjj or mutate to
strain-` with probability µj` for ` ∈ {1, 2, . . . , m} \ {j} prior to subsequent infections. The process
terminates when no additional infections are possible. A graphical illustration for the case when
m = 2 is given in Figure 2. In this paper, we focus on the case where m = 2, however, it is
straightforward to extend our theory to handle the general case with m strains. More details are
given in Section 4.
kpk
p̂k = , k = 1, 2, . . .
hki
P
where hki denotes the mean degree, i.e., hki = k kpk .
10
(a) (b) (c) (d) (e)
Figure 2: The multiple-strain model for evolution. (a) The process starts with a single
individual, i.e., the seed, receiving infection with strain-1 (highlighted in orange) from an external
reservoir. (b) The seed infects each of her susceptible neighbors (highlighted in green) independently
with probability T1 . (c) The pathogen mutates independently within hosts. The pathogen remains
as strain-1 with probability µ11 or mutates to strain-2 (highlighted in blue) with probability µ12 .
(d) Individuals whose pathogen has mutated to strain-i infect their neighbors independently with
probability Ti . (e) The pathogen mutates independently within hosts. The pathogen remains as
strain-2 with probability µ22 or mutates to strain-1 with probability µ21 .
4 Analysis
4.1 The Probability of Emergence
The analysis of the probability of emergence was established by Alexander and Day in [33]. Below,
we give a brief summary of their results for completeness. Their approach is based on a multi-type
branching process [66, 67] that starts with an initial infective of a particular type, e.g., type-1,
and then proceeds by infecting each of her neighbors independently with some probability that is
characterized by the infecting strain. Each of the infected neighbors mutate independently with
a probability that is also characterized by the infecting strain. The process proceeds similarly for
subsequent stages. Clearly, the process differs from the standard Single-Type Branching Process
in that individuals of different types may coexist in any generation (other than generation 0), with
different offspring distribution per each type, hence the notion Multi-Type [66, 67].
Next, we summarize the results given by Alexander and Day in [33]. Let γi (s1 , s2 , . . . , sm ) be
the probability generating function (PGF) for the number of infections of each type transmitted
by an initial infective of type-i. It holds that
m
X
γi (s1 , s2 , . . . , sm ) = g 1 − Ti + Ti µij sj ,
j=1
for i = 1, . . . , m and with g (s) denoting the PGF of the degree distribution; i.e., g (s) = ∞ k
P
k=0 pk s .
Moreover, with Γi (s1 , s2 , . . . , sm ) denoting the PGF for the number of infections of each type
transmitted by a later-generation infective of type-i (i.e., a typical intermediate host in the process);
it holds that
m
X
Γi (s1 , s2 , . . . , sm ) = G 1 − Ti + Ti µij sj ,
j=1
11
for i = 1, . . . , m and with G (s) denoting the PGF of the excess degree distribution; i.e.,
∞
X kpk
G (s) = sk−1 .
hki
k=1
We remind that kpk /hki gives the probability that a randomly chosen neighbor of a randomly
chosen vertex has degree k, and note that the excess degree is k − 1 since one edge is already
traversed to reach the node.
The probability of extinction starting from one later-generation infective of type-i, denoted qi ,
is the smallest non-negative root of the equation qi = Γi (q1 , . . . , qm) solved simultaneouslyfor all
i = 1, . . . , m. Finally, the overall extinction probability is given by g 1 − Ti + Ti m
P
j=1 µij qj if the
whole process starts with an initial infective of type-i. It was shown in [33] that the above process
resembles a multi-type branching process with mean matrix 3 given by
2
hk i − hki
M = Tµ (1)
hki
The theory of multi-type branching processes states that if the dominant eigenvalue of M is
less than or equal to one, then the process goes extinct with probability 1. Otherwise, there is a
positive probability of non-extinction. Hence, the phase transition occurs when
M ) = 1,
ρ (M (2)
M ) denotes the spectral radius, i.e., the largest eigenvalue (in absolute value) of M .
where ρ (M
12
to simultaneous infections by both strain-1 and strain-2 from her neighbors at level ` − 1. In the
remainder of this section, we assume that co-infection is not possible, hence a node that receives x
infections of strain-1 and y infections of strain-2 becomes infected by strain-1 (respectively, strain-
2) with probability x/(x + y) (respectively, y/(x + y)). In Section 7, we empirically consider the
case where co-infection is possible, i.e., a node that receives simultaneous infections by both strains
becomes co-infected and starts to spread the co-infection in the subsequent rounds. In this case,
co-infection may be modeled as an additional strain that has transmissibility Tco and never mutates
back to strain-1 or strain-2.
Throughout, we say that a node is either inactive if it has not received any infection (i.e., still
susceptible) or active and type-i if it has been infected and then mutated to strain-i, for i = 1, 2.
With a slight abuse of notations, let q`+1,i be the probability that a node at level ` + 1, say node v,
is active and type-i. Furthermore, let q`+1 = q`+1,1 + q`+1,2 , i.e., q`+1 is the total probability that a
node at level ` + 1 is active. We start by an arbitrary initial distribution for {q0,1 , q0,2 } satisfying
q0,1 > 0, q0,2 > 0. Then, we update the distribution properly until we reach the root. Note that
if the degree of node v is k, then node v is using one edge to connect to her parent at level ` + 2,
and k − 1 edges to connect to her neighbors at level `. We can condition on the excess degree (d) ˜
of node v to get
∞
kpk
P node v becomes active and type-i d˜ = k − 1
X
q`+1,i =
hki
k=1
Next, we further condition on the number of active neighbors of type-1 and type-2. Note that we
have a Multinomial distribution for the number of active neighbors of both types. In particular, a
neighbor at level ` may be active and type-1 with probability q`,1 , active and type-2 with probability
q`,2 , or inactive with probability 1−q` = 1−q`,1 −q`,2 . Let Ii denote the number of active neighbors
of type-i. Thus,
∞ k−1 k−1−k
X 1k − 1 k − 1 − k1
kpk X
(q`,1 )k1 (q`,2 )k2 (1 − q`,1 − q`,2 )k−1−k1 −k2
X
q`+1,i =
hki k1 k2
k=1 k1 =0 k2 =0
· P node v becomes active and type-i I1 = k1 , I2 = k2
Let X and Y denote the number of infections received from type-1 and type-2 neighbors,
respectively. Note that conditioned on having k1 and k2 active neighbors of type-1 and type-2,
respectively, we have
X ∼ Binomial(k1 , T1 )
Y ∼ Binomial(k2 , T2 )
Consider a particular realization (x, y) of the random variables (X, Y ). Observe that if x >
0, y = 0, then node v becomes infected by strain-1 and eventually mutates to type-i with probability
13
µ1i . Similarly, if x = 0, y > 0, then node v becomes infected by strain-2 and eventually mutates
to type-i with probability µ2i . Finally, if x > 0, y > 0, then node v becomes infected by strain-1
(respectively, strain-2) with probability x/(x + y) (respectively, y/(x + y)) and eventually mutates
to type-i with probability µ1i (respectively, µ2i ). Hence, by conditioning on X and Y , we have
k1 X
k2
k1 k2
T1x T2y (1 − T1 )k1 −x (1 − T2 )k2 −y P A X = x, Y = y
X
A=
x y
x=0 y=0
k1 X
k2
k1 k2
T1x T2y (1 − T1 )k1 −x (1 − T2 )k2 −y
X
=
x y
x=0 y=0
!
xµ1i yµ2i
· µ1i1[x > 0, y = 0] + µ2i1[x = 0, y > 0] + + 1[x > 0, y > 0]
x+y x+y
Note that
k1 Xk2
k1 k2
T1x T2y (1 − T1 )k1 −x (1 − T2 )k2 −y µ1i1 [x > 0, y = 0] = µ1i (1 − T2 )k2 (1 − P(X = 0))
X
x y
x=0 y=0
= µ1i a2 b1
Thus, we have
∞ k−1 k−1−k
X 1 k−1 k − 1 − k1
kpk X
(q`,1 )k1 (q`,2 )k2 (1 − q`,1 − q`,2 )k−1−k1 −k2 ·
X
q`+1,i =
hki k1 k2
k=1 k1 =0 k2 =0
· b1 a2 µ1i + a1 b2 µ2i +
k1 X
k2 !
X k1 k2 x y k1 −x k2 −y xµ1i yµ2i
T1 T2 (1 − T1 ) (1 − T2 ) + 1 [x > 0, y > 0] , (3)
x y x+y x+y
x=0 y=0
for ` = 0, 1, . . . and i = 1, 2.
Observe that under the assumption that nodes do not become inactive once they turn active,
the quantities q`,i appearing in (3) are non-decreasing in `, and thus they converge to a limit q∞,i
for i = 1, 2. Finally, the final fraction of nodes that are active and type-i is equal (in expected
value) to the probability that the root of the tree (at level ` → ∞) is active and type-i. Note that
if the tree root has degree k, then all of these k edges will be utilized to connect with her neighbors
at the lower level. Hence,
∞ k k−k
X1 k k − k1
(q∞,1 )k1 (q∞,2 )k2 (1 − q∞,1 − q∞,2 )k−k1 −k2 ·
X X
Qi = pk
k1 k2
k=0 k1 =0 k2 =0
14
· b1 a2 µ1i + a1 b2 µ2i +
k1 X
k2 !
X k1 k2 y xµ 1i yµ2i
T1x T2 (1 − T1 )k1 −x (1 − T2 )k2 −y + 1 [x > 0, y > 0] (4)
x y x+y x+y
x=0 y=0
where Qi for i = 1, 2 denotes the probability that the tree root is active and type-i and q∞,i for
i = 1, 2 is the steady-state solution of the recursive equations (3). Note that Q = Q1 + Q2 is the
total probability that the tree root is active.
Observe that q∞,1 = q∞,2 = 0 gives a trivial fixed-point of the recursive equations (3). Indeed,
this trivial solution leads to Q = 0 by virtue of (4). Although the trivial fixed point is a valid
numerical solution for the recursive equations (3), we can show that this trivial solution is unstable.
Hence, another solution with q∞,1 > 0 and q∞,2 > 0 may exist. To test whether or not the trivial
fixed point is stable, we check the spectral radius of the Jacobian matrix J (q`,1 , q`,2 ) corresponding to
the linearization of (3) at q`,1 = q`,2 = 0. If the spectral radius of the J (q`,1 , q`,2 ) at q`,1 = q`,2 = 0 is
larger than one, then the trivial fixed-point is unstable, indicating that there exists another solution
with q∞,1 > 0 and q∞,2 > 0 implying the existence of a giant component. The Jacobian matrix is
given by
" ∂q`+1,1 ∂q`+1,1 #
∂q`,1 ∂q`,2
J (q`,1 , q`,2 )|q`,1 =q`,2 =0 = ∂q`+1,2 ∂q`+1,2
∂q`,1 ∂q`,2 q`,1 =q`,2 =0
hk 2 i
− hki
T1 µ11 T2 µ21
=
hki T1 µ12 T2 µ22
2
hk i − hki
= T µ )T
(T
hki
Note that a square matrix and its transpose have the same set of eigenvalues. Hence, the Jacobian
matrix admits the same spectral radius of (1) as would be expected, implying the same condition
(2) for phase transition. Nevertheless, the generating functions approach used by Alexander and
Day [33] is useful in its own right as it enables quantifying the probability of emergence.
We remark that it is straightforward to extend our analysis to the general case with m strains,
for some finite integer m ≥ 2 as long as the underlying process is indecomposable [33, 66, 67]. At a
high level, indecomposable processes are those for which each pathogen strain i eventually gives rise
to strain-j at some generation nij ≥ 1 for i, j = 1, 2, . . . , m. In other words, if an indecomposable
process starts with an infection with strain-i, then as the process continues to grow, all other strains
will eventually emerge. Such a property is established if, for every pair of strains (i, j), there exists
a positive integer nij such that M nij (i, j) > 0 [33]. If the underlying process is decomposable,
then there exist classes of strain types such that strain types belonging to the same class can
eventually give rise to one another, but not to other strain types. Indeed, the existence of multiple
classes leads to multiple solutions for the set of equations (4) depending on the initial distribution of
{q0,1 , q0,2 , . . . , q0,m }. Hence, to guarantee the uniqueness of the solution of (4) and for mathematical
tractability, we limit our formalism to the case when the underlying process is indecomposable.
15
5 Numerical results
5.1 The Structure of the Contact Network
In this section, we consider synthetic contact networks generated randomly by the configuration
model, while real-world networks are considered in Section 6. In particular, we consider contact
networks with Poisson degree distribution as well as Power-law degree distribution.
λk
pk = e−λ , k = 0, 1, . . .
k!
In this case, condition (2) implies that phase transition occurs when
Tµ
λ × ρ (T µ) = 1 (5)
where ρ (TTµ
µ) denotes the spectral radius of the matrix multiplication T µ µ. Observe that condition
(5) embodies the structure of the contact network (represented by λ for a contact network with
Poisson degree distribution), the characteristics of propagation (represented by the matrix T ) and
the process of evolution (represented by µ ), hence it unravels how these properties interact together
to yield an epidemic.
Similar to (5), condition (6) indicates how the structure of the underlying network, the char-
acteristics of propagation, and the process of evolution are intertwined together, and under what
conditions their relationship would induce an epidemic.
16
Poisson Degree Distribution Power-law Degree Distribution
1 1
S (Experiment) S (Experiment)
S (Theoretical) S (Theoretical)
S1 (Experiment) S1 (Experiment)
0.8 0.8
S1 (Theoretical)
Epidemic Size
Epidemic Size
S1 (Theoretical)
S2 (Experiment) S2 (Experiment)
S2 (Theoretical) S2 (Theoretical)
0.6 0.6 Transition Point
Transition Point
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(a) (b)
0.8 0.8
Epidemic Size
Epidemic Size
0.6 0.6
S (Experiment) S (Experiment)
0.4 S (Theoretical) 0.4 S (Theoretical)
S1 (Experiment) S1 (Experiment)
S1 (Theoretical) S1 (Theoretical)
S2 (Experiment) S2 (Experiment)
0.2 0.2
S2 (Theoretical) S2 (Theoretical)
Transition Point Transition Point
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(c) (d)
Figure 3: Evolution on Poisson and Power-law contact networks. The network size n is
2 × 105 and the number of independent experiments for each data point is 500. Blue circles, brown
plus signs, and green triangles denote the empirical average epidemic size, average fraction of nodes
infected with strain-1, and average fraction of nodes infected with strain-2, respectively. The red,
blue, and yellow lines denote the theoretical average total epidemic size, average fraction of nodes
infected with strain-1, and average fraction of nodes infected with strain-2, respectively. Theoretical
results are obtained by solving the system of equations (4) with the corresponding parameter set.
(a)-(b) We set T1 = 0.2, T2 = 0.5, µ11 = µ22 = 0.75. (c)-(d) We set T1 = 0.4, T2 = 0.8, and
µ11 = 0.3, and µ22 = 0.7 implying that an infected node, regardless of what type of infection it has,
mutates to strain-1 (respectively, strain-2) with probability 0.3 (respectively, 0.7), independently.
In all cases, we observe good agreement with our theoretical results.
.
17
Methods: We use the configuration model to create random random graphs with particular
degree distributions. In particular, we sample a degree sequence from the corresponding distribu-
tion, then we use the configuration model to construct a random graph with that degree sequence.
We use igraph [71] on both C++ and Python for simulations. Our simulation codes are available
online 4 . Unless otherwise stated, we start the process by selecting a node uniformly at random and
infecting it with strain-1. The node infects each neighbor independently with probability T1 . Each
of the infected neighbors mutate independently to strain-1 with probability µ11 , or to strain-2 with
probability µ12 . As the process continues to grow, both strains might exist in the population. An
intermediate node that becomes infected with strain-i would mutate to strain-1 with probability
µi1 , or strain-2 with probability µi2 , for i = 1, 2. When cycles start to appear, a susceptible node
could be exposed to multiple infections at once. If a node is exposed to x infections of strain-1
and y infections of strain-2 simultaneously, the node becomes infected with strain-1 (respectively,
strain-2) with probability x/(x + y) (respectively, y/(x + y)) for any non-negative constants x and
y. A node that receives infection at round i mutate first (by the end of round i) before it attempts
to infect her neighbors at round i + 1. The node is considered recovered at round i + 2, i.e., a node
is infective for only one round.
Observe that we have µ11 = µ21 and µ22 = µ12 for the second parameter set. Hence, an infected
node, regardless of what type of infection it has, mutates to strain-1 (respectively, strain-2) with
probability 0.3 (respectively, 0.7), independently. This is a special case that can easily be treated
by our formalism given in Section 4.
In Figure 3a and Figure 3b, we use the first parameter set and run 500 independent experi-
ments for each data point. We demonstrate our results on contact networks with Poisson degree
distribution (Figure 3a) and Power-law degree distribution with exponential cutoff (Figure 3b). For
Figure 3b, we set Γ = 15, and vary γ with the mean degree. In particular, the mean degree λ is
given by
Liγ−1 e−1/Γ
λ= . (7)
Liγ e−1/Γ
Hence, we can numerically solve (7) to obtain the particular value of γ corresponding to a given
value of λ.
In order to establish the validity of our analytic results given in Section 4, we plot the theoretical
values of S, S1 , and S2 obtained by solving the system of equations (4) with the corresponding
parameter set. We also plot a vertical line at the critical mean degree that corresponds to a phase
transition (see (5) and (6)). Clearly, our experimental results are in perfect agreement with our
4
https://github.com/reletreby/evolution.git
18
theoretical results on both contact networks. In Figure 3c and Figure 3d, we repeat the same
procedure, but with the second parameter set. Similarly, we observe perfect agreement with our
theoretical results on both contact networks.
- T1 = 0.2, T2 = 0.5, and µ11 = µ22 = 0.75 for Figure 4.a, and
- T1 = 0.4, T2 = 0.8, µ11 = 0.3 and µ22 = 0.7 for Figure 4.b.
Note that in Figure 4, we plot the probability of emergence conditioned on the initial node
receiving infection with strain-1 5 . We observe an agreement between our experimental results
and the theoretical results given in [33]. The reasoning behind this is intuitive; the multi-type
branching framework assumes that the underlying graph is tree-like, an assumption that works
best for networks with vanishingly small clustering coefficient, e.g., networks which are generated
by the configuration model.
19
Poisson Degree Distribution Poisson Degree Distribution
1 1
Probability of Emergence
Probability of Emergence
0.8 0.8
0.6 0.6
0.4 0.4
Prob. (Experiment) Prob. (Experiment)
Prob (Theoretical) Prob (Theoretical)
0.2 Transition Point 0.2 Transition Point
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(a) (b)
Figure 4: The probability of emergence on contact networks with Poisson degree distri-
bution. The network size n is 5 × 105 and the number of independent experiments for data point is
104 . Blue circles denote the empirical probability of emergence while the red line denotes the theo-
retical probability of emergence according to [33]. (a) We set T1 = 0.2, T2 = 0.5, µ11 = µ22 = 0.75.
(b) We set T1 = 0.4, T2 = 0.8, and µ11 = 0.3, and µ22 = 0.7. Our experimental results prove the
validity of the formalism presented by Alexander and Day in [33]
.
In other words, if the left hand side of (8) is strictly larger than 1, a giant component emerges
indicating an epidemic. Otherwise, we have self-limited outbreaks.
Comparing (2) to (8) suggests the proposal of a matching that results in the same condition for
phase transition. More precisely, if we are to set
T µ)
TBP = ρ (T (9)
then, both (2) and (8) collapse to the same condition for a given contact network. In what follows,
we explore the extent to which classical, single-type bond-percolation models (under the match-
ing condition (9)) may predict the threshold, probability, and final size of epidemics that entail
evolution, i.e., information or diseases that propagate according to the multiple-strain model given
in Section 3. We focus on contact networks with Poisson degree distribution, generated by the
configuration model, while we devote Section 6 for real-world networks.
In Figure 5, we extend Figure 4 by further adding the experimental results for the final epidemic
size as well as the corresponding theoretical values for the probability of emergence on a bond-
percolated network under the matching condition (9). Note that the probability of emergence is
equivalent to the final epidemic size for single-type, bond-percolated networks [3]. Observe that
the classical single-type bond-percolation model accurately captures the threshold and final size
of epidemic but provides significantly inaccurate predictions when it comes to the probability of
emergence. Similar pattern will be observed in Section 6 for real-world networks. This inaccuracy
sheds the light on a fundamental disconnect between the classical, single-type bond-percolation
models and real-life spreading processes that entail evolution. We explain the intuition behind our
findings in Appendix A.
20
Poisson Degree Distribution Poisson Degree Distribution
1 1 1 1
Probability of Emergence
Probability of Emergence
0.8 0.8 0.8 0.8
Epidemic Size
Epidemic Size
0.6 0.6 0.6 0.6
Prob. (Experiment)
Prob (Theoretical)
0.4 0.4 0.4 Transition Point 0.4
Prob. (Experiment) S (Experiment)
Prob (Theoretical) S (≡ Prob.) Bond Percolation
0.2 Transition Point 0.2 0.2 0.2
S (Experiment)
S (≡ Prob.) Bond Percolation
0 0 0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(a) (b)
Figure 5: Reduction to single-type bond-percolation. The network size n is 5 × 105 and the
number of independent experiments for each data point is 104 . Blue circles and brown plus signs
denote the empirical average epidemic size and the probability of emergence, respectively. The
navy blue line denotes the theoretical probability of emergence according to [33] while the red line
denotes the theoretical average epidemic size (as well as the probability of emergence) predicted by
the single-type bond-percolation framework under the matching condition (9). (a) We set T1 = 0.2,
T2 = 0.5, µ11 = µ22 = 0.75. (b) We set T1 = 0.4, T2 = 0.8, and µ11 = 0.3, and µ22 = 0.7. The
classical, single-type bond percolation models may accurately predict the threshold and final size
of epidemics, but their predictions on the probability of emergence are clearly inaccurate.
21
T and mutation matrix µ given by
T1 0 1−µ µ
T = and µ = .
0 T2 0 1
Assume also that T1 < T2 . Note that the process starts by picking a random individual uniformly
at random and infecting her with strain-1. Fix the mean degree of the underlying network to λ. Let
λ1 and λ2 denote the phase transition points (i.e., critical mean degrees) for a single-strain, bond-
percolated network with T1 and T2 , respectively. Observe that ρ (TTµµ) = T2 , hence, in view of (2),
the phase transition is entirely controlled by the parameters of strain-2, i.e., the phase transition
occurs at λ2 . Indeed, we can conclude from (2) that for λ < λ2 , the probability of emergence is
zero (in the limit of large network size). We can write
P [emergence] = P emergence at least one mutation × Pµ
+ P emergence no mutation × (1 − Pµ ) (10)
where Pµ denotes the probability that at some point along the chain of infections (starting from the
type-1 seed), a node would be infected by strain-1, but then mutate to strain-2. In other words, Pµ
captures the probability that at some point during the propagation,
a type-2 node would emerge.
Observe that for λ < λ1 , we have P emergence no mutation = 0 in the limit of large network
size (since P1BP = 0 on this interval), while for λ ≥ λ1 , we have Pµ = 1 in the limit of large network
size 6 . Hence, the second term in (10) is always zero in the limit of large network size, leading to
P [emergence] = P emergence at least one mutation × Pµ
Note that on the range λ2 ≤ λ < λ1 , we have P emergence at least one mutation = P2BP .
However, on the range λ ≥ λ1 , strain-1 nodes are able to form a giant component on their
own. Hence, in the cases where a strain-2 node emerges at some point, but fails to infect any
of her neighbors, strain-1 nodes could stillBPtrigger the emergence of the disease. It follows that
P emergence at least one mutation ≥ P2 on the range λ ≥ λ2 . Note that the bound is tight
whenever T2 is significantly larger than T1 . The reasoning behind this can be explained as follows.
Whenever T2 is significantly larger than T1 , the average number of secondary infections of strain-2
would be much larger than that of strain-1. Hence, infections with strain-2 would propagate much
faster and block potential pathways for strain-1 to propagate. In this case, the overall probability
of emergence becomes tightly controlled by P2BP . Next, we turn our attention to deriving Pµ .
Consider a tree of infections that starts with a single node infected with strain-1. Let H be
the probability that strain-2 never appears throughout the tree, i.e., H is the probability that the
tree of infections starting from the seed does not give rise to strain-2 at any intermediate point.
Similarly, let h be the probability that a subtree of infections starting from a type-1 host does not
give rise to strain-2 at any intermediate point. Recall that G(.) gives the PGF of the excess degree
distribution while g(.) gives the PGF of the degree distribution. By conditioning on the excess
degree as well the number of secondary infections, we get
∞ k−1
kpk X k − 1
X
h= (T1 (1 − µ))x (1 − T1 )k−1−x hx
hki x
k=1 x=0
6
When λ ≥ λ1 , a giant component of type-1 nodes emerges. Now, since µ > 0, and the number of nodes in the
giant component tends to infinity in the limit of large network size, the probability that none of the nodes mutate to
strain-2 is zero.
22
∞
X kpk
= (1 − T1 + T1 (1 − µ) h)k−1
hki
k=1
= G (1 − T1 + T1 (1 − µ) h) (11)
The validity of (11) can be explained as follows. Note that the root of any subtree, say node v,
has already used an edge to receive an infection with strain-1 from her parent. Hence, if the degree
of node v is k, then node v is only using k − 1 edges to infect her offspring, leading us to use the
excess degree distribution. Furthermore, conditioned on the excess degree being k − 1, the number
of secondary infections of each type generated by node v is given by a multinomial distribution
characterized by (k − 1, T1 (1 − µ), T1 µ, 1 − T1 ). In particular, conditioned on node v being type-1
and having an excess degree of k − 1, the probability of generating x infections of type-1 and y
infections of type-2 is given by
k−1 k−1−x
(T1 (1 − µ))x (T1 µ)y (1 − T1 )k−1−x−y
x y
However, the only relevant term for the computation of h is the one with y = 0, as all other terms
with y > 0 are contributing with a zero probability to h by definition. Finally, hx denotes the
probability that the subtrees emanating from the current x offspring are themselves free of any
strain-2 node.
Recall that H denotes the probability that strain-2 never appears throughout the tree (starting
from the root) and note that if the tree root has degree k, then all of these k edges will be utilized
to connect with her neighbors at the lower level. Hence, in view of (11), we can write
H = g (1 − T1 + T1 (1 − µ) h∞ )
where h∞ denotes the steady-state solution of (11). It is now immediate that Pµ = 1 − H, leading
to
P [emergence] ≥ (1 − H) P2BP (12)
To confirm the validity of (12), we run a computer simulation on random networks generated
by the configuration model with Poisson degree distribution. In Figure 6, we set the network size
n = 2 × 105 and perform 104 independent experiments for each data point. In Figure 6a, we set
T1 = 0.1, T2 = 1, and µ = 0.01. Observe that the bound given by (12) is tight, as T2 is significantly
larger than T1 . In general, we would expect a tight bound whenever λ2 ≤ λ < λ1 (i.e., 1 ≤ λ < 10
for the given parameter set). As λ increases beyond λ1 , the tightness of the bound depends on the
ratio between T2 to T1 . This is illustrated in Figure 6b for the case when T1 = 0.2 and T2 = 0.3.
The availability of an explicit expression for the probability of mutation allows for exploring
the effects of mutation on the overall probability of emergence. Indeed, the way the probability of
emergence behaves with respect to changes in the mean degree resembles, to a great extent, the
way Pµ behaves, as illustrated in Figure 6. Hence, in what follows, we focus on the behavior of Pµ
with respect to changes in the mean degree. In Figure 7, we set T1 = 0.1 and plot Pµ against the
mean degree for a network with Poisson degree distribution. We observe that different values for
µ impacts the shape of Pµ (hence, the probability of emergence) in a remarkable way. Firstly, for
all values of µ ∈ (0, 1), the behavior of Pµ appears to be strikingly different than the universality
class of percolation models, e.g., see the shape of the probability of emergence (respectively, P2B )
in Figure 4 (respectively, Figure 6). Secondly, the effect of mutation probabilities on Pµ appears
23
Poisson Degree Distribution Poisson Degree Distribution
1 1
Probability of Emergence
Probability of Emergence
Prob. (Experiment)
Pµ × P2BP
0.8 0.8
Prob. (Experiment) P2BP
Pµ × P2BP Pµ
P2BP Transition Point
0.6 Transition Point 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 2 4 6 8 10
Mean Degree Mean Degree
(a) (b)
Figure 6: Approximating the probability of emergence: The network size n is 2 × 105 and
the number of independent experiments for each data point is 104 . Blue circles denote the empirical
probability of emergence while the red line denotes the theoretical approximation of the probability
of emergence according to (12). The light blue dashed line denotes the probability of emergence for
a single-strain, bond-percolated network with T2 . (a) We set T1 = 0.1, T2 = 1, and µ = 0.01. (b)
We set T1 = 0.2, T2 = 0.3, and µ = 0.01. We observe good agreement between the experimental
results and the theoretical approximation given by (12) whenever λ2 ≤ λ < λ1 or whenever T2 is
significantly larger than T1 .
to be significant as the mean degree increases from small values, reaches its peak right before the
critical mean degree corresponding to P1BP , then decays as the mean degree increases further.
The reasoning behind the aforementioned observation is intuitive. Recall that the process
starts with a single infection with strain-1 and note that Pµ is influenced by the structure of the
underlying contact network, the transmissibility of strain-1, and the particular value of µ. As the
mean degree λ increases towards λ1 , the length of the tree of infections starting from the seed 7 also
increases, however, no cycles appear and the epidemic propagates on a finite, tree-like percolated
network (since λ < λ1 ). Increasing the length of the tree increases the probability that at least one
intermediate node would mutate to strain-2, but the fact that the tree is finite makes the particular
value of µ very crucial to Pµ . Namely, a small value of µ makes it less likely that a mutant emerges
before the chain of infections is terminated, while a relatively larger value could drive the emergence
of strain-2 and lead the epidemic to escape extinction. Put differently, the finiteness of the chain
of infections when λ < λ1 creates a limited number of opportunities for mutation, causing the
particular value of µ to bear the burden of generating a mutant and driving the whole process to
emergence. However, as λ increases beyond λ1 , cycles start to appear and a giant component of
nodes infected with strain-1 emerges. In this case, the chain of infections is no longer finite, and
any positive value of µ results in a mutation almost surely in the limit of large network size. Put
differently, when λ ≥ λ1 , the structure of the underlying network starts to facilitate the emergence
of strain-2, hence reducing the dependence on µ.
7
The length of the tree of infections can be interpreted as the size of the component (of a bond percolated network
with T1 ) that contains the seed.
24
Poisson Degree Distribution
1 1
P0.4 − P0.01
0.5
0.8
0
0 10 20 30 40
Mean Degree
0.6
Pµ 0.4 µ = 0.4
µ = 0.3
µ = 0.2
0.2 µ = 0.1
µ = 0.01
Transition Point for P1BP
0
5 10 15 20 25
Mean Degree
Figure 7: Effect of Mutation: We set T1 = 0.1 and plot the behavior of Pµ against the mean
degree for a network with Poisson degree distribution. Intuitively, different values of µ have different
impact on Pµ . The impact is pronounced before the critical mean degree corresponding to a single-
strain, bond-percolated network with T1 . Inset: The difference between the value of Pµ when
µ = 0.4 and the value of Pµ when µ = 0.01 as a function of the mean degree of the underlying
contact network.
- Facebook [55, 75]: The contact network among the friends of 10 users (including those 10
users).
- Twitter [55,75]: The contact network among the friends of 1000 users (including those 1000
users).
- Slashdot [55, 76]: The network contains friend/foe links between the users of Slashdot.
- Higgs [55,77]: The Higgs data set has been collected upon monitoring the spreading processes
on Twitter before, during and after the announcement of the discovery of a new particle with
25
Network |N | |E| λoriginal Φoriginal Φ{λ=1} Φ{λ=10} Φrandom
Facebook 4, 039 88, 234 43.7 0.519 0.011 0.117 0.0107
Twitter 81, 306 1, 342, 296 33 0.170 0.005 0.051 0.0004
Slashdot 82, 168 504, 230 12.3 0.024 0.001 0.019 0.0001
Higgs 456, 626 12, 508, 413 54.8 0.008 0.0001 0.001 0.0001
High school 773 6342 16.4 0.094 0.019 0.059 0.020
Hospital 73 543 14.87 0.446 0.090 0.296 0.183
Figure 8: Real-world contact networks. We consider four real-world contact networks in the
context of information propagation, namely, Facebook, Twitter, Slashdot, and Higgs networks from
SNAP [55] dataset. We also consider two real-world contact networks in the context of infectious
disease propagation, namely, a contact network among students, teachers, and staff at a US high
school [73] and a contact network among professional staff and patients in a hospital in Lyon,
France [74]. For each network, we indicate the number of nodes |N |, the number of edges |E|, the
mean degree of the original network λoriginal , and the clustering coefficient of the original network
Φoriginal . Φ{λ=1} (respectively, Φ{λ=10} ) denotes the clustering coefficient of the original network
after removing a random subset of edges such that the resulting mean degree is 1 (respectively, 10).
Φrandom denotes the average clustering coefficient (over 200 independent realizations) of a random
network generated by the configuration model with Poisson degree distribution. The random
network has the same number of nodes and the same (original) mean degree of the corresponding
real-world network.
the features of the elusive Higgs boson on July 4, 2012. Nodes correspond to the authors of
the collected tweets and edges represent the followee/follower relationships between them.
In the context of infectious disease propagation, we consider the following two contact networks:
- High school network [73]: The contact network observed at a US high school during a typical
school day. The dataset covers 762, 868 interactions between students, teachers, and staff.
Each interaction between two individuals is characterized by their identification numbers as
well as the duration of the interaction. Two individuals could have multiple interactions
throughout the day, and we sum the durations of these interactions to calculate the total
contact time between these two individuals over the whole day. We proceed by sampling a
static graph out of this dataset, by assigning an edge between nodes u and v with probability
tuv /tmax where tuv denotes the total contact time between nodes u and v throughout the day
and tmax denotes the maximum total contact time observed in the dataset.
- Hospital network [74]: The contact network observed in a short stay geriatric unit of a
university hospital in Lyon, France. The dataset covers five days of interactions between
professional staff members and patients. Similar to the high school network, we compute the
total contact time between two individuals (over the span of five days), then we sample a
static graph out of the dataset, by assigning an edge between nodes u and v with probability
tuv /tmax .
More details on the networks, including their clustering coefficients are given in Figure 8. We
assume that all edges are unidirectional.
26
6.1 Methods
To conduct a fair comparison between the formalism given in Section 4.A and the single-type bond
percolation framework, we fix the parameters of the transmissibility matrix T and the mutation
Tµ
matrix µ , hence fixing ρ (T µ) and TBP (according to (9)). We vary the mean degree, denoted λ, for
each of the contact networks between 1 and 10. For each value of λ, we remove a random subset of
edges such that the resulting network is of mean degree λ (approximately). Note that the random
removal of edges would indeed lower the clustering coefficient of the network, however, the resulting
subgraph would remain highly clustered compared to random networks with the same mean degree
(see Figure 8). In other words, the sampled networks still exhibit specific structural properties that
distinguish them from synthetic contact networks generated randomly by the configuration model
(with Poisson degree distribution of the same mean degree). After the mean degree is adjusted,
the process proceeds similar to Section 5.B.
6.2 Results
In Figure 9, we plot the probability of emergence for the four contact networks shown in Figure 8.
We compare the results obtained by computer simulations with those obtained by the multiple-
strain formalism (Section 4.A) and the single-type bond-percolation framework. We set T1 = 0.2,
T2 = 0.5, and µ11 = µ22 = 0.75. It follows that TBP = 0.4 according to (9).
Similar to our observations on random networks (Section 5.E), the single-type, bond-percolation
framework provides significantly inaccurate predictions on the probability of emergence, should the
underlying process entail evolution. The limitation is universal as it applies to both random and
real-world networks. Appendix A explains the intuition behind our observations. In contrast, the
multiple-strain formalism provides remarkably accurate predictions, especially on contact networks
with low clustering coefficient. Note that the multi-type branching framework assumes that the
underlying graph is tree-like; an assumption that holds for networks with small clustering coefficient.
Hence, one could reasonably argue that the multiple-strain formalism would provide high prediction
accuracy on such networks.
27
In this section, we seek to shed the light on the effects of co-infection on information/disease
propagation. In particular, we investigate the extent to which co-infection dynamics could enhance
or suppress the scale of epidemics. Of particular interest is whether co-infection could change
the order of phase transition from second-order (as it is the case with most epidemic models) to
first-order, leading to a phenomenon that is commonly described as avalanche outbreaks [59]. To
that end, we extend the multiple-strain model given in Section 3 to account for co-infection. In
particular, a susceptible individual who comes into infectious contacts with type-1 and type-2 hosts
simultaneously becomes co-infected and starts to spread the co-infection. Henceforth, we consider
the case when the co-infection has its own transmissibility Tco and does not mutate back to either
strain-1 or strain-2. In other words, a co-infected host infects each of her neighbors independently
with probability Tco , and infected neighbors are deemed co-infected with probability 1.
As with Section 5, we consider contact networks with Poisson degree distribution and Power-law
degree distribution with exponential cutoff, respectively. For both cases, we set T1 = 0.2, T2 = 0.5,
and µ11 = µ22 = 0.75. Moreover, we set the network size to 2 × 106 and the number of independent
experiments for each data point to 5×103 . To illustrate how co-infection dynamics control the order
of phase transition, we simulate and compare the process for two values of Tco , namely Tco = 0.1 and
Tco = 0.8. Finally, we plot the epidemic size, denoted by sBP co , for a single-strain, bond-percolated
network [3].
In all cases, co-infection emerges at the phase transition point that characterizes an epidemic
of strain-1 and strain-2, i.e., the mean degree for which ρ(MM ) = 1, where M is given by
2
hk i − hki
T1 0 µ11 µ12
M =
hki 0 T2 µ21 µ22
As seen in Figure 10, a first-order phase transition is observed on both contact networks when
Tco = 0.8 due to the corresponding first order transition of Sco . In particular, the value of Sco
jumps discontinuously from zero to (approximately) the corresponding value of Sco BP for a single-
strain, bond-percolated network with Tco = 0.8. Hence, a first-order phase transition is observed.
In general, we conjecture that a first-order phase transition emerges whenever Tco is large enough
such that ScoBP > 0 at the critical point ρ(M BP = 0 when
M ) = 1. If, however, Tco is small such that Sco
ρ(MM ) = 1, then a second-order phase transition is observed. This is confirmed by our simulation
results for the case when Tco = 0.1.
In order to validate the order of phase transition when Tco = 0.8, we conduct an extensive
simulation study around the phase transition point on both contact networks. In Figure 11, we set
the number of nodes n to 15 × 106 (to alleviate finite size effects) and the number of experiments
to 104 for each data point. We use the same parameters that were used to generate Figure 10,
i.e., T1 = 0.2, T2 = 0.5, and µ11 = µ22 = 0.75. Our results confirm that the phase-transition is
indeed first order on both contact networks. In fact, the value of Sco jumps discontinuously to
(approximately) the corresponding value of Sco BP with T = 0.8.
co
8 Conclusion
In this paper, we have investigated the evolution of spreading processes on complex networks
and developed a mathematical theory that unravels the relationship between the characteristics of
the spreading process, evolution, and the structure of the contact network on which the process
spreads. Our mathematical theory was complemented by an extensive simulation study on both
28
random and real-world contact networks. The simulation results proved the validity of our theory
and revealed the significant shortcomings of the classical mathematical models that do not capture
evolution. A matching condition between single- and multiple-strain models was proposed and
evaluated in the context of probability of emergence, epidemic size, and epidemic threshold. Under
the proposed matching condition, our results revealed that the classical bond-percolation models
may accurately predict the threshold and final size of epidemics that entail evolution, but their
predictions on the probability of emergence are significantly inaccurate on both random and real-
world networks. Hence, our formalism is necessary to bridge the disconnect between how spreading
processes propagate and evolve on complex networks, and the current mathematical models that
do not capture evolution.
We proceeded by deriving a lower bound on the probability of emergence to gain further insights
on the effects of mutation. The bound was derived for the special case of one-step irreversible
mutation. Our results revealed that the probability of mutation plays a key role in determining
the shape and behavior of the probability of emergence. Moreover, the way the particular value
of µ influences the probability of mutation varies according to the connectivity of the underlying
contact network. Finally, we considered the case when co-infection is possible and showed that
co-infection dynamics control the order of phase transition in an interesting way. In particular,
depending on co-infection dynamics, the order of phase transition of the epidemic size could change
from second-order to first-order, in contrast to the universality class of percolation models that are
typically second-order.
Acknowledgement
This work has been supported (in part) by the National Science Foundation through grant CCF-
1813637, (in part) by the Army Research Office through grant W911NF-17-1-0587, and (in part) by
the Office of Naval Research through grants N0001418SB001 and N000141512797. The first author
was funded in part by the Dowd Fellowship from the College of Engineering at Carnegie Mellon
University. The authors would like to thank Philip and Marsha Dowd for their financial support
and encouragement. The first author would like to thank Ms. Mary Turocy from the School of
Medicine at University of California San Francisco for her helpful and constructive comments.
References
[1] A.-L. Barabási and M. Pósfai, Network science. Cambridge university press, 2016.
[2] C. Fraser, S. Riley, R. M. Anderson, and N. M. Ferguson, “Factors that make an infectious
disease outbreak controllable,” Proceedings of the National Academy of Sciences of the United
States of America, vol. 101, no. 16, pp. 6146–6151, 2004.
[3] M. E. Newman, “Spread of epidemic disease on networks,” Phys. Rev. E, vol. 66, no. 1, p.
016128, 2002.
29
[5] R. M. Anderson, R. M. May, and B. Anderson, Infectious diseases of humans: dynamics and
control. Wiley Online Library, 1992, vol. 28.
[6] R. Pastor-Satorras and A. Vespignani, “Epidemic dynamics and endemic states in complex
networks,” Phys. Rev. E, vol. 63, no. 6, p. 066117, 2001.
[8] C. Granell, S. Gómez, and A. Arenas, “Competing spreading processes on multiplex networks:
awareness and epidemics,” Phys. Rev. E, vol. 90, no. 1, p. 012808, 2014.
[9] D. M. Morens, G. K. Folkers, and A. S. Fauci, “The challenge of emerging and re-emerging
infectious diseases,” Nature, vol. 430, no. 6996, p. 242, 2004.
[10] N. D. Wolfe, C. P. Dunavan, and J. Diamond, “Origins of major human infectious diseases,”
Nature, vol. 447, no. 7142, p. 279, 2007.
[14] C. I. Siettos and L. Russo, “Mathematical modeling of infectious disease dynamics,” Virulence,
vol. 4, no. 4, pp. 295–306, 2013.
[16] M. J. Keeling and P. Rohani, Modeling infectious diseases in humans and animals. Princeton
University Press, 2011.
[17] S. Bansal, B. T. Grenfell, and L. A. Meyers, “When individual behaviour matters: homoge-
neous and network models in epidemiology,” Journal of the Royal Society Interface, vol. 4,
no. 16, pp. 879–891, 2007.
[18] M. J. Keeling and K. T. Eames, “Networks and epidemic models,” Journal of the Royal Society
Interface, vol. 2, no. 4, pp. 295–307, 2005.
[20] J. C. Miller and I. Z. Kiss, “Epidemic spread in networks: Existing methods and current
challenges,” Mathematical modelling of natural phenomena, vol. 9, no. 2, pp. 4–42, 2014.
30
[21] R. Durrett, “Some features of the spread of epidemics and information on a random graph,”
Proceedings of the National Academy of Sciences, vol. 107, no. 10, pp. 4491–4498, 2010.
[22] Y. Zhuang and O. Yağan, “Information propagation in clustered multilayer networks,” IEEE
Transactions on Network Science and Engineering, vol. 3, no. 4, pp. 211–224, 2016.
[23] O. Yağan, D. Qian, J. Zhang, and D. Cochran, “Conjoining speeds up information diffusion
in overlaying social-physical networks,” IEEE Journal on Selected Areas in Communications,
vol. 31, no. 6, pp. 1038–1048, 2013.
[24] L. Huang, K. Park, and Y.-C. Lai, “Information propagation on modular networks,” Phys.
Rev. E, vol. 73, no. 3, p. 035103, 2006.
[25] Y. Moreno, M. Nekovee, and A. F. Pacheco, “Dynamics of rumor spreading in complex net-
works,” Phys. Rev. E, vol. 69, no. 6, p. 066130, 2004.
[26] P. S. Dodds and D. J. Watts, “Universal behavior in a generalized model of contagion,” Phys.
Rev. Letters, vol. 92, no. 21, p. 218701, 2004.
[27] F. D. Sahneh, C. Scoglio, and P. Van Mieghem, “Generalized epidemic mean-field model for
spreading processes over multilayer complex networks,” IEEE/ACM Transactions on Network-
ing, vol. 21, no. 5, pp. 1609–1620, 2013.
[29] M. E. Newman, S. Forrest, and J. Balthrop, “Email networks and the spread of computer
viruses,” Phys. Rev. E, vol. 66, no. 3, p. 035101, 2002.
[31] D. Qian, O. Yağan, L. Yang, and J. Zhang, “Diffusion of real-time information in social-physical
networks,” in IEEE GLOBECOM, 2012, pp. 2072–2077.
[33] H. Alexander and T. Day, “Risk factors for the evolutionary emergence of pathogens,” Journal
of The Royal Society Interface, vol. 7, no. 51, pp. 1455–1474, 2010.
[34] R. Antia, R. R. Regoes, J. C. Koella, and C. T. Bergstrom, “The role of evolution in the
emergence of infectious diseases,” Nature, vol. 426, no. 6967, p. 658, 2003.
[35] K. S. Pfennig, “Evolution of pathogen virulence: the role of variation in host phenotype,”
Proceedings of the Royal Society of London B: Biological Sciences, vol. 268, no. 1468, pp.
755–760, 2001.
31
[36] L. A. Adamic, T. M. Lento, E. Adar, and P. C. Ng, “Information evolution in
social networks,” in ACM WSDM, 2016, pp. 473–482. [Online]. Available: http:
//doi.acm.org/10.1145/2835776.2835827
[37] Y. Zhang, S. Zhou, Z. Zhang, J. Guan, and S. Zhou, “Rumor evolution in social networks,”
Phys. Rev. E, vol. 87, no. 3, p. 032133, 2013.
[42] H.-D. Song, C.-C. Tu, G.-W. Zhang, S.-Y. Wang, K. Zheng, L.-C. Lei, Q.-X. Chen, Y.-W.
Gao, H.-Q. Zhou, H. Xiang, H.-J. Zheng, S.-W. W. Chern, F. Cheng, C.-M. Pan, H. Xuan,
S.-J. Chen, H.-M. Luo, D.-H. Zhou, Y.-F. Liu, J.-F. He, P.-Z. Qin, L.-H. Li, Y.-Q. Ren, W.-J.
Liang, Y.-D. Yu, L. Anderson, M. Wang, R.-H. Xu, X.-W. Wu, H.-Y. Zheng, J.-D. Chen,
G. Liang, Y. Gao, M. Liao, L. Fang, L.-Y. Jiang, H. Li, F. Chen, B. Di, L.-J. He, J.-Y. Lin,
S. Tong, X. Kong, L. Du, P. Hao, H. Tang, A. Bernini, X.-J. Yu, O. Spiga, Z.-M. Guo, H.-Y.
Pan, W.-Z. He, J.-C. Manuguerra, A. Fontanet, A. Danchin, N. Niccolai, Y.-X. Li, C.-I. Wu,
and G.-P. Zhao, “Cross-host evolution of severe acute respiratory syndrome coronavirus in
palm civet and human,” Proceedings of the National Academy of Sciences, vol. 102, no. 7, pp.
2430–2435, 2005. [Online]. Available: http://www.pnas.org/content/102/7/2430
[44] O. Balmer and M. Tanner, “Prevalence and implications of multiple-strain infections,” The
Lancet infectious diseases, vol. 11, no. 11, pp. 868–878, 2011.
[45] H. Susi, B. Barrès, P. F. Vale, and A.-L. Laine, “Co-infection alters population dynamics of
infectious disease,” Nature communications, vol. 6, p. 5975, 2015.
[46] A. F. Read and L. H. Taylor, “The ecology of genetically diverse infections,” Science, vol. 292,
no. 5519, pp. 1099–1102, 2001.
32
[48] S. Alizon, J. C. de Roode, and Y. Michalakis, “Multiple infections and the evolution of viru-
lence,” Ecology letters, vol. 16, no. 4, pp. 556–567, 2013.
[50] J. P. Gleeson and D. J. Cahalane, “Seed size strongly affects cascades on random
networks,” Phys. Rev. E, vol. 75, p. 056103, May 2007. [Online]. Available: https:
//link.aps.org/doi/10.1103/PhysRevE.75.056103
[51] J. P. Gleeson, “Cascades on correlated and modular random networks,” Phys. Rev. E, vol. 77, p.
046117, Apr 2008. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.77.046117
[52] M. Molloy and B. Reed, “A critical point for random graphs with a given degree sequence,”
Random structures & algorithms, vol. 6, no. 2-3, pp. 161–180, 1995.
[53] B. Bollobás, Random graphs. Cambridge university press, 2001, vol. 73.
[54] M. E. Newman, S. H. Strogatz, and D. J. Watts, “Random graphs with arbitrary degree
distributions and their applications,” Phys. Rev. E, vol. 64, no. 2, p. 026118, 2001.
[55] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http:
//snap.stanford.edu/data, Jun. 2014.
[56] L. J. Allen, F. Brauer, P. Van den Driessche, and J. Wu, Mathematical epidemiology. Springer,
2008, vol. 1945.
[57] C. Moore and M. E. Newman, “Exact solution of site and bond percolation on small-world
networks,” Phys. Rev. E, vol. 62, no. 5, p. 7059, 2000.
[58] L. Meyers, “Contact network epidemiology: Bond percolation applied to infectious disease
prediction and control,” Bulletin of the American Mathematical Society, vol. 44, no. 1, pp.
63–86, 2007.
[60] N. Azimi-Tafreshi, “Cooperative epidemics on multiplex networks,” Phys. Rev. E, vol. 93, p.
042303, Apr 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.93.042303
[62] P.-B. Cui, F. Colaiori, and C. Castellano, “Mutually cooperative epidemics on power-
law networks,” Phys. Rev. E, vol. 96, p. 022301, Aug 2017. [Online]. Available:
https://link.aps.org/doi/10.1103/PhysRevE.96.022301
33
[63] M. E. Woolhouse, D. T. Haydon, and R. Antia, “Emerging pathogens: the epidemiology and
evolution of species jumps,” Trends in ecology & evolution, vol. 20, no. 5, pp. 238–244, 2005.
[65] M. S. Klempner and D. S. Shapiro, “Crossing the species barrier–one small step to man, one
giant leap to mankind,” New England Journal of Medicine, vol. 350, no. 12, pp. 1171–1172,
2004.
[66] C. J. Mode, Multitype branching processes: theory and applications. American Elsevier Pub.
Co., 1971, vol. 34.
[69] G. U. Yule et al., “Ii.a mathematical theory of evolution, based on the conclusions of dr. jc
willis, fr s,” Phil. Trans. R. Soc. Lond. B, vol. 213, no. 402-410, pp. 21–87, 1925.
[71] G. Csardi and T. Nepusz, “The igraph software package for complex network research,”
InterJournal, vol. Complex Systems, p. 1695, 2006. [Online]. Available: http://igraph.org
[72] C. S. Gokhale, Y. Iwasa, M. A. Nowak, and A. Traulsen, “The pace of evolution across fitness
valleys,” Journal of Theoretical Biology, vol. 259, no. 3, pp. 613–620, 2009.
[74] P. Vanhems, A. Barrat, C. Cattuto, J.-F. Pinton, N. Khanafer, C. Rgis, B.-a. Kim, B. Comte,
and N. Voirin, “Estimating potential infection transmission routes in hospital wards using
wearable proximity sensors,” PLOS ONE, vol. 8, no. 9, pp. 1–9, 09 2013.
[75] J. Leskovec and J. J. Mcauley, “Learning to discover social circles in ego networks,” in Advances
in neural information processing systems, 2012, pp. 539–547.
[77] M. De Domenico, A. Lima, P. Mougel, and M. Musolesi, “The anatomy of a scientific rumor,”
Scientific reports, vol. 3, p. 2980, 2013.
34
[78] J. C. de Roode, M. E. Helinski, M. A. Anwar, and A. F. Read, “Dynamics of multiple in-
fection and within-host competition in genetically diverse malaria infections,” The American
Naturalist, vol. 166, no. 5, pp. 531–542, 2005.
[79] C. Lord, B. Barnard, K. Day, J. Hargrove, J. McNamara, R. Paul, K. Trenholme, and M. Wool-
house, “Aggregation and distribution of strains in microparasites,” Philosophical Transactions
of the Royal Society B: Biological Sciences, vol. 354, no. 1384, pp. 799–807, 1999.
[80] F. Cézilly, M.-J. Perrot-Minnot, and T. Rigaud, “Cooperation and conflict in host manipu-
lation: interactions among macro-parasites and micro-organisms,” Frontiers in microbiology,
vol. 5, p. 248, 2014.
[81] C. Tollenaere, H. Susi, and A.-L. Laine, “Evolutionary and epidemiological implications of
multiple infection in plants,” Trends in plant science, vol. 21, no. 1, pp. 80–90, 2016.
[82] S. Lass, P. J. Hudson, J. Thakar, J. Saric, E. Harvill, R. Albert, and S. E. Perkins, “Generating
super-shedders: co-infection increases bacterial load and egg production of a gastrointestinal
helminth,” Journal of the Royal Society Interface, vol. 10, no. 80, p. 20120588, 2013.
[83] E. Kenah and J. M. Robins, “Second look at the spread of epidemics on networks,” Phys. Rev.
E, vol. 76, no. 3, p. 036113, 2007.
35
Later on, Kenah and Robins [83] proved that this isomorphism to a bond-percolation prob-
lem is valid only when the distribution of the infectious periods is degenerate, i.e., τi = τ0 for
all i = 1, 2, . . ., where τ0 is a constant. Kenah and Robins showed that when the distribution of
the infectious periods is non-degenerate, there is no bond-percolation probability that will make
the bond-percolation model isomorphic to the SIR model. The fundamental reason behind their
findings is the fact that the infection events across edges emanating from node i are conditionally
independent given τi , but marginally dependent unless τi = τ0 with probability one. That said, Ke-
nah and Robins showed that even when the distribution of the infectious periods is non-degenerate,
the mapping to a bond-percolation process can still be used to accurately predict the epidemic
threshold and epidemic size.
The multiple-strain model presented by Alexander and Day exhibits a similar form of correla-
tions between infection events. In particular, infection events are conditionally independent given
the type of the infective node. Namely, conditioned on node i being infected with strain-`, node
i infects each of her neighbors independently with probability T` . However, infection events are
marginally dependent, unless Ti = T0 for all i with probability one; a condition that essentially
reduces the dynamics to that of single-strain processes without evolution. To give an example, con-
sider a regular network, where each node has exactly 2 neighbors. Let T1 = 1 and µ11 = µ21 = µ.
In this case, we have TBP = µ + T2 (1 − µ) by virtue of (9). Now, we can easily compute the
probability that an infection of a randomly selected node results in an outbreak of size one. Un-
der the bond percolation framework, this is given by (1 − TBP )2 = (1 − µ − T2 (1 − µ))2 . However
the multiple-strain formalism predicts a zero probability for this event, should the initial node be
infected with strain-1. Indeed, the probability predicted by the bond percolation framework will
match the one predicted by the multiple-strain formalism only if T2 = 1 or µ = 1; a condition that
diminishes the role of evolution and reduces the dynamics into that of single-strain processes.
36
Facebook Network Twitter Network
0.8 0.8
Prob. of Emergence
Prob. of Emergence
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
Prob. (Theoretical) Prob. (Theoretical)
0.2 0.2
Prob. (Experiment) Prob. (Experiment)
Prob. Bond Percolation Prob. Bond Percolation
0.1 0.1
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(a) (b)
Prob. of Emergence
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(c) (d)
0.8
Prob. of Emergence
Prob. of Emergence
0.8 0.7
0.6
0.6
0.5
0.4
0.4
0.3
Prob. (Theoretical)
Prob. (Theoretical) 0.2 Prob. (Experiment)
0.2
Prob. (Experiment) Prob. Bond Percolation
Prob. Bond Percolation 0.1
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(e) (f)
37
Poisson Degree Distribution - Tco = 0.8 Poisson Degree Distribution - Tco = 0.1
1 1
S|{Sco > 0}
S|{Sco > 0} S
0.8 S 0.8 S1
Epidemic Size
Epidemic Size
S1 S2
S2 Sco
Sco BP
Sco
0.6 BP 0.6
Sco
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(a) (b)
Power-law Degree Distribution - Tco = 0.8 Power-law Degree Distribution - Tco = 0.1
1 0.8
S|{Sco > 0}
0.7 S
0.8 S1
Epidemic Size
Epidemic Size
0.6 S2
S|{Sco > 0} Sco
BP
S 0.5 Sco
0.6
S1
S2 0.4
Sco
0.4 BP
Sco 0.3
0.2
0.2
0.1
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Mean Degree Mean Degree
(c) (d)
Figure 10: Co-infection dynamics determine the order of phase transition. We set T1 =
0.2, T2 = 0.5, and µ11 = µ22 = 0.75 for all subfigures. The network size n is 2 × 106 and the number
of independent experiments for each data point is 5 × 103 . Blue circles denote the average total
epidemic size S and red stars denote the average total epidemic size S conditioned on Sco being
greater than zero, i.e., conditioned on the existence of a positive fraction of co-infected nodes. Blue
plus signs, orange triangles, and yellow squares denote the fraction of nodes infected with strain-1,
strain-2, and co-infection, respectively. The black dashed-line denotes the epidemic size for a single-
BP . (a) and (c): A first order phase transition is
strain, bond-percolated network with Tco , i.e., Sco
observed when Tco = 0.8 owing to the corresponding first order transition of Sco . Co-infection
emerges at the phase transition point that characterizes an epidemic of strain-1 and strain-2. At
this point, the value of Sco jumps discontinuously to (approximately) the corresponding value of
BP with T
Sco BP
co = 0.8. Observe that Sco > 0 at the transition point, hence, a first-order phase
transition is observed. (b) and (d): Co-infection still emerges right at the phase transition point.
BP = 0 at the transition point. Hence, a second-order phase transition
However, since Tco is small, Sco
is observed.
38
Poisson Degree Distribution - Tco = 0.8 Power-law Degree Distribution - Tco = 0.8
1 0.5
S1
S2
0.8 0.4 Sco
Epidemic Size
Epidemic Size
BP
Sco
Transition Point
0.6 S1 0.3
S2
Sco
BP
0.4 Sco 0.2
Transition Point
0.2 0.1
0 0
2.5 2.52 2.54 2.56 2.58 2.6 1.5 1.52 1.54 1.56 1.58 1.6
Mean Degree Mean Degree
(a) (b)
Figure 11: Validating the order of phase transition. We set the network size n to 15 × 106 ,
the number of independent experiments for each data point to 104 , T1 = 0.2, T2 = 0.5, and
µ11 = µ22 = 0.75. Our results confirm that the phase-transition is indeed first order on both
contact networks. The value of Sco jumps discontinuously to (approximately) the corresponding
BP with T = 0.8.
value of Sco co
39