0% found this document useful (0 votes)

94 views12 pages

Genotyping Methods for SARS-CoV-2

The emerging global infectious COVID-19 coronavirus disease by novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats to global public health and the economy since it was identified in late December 2019 in China. The virus has gone through various pathways of evolution.

Uploaded by

jay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views12 pages

Genotyping Methods for SARS-CoV-2

Uploaded by

jay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Genotyping coronavirus SARS-CoV-2: methods and

implications

Changchuan Yin ∗
arXiv:2003.10965v1 [q-bio.GN] 24 Mar 2020

Department of Mathematics, Statistics, and Computer Science

University of Illinois at Chicago
Chicago, IL 60607
USA

Abstract

The emerging global infectious COVID-19 coronavirus disease by novel Severe

Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats
to global public health and the economy since it was identified in late December
2019 in China. The virus has gone through various pathways of evolution. For
understanding the evolution and transmission of SARS-CoV-2, genotyping of virus
isolates is of great importance. We present an accurate method for effectively
genotyping SARS-CoV-2 viruses using complete genomes. The method employs
the multiple sequence alignments of the genome isolates with the SARS-CoV-2
reference genome. The SNP genotypes are then measured by Jaccard distances to
track the relationship of virus isolates. The genotyping analysis of SARS-CoV-2
isolates from the globe reveals that specific multiple mutations are the predominated
mutation type during the current epidemic. Our method serves a promising tool for
monitoring and tracking the epidemic of pathogenic viruses in their gradual and
local genetic variations. The genotyping analysis shows that the genes encoding
the S proteins and RNA polymerase, RNA primase, and nucleoprotein, undergo
frequent mutations. These mutations are critical for vaccine development in disease
control.

1 Highlights
• We genotyped 558 SARS-CoV-2 isolates from the globe as of March 23, 2020.
• Frequent mutations in SARS-CoV-2 genomes are in the genes encoding the S protein and
RNA polymerase, RNA primase, and nucleoprotein.
• We established a method for monitoring and tracing SARS-CoV-2 mutations.

2 Introduction
The novel coronavirus in humans, first discovered in Wuhan, China, in December 2019, was initially
named as 2019-nCoV and then designated as SARS-CoV-2 due to its taxonomic and genomic
relationships with the species Severe acute respiratory syndrome-related coronavirus (Gorbalenya
et al., 2020). The present outbreak of the coronavirus-associated acute respiratory disease is named
coronavirus disease 19 (COVID-19) by WHO. Since the epidemic of COVID-19, more than 332, 930
people from 147 countries and territories have been confirmed sicked and more than 14, 510 have
died from the rapidly-spreading SARS-CoV-2 virus as of March 23, 2020 (WHO, 2020).
∗ ID
Correspondence author, cyin1@uic.edu

Preprint arXiv.org, March 25, 2020

Coronaviruses (CoVs) are a family of enveloped positive-strand RNA viruses infecting vertebrates,
named for the crown-like spikes on their surface. Coronavirus (CoV) belongs to the family Coron-
aviridae and the order Nidovirales. Coronavirus is widely spread in humans, other mammals, and
birds, and can cause diseases such as the respiratory, intestinal, liver, and nervous systems. Human
coronaviruses (HCoVs) were first identified in the mid-1960s. Seven common HCovs are CoV-229E
(alpha coronavirus), CoV-NL63 (alpha coronavirus), CoV-OC43 (beta coronavirus), CoV-HKU1 (beta
coronavirus), Severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory
syndrome coronavirus (MERS-CoV), and current SARS-CoV-2. CoV-229E and CoV-OC43 are the
cause of the common cold in adults during the mid-1960s. Disease manifestations associated with
CoV-HKU1 and CoV-NL63 include the common cold and chronic pneumonia. Coronavirus-HKU1
has been predominantly reported in children in the United States but less common among adults.
Three highly pathogenic coronaviruses, SARS-CoV, MERS-CoV, and SARS-CoV-2, which emerged
in 2002, 2012, and 2019, respectively, have caused severe respiratory disease and thousands of deaths
worldwide (Chen, 2020).
SARS-CoV-2 coronavirus harbors a linear single-stranded positive RNA genome. The coronavirus
SARS-CoV-2 genome consists of a leader sequence, ORF1ab encoding proteins for RNA replication,
and genes for non-structural proteins (nps) and structural proteins. The genomic leader sequence
of about 265 bp is the unique characteristic in coronavirus replication and plays critical roles in the
gene expression of coronavirus during its discontinuous sub-genomic replication (Li et al., 2005).
ORF1ab encodes replicase polyproteins required for viral RNA replication and transcription (Chen
et al., 2020). Expression of the C-proximal portion of ORF1ab requires (–1) ribosomal frame-shifting.
The first non-structural protein (nsp) encoded by ORF1ab is Papain-like proteinase (PL proteinase,
nps3). Nsp3 is an essential and largest component of the replication and transcription complex. The
PL proteinase in nsp3 cleaves nsps 1-3 and blocks host innate immune response, promoting cytokine
expression (Serrano et al., 2009; Lei et al., 2018). Nsp4 encoded in ORF1ab is responsible for
forming double-membrane vesicle (DMV). The other nsp are 3CLPro protease (3-chymotrypsin-like
proteinase, 3CLpro) and nsp6. 3CLPro protease is essential for RNA replication. The 3CLPro
proteinase is accountable for processing the C-terminus of nsp4 through nsp16 in all coronaviruses
(Anand et al., 2003). Therefore, conserved structure and catalytic sites of 3CLpro may serve as
attractive targets for antiviral drugs (Kim et al., 2012). Together, nsp3, nsp4, and nsp6 can induce
DMV (Angelini et al., 2013).
SARS-coronavirus RNA replication is unique, involving two RNA-dependent RNA polymerases
(RNA pol). The first RNA polymerase is primer-dependent non-structural protein 12 (nsp12), and the
second RNA polymerase is nsp8. In contrast to nsp12, nsp8 has the primase capacity for de novo
replication initiation without primers (Te Velthuis et al., 2012). Nsp7 and nsp8 are important in the
replication and transcription of SARS-CoV-2. The SARS-coronavirus nsp7 and nsp8 complex is a
multimeric RNA polymerase for both de novo initiation and primer extension (Prentice et al., 2004;
Te Velthuis et al., 2012). Nsp8 also interacts with ORF6 accessory protein. Nsp9 replicase protein of
SARS-coronavirus binds RNA and interacts with nsp8 for its functions (Sutton et al., 2004).
Furthermore, the SARS-CoV-2 genome encodes four structural proteins. The structural proteins
possess much higher immunogenicity for T cell responses than the non-structural proteins (Li et al.,
2008). The structural proteins are involved in various viral processes, including virus particle
formation. The structural proteins include spike (S), envelope (E), membrane protein (M), and
nucleoprotein (N), which are common to all coronaviruses (Marra et al., 2003; Ruan et al., 2003).
The spike S protein is a glycoprotein, which has two domains S1 and S2. Spike protein S1 attaches
the virion to the cell membrane by interacting with host receptor ACE2, initiating the infection
(Wong et al., 2004). After the internalization of the virus into the endosomes of the host cells, the S
glycoprotein is induced by conformation changes. The S protein is then cleaved by cathepsin CTSL,
and unmasked the fusion peptide of S2, therefore, activating membranes fusion within endosomes.
Spike protein domain S2 mediates fusion of the virion and cellular membranes by acting as a class
I viral fusion protein. Especially, the spike glycoprotein of coronavirus SARS-CoV-2 contains a
furin-like cleavage site (Coutard et al., 2020). The furin recognition site is important for being
recognized by pyrolysis and therefore, contributing to the zoonotic infection of the virus. The
envelope (E) protein interacts with membrane protein M in the budding compartment of the host
cell. The M protein holds dominant cellular immunogenicity (Liu et al., 2010). Nucleoprotein
(ORF9a) packages the positive-strand viral RNA genome into a helical ribonucleocapsid (RNP)
during virion assembly through its interactions with the viral genome and membrane protein M (He

2
et al., 2004). Nucleoprotein plays an important role in enhancing the efficiency of subgenomic viral
RNA transcription as well as viral replication.
The increasing epidemiological and clinical evidence implicates that the SARS-CoV-2 has stronger
transmission power than SARS-CoV and lower pathogenicity (Guan et al., 2020). However, the
mechanism of high transmission of SARS-CoV-2 is unclear. DNA sequence comparisons using
single nucleotide polymorphisms (SNPs) are often used for evolutionary studies and can be especially
beneficial in recognizing the mutated coronavirus genomes, where high mutations can occur due to
an error-prone RNA-dependent RNA polymerase in genome replication.
To understand the virus evolution of SARS-CoV-2 from the genome mutation context, we establish
the SNP genotyping method and investigate the genotype changes during the transmission of SARS-
CoV-2. Our results show that the genotypes of the virus are not uniformly distributed among the
complete genomes of SARS-CoV-2. This genotyping study discovers a few highly frequent mutations
in the SARS-CoV-2 genomes. The highly frequent SNP mutations might be associated with the
changes in transmissibility and virulence of the virus. The mutations are located in the S protein, RNA
polymerase, RNA primase, and nucleoprotein, which are fundamental proteins for vaccine efficacy.
Therefore, the high-frequency SNP mutations are important factors when developing vaccines for
preventing the infection of SARS-CoV-2 coronavirus.

3 Methods and algorithms

3.1 Multiple sequence alignments (MSA)

Total 558 complete genome sequences of the SARS-CoV-2 strains from the infected individuals are
retrieved from the GISAID database (Shu and McCauley, 2017) as of March 23, 2020. Only the
complete genomes of high-coverage are included in the dataset. The countries and territories, which
are infected by SARS-CoV-2 and share the complete genomes of SARS-COV-2, are Australia (AU),
Belgium (BE), Brazil (BR), Canada (CA), Chile (CL), China (CN), Czech Republic (CZ), Denmark
(DK), England (UK), Finland (FI), France (FR), Georgia (GE), Germany (DE), Hong Kong (HK),
Hungary (HU), India (IN), Ireland (IE), Italy (IT), Japan (JP), Korea (KR), Kuwait (KW), Mexico
(MX), Netherlands (NL), New Zealand (NZ), Scotland (UK), Singapore (SG), Switzerland CH),
Sweden(SE), Taiwan (TW), Thailand (TH), United Kingdom (UK), Unites States (US), and Vietnam
(VN). The complete genome sequences are aligned with the reference genome of SARS-CoV-2 by
MSA tool Clustal Omega using the default parameters (Sievers and Higgins, 2014). The aligned
genomes are then re-positioned according to the reference SARS-CoV-2 genome (GenBank access
number: NC_045512.2).

3.2 SNP genotyping

The SNP mutations including nucleotide changes and the corresponding positions in a genome are
called an SNP profile. The SNP profiles of SARS-CoV-2 isolates are retrieved and parsed from the
aligned genomes according to the reference genome SARS-CoV-2. The SNP profile of the complete
genome of a virus can be considered as the genotype of the virus.

3.3 Jaccard distance of the SNP variants

The Jaccard similarity coefficient J(A, B) of two sets A and B is defined as the intersection size of
the two sets divided by the union size of two sets (Equation (1)) (Levandowsky and Winter, 1971).
|A ∩ B| |A ∩ B|
J(A, B) = = (1)
|A ∪ B| |A| + |B| − |A ∩ B|
The Jaccard distance is a metric on the collection of finite sets. The Jaccard distance dJ (A, B) of
two sets A and B is scored by the difference between 100% and the Jaccard similarity coefficient
(Equation (2)).
|A ∪ B| − |A ∩ B|
dJ (A, B) = 1 − J(A, B) = (2)
|A ∪ B|
The Jaccard distance measure of SNP variants takes account of the ordering of SNP mutations.
Therefore, the genetic distance of two genomes corresponds to the Jaccard distance of their SNP
variants. The Jaccard distance of SNP variants was adopted in the phylogenetic analysis of human or

3
bacterial genomes (Comas et al., 2009; Yu et al., 2017; Yin and Yau, 2019). In this study, we use the
Jaccard distance of the SNP mutations of virus genomes to measure the dissimilarity of virus isolates.

3.4 Transmission analysis of virus isolates by SNP genotyping

Because a mutation is rarely reversed, more SNPs in a virus occur along time. Let A and B represent
two SNP sets of the virus, if A is the subset of B, i.e., (A ∈ B, A 6= B), then B can be considered as
one of A’s descendants A, and A can be considered as the ancestor of B. To this end, we propose the
directed Jaccard distance DJ (A, B) of two SNP sets A and B as the measure of mutual relationship
(Equation (3)). Obviously, if B is a descendant of A, then DJ (A, B) is positive; otherwise, if A is a
descendant of B, DJ (A, B) is negative. In all the descendants of an SNP A, the closest descendant
is the one having the minimum DJ (A, B) of the A descendant sets.
|A ∪ B| − |A ∩ B|


 , ifA ∩ B ∼=A
 |A ∪ B|
DJ (A, B) = sgn(1 − J(A, B)) = (3)
 |A ∩ B| − |A ∪ B| , ifA ∩ B ∼

 =B
|A ∪ B|
For two SNP sets A and B, if A ∩ B 6= ∅, A 6⊂ B and B 6⊂ A, then the two viruses are relatives,
sharing common SNP mutations. If two SNP sets are neither descendant-ancestor nor relatives,
the corresponding two viruses are isolated mutants. Hence, the relevance of virus isolates can be
identified from the directed Jaccard measure on the SNP genotypes.
Though the source of SARS-CoV-2 varies, we still consider the virus samples were randomly collected
for sequencing. If a virus strain among all sequenced viruses has many descendants in the genome
set, we infer that this strain is conferred with high transmissibility. Therefore, the SNP mutations in
this strain are critical for increased transmissibility.
We calculate the directed Jaccard distances of the SNP mutations to identify the relationships of virus
strains, therefore, we may determine the virus transmission pattern. The pipeline for SNP genotyping
and analysis is described in Algorithm 1.

Input: The complete genomes of SARS-CoV-2 strains

Output: SNP genotypes of SARS-CoV-2 strains
Step:
1. Divide the complete genomes of SARS-CoV-2 strains into subsets based on the originating
territories.
2. Add the reference genome of SARS-CoV-2 to each subset of the complete genomes.
3. Perform multiple sequence alignments for each subset genomes using Clustal Omega.
4. Convert the alignment files to SNP profiles using the reference genome of SARS-CoV-2.
5. Merge the SNP profiles of all virus genomes.
6. Calculate the pairwise directed Jaccard distances of all the SNPs profiles.
7. Analyze the descendants, ancestors, and relative relationships of each SNP genotype from
the Jaccard distances.
Algorithm 1: SNP genotyping analysis of SARS-CoV-2.

3.5 Data and computer programs

The genomic analytics is performed using computer programs in Python and Biopython libraries
(Cock et al., 2009). The computer programs and the updated SNP profiles of SARS-CoV-2 isolates
are available upon requests.

4 Results
4.1 Genotyping SARS-CoV-2 coronavirus isolates from the globe

We retrieve the SNP genotypes of 442 SARS-CoV-2 strains in GISAID database from the globe. To
investigate the SNP distributions among all the virus isolates, we plot the SNP profiles of all the virus

4
isolates from the globe and compare the frequency of each SNP mutation in the virus sets. The results
show large mutation diversity in these virus isolates.
From the mutation frequency analysis, the mutations are due to the fact that RNA-dependent RNA
polymerase (RdRp) of RNA viruses lacks proofreading, however, the mutations are not equally
distributed. The SNP mutations can be single mutation and multiple mutations at a few fixed
positions. The impacts and roles that these SNP mutations have on the pathogenicity and transmission
ability of SARS-CoV remain to be determined by biochemical experiments. These divers mutations
might impact both transmissibility and pathogenicity of SARS-CoV-2.
The first common SNP mutation in the SARS-CoV-2 genome is in the leader sequence (241C>T), an
important genomic site for discontinuous sub-genomic replication. The leader sequence mutation
241C>T is co-evolved with three important mutations, 3037C>T, 14408C>T, and 23403A>G, which
result in amino acid mutations in nsp3 (synonymous mutation), RNA primase (P323L), and spike
glycoprotein (S protein, D614G), respectively. These three co-mutations (241C>T, 14408C>T, and
23403A>G) are in critical proteins for RNA replication (241C>T, 14408C>T) and the S protein
(23403A>G) for binding to ACE2 receptor. We observe that these four co-mutations are prevalent
in the virus isolates from Europe, where infections COVID-19 by SARS-CoV-2 are generally more
severe than other geographical regions. Combined, these four co-mutations probably can confer
increased transmissibility of the virus.
SARS-coronavirus RNA replication is unique, involving two RNA-dependent RNA polymerases
(RdRp). The first RNA polymerase is primer-dependent non-structural protein 12 (nsp12), whereas
the second RNA polymerase is nsp8. Nsp8 has the primase capacity for de novo initiation RNA
replication without primers (Te Velthuis et al., 2012). The most abundant SNP mutation in SARS-
CoV-2 isolates is (28144T>C) in nsp8 protein, in which amino acid leucine (L) is mutated to serine (S).
Our result is consistent with a previous study on 103 SARS-CoV-2 genomes in which SARS-CoV-2
virus is classified as S and L types by the two co-mutations (8782C>T and 28144T>C) (Zhang et al.,
2020).
The third abundant SNP mutation is (26144G>T) in nonstructural protein 3 (nsp3: G251V). The
protein nsp3 works with nsp4 and nsp6 to induce double-membrane vesicles (DMV), membrane
complex that acts as a platform for RNA replication and assembly (Angelini et al., 2013).
The significant SNP mutation (23403A>G) is located in the gene encoding spike glycoprotein (S
protein: D614G). The S protein in the SARS-CoV-2 virus is an important determinant of the host
range and pathogenicity. The S protein attaches the virion to the cell membrane by binding with
the host ACE2 receptor (Xiao et al., 2003). The mutation D614G is located in the putative S1–S2
junction region near the furin recognition site (R667) for the cleavage of S protein when the viron
enters or exists cells (Follis et al., 2006). However, the actual functional impact of this high-frequency
SNP mutation (23403A>G) in the S protein (D614G) is unclear. The affinity strength of the mutation
S protein (D614G) with the ACE2 receptor shall be further determined by biochemical experiments.
Especially, the SNP analytics result also shows that the primer independent RNA primase (nsp8)
contains more mutations than any other proteins (28144T>C, 28881G>A, 28881G>A, 28882G>A,
and 28883G>C). The RNA polymerase and primase mutations may confer resistance to mutagenic
nucleotide analogs via increased fidelity. The previous study indicated that a single mutation in RNA
polymerase can improve the replication fidelity in RNA virus (Pfeiffer and Kirkegaard, 2003). If
a mutation is lethal or reduces the transmission ability, the mutations may not be carried on or get
deceased. The SNP profiles demonstrate that the mutations in the envelope glycoprotein and RNA
polymerases predominate. Only the mutations in the S protein that have strongly binding to cell
ACE2 receptors while escape from immune system response can have chances to survive. Therefore,
these critical mutations are the results of natural selection in virus evolution.
In the SARS-CoV-2 strains found in the US, the nucleocapsid (N) protein gene has three mutations
(28881G>A, 28882G>A, and 28883G>C), The N protein of SARS-CoV is responsible for the
formation of the helical nucleocapsid during virion assembly. The N protein may cause an immune
response and has potential value in vaccine development (Zhao et al., 2005). These mutations shall
be considered when developing a vaccine using the N protein.

5
(a)

(b)
Figure 1: Distribution of SNP mutations of SARS-CoV-2 isolates from the globe. (a) The SNP
profiles of mutations in 442 SARS-CoV-2 isolates. (b) Frequencies of the single SNP mutations on
the genome. The nucleotide positions are on the reference genome of SARS-CoV-2.

6
Table 1: High-frequency single SNP genotypes in SARS-CoV-2.

SNP mutation protein mutation frequency

241C>T leader sequence 178
3037C>T synonymous mutation (nsp3, F105F) 182
8782C>T synonymous mutation (nsp4, S75S) 138
11083G>T nsp6, L37F 115
14408C>T RNA pol (nsp12, P323L) 182
17747C>T helicase, P504L 55
17858A>G helicase, Y541C 55
18060C>T synonymous mutation (3’-to-5’exonuclease, L6L) 62
23403A>G spike glycoprotein (S protein), D614G 183
26144G>T ORF3a, G251V 49
27046C>T membrane glycoprotein, T175M 33
28144T>C RNA primase (nsp8, L84S) 140
28881G>A nucleocapsid phosphoprotein (R203K) 74
28882G>A nucleocapsid phosphoprotein (R202R) 74
28883G>C nucleocapsid phosphoprotein (G204R) 74
Note: The SNP mutation positions are on the reference genome. Nucleotide T represents nucleotide
U in SARS-CoV-2 RNA virus genome. The frequencies of mutations are computed from total 558
SARS-CoV-2 strains.

Table 2: Co-mutations with high descendants in SARS-CoV-2.

SNP co-mutations proteins descendants

8782C>T, 28144T>C, 18060C>T>C RNA pol (nsp8) 54
241C>T, 3037C>T, 23403A>G, 28144T>C, S protein, RNA pol (nsp8) 82
241C>T, 3037C>T, 14408C>T, 23403A>G RNA primase (nsp12), S protein 81
Note: The SNP mutation positions are on the reference genome. Nucleotide T represents nucleotide
U in SARS-CoV-2 RNA genome. The frequencies of mutations are computed from total 558 SARS-
CoV-2 strains.

4.2 Evolution of SARS-CoV-2 coronavirus by genotyping

To spread, a pathogen virus must multiply within the host to ensure transmission, while simultaneously
avoiding host morbidity or death. Therefore, during the evolution of a virus, the transmissibility
of the virus is usually increased, whereas the pathogenicity becomes reduced (Alizon et al., 2009).
From the SNP profiles of SARS-CoV-2 strain, high-frequency mutations predominate in the virus
isolations, therefore, these high-frequency mutations probably contribute to increased transmissibility.
In addition, these high-frequency mutations are associated with different critical proteins. We analyze
and trace the SNP profiles from 442 SARS-CoV-2 strains which have at least 10 descendants. The
result suggests a number of high-frequency mutations that are associated with different critical
proteins. The results show that the SNP distribution is not random but is predominated at some
positions and then have more descendants. These high-frequency mutations may confer a high
transmissibility of the virus (Table 2). If we exclude the leader sequence mutation and the synonymous
mutations (3037C>T, 8782C>T, 18060C>T), we classify the SNP mutations into four major groups
based on the impacted proteins (Fig. 2.). (1) single mutation in nsp6 (11083G>T) (Fig.2(a)), (2)
single mutation in ORF3a (26144G>T) (Fig.2(b)), (3) single mutation in RNA polymerase (nsp8)
(8782C>T, 28144T>C) (Fig.2(c)), and (4) double mutations in S-protein and RNA polymerase:
(241C>T, 3037C>T, 14408C>T, 23403A>G ) (Fig.2(d)). These strains in one group are derived from
the same ancestor stain in that group according to their SNP profiles.
The result shows that most SNP mutations in SARS-CoV-2 isolates in China and some from Europe
and USA are located at two positions (8782C>T, 28144T->C) (Fig.2(c)). Later on this strain was
mutated at new position (8782C>T, 28144T>C, 18060C>T). These mutations are from the early
phase of the strain.

7
(a) (b)

(c) (d)
Figure 2: The SNP profiles of four major genotypes. (a) Genotype I (11083G>T), (b) Genotype
II: (26144G>T), (c) Genotype III (8782C>T, 28144T>C), (d) Genotype IV (241C>T, 3037C>T,
14408C>T, 23403A>G). The strains in a genotype group originate from the same ancestor. The
strains from the same region are marked in the same color.

The important and prevalent co-mutations (241C>T, 3037C>T, 23403A>G) occurred mostly in
SARS-CoV-2 isolates in Europe countries. This strain then has additional extended mutations at
positions (241C>T, 3037C>T, 14408C>T, 23403A>G) (Fig.2(d)). The impacted critical proteins are
NA pol (nsp8), RNA primase (nsp12), and the S protein. Most of the strains are found in Europe
countries (Fig.2(d)). Italy is being heavily infected by SARS-CoV-2 with 59, 138 confirmed cases
and 5, 476 deaths as of March 23, 2020 (WHO, 2020). These critical mutations probably may be
correlated with the severe infections in Europe.
From the SNP profiles of the viruses across the globe from a different time, we may estimate that one
mutation can occur in one generation. For example, in USA (IL) two consecutive infection cases
(US|IL1|EPI_ISL_404253|2020-01-21,US|IL2|EPI_ISL_410045|2020-01-28), the virus increased
one mutation (28854C>Y) between two same community members. Over the length of its 30kb
genome, SARS-CoV-2 may accumulate mutations ranging from single mutation to 14 mutations
(NL|EPI_ISL_413591|2020-03-02), as seen from December 2019 to March 23, 2020. Therefore, we
may estimate that the transmission of SARS-CoV-2 has reached 14 generations since its first infection
to humans in December 2019.

8
Besides the SNPs mutations, we also observed a few deletion or insertion mutations in SARS-CoV-2
isolates. The deletion-insertion mutations do not happen often, however, whether these deletion and
insertion mutations can spread is unknown from the limited genome data.

5 Discussions
Our study has a few notable limitations due to the nature of the genome data. Because the sample
collection dates may not reflect the actual infection date so the transmission path analysis is only
approximate. Caution should be exercised on the genotyping analytics because some countries have
not sequenced enough virus samples, the frequencies of the genotype groups may be unbalanced
due to the unavailability of complete genomes in some countries and regions. Whether any of these
common SNP mutations will result in biological and clinical differences remains to be determined.
In this study, we use the complete genomes of SARS-CoV-2 for SNP genotype calling. However, in
an emergency time, the complete genomes may not be available for SNP genotyping. In this case,
the SNP variant calling process may directly use the raw NGS reads (Yin and Yau, 2019). The SNP
variants then can be obtained by mapping the NGS reads to the reference genome by BWA alignments
(Li, 2013), followed by GATK variant calling (McKenna et al., 2010).

6 Conclusion
The SARS-CoV-2 epidemic has caused a substantial health emergency and economic stress in the
world. Therefore, understanding the nature of this virus and deriving methods to monitor the spread
of virus in the epidemic are critical in disease control. Our results show several molecular facets of
the SARS-CoV-2 pertinent to this epidemic. The discovery of genotypes linked to geographic and
temporal clusters of infectious suggests that genome SNP signatures can be used to track and monitor
the epidemic.
Rapid detection of different genotypes of SARS-CoV-2 are important for an efficient response to the
COVID-19 outbreak Discriminating and relating viral isolates can be useful in genetic epidemiology.
Determining the origin and monitoring the transmission pattern of the pathogenic agents are critical
to controlling the outbreak. In this work, the SNP genotyping of SARS-CoV-2 was developed by
adapting fast MSA of the complete genomes of SARS-CoV-2 and SNP analytics using the directed
Jaccard distance of the SNP profiles. The genotyping analysis provides insights on the frequent
mutations that confer fast transmissibility of the virus. The major mutations are in the critical
proteins, including the S protein, RNA polymerase, RNA primase, and nucleoprotein. Therefore,
these high-frequency SNP mutation sites must be considered when designing a vaccine for preventing
the infection of SARS-CoV-2.

7 Abbreviations
• COVID-19: coronavirus disease 2019
• DMV: double-membrane vesicle
• GATK: the genome analysis toolkit
• MSA: multiple sequence alignment
• NGS: next generation sequencing
• SARS: severe acute respiratory syndrome
• SARS-CoV-2: severe acute respiratory syndrome coronavirus 2
• SNP: single nucleotide polymorphisms
• WHO: the world health organization

8 Acknowledgments
We sincerely appreciate the researchers worldwide who sequenced and shared the complete genome
data of SARS-CoV-2 and other coronaviruses from GISAID (https://www.gisaid.org/). This research
is dependent on these precious data.

9
References
Alizon, S., Hurford, A., Mideo, N., Van Baalen, M., 2009. Virulence evolution and the trade-off
hypothesis: history, current state of affairs and the future. Journal of Evolutionary Biology 22,
245–259.
Anand, K., Ziebuhr, J., Wadhwani, P., Mesters, J.R., Hilgenfeld, R., 2003. Coronavirus main
proteinase (3CLpro) structure: basis for design of anti-SARS drugs. Science 300, 1763–1767.
Angelini, M.M., Akhlaghpour, M., Neuman, B.W., Buchmeier, M.J., 2013. Severe acute respiratory
syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles. MBio
4, e00524–13.
Chen, J., 2020. Pathogenicity and transmissibility of 2019-nCoV — a quick overview and comparison
with other emerging viruses. Microbes and Infection 22, 69–71.
Chen, Y., Liu, Q., Guo, D., 2020. Emerging coronaviruses: genome structure, replication, and
pathogenesis. Journal of Medical Virology 92, 418–423.
Cock, P.J., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T.,
Kauff, F., Wilczynski, B., et al., 2009. Biopython: freely available python tools for computational
molecular biology and bioinformatics. Bioinformatics 25, 1422–1423.
Comas, I., Homolka, S., Niemann, S., Gagneux, S., 2009. Genotyping of genetically monomorphic
bacteria: DNA sequencing in mycobacterium tuberculosis highlights the limitations of current
methodologies. PloS One 4, e7815.
Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N., Decroly, E., 2020. The spike
glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV
of the same clade. Antiviral Research 176, 104742.
Follis, K.E., York, J., Nunberg, J.H., 2006. Furin cleavage of the SARS coronavirus spike glycoprotein
enhances cell–cell fusion but does not affect virion entry. Virology 350, 358–369.
Gorbalenya, A., et al., 2020. The species severe acute respiratory syndrome-related coronavirus:
classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology .
Guan, W.j., Ni, Z.y., Hu, Y., Liang, W.h., Ou, C.q., He, J.x., Liu, L., Shan, H., Lei, C.l., Hui, D.S.,
et al., 2020. Clinical characteristics of coronavirus disease 2019 in China. New England Journal of
Medicine .
He, R., Leeson, A., Ballantine, M., Andonov, A., Baker, L., Dobie, F., Li, Y., Bastien, N., Feld-
mann, H., Strocher, U., et al., 2004. Characterization of protein–protein interactions between
the nucleocapsid protein and membrane protein of the SARS coronavirus. Virus Research 105,
121–125.
Kim, Y., Lovell, S., Tiew, K.C., Mandadapu, S.R., Alliston, K.R., Battaile, K.P., Groutas, W.C.,
Chang, K.O., 2012. Broad-spectrum antivirals against 3C or 3C-like proteases of picornaviruses,
noroviruses, and coronaviruses. Journal of Virology 86, 11754–11762.
Lei, J., Kusov, Y., Hilgenfeld, R., 2018. Nsp3 of coronaviruses: Structures and functions of a large
multi-domain protein. Antiviral Research 149, 58–74.
Levandowsky, M., Winter, D., 1971. Distance between sets. Nature 234, 34.
Li, C.K.f., Wu, H., Yan, H., Ma, S., Wang, L., Zhang, M., Tang, X., Temperton, N.J., Weiss, R.A.,
Brenchley, J.M., et al., 2008. T cell responses to whole SARS coronavirus in humans. The Journal
of Immunology 181, 5490–5500.
Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.
arXiv preprint arXiv:1303.3997 .
Li, T., Zhang, Y., Fu, L., Yu, C., Li, X., Li, Y., Zhang, X., Rong, Z., Wang, Y., Ning, H., et al., 2005.
siRNA targeting the leader sequence of SARS-CoV inhibits virus replication. Gene Therapy 12,
751–761.

10
Liu, J., Sun, Y., Qi, J., Chu, F., Wu, H., Gao, F., Li, T., Yan, J., Gao, G.F., 2010. The membrane protein
of severe acute respiratory syndrome coronavirus acts as a dominant immunogen revealed by a
clustering region of novel functionally and structurally defined cytotoxic T-lymphocyte epitopes.
Journal of Infectious Diseases 202, 1171–1180.
Marra, M.A., Jones, S.J., Astell, C.R., Holt, R.A., Brooks-Wilson, A., Butterfield, Y.S., Khattra, J.,
Asano, J.K., Barber, S.A., Chan, S.Y., et al., 2003. The genome sequence of the SARS-associated
coronavirus. Science 300, 1399–1404.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K.,
Altshuler, D., Gabriel, S., Daly, M., et al., 2010. The Genome Analysis Toolkit: a MapReduce
framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303.
Pfeiffer, J.K., Kirkegaard, K., 2003. A single mutation in poliovirus RNA-dependent RNA polymerase
confers resistance to mutagenic nucleotide analogs via increased fidelity. Proceedings of the
National Academy of Sciences 100, 7289–7294.
Prentice, E., McAuliffe, J., Lu, X., Subbarao, K., Denison, M.R., 2004. Identification and characteri-
zation of severe acute respiratory syndrome coronavirus replicase proteins. Journal of Virology 78,
9977–9986.
Ruan, Y., Wei, C.L., Ling, A.E., Vega, V.B., Thoreau, H., Thoe, S.Y.S., Chia, J.M., Ng, P., Chiu, K.P.,
Lim, L., et al., 2003. Comparative full-length genome sequence analysis of 14 SARS coronavirus
isolates and common mutations associated with putative origins of infection. The Lancet 361,
1779–1785.
Serrano, P., Johnson, M.A., Chatterjee, A., Neuman, B.W., Joseph, J.S., Buchmeier, M.J., Kuhn, P.,
Wüthrich, K., 2009. Nuclear magnetic resonance structure of the nucleic acid-binding domain of
severe acute respiratory syndrome coronavirus nonstructural protein 3. Journal of Virology 83,
12998–13008.
Shu, Y., McCauley, J., 2017. GISAID: Global initiative on sharing all influenza data–from vision to
reality. Eurosurveillance 22.
Sievers, F., Higgins, D.G., 2014. Clustal Omega, accurate alignment of very large numbers of
sequences, in: Multiple sequence alignment methods. Springer, pp. 105–116.
Sutton, G., Fry, E., Carter, L., Sainsbury, S., Walter, T., Nettleship, J., Berrow, N., Owens, R., Gilbert,
R., Davidson, A., et al., 2004. The nsp9 replicase protein of SARS-coronavirus, structure and
functional insights. Structure 12, 341–353.
Te Velthuis, A.J., van den Worm, S.H., Snijder, E.J., 2012. The SARS-coronavirus nsp7+ nsp8
complex is a unique multimeric RNA polymerase capable of both de novo initiation and primer
extension. Nucleic Acids Research 40, 1737–1747.
WHO, 2020. Coronavirus disease 2019 (COVID-19) situation report – 63. Coronavirus Disease
(COVID-2019) Situation Reports 00, 00–00.
Wong, S.K., Li, W., Moore, M.J., Choe, H., Farzan, M., 2004. A 193-amino acid fragment of
the SARS coronavirus S protein efficiently binds angiotensin-converting enzyme 2. Journal of
Biological Chemistry 279, 3197–3201.
Xiao, X., Chakraborti, S., Dimitrov, A.S., Gramatikoff, K., Dimitrov, D.S., 2003. The sars-cov s
glycoprotein: expression and functional characterization. Biochemical and Biophysical Research
Communications 312, 1159–1164.
Yin, C., Yau, S.S.T., 2019. Whole genome single nucleotide polymorphism genotyping of staphylo-
coccus aureus. Communications in Information and Systems 19, 57–80.
Yu, C., Baune, B.T., Licinio, J., Wong, M.L., 2017. A novel strategy for clustering major depression
individuals using whole-genome sequencing variant data. Scientific Reports 7, 44389.
Zhang, L., Shen, F.m., Chen, F., Lin, Z., 2020. Origin and evolution of the 2019 novel coronavirus.
Clinical Infectious Diseases .

11
Zhao, P., Cao, J., Zhao, L.J., Qin, Z.L., Ke, J.S., Pan, W., Ren, H., Yu, J.G., Qi, Z.T., 2005. Immune
responses against SARS-coronavirus nucleocapsid protein induced by DNA vaccine. Virology
331, 128–135.

Genotyping Coronavirus SARS-CoV-2 - Methods and Implications - ScienceDirect
No ratings yet
Genotyping Coronavirus SARS-CoV-2 - Methods and Implications - ScienceDirect
26 pages
2020.04.20.049924v1.full SARS-coronavirus-2 Replication in Vero E6 Cells Replication Kinetics, Rapid Adaptation and Cytopathology PDF
No ratings yet
2020.04.20.049924v1.full SARS-coronavirus-2 Replication in Vero E6 Cells Replication Kinetics, Rapid Adaptation and Cytopathology PDF
40 pages
SARS-CoV-2 Immune Epitope Mapping
No ratings yet
SARS-CoV-2 Immune Epitope Mapping
20 pages
Coronavirus Sars-Cov-2: Analysis of Subgenomic Mrna Transcription, 3clpro and Pl2Pro Protease Cleavage Sites and Protein Synthesis
No ratings yet
Coronavirus Sars-Cov-2: Analysis of Subgenomic Mrna Transcription, 3clpro and Pl2Pro Protease Cleavage Sites and Protein Synthesis
32 pages
COVID-19 Project
No ratings yet
COVID-19 Project
10 pages
Structural Proteins in Severe Acute Respiratory Syndrome Coronavirus-2
No ratings yet
Structural Proteins in Severe Acute Respiratory Syndrome Coronavirus-2
10 pages
COVID-19: Diagnosis and Biosensor Advances
No ratings yet
COVID-19: Diagnosis and Biosensor Advances
25 pages
Text To PDF
No ratings yet
Text To PDF
5 pages
COVID-19: SARS-CoV-2 Overview and Impact
No ratings yet
COVID-19: SARS-CoV-2 Overview and Impact
26 pages
Coronavirus
No ratings yet
Coronavirus
8 pages
Structural Proteins of Human Coronaviruses: What Makes Them Different? - PMC
No ratings yet
Structural Proteins of Human Coronaviruses: What Makes Them Different? - PMC
136 pages
Comparative Genomic Analysis of Rapidly Evolving SARS-CoV
No ratings yet
Comparative Genomic Analysis of Rapidly Evolving SARS-CoV
16 pages
SARS-CoV-2: Features and Therapeutics Review
No ratings yet
SARS-CoV-2: Features and Therapeutics Review
4 pages
Kumar 2020 Comparative Genomic Analysis of Rap
No ratings yet
Kumar 2020 Comparative Genomic Analysis of Rap
40 pages
SARS-CoV-2: Evolution and Impact
No ratings yet
SARS-CoV-2: Evolution and Impact
145 pages
COVID-19 Pandemic Case Study Analysis
No ratings yet
COVID-19 Pandemic Case Study Analysis
35 pages
COVID
No ratings yet
COVID
30 pages
Chapitre 4
No ratings yet
Chapitre 4
20 pages
Covid19 30 Page Work 1 (1) - 1
No ratings yet
Covid19 30 Page Work 1 (1) - 1
42 pages
Corrected Version Manuscript Rivas Et Al
No ratings yet
Corrected Version Manuscript Rivas Et Al
47 pages
Understanding SARS-CoV-2 and COVID-19
No ratings yet
Understanding SARS-CoV-2 and COVID-19
36 pages
Journal Pre-Proof: Microbes and Infection
No ratings yet
Journal Pre-Proof: Microbes and Infection
42 pages
KAHN Comparative Genome Analysis of Novel Coronavirus
No ratings yet
KAHN Comparative Genome Analysis of Novel Coronavirus
18 pages
An Overview of COVID-19: Review
No ratings yet
An Overview of COVID-19: Review
18 pages
Genotype and Phenotype of COVID-19
No ratings yet
Genotype and Phenotype of COVID-19
5 pages
COVID-19 Genome Sequencing Insights
No ratings yet
COVID-19 Genome Sequencing Insights
3 pages
COVID-19: Epidemiology and Clinical Insights
No ratings yet
COVID-19: Epidemiology and Clinical Insights
9 pages
Sars-Cov-2 Genetics: Key Findings For Public Health
No ratings yet
Sars-Cov-2 Genetics: Key Findings For Public Health
2 pages
SARS-CoV-2 Spike Protein Analysis
No ratings yet
SARS-CoV-2 Spike Protein Analysis
10 pages
Molecular Mechanisms and Pharmacological Interventions in The Replication Cycle of Human Coronaviruses
No ratings yet
Molecular Mechanisms and Pharmacological Interventions in The Replication Cycle of Human Coronaviruses
18 pages
SARS-CoV-2: Evolution and Impact
No ratings yet
SARS-CoV-2: Evolution and Impact
31 pages
Overview of COVID-19 and Coronaviruses
No ratings yet
Overview of COVID-19 and Coronaviruses
17 pages
Attitude, Knowledge, and Perception of Health Workers On Transmission and Prevention of Covid-19 in Federal Medical Centre, Birnin Kebbi, Krbbi State
No ratings yet
Attitude, Knowledge, and Perception of Health Workers On Transmission and Prevention of Covid-19 in Federal Medical Centre, Birnin Kebbi, Krbbi State
22 pages
Molecular Virology of SARS-CoV-2 and Related Coronaviruses Microbiology and Molecular Biology Reviews
No ratings yet
Molecular Virology of SARS-CoV-2 and Related Coronaviruses Microbiology and Molecular Biology Reviews
71 pages
COVID
No ratings yet
COVID
19 pages
Journal of Virology-2010-Graham-3134.full
No ratings yet
Journal of Virology-2010-Graham-3134.full
13 pages
SARS-CoV-2 Genomics and Reverse Genetics
No ratings yet
SARS-CoV-2 Genomics and Reverse Genetics
4 pages
Coronavirus Properties and SARS-CoV-2
No ratings yet
Coronavirus Properties and SARS-CoV-2
9 pages
SARS-CoV-2 Structure and Therapy Insights
No ratings yet
SARS-CoV-2 Structure and Therapy Insights
16 pages
Overview of SARS-CoV-2 and COVID-19
No ratings yet
Overview of SARS-CoV-2 and COVID-19
3 pages
COVID-19 Pandemic Overview and Origins
No ratings yet
COVID-19 Pandemic Overview and Origins
4 pages
Primer Design For Quantitative Real-Time PCR For The Emerging Coronavirus Sars-Cov-2
No ratings yet
Primer Design For Quantitative Real-Time PCR For The Emerging Coronavirus Sars-Cov-2
13 pages
COVID-19: Pathophysiology and Treatment
No ratings yet
COVID-19: Pathophysiology and Treatment
12 pages
SARS-CoV-2 Co-Infection Study Insights
No ratings yet
SARS-CoV-2 Co-Infection Study Insights
28 pages
The SARS-CoV-2 Genome, Its Variants and Their Various Way of Immunization
No ratings yet
The SARS-CoV-2 Genome, Its Variants and Their Various Way of Immunization
7 pages
Otro
No ratings yet
Otro
28 pages
Molecular Immune Pathogenesis and Diagnosis of COV PDF
No ratings yet
Molecular Immune Pathogenesis and Diagnosis of COV PDF
17 pages
SARS-CoV-2 NSPs and ER Stress Impact
No ratings yet
SARS-CoV-2 NSPs and ER Stress Impact
10 pages
COVID-19: Overview and Viral Structure
No ratings yet
COVID-19: Overview and Viral Structure
31 pages
Class 12 Project File On COVID 19
No ratings yet
Class 12 Project File On COVID 19
19 pages
COVID-19: Virus Mechanisms and Treatments
No ratings yet
COVID-19: Virus Mechanisms and Treatments
7 pages
Coronavirus Biology and SARS-CoV-2 Insights
No ratings yet
Coronavirus Biology and SARS-CoV-2 Insights
16 pages
Structure and Pathogenesis of Coronavirus
No ratings yet
Structure and Pathogenesis of Coronavirus
6 pages
COVID Mutation
No ratings yet
COVID Mutation
17 pages
COVID-19: Mechanisms and Impact
No ratings yet
COVID-19: Mechanisms and Impact
19 pages
The Emergence of SARS, MERS and Novel SARS 2 Coronaviruses in The 21st Century
No ratings yet
The Emergence of SARS, MERS and Novel SARS 2 Coronaviruses in The 21st Century
10 pages
HRE Assignment
No ratings yet
HRE Assignment
8 pages
Fractals & PostScript for Programmers
100% (1)
Fractals & PostScript for Programmers
30 pages
Epoch Times: Media Accuracy & Integrity
No ratings yet
Epoch Times: Media Accuracy & Integrity
28 pages
Rust Language Cheat Sheet 2023
No ratings yet
Rust Language Cheat Sheet 2023
9 pages
AV Linux 2020 User Manual
No ratings yet
AV Linux 2020 User Manual
128 pages
US Trade Deficit Rises Amid Record Drop in Exports: Quotes
No ratings yet
US Trade Deficit Rises Amid Record Drop in Exports: Quotes
28 pages
2018 Japanese Anime Industry Insights
No ratings yet
2018 Japanese Anime Industry Insights
8 pages
Rust Language Cheat Sheet 2023
No ratings yet
Rust Language Cheat Sheet 2023
9 pages
DPPM Unit - I
No ratings yet
DPPM Unit - I
16 pages
Pilot Exam Prep: Aircraft Knowledge
No ratings yet
Pilot Exam Prep: Aircraft Knowledge
36 pages
BOEM Guidelines for Site Investigations
No ratings yet
BOEM Guidelines for Site Investigations
30 pages
Chuck Arbors
No ratings yet
Chuck Arbors
6 pages
CH 5ethical - Considerations - in - Pricing
No ratings yet
CH 5ethical - Considerations - in - Pricing
21 pages
April 20 Paper 1 Insert
50% (10)
April 20 Paper 1 Insert
4 pages
Pi Network: Mobile Mining Revolution
No ratings yet
Pi Network: Mobile Mining Revolution
16 pages
SWB 6 - User Tech
No ratings yet
SWB 6 - User Tech
20 pages
EXceed Jumpchain
No ratings yet
EXceed Jumpchain
11 pages
Comprehensive Wellness for Guests
No ratings yet
Comprehensive Wellness for Guests
1 page
Baguio Building Permit Requirements Checklist
No ratings yet
Baguio Building Permit Requirements Checklist
1 page
UChicago Fall Protection Policy
No ratings yet
UChicago Fall Protection Policy
7 pages
Baby in Basket
No ratings yet
Baby in Basket
7 pages
I PU Quaterly Test - Sample QP
No ratings yet
I PU Quaterly Test - Sample QP
2 pages
Advanced Integrated Circuits: Carsten Wulff
No ratings yet
Advanced Integrated Circuits: Carsten Wulff
478 pages
Susan Whitcombs Resume Magic Bonus 600+ Action Verbs Sample Phrases PDF
100% (1)
Susan Whitcombs Resume Magic Bonus 600+ Action Verbs Sample Phrases PDF
25 pages
Ae
No ratings yet
Ae
8 pages
Duckworth/Lewis System Explained
No ratings yet
Duckworth/Lewis System Explained
3 pages
On-Column Method for Asphaltene Analysis
No ratings yet
On-Column Method for Asphaltene Analysis
3 pages
1 5a A Gossamercondordesignbrief
0% (1)
1 5a A Gossamercondordesignbrief
3 pages
Iveco Euro Cargo Tector 6 10t Repair Manual
98% (61)
Iveco Euro Cargo Tector 6 10t Repair Manual
20 pages
Link L8 U1 Reinforcement Ws
No ratings yet
Link L8 U1 Reinforcement Ws
1 page
Blue Design Resume-WPS Office
No ratings yet
Blue Design Resume-WPS Office
1 page
Roster Register Formate
No ratings yet
Roster Register Formate
2 pages
Film Review and English Exercises
No ratings yet
Film Review and English Exercises
4 pages
Data Processing and Analysis Overview
No ratings yet
Data Processing and Analysis Overview
4 pages
Operational Risk in Bank
100% (5)
Operational Risk in Bank
182 pages
Area and Primeter of Square Circle and Triangle
No ratings yet
Area and Primeter of Square Circle and Triangle
3 pages
Teachers Time Table-1
No ratings yet
Teachers Time Table-1
19 pages
Rohit Kumar Singh CV - Electrical Engineer
No ratings yet
Rohit Kumar Singh CV - Electrical Engineer
3 pages

Genotyping Methods for SARS-CoV-2

Uploaded by

Genotyping Methods for SARS-CoV-2

Uploaded by

Genotyping coronavirus SARS-CoV-2: methods and

Department of Mathematics, Statistics, and Computer Science

The emerging global infectious COVID-19 coronavirus disease by novel Severe

Preprint arXiv.org, March 25, 2020

3 Methods and algorithms

3.2 SNP genotyping

3.3 Jaccard distance of the SNP variants

3.4 Transmission analysis of virus isolates by SNP genotyping

Input: The complete genomes of SARS-CoV-2 strains

3.5 Data and computer programs

SNP mutation protein mutation frequency

Table 2: Co-mutations with high descendants in SARS-CoV-2.

SNP co-mutations proteins descendants

4.2 Evolution of SARS-CoV-2 coronavirus by genotyping

You might also like