0% found this document useful (0 votes)
26 views

2020 ImmuneEpitopeMapoftheReportedProtein

The document identifies B-cell epitopes of the reported protein sequences of SARS-CoV-2 using in silico analysis tools. It finds potential epitopes in several SARS-CoV-2 proteins including the polyprotein, surface glycoprotein, envelope protein, membrane glycoprotein, nucleocapsid phosphoprotein, ORF3, ORF7a, and ORF8. It observes high similarity between predicted SARS-CoV-2 epitopes and proteins of the SARS-CoV virus, but also identifies some unique epitopes in SARS-CoV-2 non-structural proteins 1 and 3 and surface glycoprotein. This analysis can help in developing epitope-based vaccines and diagnostics for SARS-CoV-2.

Uploaded by

vivitri.dewi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

2020 ImmuneEpitopeMapoftheReportedProtein

The document identifies B-cell epitopes of the reported protein sequences of SARS-CoV-2 using in silico analysis tools. It finds potential epitopes in several SARS-CoV-2 proteins including the polyprotein, surface glycoprotein, envelope protein, membrane glycoprotein, nucleocapsid phosphoprotein, ORF3, ORF7a, and ORF8. It observes high similarity between predicted SARS-CoV-2 epitopes and proteins of the SARS-CoV virus, but also identifies some unique epitopes in SARS-CoV-2 non-structural proteins 1 and 3 and surface glycoprotein. This analysis can help in developing epitope-based vaccines and diagnostics for SARS-CoV-2.

Uploaded by

vivitri.dewi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340137481

Immune Epitope Map of the Reported Protein Sequences of Severe Acute


Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) CURRENT STATUS: POSTED
SUBJECT AREAS

Preprint · March 2020


DOI: 10.21203/rs.3.rs-18689/v1

CITATIONS READS

0 71

3 authors, including:

Leonardo Guevarra Jr.


University of Santo Tomas
4 PUBLICATIONS   4 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Development of Immunodiagnostics for Colorectal Cancer View project

Fourier Transform Infrared (FTIR) Spectroscopy Based Diagnostic Methods for Cancer View project

All content following this page was uploaded by Leonardo Guevarra Jr. on 25 March 2020.

The user has requested enhancement of the downloaded file.


Preprint: Please note that this article has not completed peer review.

Immune Epitope Map of the Reported Protein


Sequences of Severe Acute Respiratory Syndrome
Coronavirus 2 (SARS-CoV-2)
CURRENT STATUS: POSTED

Leonardo A. Guevarra Jr.


University of Santo Tomas

[email protected] Author

Gianne Eduard L. Ulanday


National Institutes of Health - University of the Philippines Manila

ORCiD: https://orcid.org/0000-0002-1093-9287

DOI:
10.21203/rs.3.rs-18689/v1
SUBJECT AREAS
Virology Bioinformatics
KEYWORDS
coronavirus, SARS-CoV-2, novel coronavirus, 2019-nCoV, epitope map,
COVID-19, severe acute respiratory syndrome coronavirus 2

1
Abstract

Identifying immunogenic sequences of the severe acute respiratory syndrome coronavirus

2 (SARS-CoV-2) proteins is important in developing epitope-based vaccine and

diagnostics. This step is critical in designing potent vaccines and highly specific diagnostic

tools which can help prevent the spread of this disease.

In this study, we identified, using in silico analysis tools, immunogenic epitopes of the

reported sequences of SARS-CoV-2 proteins and determined similar sequences with known

viral proteins. The amino acid sequences of the SARS-CoV-2 proteins were acquired from

the National Center for Biotechnology Information (NCBI) database. B-cell epitope

prediction was done using in silico analysis tools available at the Immune Epitope

Database and Analysis Resources (IEDB). Blastp was performed on the identified

immunogenic sequences to determine similarities with known viral proteins and deduce

possible locations in the coronavirus.

We were able to identify B-cell epitopes of the SARS-CoV-2 polyprotein, surface

glycoprotein, envelop, membrane glycoprotein, nucleocapsid phosphoprotein, orf3, orf7a

and orf8. No epitope was identified in orf6 and orf10. High similarities of the predicted

immunogenic epitopes of the SARS-CoV-2 were observed with the 2003 SARS-

CoV. However, unique epitopes were identified in non-structural proteins (NSP) 1 and 3

and surface glycoprotein of the SARS-CoV-2.

Introduction

Coronaviruses are single-stranded, positive-sense RNA viruses which are classified into

four genera; namely, alpha, beta, delta, and gamma coronaviruses. The former two genera

primarily infect mammals, whereas the latter two primarily infect birds [1,2]. Its genome

is the largest among the RNA viruses and includes a variable number (around 6 to 11) of

2
open reading frames (orf). Coronavirus replication is somewhat unique wherein; it involves

ribosomal frameshifting or slippage and having a large replicase gene with an open

reading frame (orf1ab).. The replicase gene occupies around two-thirds of its genome and

encodes the 16 nonstructural proteins (NSPs). The remaining one-third of the genome

(~10kb) encodes for the structural and accessory proteins [1,3]. The main structural

proteins include the viral envelope-bound membrane protein (M), envelope protein (E) and

spike protein (S) and the RNA-bound nucleocapsid (N) [3,4]. A fifth structural protein, the

hemagglutinin esterase (HE), may be present but only among betacoronaviruses [5]. Aside

from the structural proteins, its gene encodes 16 non-structural proteins which are

responsible either in viral gene replication, protein scaffold formation, proteolytic

maturation of proteins, and protection from host’s immune response [6].

Until recently, there were six coronaviruses (CoVs) known to infect humans; HCoV–229E,

HCoV-OC43, HCoV-NL63, HCoV-HKU1, SARS-CoV and MERS-CoV which evolved between

1960 and 2015 [7]. By the end of 2019, however, a new coronavirus was detected in China

among individuals suffering from acute respiratory distress [8]. From the initial cases

identified to have links with the Huanan seafood and wildlife market in Wuhan City at the

Hubei Province in Central China, this zoonotic emerging infection, has now reached 25

countries in Asia, North America, Europe, and Australia [9,10]. The exact source of

exposure leading to this event is still under investigation.

Researchers worldwide rushed to sequence the viral genome to aid state authorities in

building their diagnostic and rapid containment capabilities. This emerging threat has

caused an unprecedented alarm among states and was immediately recognized by the

World Health Organization (WHO) as a Public Health Emergency of International Concern

[9,11]. As of 15 March 2020, the global confirmed cases of coronavirus disease 2019

(COVID–19) has already reached more than 153 thousand cases and has claimed 5,375

3
lives [13].

Coronaviruses have been notoriously implicated in recent high-profile, cross-border

outbreaks affecting human populations. Phylogenetic studies of these viral family suggest

a high capacity for transmission across species barriers having been found in bats, pigs,

camels, and humans. The increasing frequency of its genetic recombination coupled with

profound human-animal interface activities leads to higher probabilities of zoonotic

spillover events [13–15]. The emergence of novel pathogens, such as the SARS-CoV–2,

poses a serious threat to human health of up to global proportions because of the

knowledge gaps on the pathogen causing the disease and the lack of pre-formed immunity

among individuals [16]. This knowledge gap, particularly on the molecular characteristics

of SARS-CoV–2, is a barrier in creating strategies in controlling the spread of the infection

including the development of rapid diagnostic devices and designing of vaccines [17].

Fortunately, bioinformatics tools such as epitope analysis resources and sequence identity

analysis tools can be exploited in identifying and mapping immunogenic sequences and

their possible locations in viral polyproteins [18,19].

In an effort to contribute to the existing knowledge gap on the identity and genomic

characteristics of the SARS-CoV–2, we aimed to identify, using in silico prediction tools, B-

cell epitopes of the of the SARS-CoV–2 which can serve as basis for future recombinant

engineering work and vaccine design studies. We also aim to determine similarities in the

identity of the in silico-predicted epitopes with other viral proteins found in public

databases, especially those which are closely related to SARS-CoV–2. Focus has been

established on SARS-related coronaviruses (SARS-CoV) and other significant members of

betacoronavirus as these were the apparent nearest relative of SARS-CoV–2 based on

current phylogenetic data.

Results

4
We were able to identify, using in silico epitope prediction, tools available in the Immune

Epitope Database and Analysis Resources (IEDB), potentially immunogenic epitopes of the

reported amino acid sequences of SARS-CoV–2 polyprotein, surface glycoprotein, orf3,

envelop protein, membrane glycoprotein, orf7a, o rf8, and nucleocapsid phosphoprotein.

For the polypeptide sequence of orf6 and orf10, none was found to be potentially

immunogenic, and all values are lower than the cut-off. Supplementary Table 1 and

Supplementary Table 2 presents the position, sequences, antigenicity, surface

accessibility, and hydrophilicity scores of the predicted epitopes.

The 10-mer peptide sequences with the highest antigenicity scores are both found in the

envelop protein. These sequences are located at positions 50—59 (SLVKPSFYVY) and 51—

60 (LVKPSFYVYS) which can actually form a single immunogenic epitope of the envelop

proteins of SARS-CoV–2. The sequence which has the highest surface accessibility score is

located at position 382—391 (LPQRQKKQQT) of the nucleocapsid phosphoprotein while the

sequence, which is predicted to be most hydrophilic, is located at position 237—246

(KGQQQGQT) also of the nucleocapsid phosphoprotein.

Combining continuous adjacent sequences of the predicted 10-mer epitopes generated

111 epitopes for the polyprotein, 22 for the surface glycoprotein, three for orf3, a single

11-mer epitope for the envelop protein, five for membrane glycoprotein, four for orf7a,

five for orf8, and six for the nucleocapsid phosphoprotein. These sequences are presented

in Table 1 and Table 2.

A high homology was observed in the predicted immunogenic epitopes of SARS-CoV–2 with

the proteins of the SARS-related coronavirus (SARS-CoV). The epitopes of SARS-CoV–2

polyproteins have homologous sequences with the non-structural proteins (NSP) 1, 3

(replicase and proteinase domains), 7 (replicase light chain), NSP 8 (replicase heavy

chain), NSP 9 (replicase), 10, 12, 13 (helicase), 14 (guanine-n7 methyltransferase) and 15.

5
This was also observed for the epitopes predicted for the reported sequences of SARS-

CoV–2 glycoprotein, envelop, and orf7a which were found to have homologous sequences

in the SARS-CoV spike glycoprotein, small envelop protein, and orf7a accessory protein,

respectively. Unique epitopes of were also found at positions 488—497 (VETVKGLDYK),

555—564 (AQNSVRVLQKA), 713—725 (SKGLYRKCVKSRE), 1006—1015 (VEVQPQLEME), 1045

—1054 (IVEEAKKVKP), 1048—1057 (EAKKVKPTVV), 1227—1236 (QDDKKIKACV), 2041—2051

(CEDLKPVSEEV), 2045—2056 (KPVSEEVVENPT), 2551—2564 (ESSAKSASVYYSQL) and 2655—

2665 (LKLSHQSDIEV) of the SARS-CoV–2 polyprotein and positions 44—53 (RSSVLHSTQD),

319—328 (RVQPTESIVR) and 321—330 (QPTESIVRFP) of the SARS-CoV–2 surface

glycoprotein.

Discussion

In the last two decades prior to the current SARS-CoV–2 outbreak, two coronaviruses

gained prominence due to its novelty, infectivity, and virulence - the Severe Acute

Respiratory Syndrome coronavirus (SARS-CoV) in 2002—2003 and the Middle East

Respiratory Syndrome coronavirus (MERS-CoV) in 2012 (Son et al., 2017). The lessons

learned in both epidemics are being applied by scientists around the world in the current

SARS-CoV–2 outbreak as evidenced by increased data transparency and broader

information sharing among stakeholders [20].

The availability of current technologies has also paved the way for a quicker response to

human diseases. Molecular biology-based technologies, including advancement of

sequencing methods, helped in the characterization of pathogens. Whole genome

sequences can be done with remarkable speed, accuracy, and depth of information [21].

In addition, bioinformatics tools and global genomic and proteomic databases haver aided

scientists worldwide in understanding molecular structures and characteristics, hence

developing strategies to control human diseases [22].

6
The application of computational methods in immunology, such as in silico epitope

prediction, enabled researchers to focus on and prioritize immune targets for

experimental epitope mapping, saving time and resources, which are crucial in providing

expedient epidemic containment and response [23–25]. In silico epitope mapping helped

researchers to expeditiously identify epitopes essential in rational vaccine design and

development of epitope-based diagnostic serological devices [26–27]. In this paper, we

present putative epitopes of SARS-CoV–2 proteins, including sequence similarities with

other viral proteins, which may potentially be used in the development of epitope-based

vaccine against this recent emerging infection. One of the findings presented in this paper

that may have impact in the disease control strategies is the high homology between the

immune epitopes of SARS-CoV–2 and the 2003 SARS-CoV which also originated in China.

We were able to identify high sequence homology between SARS-CoV–2 NSP1, NSP3, NSP7,

NSP8, NSP9, NSP10, NSP12, NSP13, NSP14, NSP15, and surface glycoprotein of the SARS-

CoV–2 with the corresponding proteins of SARS-CoV. This has been consistent with

previous reports on the phylogenetic relatedness of SARS-CoV–2 with SARS-CoV, although,

the highest genetic sequence similarity was observed with bat- derived SARS-like virus

(~88% genetic identity) which proves its zoonotic origin [2,28, 29]. These observations

may have possible implications on the therapeutic and surveillance strategies since

protein similarities in NSPs and surface glycoprotein between these two betacoronaviruses

may yield cross-protection between SARS-CoV–2 and SARS-CoV as previously observed in

cases of other human coronavirus infection; explain possible similarities in the mechanism

of infection, hence, treatment; and prevent the error of using SARS-CoV–2 and SARS-CoV

homologous epitopes in antibody-based detection which, in serological assays, have been

known to be the cause inability to correctly discriminate closely -related pathogens, thus,

decreased specificity of the serological test [30–32].

7
The protein with the greatest number of homologous epitopes with SARS-CoV, based on

the blastp performed, is the surface glycoprotein. Seventeen out of the 21 in silico-

predicted epitopes of the SARS-CoV–2 surface glycoprotein are at least 64% homologous

with the epitopes of the SARS-CoV spike glycoprotein. This observation is very important

to note since the surface glycoprotein is pathogenically and serologically important

because of its role in viral and host cell membrane fusion, hence, a good prospect as

epitope-based vaccine due to its ability to produce viral-neutralizing antibodies [33–34].

In the polyprotein, the portion which has the highest number of predicted epitopes is at

the putative position of the NSP3 protein located between amino acid position 920 to

2665. This portion also contains the most number (8 of 11) of SARS-CoV–2 unique epitopes

not only for the polyprotein but for all the reported proteins analyzed based on the blastp

analysis we performed. The finding is not surprising knowing that the NSP3 is the largest

nonstructural protein of CoVs and has been reported to be heavily involved in proteolytic

processing and polyprotein maturation. Furthermore, it was reported that NSP3 is involved

in multiple interactions with other NSPs providing cooperative enzymatic functions.

Surprisingly, the NSP3 is highly divergent among CoVs with mutations leading to

evolutionary adaptations specific to certain coronaviruses [35–37]. The NSP3/4

macrodomain and transmembrane units are also critical for the ability of coronaviruses to

evade the immune system. Experimental studies in both SARS-CoV and MERS-CoV revealed

that subunits of NSP3/4 induced the formation of double-membrane vesicles (DMVs), which

are specialized replicative organelles (ROs), that enhances viral RNA synthesis while

hiding double-stranded RNA from detection by the innate immune system [6,38,39]. A

study mentioned the detection of proteinases NSP3 and NSP5 in the mature virion along

with the structural proteins [40]. This phenomenon should be elucidated further as data

on NSP3 is relatively scarce compared to its structural counterparts.

8
On the other hand, the identified unique residues, especially for relevant proteins such as

the surface glycoprotein, can be further explored experimentally to confirm its feasibility

and uniqueness against other viruses, particularly, coronaviruses. During the SARS

outbreak, there was difficulty in identifying actual SARS cases from common cold viruses

based on serological tests as there was a high seroprevalence in the population of

antibodies against the common cold, aggravated by the presence of cross-reactive

antibodies against conserved coronavirus epitopes. Nevertheless, serological testing has

its advantage of detecting asymptomatic infections, monitoring disease progression and

study of post-infection transmission dynamics [25,41].

Methods

The amino acid sequence of the SARS-CoV–2 polyprotein (GenBank: QHO60603.1), surface

glycoprotein (GenBank: QHO60594.1), orf3 (QHO60595.1), envelop protein (QHO60596.1),

membrane glycoprotein (QHO60597.1), orf6 (QHO60598.1), orf7a (QHO60599.1), orf8

(QHO60560.1), nucleocapsid phosphoprotein (QHO60561.1) and orf10 (QHO60595.1) were

acquired from the National Center for Biotechnology Information (NCBI). The reported

Genpept sequences were used in the identification of the linear continuous B-cell

epitopes.

The criteria used in identifying putative immunogenic epitopes are antigenicity, surface

accessibility, and hydrophilicity based on known computational tools available at the

Immune Epitope Database and Analysis Resources (IEDB) [42–44]. The window size was

set to 10 amino acids. Antigenicity, surface accessibility, and hydrophilicity scores,

derived from IEDB analysis were compared to the computed cut-off value set by each

parameter. Peptides which score below the cut-off of at least one of the three parameters

were excluded. Adjacent predicted immunogenic sequences, which are positioned

continuously, were considered and reported as one epitope.

9
After epitope prediction, sequence homology of the predicted immunogenic epitope was

done to identify related viral proteins. Proteins reported to have similarity with the

predicted immunogenic epitope, its origin, and percent identity with the query sequence

were noted and reported. Putative amino acid positions in the SARS-CoV–2 were compared

with reference alignment of a bat coronavirus sequence (data not shown) and positions

reported in recent literature [18,19,29].

Declarations

Author’s Contributions

LAG – performed in silico epitope mapping, blastp analysis and preparation of manuscript

GLU – performed identification possible location of identified epitopes in coronavirus

proteins and preparation of manuscript

Competing Interests

The authors declare no competing interests.

References

1. Song, Z. et al. From SARS to MERS, thrusting coronaviruses into the spotlight.

Viruses. 11,59; https://doi.org/10.3390/v11010059 (2019)

2. Tang, J. W., Tambyah, P. A., & Hui, D. S. C. Emergence of a novel coronavirus causing

respiratory illness from Wuhan, China. J Infect.

https://doi.org/10.1016/j.jinf.2020.01.014 (2020)

3. Fehr, A., & Perlman, S. Coronaviruses: An Overview of Their Replication and

Pathogenesis. Methods Mol Biol, 1282, 1–282 (2015)

4. Li, F. Structure, Function, and Evolution of Coronavirus Spike Proteins. Annu Rev

10
Virol, 3(1), 237–261 (2016)

5. de Groot et al. Part II—The Positive Sense Single Stranded RNA Viruses Family

Coronaviridae. In Virus Taxonomy: Ninth report of the International Committee on

Taxonomy of Viruses (pp. 806–828) (2012)

6. Fehr, A. R. et al. The nsp3 Macrodomain Promotes Virulence in Mice with Coronavirus-

Induced Encephalitis. J Virol, 89(3), 1523–1536 (2015).

7. Sharmin, R., & Islam, A. B. M. M. K. Conserved antigenic sites between MERS-CoV and

Bat-coronavirus are revealed through sequence analysis. Source Code Biol Med,

11(1), 1–6. https://doi.org/10.1186/s13029–016–0049–7 (2016).

8. Chen, N. et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel

coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet,

395(10223), 507–513 (2020).

9. Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in

Wuhan, China. The Lancet, 6736(20), 1–10. https://doi.org/10.1016/S0140–

6736(20)30183–5 (2020).

10. Nishiura H. et al. The Extent of Transmission of Novel Coronavirus in Wuhan, China,

2020. J Clin Med, 9(2), 330. https://doi.org/10.3390/jcm9020330 (2020).

11. World Health Organization (WHO). Novel Coronavirus ( 2019-nCoV) Situation treport

16. WHO Bulletin, (JANUARY), 1–7 (2020).

12. World Health Organization (WHO). Coronavirus disease 2019 ( COVID–19) Situation

Report–55 (2020).

13. Cui, J., Li, F., & Shi, Z. L. Origin and evolution of pathogenic coronaviruses. Nat Rev

Microbiol, 17(3), 181–192 (2019).

14. Li, H. et al. Human-animal interactions and bat coronavirus spillover potential among

rural residents in Southern China. Biosafety and Health, 1(2), 84–90 (2019).

11
15. Zhu Z. et al. Predicting the receptor-binding domain usage of the coronavirus based

on kmer frequency on spike protein. Infect Genet Evol, 61, 183–184 (2018).

16. Nishiura, H., Linton, N. M., & Akhmetzhanov, A. R. Serial interval of novel coronavirus

(COVID–19) infections. Int J Infect Dis, 113332.

https://doi.org/10.1016/j.jelechem.2019.113332 (2020).

17. European Center for Disease Prevention and Control (ECDC). Outbreak of acute

respiratory syndrome associated with a novel coronavirus, China ; First cases

imported in the EU / EEA ; second update (2020).

18. Hassan, A. et al. Pangenome and immuno-proteomics analysis of Acinetobacter

baumannii strains revealed the core peptide vaccine targets. BMC Genomics,

17(732). https://doi.org/10.1186/s12864–016–2951–4 (2016).

19. Potocnakova, L., Bhide, M., & Pulzova, L. B. An Introduction to B-Cell Epitope Mapping

and in Silico Epitope Prediction. J Immunol Res,.

https://doi.org/10.1155/2016/6760830 (2016).

20. Liu, S. L., & Saif, L. Emerging Viruses without Borders: The Wuhan Coronavirus.

Viruses, 12(2), 9–10 (2020).

21. Heather, J. M., & Chain, B.. The sequence of sequencers: The history of sequencing

DNA. Genomics, 107(1). https://doi.org/10.1016/j.ygeno.2015.11.003 (2016)

22. Diniz, W. J. S., & Canduri, F. Bioinformatics: An overview and its applications. Genet

Mol Res, 16(1), https://doi.org/10.4238/gmr16019645 (2017).

23. Backert, L., & Kohlbacher, O. Immunoinformatics and epitope prediction in the age of

genomic medicine. Genome Med, 7(1), https://doi.org/10.1186/s13073–015–0245–0

(2015).

24. Kringeluma, J. V., Nielsena, M., Padkjærb, S. B., & Lunda, O. Structural analysis of B-

cell epitopes in antibody:protein complexes. Mol Immunol, 53(1–2), 24–34 (2013).

12
25. Meyer, B., Drosten, C., & Müller, M. A. Serological assays for emerging coronaviruses:

Challenges and pitfalls. Virus Res, 194, 175–183 (2014).

26. Hua, C. et al. Computationally-driven identification of antibody epitopes. ELife, 6,

https://doi.org/10.7554/eLife.29023 (2017).

27. Shey, R. A. et al. In-silico design of a multi-epitope vaccine candidate against

onchocerciasis and related filarial diseases. Scientific Reports, 9(1).

https://doi.org/10.1038/s41598–019–40833-x (2019)

28. Lu, R.et al. Genomic characterisation and epidemiology of 2019 novel coronavirus:

implications for virus origins and receptor binding. The Lancet, 6736(20),

https://doi.org/10.1016/S0140–6736(20)30251–8 (2020)

29. Wu, A. et al. Genome Composition and Divergence of the Novel Coronavirus (2019-

nCoV) Originating in China. Cell Host Microbe,

https://doi.org/10.1016/j.chom.2020.02.001 (2020)

30. Capriles, P. V. S. Z. et al. Structural modelling and comparative analysis of

homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo

sapiens: Putative drug targets for chagas’ disease treatment. BMC Genomics, 11(1),

https://doi.org/10.1186/1471–2164–11–610 (2010).

31. Chan, K. H. et al. Serological responses in patients with severe acute respiratory

syndrome coronavirus infection and cross-reactivity with human coronaviruses 229E,

OC43, and NL63. Clin Diag Lab Immunol, 12(11), 1317–1321 (2005).

32. Che, X. et al. Antigenic Cross‐Reactivity between Severe Acute Respiratory

Syndrome–Associated Coronavirus and Human Coronaviruses 229E and OC43. J Infect

Dis, 191(12), 2033–2037 (2005).

33. Guillen, J., Perez-Berna, A. J., Moreno, M. R., & Villalaın, J.. Identification of the

Membrane-Active Regions of the Severe Acute Respiratory Syndrome Coronavirus

13
Spike Membrane Glycoprotein Using a 16/18-Mer Peptide Scan: Implications for the

Viral Fusion Mechanism. J Virol, 11(8), 781–783 (2005).

34. Zhu Z. Potent cross-reactive neutralization of SARS coronavirus isolates by human

monoclonal antibodies. PNAS, 104(29), 12123–12128 (2007).

35. Forni, D. et al. Extensive Positive Selection Drives the Evolution of Nonstructural

Proteins in Lineage C Betacoronaviruses. J Virol, 90(7), 3627–3639 (2016).

36. Imbert, I. et al. The SARS-Coronavirus PLnc domain of nsp3 as a

replication/transcription scaffolding protein. Virus Res, 133(2), 136–148 (2008).

37. Lei, J., Kusov, Y., & Hilgenfeld, R. Nsp3 of coronaviruses: Structures and functions of

a large multi-domain protein. Antiviral Res, 149, 58–74 (2018).

38. Angelini, M., Akhlaghpour, M., Neuman, B. W., & Buchmeierc, M. J. Severe Acute

Respiratory Syndrome Coronavirus Nonstructural Proteins 3, 4, and 6 Induce Double-

Membrane Vesicles. MBio, 4(4), 165, https://doi.org/10.1128/mBio.00524–13 (2013).

39. Oudshoorn, D. et al. Expression and cleavage of middle east respiratory syndrome

coronavirus nsp3–4 polyprotein induce the formation of double-membrane vesicles

that mimic those associated with coronaviral RNA replication. MBio, 8(6),

https://doi.org/10.1128/mBio.01658–17 (2017).

40. Neuman, B. W. et al. Proteomics Analysis Unravels the Functional Repertoire of

Coronavirus Nonstructural Protein 3. J Virol, 82(11), 5279–5294 (2008).

41. Niedrig, M. et al. First external quality assurance of antibody diagnostic for SARS-new

coronavirus. J Clin Virol, 34(1), 22–25 (2005).

42. Kolaskar, S., & Tongaonkar, P. C. A semi-empirical method for prediction of antigenic

determininants on protein antigens. Febbs Lett, 276(1–2), 172–174 (1990).

43. Parker, J. M. R., Guo, D., & Hodges, R. S. New Hydrophilicity Scale Derived from High-

Performance Liquid Chromatography Peptide Retention Data: Correlation of Predicted

14
Surface Residues with Antigenicity and X-ray-Derived Accessible Sites. Biochemistry,

25(19), 5425–5432 (1986).

44. Vita, R. et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res,

47(D1), D339–D343. https://doi.org/10.1093/nar/gky1006 (2019)

Tables
Table 1. Peptide Sequence of Continuous Epitopes of SARS-CoV-2 Polyprotein and Similar
Proteins of Viral Origins as Identified by blastp Analysis
Putat Position SARS-CoV-2 Sequence of % Organism Source and Putative Posit
ive Epitope Subject Peptide H Protein of Related Sequence protein
prote Sequence o in SARS-
in in m CoV-2
SARS o
-CoV- l
2 o
g
y

NSP1 9-19 NEKTHVQLSLP ****HVQLSLP 64% SARS-CoV NSP1 NSP6/7

35-44 VEEVLSEARQ VEEALSEARE 80% SARS-CoV NSP1 NSP7

38-47 VLSEARQHLK LSEAREHLK 80% SARS-CoV NSP1

57-67 EKGVLPQLEQP EKGVLPQLEQP 100 SARS-CoV NSP1 NSP8


%
66-75 QPYVFIKRSD QPYVFIKRSD 100 SARS-CoV NSP1
%
NSP2 301-311 RSVYPVASPNE RSVYP****** 45% Human alphaherpesvirus
envelop glycoprotein
488-497 VETVKGLDYK - - -

495-504 DYKAFKQIVE **KAFMQVVE 60% Escherichia phage T7 RNA NSP9


polymerase
555-564 AQNSVRVLQKA - - -

618-627 GTVYEKLKPV *TVYDRLK** 50% Sindbis virus envelop


polyprotein
711-720 THSKGLYRKC THGKGHYR** 60% Rotavirus NSP2

713-725 SKGLYRKCVKSRE - - - NSP10

765-774 PLEQPTSEAV PIEQPT**** 50% Human alphaherpesvirus


capsid
794-803 KDTEKYCALA ***EKYCVL* 50% Influenza A(H5N1) virus RNA
Polymerase
NSP3 920-929 MYCSFYPPDED MYCSFYPPDE* 91% SARS-CoV NSP3 NSP12

1006- VEVQPQLEME - - -
1015
1045- IVEEAKKVKP - - -
1054

15
1048- EAKKVKPTVV - - -
1057
1180- KNLYDKLVSS KNLYDK**** 60% Influenza A(H5N2) virus
1189 hemagglutinin
1198- QVEQKIAEIP QIEDKIEEI* 60% SARS-CoV spike
1207 glycoprotein
1213- PFITESKPSV PFITDS**SV 70% Rotavirus A capsid protein
1222
1227- QDDKKIKACV - - -
1236
1338- TVEEAKTVLKKCKS TLEEAKTALKKCKS 87% SARS-CoV unique domain
1352 A A
1398- IVSTIQRKYK IMATIQRKYK 80% SARS-CoV NSP3 (Replicase)
1407
1423- YFYTSKTTVA FFYTSKEPVA 70% SARS-CoV NSP3 (Replicase)
1432
1425- YTSKTTVASL YTSKEPVAS 70% SARS-CoV NSP3 (Replicase)
1434
1528- LKRGDKSVYY LKRGDKIVY* 80% SARS-CoV NSP3 (Replicase)
1537
1628- RVEAFEYYHTT RSEAFEYYHT* 82% SARS-CoV NSP3 (Replicase)
1638
1691- NPPALQDAYYRA NAPALQEAYYRA 83% SARS-CoV NSP3 (Replicase) NSP13
1702
1774- LSYEQFKKGV **FEEFKKG* 50% Crimean-Congo
1783 Hemorrhagic Fever virus
nucleoprotein
1788- TCGKQATKYL *CGRDATQYL 60% SARS-CoV NSP3 (Proteinase)
1797
1790- GKQATKYLVQQES GRDATQYLVQQES 71% SARS-CoV NSP3 (Proteinase)
1803 P *
1808- SAPPAQYELKH SAPPAEYKL** 64% SARS-CoV NSP3 (Proteinase)
1818
1827- YTGNYQCGHY YTGNYQCGHY 100 SARS-CoV NSP3 (Proteinase)
1836 %
1873- YTTTIKPVTY YTTTIK**** 60% SARS-CoV NSP3 (Proteinase)
1882
1877- IKPVTYKLDGV *KPVTY***** 45% Rift valley river virus
1887 glycoprotein
1909- EQPIDLVPNQP EQPIDLVPTQP 91% SARS-CoV NSP3 (Replicase)
1919
1914- LVPNQPYPNA LVPTQPLPNA 80% SARS-CoV NSP3 (Replicase)
1923
1968- VAIDYKHYTP VAIDYRHY** 70% SARS-CoV NSP3 (Replicase)
1977
2041- CEDLKPVSEEV - - -
2051
2045- KPVSEEVVENPT - - -
2056
2483- PTDQSSYIVD **DQSWSYIVE 70% Influenza A(H6N1)
2492 hemagglutinin
2514- ERHSLSHFVN *****SHFVN 50% Human calcivirus capsid
2523
2551- ESSAKSASVYYSQL - - -
2564
2642- VDSDVETKDVV VNRDVQTSDV* 55% Feline calcivirus viral protein1 NSP14
2652 capsid
2655- LKLSHQSDIEV - - -

16
2665
NSP4 2792- PVHVMSKHTD *VHVMRK*** 50% FMD virus RNA polymerase
2801
2844- SYTNDKACPL SYTNNK**** 50% Influenza(H1N1)
2853 hemagglutinin
2926- SGKPVPYCYDTN ****VPYIYDT* 50% Sacbrood virus viral protein
2937 3
2942- SVAYESLRPD **AYDSLR** 50% Human herpes virus
2951 glycoprotein
2974- RVVTTFDSEY *VVTT*DISE* 70% DENV2 NS3 (RNA helicase)
2983
3244- SGSDVLYQPPQTS SGKDVFYLPPE** 62% DENV3 NS5 (Polymerase) NSP15
3256
3249- LYQPPQTSIT LYQPPTASVT 70% Murine coronavirus RNA
3258 polymerase
NSP5 3502- YEPLTQDHVDI YEPLTQDHVDI 100 SARS-CoV Proteinase(Main)
3512 %
NSP6 3719- YKVYYGNALDQ YK**YLGPGNSLDQ 64% H-1 parvovirus capsid NSP15/1
3729 6
3810- VYDYLVSTQE *YDYLV**** 50% Bombix mori cypovirus1
3819 RNA polymerase


Table 2. Peptide Sequence of Continuous Epitopes of SARS-CoV-2 Structural and orf
Proteins and Similar Proteins of Viral Origins as Identified by blastp Analysis

17
Protein Location Start SARS-CoV-2 Epitope Sequence of Subject % Homology Organism Source and
in Sequence Peptide Sequen
SARS-CoV-2
Surface 33-42 TRGVYYPDKV *RGVYYPD** 70% SARS-CoV spike glycopro
Glycoprotein 37-47 YYPDKVFRSSV YYPDEIFRS** 64% SARS-CoV spike glycopro
44-53 RSSVLHSTQD - - -
295-305 PLSETKCTLKS PLAELKCSVKS 64% SARS-CoV spike glycopro
319-328 RVQPTESIVR - - -
321-330 QPTESIVRFP - - -
364-373 DYSVLYNSAS DYSVLYNS** 80% SARS-CoV spike glycopro
491-500 PLQSYGFQPT ***SYGFQ** 50% Hepatitis C virus RNA po
501-510 NGVGYQPYRV *GIGYQPYRV 80% SARS-CoV spike glycopro
575-584 AVRDPQTLEI *VRDPKTSEI 70% SARS-CoV spike glycopro
776-787 KNTQEVFAQVKQ *NTREVFAQVKQ 91% SARS-CoV spike glycopro
783-793 AQVKQIYKTPP AQVKQMYKTP* 82% SARS-CoV spike glycopro
802-813 FSQILPDPSKPS FSQILPDPLKP* 91% SARS-CoV spike glycopro
911-920 VTQNVLYENQ VTQNVLYENQ 100% SARS-CoV spike glycopro
913-922 QNVLYENQKL QNVLYENQK* 90% SARS-CoV spike glycopro
947-957 KLQDVVNQNAQ KLQDVVNQNAQ 100% SARS-CoV spike glycopro
982-992 SRLDKVEAEVQ SRLDKVEAEVQ 100% SARS-CoV spike glycopro
1002- QSLQTYVTQQ
1011 QSLQTYVTQQ 100% SARS-CoV spike glycopro
1064- HVTYVPSQERN
1074 HVTYVPAQEKN 82% SARS-CoV spike glycopro
1133- INNTVYDPLQ
1142 VNNTVYDPLQ 90% SARS-CoV spike glycopro
1260- VYDPLQPELD
1269 DSEPVLKGVK 100% SARS-CoV spike glycopro
orf3 132-141 KCRSKNPLLY *****NPLLY 50% Human betaherpesvirus
202-212 VLHSYFTSDYY *****FTSDY* 45% Cowpox virus serine pro
210-221 DYYQLYSTQLST *YYELYPT**** 42% Chikungunya virus glyco
Envelop protein 50-60 SLVKPSFYVYS SLVKPTVYVYS 91% SARS-CoV small envelop
Membrane *EQLAKLLEQ Infectious pancreatic ne
Glycoprotein 10-19 VEELKKLLEQ 70% polymerase
**RPLMEPEL* Human alphaherpesviru
129-139 LTRPLLESELV 45% glycosylase
170-179 VATSRTLSYY *****TLSYY 50% Escherichia virus2 C pro
175-185 TLSYYKLGASQ *LTYYKL**** 50% Human orthopneumoviru
178-188 YYKLGASQRVA ****GASQRV* 55% Hepacivirus E2 glycopro
orf7a 16-25 ELYHYQECVR ELYHYQECVR 100% SARS-CoV orf 7a accesso
68-78 PDGVKHVYQLR *DGTRHTYQLR 55% SARS-CoV orf 7a accesso
71-80 VKHVYQLRAR **HTYQLRAR 70% SARS-CoV orf 7a accesso
73-83 HVYQLRARSVS HTYQLRARSVS 91% SARS-CoV orf 7a accesso
orf8 22-32 LQSCTQHQPYV LQSCT****** 45% Influenza A (H17N10) vir
CNQSTPYYVVD Human gammaherpesvi
25-35 CTQHQPYVVD 60% protein R
27-38 QHQPYVVDDPCP **HPYVLDD*** 42% Enterovirus D68 viral pro
*****HFYSK Rabies virus SADB19 Lar
35-44 DPCPIHFYSK 50% Protein
110-119 EYHDVRVVLD **HDAVRIILD 40% Delta coronavirus spike
Nucleocapsid 79-88 SPDDQIGYYR *PDDQIGYYR 90% SARS-CoV nucleocapsid
phosphoprotein 237-246 KGQQQQGQTV **QQQQG*** 50% Bourbon virus envelop g
239-250 QQQQGQTVTKKS QQQQG******* 42% Bourbon virus envelop g
375-384 KADETQALPQ *ADETKAL** 60% Betacoronavirus England
ETQALPQRQKKQQTV *****PPRQKKQ****
378-394 TL 35% Sindbis virus coat protei
401-410 DDFSKQLQQS DDF**QLQQ* 70% Norovirus Hu VP1

18
View publication stats

Supplementary Files

This is a list of supplementary files associated with this preprint. Click to download.

SARS-CoV-2 Epitope Mapping_Guevarra&Ulanday_Supplementary.docx

19

You might also like