0% found this document useful (0 votes)
6 views6 pages

MadhumithaS gp1

GeneMark and Genscan were used to predict genes and proteins in the SARS-CoV-2 genome. GeneMark predicted 9 genes and their protein sequences, which when blasted showed 100% identity to known SARS-CoV-2 proteins. Genscan predicted one full length protein matching the SARS-CoV-2 ORF1ab polyprotein. GeneMark is considered more accurate for SARS-CoV-2 genome prediction based on consistency and identity of predictions to known proteins.

Uploaded by

sriram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

MadhumithaS gp1

GeneMark and Genscan were used to predict genes and proteins in the SARS-CoV-2 genome. GeneMark predicted 9 genes and their protein sequences, which when blasted showed 100% identity to known SARS-CoV-2 proteins. Genscan predicted one full length protein matching the SARS-CoV-2 ORF1ab polyprotein. GeneMark is considered more accurate for SARS-CoV-2 genome prediction based on consistency and identity of predictions to known proteins.

Uploaded by

sriram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Genomics and Proteomics

Madhumitha S
First graded assignment on Annotation of SARS CoV 2 Genome
SEMESTER|7Btech Biotechnology ; Reg No. 121010076
Sequence used in this assignment:

Software: Genscan
Link: http://hollywood.mit.edu/GENSCAN.html

Organism: Vertebrate

Suboptimal Exon Cut off (optional): 1.00

Print Options: Predicted peptides only

DNA Sequence: ACCESSION Number MN908947; Version 3


(https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3)

Output: .

NO EXONS FOUND AT GIVEN PROBABILITY CUTOFF


Predicted peptide sequence(s):

>/tmp/08_21_20-05:26:32.fasta|GENSCAN_predicted_peptide_1|8673_aa
MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV

LPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYR
//
SGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQ

KKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA

Interpretation: .

Genscan predicted one full length polypeptide with this sequence.

Gn.Ex : gene number, exon number

Init = Initial exon (ATG to 5' splice site)

Intr = Internal exon (3' splice site to 5' splice site)

Term = Terminal exon (3' splice site to stop codon)

PlyA = poly-A signal (consensus: AATAAA)

Begin : beginning of exon or signal (numbered on input strand)

End : end point of exon or signal (numbered on input strand)

Len : length of exon or signal (bp)

Fr : reading frame (a forward strand codon ending at x has frame x mod 3)

P : probability of exon (sum over all parses containing exon)

The confidence P values are lower than other and is not consistent which means the prediction is not
accurate.

The predicted protein sequence blasted against non-redundant protein sequences.

This shows that the predicted protein has 79% query coverage with ORF1ab polyprotein of SARS CoV
2 with 100% identity.
Software: GeneMark
Link: http://exon.gatech.edu/GeneMark/

Choose Gene Prediction with Viruses and Phages; GeneMark Hmm

DNA Sequence: ACCESSION Number MN908947; Version 3

Output: LST

Output Options: Tick all

Output .

GeneMark.hmm PROKARYOTIC (Version 3.26)


Date: Wed Aug 19 15:00:35 2020
Sequence file name: seq.fna
Model file name: GeneMark_hmm_heuristic.mod
RBS: false
Model information: Heuristic_model_for_genetic_code_1_and_GC_38

FASTA definition line: NC_045512.2 Severe acute respiratory syndrome


coronavirus 2 isolate Wuhan-Hu-1, complete genome
Predicted genes
Gene Strand LeftEnd RightEnd Gene Class
# Length
1 + 266 13483 13218 1
2 + 13810 21555 7746 1
3 + 21536 25384 3849 1
4 + 25393 26220 828 1
5 + 26523 27191 669 1
6 + 27202 27387 186 1
7 + 27394 27759 366 1
8 + 27894 28259 366 1
9 + 28274 29533 1260 1
Gene protein sequence predicted: .
>gene_1|GeneMark.hmm|4405_aa|+|266|13483 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSD
ARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRK
//

>gene_9|GeneMark.hmm|419_aa|+|28274|29533 >NC_045512.2 Severe acute


respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHG
//
WPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKK
ADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA

Gene nucleotide predicted: .


>gene_1|GeneMark.hmm|13218_nt|+|266|13483 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGAC
GTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCA
//
>gene_9|GeneMark.hmm|1260_nt|+|28274|29533 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATGTCTGATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGC
//
GCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTTGGATGATTTCTCCAAA
CAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAA

Interpretation .

GeneMark predicted totally of 9 polypeptide sequence.

The protein sequences are blasted against non-redundant protein sequences

Gene Number Matched sequence Query cover % Identity Accession


Gene_1 ORF1a polyprotein 100% 100% YP_009725295.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_2 orf1ab polyprotein 100% 100% QHW06038.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_3 surface 100% 99% CAD0240757.1
glycoprotein,
partial [Severe
acute respiratory
syndrome
coronavirus 2]
Gene_4 Chain A, Protein 3a 100% 100% 6XDC_A
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_5 membrane 100% 100% YP_009724393.1
glycoprotein
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_6 ORF6 protein 100% 100% YP_009724394.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_7 ORF7a protein 100% 100% YP_009724395.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_8 ORF8 protein 100% 100% YP_009724396.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_9 nucleocapsid 100% 100% YP_009724397.2
phosphoprotein
[Severe acute
respiratory
syndrome
coronavirus 2]

GeneMark.Hmm is more accurate for COVID 19 genome.

You might also like