Sequence Analysis - Alignment

Here are the BLAST results for the given sequences: KR093978 - BLASTn search shows this sequence has 100% identity to Bacillus cereus strain BGSC 6E1 chromosome, complete genome. KR093979 - BLASTn search shows this sequence has 100% identity to Bacillus cereus strain BGSC 6E1 chromosome, complete genome. KR093980 - BLASTn search shows this sequence has 100% identity to Bacillus cereus strain BGSC 6E1 chromosome, complete genome. KR093981 - BLASTn search shows this sequence has 100% identity to Bacillus cereus strain BGSC 6E1 chromosome, complete genome. KR093982

Uploaded by

filson.riyadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views57 pages

Sequence Analysis - Alignment

Uploaded by

filson.riyadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Chapter - 3

Biological Sequence Analysis & Alignment

3.1. Sequence comparison, similarity alignment
3.2. Database Similarity Searching: Database Searching
Tools & Formats (BLAST, FASTA, etc)
3.3. Sequence alignment methods (local & global) and
algorithms
3.4. Sequence alignment techniques: Pair wise & Multiple
Sequence Alignment
3.5. Tools for Sequence alignment: ClustalW, T-Coffee, etc
3.6. Alignment Interpretation and Scoring methods
Biological Sequence Analysis
• Biological Sequence
 is a single, continuous molecule of nucleic acid or protein.
 the methodologies implemented under sequence analysis
include:
- sequence alignment (pairwise & multiple sequence
alignment),
- phylogenetic analysis,
- motif & domain search/prediction,
- identification of novel genes for the drug.
• Sequence alignment
 is a way of arranging the sequences of DNA/RNA/AA to
identify regions of similarity that may be a consequence
of functional, structural or evolutionary relationships.
 the goal of alignment is to find the conserved region (if
present) between two or more sequences;
 these conserved regions are supposed to be an important
& functional region (domain or motif) in the sequences.
The dilemma: DNA or protein?
Search by similarity

Using nucleotide seq. Using amino acid seq.

By translating into amino acid sequence, are we losing
information? Yes!The genetic code is degenerate (Two
or more codons can represent the same amino acid)
 Very different DNA sequences may code for similar
protein sequences →We certainly do not want to miss those
cases!
• Conclusion:
It is almost always better to compare coding sequences in
their amino acid form, especially if they are very divergent.
Very highly similar nucleotide sequences may give better
results.
Biological Sequence Similarity
• it tells us:
1. Homology genes
- are genes that derive from a common ancestor-gene
are called homologs
- is an evolutionary relationship that either exists or
does not.
- high similarity is evidence for homology. Similar
sequences may be orthologs or paralogs.
2. Orthologous genes
- are homologous genes in different organisms with
shared function.
3. Paralogous genes
- are homologous genes in one organism that derive from
gene duplication → often have divergent function.
Orthologs and paralogs are often viewed in a single tree
Homologous and Paralogous
Causes for sequence (dis)similarity
• Mutation: a nucleotide at a certain location is replaced by
another nucleotide (e.g.: ATA → AGA)
Transitions mutations: change from a purine to a purine or
a pyrimidine to a pyrimidine. E.g: A to G; G to A; C
to T; T to C
Transversions mutations: change from a purine to a
pyrimidine or vice versa.
 Synonymous & non-synonymous mutation
Insertion: at a certain location one new nucleotide is
inserted in between two existing nucleotides
(e.g.: AA → AGA)
Deletion: at a certain location one existing nucleotide
is deleted (e.g.: ACTG → AC-G)
 Indel: an insertion or a deletion
Classification of sequence alignment algorithms
 two main classes of sequence alignment methods:
- global alignments and local alignments.

 in contrast to local alignments where only portions of

sequences are aligned, the entire sequences are aligned
in global alignments.
 Global alignments are useful for aligning closely related
sequences whereas local alignments are more suitable
when comparing distantly related sequences
 Pairwise & multiple alignments are the basic tools to
compare sequences.
 An alignment is meant to say global alignment when
closely related sequences of the same length are aligned
together;
 the alignment of the sequence is carried out from the start
to end of the sequence while searching for best possible
alignment.
→ Needleman-Wunsch algorithm
 Local alignment is mainly used for those sequences which
differ in sequence length.
→ this method finds local matches within the sequence
stretch instead of looking at the entire sequence.
→ Smith-Waterman algorithm
→ BLAST (basic local alignment search tool) is the most
commonly used tool for sequence alignment & similarity
search.
 gaps are used to show that an AA or DNA is without a
match in the other sequence & the gaps represent
insertions or deletions in an evolutionary context.
 when alignment is constructed, the identity & similarity
can be quantified.
• the identity is the number of DNAs or AAs matching
among sequences compared at all positions.
• Similarity is a further comparison also considering
different types of DNAs or AAs as well as the gaps.
• Global alignment (top) includes matches ignored by local
alignment (bottom)

Global:
15% identity

Local:
30% identity
Sequence Similarity & Scoring Methods
1. Dot-Matrix Method
 is done by putting one sequence along the y-axis on left
side & another sequence on x-axis horizontally on top.
 this method generates a simple matrix of sequence, while
each item of the matrix is a measure of similarity of those
two residues on the horizontal & vertical sequence.
2. Dynamic Programming
 this method is used in computer science, mathematics,
management science, economics.
Multiple Sequence Alignment
• EBI ClustalW Server
Preparing Multiple Sequence
 “*” refers to the residues or nucleotides in that column are identical in all
sequences in the alignment.
 “:” indicates that conserved substitutions have been observed.
 “.” indicates that semi-conserved substitutions are observed.
Multiple Alignment using Fast Fourier
Transform

MUltiple Sequence Comparison by Log-

Expectation

(Tree-based Consistency Objective Function For alignment Evaluation)

Exercise- 1
1. Pair wise alignment – online + CLC genome
workbench
2. Multiple alignment – online + CLC genome
workbench
3. Local alignment – online
4. Global alignment – online
Database searching tips
 use latest database version.
 use BLAST (Basic Local Alignment Search Tool) first
 search both strands when using FASTA.
 translate sequences where relevant
 E<0.05 is statistically significant, usually biologically
interesting.
 if the query has repeated segments, delete them & repeat
search
 most used algorithm in bioinformatics - Verb: to blast
 BLAST allows rapid sequence comparison of a query
sequence against a database.
 The BLAST algorithm is fast, accurate, & accessible both
via the web & the command line.
 is popular - good balance of sensitivity & speed; reliable
& flexible
BLAST
 BLAST tool is fast & can be used in analysis of >1000s
of sequences & even for comparison of two genomes
 BLAST is freely available for everyone
 BLAST tool is straightforward to handle and produces
very informative data
 BLAST method is a word search heuristic method which
eliminates the irrelevant sequences & saves search time.
 BLAST has some subprograms:
 BLASTn - aligns nucleotide query sequence with
nucleotide database.
 BLASTp - aligns protein sequence with protein
database.
 BLASTx - used to align nucleotide sequence with
protein database by comparing six-frame conceptual
translation of nucleotide sequence.
 tBLASTx - aligns query nucleotide possible six-frame
converted sequence with converted nucleotide six-
frame sequences of the database.
 tBLASTn - aligns protein query sequence with
translated nucleotide database.
(blastn)

(blastp)
BLASTn
BLASTp
BLASTn: Search Set
BLASTn: Program Selection
BLASTn Result
BLASTn: Graphic Summary
BLASTn Description
BLASTn Alignment
BLASTn Tree View
BLASTp: Search Set
PDB BLASTp
BLASTp: Graphic Summary
PDB BLASTp Description
BLASTp Tree View
A practical example of sequence alignment
http://www.ncbi.nlm.nih.gov

BLAST results
0
E = 0.0 means
≤10-1000
 E value: is the expectation value or probability to find by
chance hits similar to your sequence. The lower the E, the
more significant the score.
Exercise - 2
• BLAST the following sequence or accession numbers:
KR093978
• KR093979
• KR093980
• KR093981
• KR093982

Zoology Notes 1
No ratings yet
Zoology Notes 1
9 pages
Biology Lab Project Dead or Alive Grant Proposal
75% (4)
Biology Lab Project Dead or Alive Grant Proposal
12 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
2. Sequence alignment
No ratings yet
2. Sequence alignment
25 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
3
No ratings yet
3
107 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
20200831 - Sequence Alignment
No ratings yet
20200831 - Sequence Alignment
18 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Module-II
No ratings yet
Module-II
51 pages
BT302_L3_PSA
No ratings yet
BT302_L3_PSA
47 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Bio 2
No ratings yet
Bio 2
39 pages
BI Assignment 1
No ratings yet
BI Assignment 1
6 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Sequences Alignments (Similarity & Homology)
No ratings yet
Sequences Alignments (Similarity & Homology)
32 pages
3.7
No ratings yet
3.7
22 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Sequencing Alignment & Its Methods Group II
No ratings yet
Sequencing Alignment & Its Methods Group II
12 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Genomics and Similarity search
No ratings yet
Genomics and Similarity search
43 pages
Lab Report 3 Bioinformatics
No ratings yet
Lab Report 3 Bioinformatics
18 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
lecture1_Loi
No ratings yet
lecture1_Loi
52 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
lec-02
No ratings yet
lec-02
103 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
Sequence alignment write
No ratings yet
Sequence alignment write
17 pages
Module 5
No ratings yet
Module 5
23 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Homoplasy: Homoplastic. It Is Derived From The Two Ancient Greek Words
No ratings yet
Homoplasy: Homoplastic. It Is Derived From The Two Ancient Greek Words
5 pages
Instant download (Ebook) Ancestral journeys: the peopling of Europe from the first venturers to the Vikings by Manco, Jean ISBN 9780500292075, 9780500772898, 0500292078, 0500772894 pdf all chapter
100% (6)
Instant download (Ebook) Ancestral journeys: the peopling of Europe from the first venturers to the Vikings by Manco, Jean ISBN 9780500292075, 9780500772898, 0500292078, 0500772894 pdf all chapter
65 pages
CCMB Dissertation
100% (2)
CCMB Dissertation
6 pages
TOEFL Test 1 GRAM Lourdes
100% (1)
TOEFL Test 1 GRAM Lourdes
5 pages
11.evolution by Human Final
No ratings yet
11.evolution by Human Final
31 pages
Edexcel IAL Biology A Level: Topic 3: Cell Structure, Reproduction and Development
No ratings yet
Edexcel IAL Biology A Level: Topic 3: Cell Structure, Reproduction and Development
6 pages
The FEBS Journal - 2021 - Karamanos - A Guide To The Composition and Functions of The Extracellular Matrix
No ratings yet
The FEBS Journal - 2021 - Karamanos - A Guide To The Composition and Functions of The Extracellular Matrix
63 pages
Exploring Tissue Architecture Using Spatial Transcriptomics - Biozion
No ratings yet
Exploring Tissue Architecture Using Spatial Transcriptomics - Biozion
25 pages
Q2 - Second Summative Test
No ratings yet
Q2 - Second Summative Test
3 pages
Biostatistics Concepts and Applications For Biologists
No ratings yet
Biostatistics Concepts and Applications For Biologists
210 pages
Zamia Mexico
No ratings yet
Zamia Mexico
36 pages
Class 3 - Humans As Primates
No ratings yet
Class 3 - Humans As Primates
48 pages
Ch. 12 Eoc Review
No ratings yet
Ch. 12 Eoc Review
16 pages
Bryophyta
No ratings yet
Bryophyta
5 pages
DNA (Deoxyribonucleic Acid)
No ratings yet
DNA (Deoxyribonucleic Acid)
5 pages
Elements of Genetics
No ratings yet
Elements of Genetics
90 pages
Genetic Engineering Artificial Selection Selective Breeding Hybridization Inbreeding Cloning Gene Splicing Gel Electrophoresi
No ratings yet
Genetic Engineering Artificial Selection Selective Breeding Hybridization Inbreeding Cloning Gene Splicing Gel Electrophoresi
1 page
Biotechnology and Its Applications
No ratings yet
Biotechnology and Its Applications
16 pages
Get Cytoskeleton methods and protocols Fourth Edition Ray H. Gavin (Editor) free all chapters
100% (17)
Get Cytoskeleton methods and protocols Fourth Edition Ray H. Gavin (Editor) free all chapters
85 pages
Suggested 30% Ccmas
No ratings yet
Suggested 30% Ccmas
20 pages
Single-Cell Genomics Meets Human Genetics
No ratings yet
Single-Cell Genomics Meets Human Genetics
15 pages
Syllabus and Model Question Papers: Adikavi Nannaya University:: Rajahmahendravaram
No ratings yet
Syllabus and Model Question Papers: Adikavi Nannaya University:: Rajahmahendravaram
39 pages
Test Bank for Bailey and Scotts Diagnostic Microbiology 14th Edition by Tille Chapter 12 not included - Latest Version With All Chapters Is Now Ready
100% (8)
Test Bank for Bailey and Scotts Diagnostic Microbiology 14th Edition by Tille Chapter 12 not included - Latest Version With All Chapters Is Now Ready
35 pages
1777927-CLASS 9 - SCIENCE - BIOLOGY - THE FUNDAMENTAL UNIT OF LIFE-PART II - WS WITH ANS. - SREEJA
No ratings yet
1777927-CLASS 9 - SCIENCE - BIOLOGY - THE FUNDAMENTAL UNIT OF LIFE-PART II - WS WITH ANS. - SREEJA
7 pages
CELL THE UNIT OF LIFE- FULL CHAPTER AR questions
No ratings yet
CELL THE UNIT OF LIFE- FULL CHAPTER AR questions
29 pages
BigDye - TFS-Assets_LSG_manuals_MAN1000355-BDTv3-1CycleSeqKit-UG
No ratings yet
BigDye - TFS-Assets_LSG_manuals_MAN1000355-BDTv3-1CycleSeqKit-UG
56 pages
Grade 10 Life Sciences – Paper 1_ June 2025
No ratings yet
Grade 10 Life Sciences – Paper 1_ June 2025
4 pages
Bio2 11 - 12 Q3 0403 PF FD
No ratings yet
Bio2 11 - 12 Q3 0403 PF FD
26 pages

Sequence Analysis - Alignment

Uploaded by

Sequence Analysis - Alignment

Uploaded by

Chapter - 3

Biological Sequence Analysis & Alignment

Using nucleotide seq. Using amino acid seq.

 in contrast to local alignments where only portions of

MUltiple Sequence Comparison by Log-

(Tree-based Consistency Objective Function For alignment Evaluation)

You might also like