0% found this document useful (0 votes)

54 views39 pages

Bioinformatics I

This document provides an overview of pairwise sequence alignment. It defines key terms like homology, orthologs, and paralogs. It also describes different types of alignment like global and local, and scoring matrices used like BLOSUM and PAM. Methods for assessing the statistical significance of alignments are discussed, such as relative entropy. Pairwise alignment is a fundamental bioinformatics tool for determining the relationship between sequences and hypothesizing homology. The biological significance of any alignment must still be assessed.

Uploaded by

Jan Patrick Platon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views39 pages

Bioinformatics I

Uploaded by

Jan Patrick Platon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Bioinformatics I

Marineil C. Gomez
School of Chemical, Biological and Materials Engineering and Sciences
Mapua University
Pairwise Sequence Alignment
Outline
⚫ Definition of Homology, orthologs and
paralogs
⚫ Global and Local Alignment
⚫ Scoring Matrices
⚫ Nucletide Models
⚫ Protein Models
Why do Alignment?
⚫ Is the gene/protein related to any other gene/protein?
⚫ Relatedness:
⚫ Sequence level = homologous
⚫ Common functions

⚫ DNA vs protein alignment

⚫ 4 vs 20
⚫ “wiggle” at the third position
⚫ Amino acids with similar properties
Definitions
⚫ Homology
− the state of having the
same or similar relation,
relative position, or
structure.
− Qualitative similarity of
structures
⚫ Similarity and Identity
− Quantitative description of
the relatedness of
sequences.
Definitions
⚫ Orthologs – same function, different sequence / low
sequence similarity
⚫ Myglobin and hemoglobin

⚫ Paralogs – same identity, distinct and different functions

⚫ Alpha 1 and alpha 2 globin

⚫ Similar – (amino acids) has the same biochemical

properties
⚫ Identical – same identity / aa residue /nucleotide
Homology in protein structure

These proteins are homologous

(descended from a common
ancestor) and share very similar
three-dimensional structures.
However, pairwise alignment of the
amino acid sequences of these
proteins reveals that the proteins
share very limited amino acid
identity.
Paralogous human globins
Relatedness of genes and proteins

Can be assessed by pairwise

alignment
Pairwise alignment of human beta globin (the “query”) and myoglobin (the
“subject”).
Pairwise Alignment
Can be assessed by pairwise alignment

• Query and subject : two sequences being

compared
• Identical Amino Acids: exactly the same aa or
nucleotide
• Similar Amino Acids: different identity but similar
biochemistry due to conservative substitutions
• Gaps: Insertion or deletion mutations
Pairwise alignment of human beta globin (the “query”) and myoglobin (the
“subject”).
Pairwise Alignment
The percent similarity of two protein sequences is the sum
of both identical and similar matches.

֎ The purpose of a pairwise alignment is to assess the

degree of similarity and the possibility of homology
between two molecules.

֎ It is never correct to say that two proteins share a

certain percent homology, because they are either
homologous or not.

֎ It is not appropriate to describe two sequences as

“highly homologous;” instead, it can be said that they
share a high degree of similiarity.
Dynamic Programming Approach to Pairwise Alignment
Pairwise Alignment with Dotplots

a graphical method for comparing two sequences.

One protein or nucleic acid sequence is placed
along the x axis and the other is placed along the y
axis.

Positions of identity are scored with a dot. A region

of identity between two sequences results in the
formation of a diagonal line.
Dotplot showing multiple repeats
Dotplot with Inversions
Polymorphisms and mutations in
dotplots
When two sequences are
aligned, what scores should
they be assigned?
Scoring Matrices (Proteins)
⚫ Dayhoff Model – Substitution Model
⚫ Margaret Dayhoff (1978)
− provides the basis of a quantitative scoring system for
pairwise alignments between any proteins, whether they are
closely or distantly related
⚫ BLOSUM
⚫ Steven Henikoff and Jorja G. Henikoff
⚫ JTT
Nucleotide Substitution Models
Illustration of multiple substitutions at the
same site or multiple hits. An ancestral
sequence diverged into two sequences
and has since accumulated nucleotide
substitutions independently along the
two lineages.
Only two differences are observed
between the two present-day
sequences, so that the proportion of
different sites is p̂ = 2/8 = 0.25, while in
fact as many as 10 substitutions (seven
on the left lineage and three on the right
lineage) occurred so that the true
distance is 10/8 = 1.25 substitutions per
site.
Markov
Models of
Substitution
Relative substitution rates between
nucleotides under three Markov-chain
models of nucleotide substitution: JC69
(Jukes and Cantor 1969), K80 (Kimura
1980), and HKY85 (Hasegawa et al.
1985).

The thickness of the lines represents the

substitution rates while the sizes of the
circles represent the steady-state
distribution.
Amino Acid Substitution Models
⚫ Empirical Models
⚫ Attempt to describe the relative rates of substitution between amino
acids without considering explicitly factors that influence the evolutionary
process. They are often constructed by analysing large quantities of
sequence data, as compiled from databases
⚫ PAM (Dayhoff), JTT
⚫ Mechanistic Models
⚫ Consider the biological process involved in amino acid substitution, such
as mutational biases in the DNA, translation of the codons into amino
acids, and acceptance or rejection of the resulting amino acid after
filtering by natural selection.
⚫ Mechanistic models have more interpretative power and are particularly
useful for studying the forces and mechanisms of gene sequence
evolution
PAM Substitution Matrix
⚫ The first empirical amino acid
substitution matrix constructed by
Dayhoff and colleagues (1978).
⚫ compiled and analysed protein
sequences available at the time,
using a parsimony argument to
reconstruct ancestral protein
sequences and tabulating amino
acid changes along branches on
the phylogeny.
⚫ To reduce the impact of multiple
hits, the authors used only similar
sequences that were different from
one another at < 15% of sites.
Inferred changes were merged
across all branches without regard
for their different lengths.
JTT Matrix
DAYHOFF matrix was
updated by Jones et al.
(1992), who analysed a
much larger collection of
protein sequences, using
the same approach as did
Dayhoff et al.
(1978)
BLOSUM (BLOcks SUbstitution Matrix)
⚫ introduced by Steven Henikoff and Jorja Henikoff
⚫ used to score alignments between evolutionarily divergent protein sequences.
⚫ based on local alignments
⚫ They scanned the BLOCKS database (>500 groups of local multiple
alignments of distantly related protein sequences) for very conserved regions
of protein families (that do not have gaps in the sequence alignment) and then
counted the relative frequencies of amino acids and their substitution
probabilities.
⚫ Then, they calculated a log-odds score for each of the 190 possible

substitution pairs of the 20 standard amino acids.

⚫ All BLOSUM matrices are based on observed alignments; they are not
extrapolated from comparisons of closely related proteins like the PAM
Matrices.
⚫ The default matrix used in BLAST
How can we decide
whether the alignment
of two sequences is
statistically significant?
For Global Alignments

Q: Does the score occur by chance?

Method 1: Compare with set of non-homologous sequences
Method 2: compare with randomly generated sequences
Method 3: randomly scramble on of the sequences being compared

Hypothesis testing with z-test

Cons: biological data does not follow Gaussian distribution

:
For Local Alignments
By Percent Identity:
Cons: where does the threshold lie?
26% vs 30%;
40% vs 60%
Differences in the length of proteins
20 aa vs 150 aa
For Local Alignments
By Relative Entropy:
• The relative entropy (H) of the target and background distributions
measures the information that is available per aligned amino acid
position that, on average, distinguishes a true alignment from a chance
alignment
• For each substitution matrix with its unique target frequencies qij and
background distributions pipj, it is possible to derive the relative entropy
H as follows:

• where H corresponds to the information content of the target and

background distributions associated with a particular scoring matrix
Relative Entropy and PAM distance
Perspectives and Pitfalls
The pairwise alignment of DNA or protein
sequences is one of the most fundamental
operations of bioinformatics.
Pairwise alignment allows the relationship
between any two sequences to be determined,
and the degree of relatedness that is observed
helps in the forming of a hypothesis about whether
they are homologous
Any two sequences can be aligned, even if they
are unrelated. It is always important to assess the
biological significance of a sequence alignment.
End of Lecture II

Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
5.Pairwise Alignment
No ratings yet
5.Pairwise Alignment
85 pages
AsBioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
AsBioinfo-Ders-7-ALLIGNMENT_1
9 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Bioinfo-Ders-7-ALLIGNMENT_1
No ratings yet
Bioinfo-Ders-7-ALLIGNMENT_1
55 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Amino Acid Substitution Scores: 1 2 N 1 2 N N I 1 I I
No ratings yet
Amino Acid Substitution Scores: 1 2 N 1 2 N N I 1 I I
3 pages
Bioinformatics Alignment
No ratings yet
Bioinformatics Alignment
128 pages
Lecture 6 Evolutionary Sequence Alignment Algorithms
No ratings yet
Lecture 6 Evolutionary Sequence Alignment Algorithms
26 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
UNIT III
No ratings yet
UNIT III
14 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
lecture1_Loi
No ratings yet
lecture1_Loi
52 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
3
No ratings yet
3
107 pages
Week 3
No ratings yet
Week 3
42 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Using Scoring Matrices
No ratings yet
Using Scoring Matrices
3 pages
lec-02
No ratings yet
lec-02
103 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Sequence Comparison Part 1
No ratings yet
Sequence Comparison Part 1
31 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Genomics and Similarity search
No ratings yet
Genomics and Similarity search
43 pages
3.7
No ratings yet
3.7
22 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Bioinformatics a Practical Guide to the Analysis of Genes and Proteins 2020_9898c0fe74d854816428924c4df9b9e2
No ratings yet
Bioinformatics a Practical Guide to the Analysis of Genes and Proteins 2020_9898c0fe74d854816428924c4df9b9e2
34 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
SequenceAlignment
No ratings yet
SequenceAlignment
22 pages
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
No ratings yet
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
18 pages
Lecture 6- Sequence Analysis
No ratings yet
Lecture 6- Sequence Analysis
28 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Bioinformatics: Lecture 5: Calculating Identities, Similarity and Gab Scores
No ratings yet
Bioinformatics: Lecture 5: Calculating Identities, Similarity and Gab Scores
28 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
BT302_L3_PSA
No ratings yet
BT302_L3_PSA
47 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
ch10
No ratings yet
ch10
42 pages
2. Sequence alignment
No ratings yet
2. Sequence alignment
25 pages
Aula 2
No ratings yet
Aula 2
22 pages
Importance and Significance of Sequence Alignment.pptx12
No ratings yet
Importance and Significance of Sequence Alignment.pptx12
15 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Lab Report 3 Bioinformatics
No ratings yet
Lab Report 3 Bioinformatics
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
17 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Topical Guidebook For GCE O Level Biology 3 Part 2
From Everand
Topical Guidebook For GCE O Level Biology 3 Part 2
Esther Chen
5/5 (1)
June 2023 A level Edexcel Biology B Paper 3 Question Paper
No ratings yet
June 2023 A level Edexcel Biology B Paper 3 Question Paper
40 pages
X. Li, Y. Zhao, X. Tu Et Al. Plant Diversity XXX (XXXX) XXX
No ratings yet
X. Li, Y. Zhao, X. Tu Et Al. Plant Diversity XXX (XXXX) XXX
1 page
B.Sc.(H) Botany 6th Semester 2024 (1)
No ratings yet
B.Sc.(H) Botany 6th Semester 2024 (1)
8 pages
Agriculture Biotechnology Mcqs
No ratings yet
Agriculture Biotechnology Mcqs
3 pages
Genomics and Proteomics: Dr. Asma Ahsan GENOM7013
No ratings yet
Genomics and Proteomics: Dr. Asma Ahsan GENOM7013
16 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
BIOLOGY MODULE MELC 9 BASIC TAXONOMY OF DNA Sequence
No ratings yet
BIOLOGY MODULE MELC 9 BASIC TAXONOMY OF DNA Sequence
2 pages
Gibson Assembly Dissertation
100% (1)
Gibson Assembly Dissertation
5 pages
Microbiology with Diseases by Taxonomy 4th Edition Bauman Test Bankinstant download
100% (5)
Microbiology with Diseases by Taxonomy 4th Edition Bauman Test Bankinstant download
45 pages
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
No ratings yet
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
20 pages
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
No ratings yet
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
7 pages
EITGuideBook New Curriculum
No ratings yet
EITGuideBook New Curriculum
24 pages
Lecture 2 - Sequencing
No ratings yet
Lecture 2 - Sequencing
47 pages
sunil bio 12th fin
No ratings yet
sunil bio 12th fin
23 pages
Untitled Form (Responses)
No ratings yet
Untitled Form (Responses)
2 pages
Computational Bioengineering
No ratings yet
Computational Bioengineering
480 pages
GenBank
No ratings yet
GenBank
2 pages
Genomics PPT
No ratings yet
Genomics PPT
43 pages
ABT 301 Assignment
No ratings yet
ABT 301 Assignment
13 pages
Assi
No ratings yet
Assi
8 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
No ratings yet
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
23 pages
NPTEL Courses - Final Course List (Jan - April 2022)
No ratings yet
NPTEL Courses - Final Course List (Jan - April 2022)
15 pages
Genomics and Bioinformatics: Peter Gregory and Senthil Natesan
No ratings yet
Genomics and Bioinformatics: Peter Gregory and Senthil Natesan
22 pages
Parjoa Micelium Sporange Synthetising DNA CRISPR Inject
No ratings yet
Parjoa Micelium Sporange Synthetising DNA CRISPR Inject
3 pages
NEBcutter 3.0
No ratings yet
NEBcutter 3.0
1 page
NPTEL Tentative Course List (July - Dec 2025)
No ratings yet
NPTEL Tentative Course List (July - Dec 2025)
778 pages
Lipofectamine MessengerMAX Man
No ratings yet
Lipofectamine MessengerMAX Man
2 pages
Genetic modification and cloning
No ratings yet
Genetic modification and cloning
4 pages

Bioinformatics I

Uploaded by

Bioinformatics I

Uploaded by

Bioinformatics I

⚫ DNA vs protein alignment

⚫ Paralogs – same identity, distinct and different functions

⚫ Similar – (amino acids) has the same biochemical

These proteins are homologous

Can be assessed by pairwise

• Query and subject : two sequences being

֎ The purpose of a pairwise alignment is to assess the

֎ It is never correct to say that two proteins share a

֎ It is not appropriate to describe two sequences as

a graphical method for comparing two sequences.

Positions of identity are scored with a dot. A region

The thickness of the lines represents the

substitution pairs of the 20 standard amino acids.

Q: Does the score occur by chance?

Hypothesis testing with z-test

• where H corresponds to the information content of the target and

You might also like