Dynamic Programming Methods in Pairwise Alignment
Dynamic Programming Methods in Pairwise Alignment
I.MSc BIOINFORMATICS
3RD YEAR 5TH SEMESTER
BJB AUTONOMOUS COLLEGE, BBSR
~ OUTLINE ~
Dynamic Programming?
ACGTCTGATACGCCGTATAGTCTATCT
CTGATTCGCATCGTCTATCT
•
ACGTCTGATACGCCGTATAGTCTATCT
----CTGATTCGC---ATCGTCTATCT
Causes for sequence
(dis)similarity
• mutation: a character at a certain location is
replaced by another character
ATCC
ATTCC
GAP = an insertion or a deletion event
AAAGT
TA_GT
Global vs Local Alignment
• Align the entire sequence • Align stretches of
up to both ends using all sequences with high
sequence characters density of matches
• Sequences – quite similar • Sequences – similar along
and approximately the some lengths but
same length dissimilar in others
Global vs Local Alignment
• Stretched over entire • Favoursfinding
sequence to find as many conserved subsequences
matching characters as
• Subalignments
possible
considered – over
conserved regions
SCORING a sequence alignment
•Scoring Scheme
set of values assigned to different events in an alignment
ACGTCTGATACGCCGTATAGTCTATCT
CTGATTCGCATCGTCTATCT
•
ACGTCTGATACGCCGTATAGTCTATCT
----CTGATTCGC---ATCGTCTATCT
SCORING a sequence alignment
•Match score: +1
•Mismatch score: +0
•Gap penalty: –1
ACGTCTGATACGCCGTATAGTCTATCT
||||| ||| || ||||||||
----CTGATTCGC---ATCGTCTATCT
•Matches: 18 × (+1)
•Mismatches: 2×0
TOTAL sim = +11
•Gaps: 7 × (– 1)
Optimal Alignment
• The alignment that gives the highest similarity
score.
• To access the degree of similarity between a pair
of sequences, we need to find the optimal
alignment.
2. DYNAMIC PROGRAMMING
Algorithms for optimization
3. HEURISTICS METHODS
Fast computational Methods of approximation
Methods of Sequence Alignment
1. DOT MATRIX :
Graphical similarity comparison
PROBLEM
Dynamic Programming?
• solving a complex problem
P
• first breaking into a
collection of simpler
subproblems SP SP
• solving each subproblem just
once
• storing their solutions to avoid s s s s
repetitive computations.
Biological Sequence
Alignment and
Dynamic Programming
Dynamic Programming
• The dynamic programming approach to sequence
alignment always tries to follow the best prior-result so
far.
• Try to align two sequences by inserting some gaps at
different locations, so as to maximize the score of this
alignment.
• Examples:
Needleman-Wunsch(1970)
Smith-Waterman(1981)
Dynamic Programming
measurement is determined by "match
• Score
award", "mismatch penalty" and "gap penalty".
• The higher the score, the better the alignment.
• If
both penalties are set to 0, it aims to always find an
alignment with maximum matches so far.
• It
is used to compare the similarity between two
sequences of DNA or Protein, to predict similarity of
their functionalities.
Needleman-Wunsch Method
• The Needleman-Wunsch algorithm (1970)
performs an optimal global alignment on two sequences
applied to align protein or nucleotide sequences.
• The Needleman-Wunsch algorithm is guaranteed to
find the alignment with the maximum score.
• Scoresfor aligned characters are specified by the
transition scoring matrix (i,j) :
the similarity of characters i and j.
Needleman-Wunsch Method
3-STEP PROCESS
1. INITIALIZATION
2. MATRIX FILLING
3. TRACEBACK
ALIGNMENT
1. INITIALIZATION
Gap Penalty = -6
Gap Penalty
X Row Number
Gap Penalty
X Column Number
2. MATRIX FILLING
F (i,j) = cell of ‘i’ rows and ‘j’ columns
Optimal Alignment:
TGCTCGTA
T_ _ TCATA
Dynamic Programming Tools
• GLOBAL ALIGNMENT TOOLS
1. Needle (EMBOSS)
optimal global alignment – N-W algorithm
2. Stretcher (EMBOSS)
Modified N-W algorithm to globally align larger
sequences
3. GGSEARCH2SEQ
EMBOSS – Needle (EMBL-EBI)
https://www.ebi.ac.uk/Tools/psa/emboss_needle
Smith-Waterman Method
• The Smith-Waterman algorithm (1981) is for
determining similar regions between two nucleotide
or protein sequences.
• Smith-Waterman is also a dynamic programming
algorithm and improves on Needleman-Wunsch.
• Followsthe same 3-step Process as N-W
algorithm, with just adding the 0 in the 2nd step.
INITIALIZATION
• Thefirst rows and columns are filled as per the gap
penalty of scoring scheme.
• The negative scores are then substituted with “0”.
0
Smith – Waterman introduces ‘0’ so as to when the
scoring matrix value becomes negative, the value
is set to ZERO.
3. TRACEBACK
• Thetraceback is started from the highest scoring
position in the scoring matrix.
• Path is traced up to a box that scores Zero.
• Assuch, it has the desirable property that it is
guaranteed to find the optimal local alignment
with respect to the scoring system being used
(which includes the substitution matrix and the
gap-scoring scheme).
Smith-Waterman Method
Optimal Alignment
C D
C D
+5+5
OPTIMAL ALIGNMENT
SCORE: +10
Smith-Waterman Method
• However,
the Smith-Waterman algorithm is
demanding of time and memory resources
• Asa result, it has largely been replaced in
practical use by the BLAST algorithm;
although not guaranteed to find optimal
alignments, BLAST is much more efficient.
Dynamic Programming Tools
• LOCAL ALIGNMENT TOOLS
1. Water (EMBOSS)
optimal local alignment – enhanced S-W algorithm
2. Matcher (EMBOSS)
Modified N-W algorithm to globally align larger
sequences based on LAALIGN
3. LAALIGN
4. SSEARCH2SEQ
Optimal local alignment using S-W algorithm
EMBOSS – Water (EMBL-EBI)
https://www.ebi.ac.uk/Tools/psa/emboss_water
Dynamic Programming
applications
• Sequence comparison
• Gene recognition
• RNA structure prediction and hundreds of other
problems are solved by ever new variants of DP.
• Computationally intensive
• Paved way for Fast computational Heuristics
Methods of approximation e.g – FASTA and
BLAST
CONCLUSION
All of the alignment methods in use
today are related to the original
method of Needleman and Wunsch.
• Wikipedia
• Wikimedia Commons
• Unsplash and Microsoft Bing Images
• Buffalo University – Tutorial Compatibility PPT
• Bioinformatica – Youtube channel
• Class Lecture Notes –Rakesh Ranjan Ojha (BJB Faculty)
• MIT OCW – 3. NW, SW and PAM, BLOSUM (youtube)
• https://www.youtube.com/watch?v=PdyARRNwi7I
• Bioinformatics : Methods and applications
• by Rastogi and Rastogi
THANKYOU
EVERYONE