0% found this document useful (0 votes)
39 views41 pages

Dynamic Programming Methods in Pairwise Alignment

The document discusses various methods for aligning biological sequences including global and local alignment. It covers the Needleman-Wunsch and Smith-Waterman algorithms which use dynamic programming to find optimal sequence alignments.

Uploaded by

Priyanshu Panda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views41 pages

Dynamic Programming Methods in Pairwise Alignment

The document discusses various methods for aligning biological sequences including global and local alignment. It covers the Needleman-Wunsch and Smith-Waterman algorithms which use dynamic programming to find optimal sequence alignments.

Uploaded by

Priyanshu Panda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

20 BI 019

I.MSc BIOINFORMATICS
3RD YEAR 5TH SEMESTER
BJB AUTONOMOUS COLLEGE, BBSR
~ OUTLINE ~

Alignment of Pair of Sequences

Global vs Local Alignment Optimal Alignment

Dynamic Programming?

Steps performed by the Needleman-Wunsch and the Smith-


Waterman algorithms to produce a sequence alignment.

Tools based on the algorithm-


EMBOSS Needle and EMBOSS Water
Sequence Alignment
Sequence Alignment
• Comparison of two or more sequences by
searching for a series of character patterns that
are same in the same order in the sequences.

Sequences ALIGNED = an evolutionary relationship

• Sequencealignment also refers to the process of


accessing degree of similarity between the
sequences.
Sequence Alignment
• PAIRWISE SEQUENCE ALIGNMENT: two sequences

• MULTIPLE SEQUENCE ALIGNMENT: three or more


Comparing Two Sequences
•Point
mutations, easy:
ACGTCTGATACGCCGTATAGTCTATCT
ACGTCTGATTCGCCCTATCGTCTATCT
•Indels are difficult, must align sequences:

ACGTCTGATACGCCGTATAGTCTATCT
CTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT
----CTGATTCGC---ATCGTCTATCT
Causes for sequence
(dis)similarity
• mutation: a character at a certain location is
replaced by another character

ATCC
ATTCC
GAP = an insertion or a deletion event
AAAGT
TA_GT
Global vs Local Alignment
• Align the entire sequence • Align stretches of
up to both ends using all sequences with high
sequence characters density of matches
• Sequences – quite similar • Sequences – similar along
and approximately the some lengths but
same length dissimilar in others
Global vs Local Alignment
• Stretched over entire • Favoursfinding
sequence to find as many conserved subsequences
matching characters as
• Subalignments
possible
considered – over
conserved regions
SCORING a sequence alignment
•Scoring Scheme
set of values assigned to different events in an alignment

MATCH = identity MAXIMUM VALUE


MISMATCH
GAP = insertion or deletion MINIMUM VALUE
•GAP PENALTY : negative score assigned to indel
events in Pairwise Sequence Alignment
•Scoring scheme is not universal.
Comparing Two Sequences
•Point
mutations, easy:
ACGTCTGATACGCCGTATAGTCTATCT
ACGTCTGATTCGCCCTATCGTCTATCT
•Indels operations: edit operations

ACGTCTGATACGCCGTATAGTCTATCT
CTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT
----CTGATTCGC---ATCGTCTATCT
SCORING a sequence alignment
•Match score: +1
•Mismatch score: +0
•Gap penalty: –1
ACGTCTGATACGCCGTATAGTCTATCT
||||| ||| || ||||||||
----CTGATTCGC---ATCGTCTATCT

•Matches: 18 × (+1)
•Mismatches: 2×0
TOTAL sim = +11
•Gaps: 7 × (– 1)
Optimal Alignment
• The alignment that gives the highest similarity
score.
• To access the degree of similarity between a pair
of sequences, we need to find the optimal
alignment.

• Number of matches = MAXIMUM


• Number of mismatches and gaps = MINIMUM
Methods of Sequence Alignment
1. DOT MATRIX
 Simple 2-D graphs

2. DYNAMIC PROGRAMMING
 Algorithms for optimization

3. HEURISTICS METHODS
 Fast computational Methods of approximation
Methods of Sequence Alignment
1. DOT MATRIX :
 Graphical similarity comparison

 Both sequences placed along two


axes of a 2-D plot
 A dot is placed at every point of
identity

 Does not show or produce precise


nor optimal alignment
DYNAMIC
PROGRAMMING
Dynamic Programming?
• Richard E. Bellman at RAND Corporation – optimal
decision making processes research in 1950s
• 1953 – Dynamic Programming
• Large scale system analysis and optimization
• Computer-oriented approaches for breaking problems
into sub-problems

PROBLEM
Dynamic Programming?
• solving a complex problem
P
• first breaking into a
collection of simpler
subproblems SP SP
• solving each subproblem just
once
• storing their solutions to avoid s s s s
repetitive computations.
Biological Sequence
Alignment and
Dynamic Programming
Dynamic Programming
• The dynamic programming approach to sequence
alignment always tries to follow the best prior-result so
far.
• Try to align two sequences by inserting some gaps at
different locations, so as to maximize the score of this
alignment.

• Examples:
 Needleman-Wunsch(1970)
 Smith-Waterman(1981)
Dynamic Programming
measurement is determined by "match
• Score
award", "mismatch penalty" and "gap penalty".
• The higher the score, the better the alignment.
• If
both penalties are set to 0, it aims to always find an
alignment with maximum matches so far.
• It
is used to compare the similarity between two
sequences of DNA or Protein, to predict similarity of
their functionalities.
Needleman-Wunsch Method
• The Needleman-Wunsch algorithm (1970)
 performs an optimal global alignment on two sequences
 applied to align protein or nucleotide sequences.
• The Needleman-Wunsch algorithm is guaranteed to
find the alignment with the maximum score.
• Scoresfor aligned characters are specified by the
transition scoring matrix (i,j) :
the similarity of characters i and j.
Needleman-Wunsch Method

3-STEP PROCESS

1. INITIALIZATION
2. MATRIX FILLING
3. TRACEBACK
ALIGNMENT
1. INITIALIZATION
Gap Penalty = -6

Gap Penalty
X Row Number

Gap Penalty
X Column Number
2. MATRIX FILLING
F (i,j) = cell of ‘i’ rows and ‘j’ columns

S(xi,yj) = substitution score Assigned scoring scheme


MATCH = +5
d = gap penalty MISMATCH = -2
GAP = -6
The square matrix is solved by predefined scoring
scheme and matrix is filled. MATCH = +5
MISMATCH = -2
GAP = -6
The partial alignment scores are calculated at
all parts of the alignment matrix.
3. TRACEBACK & ALIGNMENT

Optimal Alignment:

TGCTCGTA
T_ _ TCATA
Dynamic Programming Tools
• GLOBAL ALIGNMENT TOOLS
1. Needle (EMBOSS)
 optimal global alignment – N-W algorithm
2. Stretcher (EMBOSS)
 Modified N-W algorithm to globally align larger
sequences
3. GGSEARCH2SEQ
EMBOSS – Needle (EMBL-EBI)

https://www.ebi.ac.uk/Tools/psa/emboss_needle
Smith-Waterman Method
• The Smith-Waterman algorithm (1981) is for
determining similar regions between two nucleotide
or protein sequences.
• Smith-Waterman is also a dynamic programming
algorithm and improves on Needleman-Wunsch.
• Followsthe same 3-step Process as N-W
algorithm, with just adding the 0 in the 2nd step.
INITIALIZATION
• Thefirst rows and columns are filled as per the gap
penalty of scoring scheme.
• The negative scores are then substituted with “0”.

• UnlikeN-W method, no negative values are


allowed in the alignment scoring matrix.
MATRIX FILLING
F (i,j) = cell of ‘i’ rows and ‘j’ columns

0
Smith – Waterman introduces ‘0’ so as to when the
scoring matrix value becomes negative, the value
is set to ZERO.
3. TRACEBACK
• Thetraceback is started from the highest scoring
position in the scoring matrix.
• Path is traced up to a box that scores Zero.
• Assuch, it has the desirable property that it is
guaranteed to find the optimal local alignment
with respect to the scoring system being used
(which includes the substitution matrix and the
gap-scoring scheme).
Smith-Waterman Method
Optimal Alignment

C D
C D
+5+5

OPTIMAL ALIGNMENT
SCORE: +10
Smith-Waterman Method

• However,
the Smith-Waterman algorithm is
demanding of time and memory resources
• Asa result, it has largely been replaced in
practical use by the BLAST algorithm;
although not guaranteed to find optimal
alignments, BLAST is much more efficient.
Dynamic Programming Tools
• LOCAL ALIGNMENT TOOLS
1. Water (EMBOSS)
 optimal local alignment – enhanced S-W algorithm
2. Matcher (EMBOSS)
 Modified N-W algorithm to globally align larger
sequences based on LAALIGN
3. LAALIGN
4. SSEARCH2SEQ
 Optimal local alignment using S-W algorithm
EMBOSS – Water (EMBL-EBI)

https://www.ebi.ac.uk/Tools/psa/emboss_water
Dynamic Programming
applications
• Sequence comparison
• Gene recognition
• RNA structure prediction and hundreds of other
problems are solved by ever new variants of DP.

• Computationally intensive
• Paved way for Fast computational Heuristics
Methods of approximation e.g – FASTA and
BLAST
CONCLUSION
All of the alignment methods in use
today are related to the original
method of Needleman and Wunsch.

Dynamic Programming methods still


have absolute relevance amongst the
current fast computational approaches
for biological sequence alignment.
• ~ references ~

• Wikipedia
• Wikimedia Commons
• Unsplash and Microsoft Bing Images
• Buffalo University – Tutorial Compatibility PPT
• Bioinformatica – Youtube channel
• Class Lecture Notes –Rakesh Ranjan Ojha (BJB Faculty)
• MIT OCW – 3. NW, SW and PAM, BLOSUM (youtube)
• https://www.youtube.com/watch?v=PdyARRNwi7I
• Bioinformatics : Methods and applications
• by Rastogi and Rastogi
THANKYOU
EVERYONE

You might also like