0% found this document useful (0 votes)

26 views30 pages

M2-Longest Common Subsequence

The document discusses the concept of the Longest Common Subsequence (LCS) in the context of comparing DNA sequences, highlighting its importance in biological applications. It outlines the definition of subsequences, the brute-force approach to finding LCS, and introduces dynamic programming as an efficient method to compute LCS length and construct the sequence. Additionally, it covers improvements to the algorithm to optimize time and space complexity.

Uploaded by

ssanjayreg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views30 pages

M2-Longest Common Subsequence

Uploaded by

ssanjayreg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Longest Common Subsequence

Inspiration
• Biological applications often need to compare the DNA
of two (or more) different organisms
• A strand of DNA consists of a string of molecules called
bases, where the possible bases are adenine, guanine,
cytosine, and thymine
• each of these bases by its initial letter, we can express a
strand of DNA as a string over the finite set {A, C, G, T}
Inspiration
• For example, the DNA of one organism may be S1=
ACCGGTCGAGTGCGCGGAAGCCGGCCGAA, and
the DNA of another organism may be S2=
GTCGTTCGGAATGCCGTTGCTCTGTAAA.
• One reason to compare two strands of DNA is to
determine how “similar the two strands are, as some
measure of how closely related the two organisms are
Inspiration
• We can define similarity in many different ways
• First way - we can say that two DNA strands are similar
if one is a substring of the other

• In our example, neither S1 nor S2 is a substring of the

other.
• Second way - two strands are similar if the number of
changes needed to turn one into the other is small
Inspiration
• Third way measure the similarity of strands S 1 and S2 is by
finding a third strand S3

• In which bases in S3 appear in each of S1 and S2; these bases

must appear in the same order, but not necessarily
consecutively

• Longer the strand S3 we can find, the more similar S1 and S2 are
Inspiration
• S1= ACCGGTCGAGTGCGCGGAAGCCGGCCGAA

• S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA

• S3 is GTCGTCGGAAGCCGGCCGAA
Problem Statement
• A subsequence of a given sequence is just the given
sequence with zero or more elements left out

• Formally, given a sequence X = <x1,x2,...,xm>, another

sequence Z =<z1,z2,...,zk> is a subsequence of X if there
exists a strictly increasing sequence <i1,i2,...,ik> of indices
of X such that for all j = 1,2,...,k, we have xij= zj
Problem Statement
• For example, Z = <B, C, D, B> is a subsequence of X = <A,
B, C, B, D, A, B> with corresponding index sequence <2, 3,
5, 7>
• Given two sequences X and Y , we say that a sequence Z is a
common subsequence of X and Y if Z is a subsequence of
both X and Y
Problem Statement
• For example, if X = <A, B, C, B, D, A, B> and Y = <B, D, C, A,
B, A>, the sequence <B, C, A> is a common subsequence of
both X and Y
• But not a longest common subsequence (LCS) of X and Y
• Sequence <B, C, B, A>, which is also common to both X and
Y , has length 4 is the LCS
• Since X and Y have no common subsequence of length 5 or
greater
Step 1: Characterizing a longest common subsequence
• Brute-force approach to solve LCS problem:
• Enumerate all subsequences of X
• Check each subsequence to see whether it is also a subsequence of Y
• Keeping track of the longest subsequence we find.

• Each subsequence of X corresponds to a subset of the indices

{1, 2,...,m} of X
• Because X has 2m subsequences, this approach requires
exponential time, making it impractical
Basis of Optimal substructure of an LCS
• Given a sequence X = <x1, x2,...,xm>, we define the ith prefix of
X , for i = 0,1,...,m, as Xi = <x1, x2,...,xi>
• For example, if X = <A, B, C, B, D, A, B>, then X4 = <A, B, C,
B> and X0 is the empty sequence
Theorem 15.1 Optimal substructure of an LCS
• Let X = <x1, x2,...,xm> and Y = <y1, y2,...,yn> be sequences, and let Z

= <z1, z2,..., zk> be any LCS of X and Y .

1. If xm = yn , then ́zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1

2. If xm ≠ yn , then zk ≠ xm implies that Z is an LCS of Xm-1 and Y

3. If xm ≠ yn , then zk ≠ yn implies that Z is an LCS of X and Yn-1

Proof of Theorem 15.1
• (1) If ́ zk ≠ xm , then we could append xm = yn to Z to obtain a

common subsequence of X and Y of length k + 1, contradicting

the supposition that Z is a longest common subsequence of X

and Y . Thus, we must have ́ zk = xm = yn .

• Now, the prefix Zk-1 is a length (k -1) common subsequence of

X and Y
Proof of Theorem 15.1
• We wish to show that it is an LCS

• Suppose for the purpose of contradiction that there exists a

common subsequence W of Xm-1 and Yn-1 with length greater

than k-1

• Then, appending xm = yn to W produces a common subsequence

of X and Y whose length is greater than k, which is a

Proof of Theorem 15.1
(2) If ́ zk ≠ xm, then Z is a common subsequence of X m-1 and Y

• If there were a common subsequence W of X m-1 and Y with

length greater than k, then W would also be a common

subsequence of Xm and Y, contradicting the assumption that Z is

an LCS of X and Y

• (3) The proof is symmetric to (2)

Step 2: A recursive solution
• Theorem 15.1 implies that we should examine either one or two
subproblems when finding an LCS of X = <x1, x2,...,xm> and
Y= <y1, y2,...,yn>

• If xm = yn, we must find an LCS of Xm-1 and Yn-1

• Appending xm = yn to this LCS yields an LCS of X and Y

• If xm ≠ yn , then we must solve two subproblems: finding an LCS

of Xm-1 and Y and finding an LCS of X and Y n-1.
Step 2: A recursive solution
• Whichever of these two LCSs is longer is an LCS of X and Y
• Because these cases exhaust all possibilities, we know that one
of the optimal subproblem solutions must appear within an LCS
of X and Y .
Step 2: Overlapping Subproblem
• To find an LCS of X and Y, we may need to find the LCSs of X

and Yn-1 and of Xm-1 and Y

• But each of these subproblems has the subsubproblem of finding

an LCS of Xm-1 and Yn-1

• Many other subproblems share subsubproblems.

Step 2: Overlapping Subproblem
• Let us define c[i, j] to be the length of an LCS of the sequences

Xi and Yj

• either i = 0 or j = 0, one of the sequences has length 0, and so the

LCS has length 0
Step 3: Computing the length of an LCS
• LCS problem has only θ(m*n) distinct subproblems, however,
we can use dynamic programming to compute the solutions
bottom up.

• We maintain two 2D tables c and b for dynamic programming

• c table maintains the length of the common sub sequence

• b table helps to construct the solution

Step 3: Computing the length of an LCS
Step 4: Constructing an LCS
• b table returned by LCS-LENGTH enables us to quickly

construct an LCS for X = <x1, x2,...,xm> and Y = <y1, y2,...,yn>

• We simply begin at b[m, n] and trace through the table by

following the arrows

• Whenever we encounter a in entry b[i,j], it implies that x i =

yj is an element of the LCS that LCS-L ENGTH found.

Step 4: Constructing an LCS
• With this method, we encounter the elements of this LCS in
reverse order.

• The following recursive procedure prints out an LCS of X and Y

in the proper, forward order

• The initial call is PRINT -LCS(b, X, X.length, Y.length)

• For the b table in Figure
15.8 this procedure prints
BCBA
The procedure takes
time O(m + n) since it
decrements at least one
of i and j in each
recursive call
Improving the code
• Once you have developed an algorithm, you will often find that
you can improve on the time or space it uses

• Some changes can simplify the code and improve constant

factors but otherwise yield no asymptotic improvement in
performance.

• Others can yield substantial asymptotic savings in time and

space.
Improving the code
• In the LCS algorithm, for example, we can eliminate the b table
altogether. Each c[i, j] entry depends on only three other c table
entries: c[i -1, j- 1], c[i - 1, j], and c[i, j -1].

• Given the value of c[i, j], we can determine in O(1) time which
of these three values was used to compute c[i,j], without
inspecting table b.
Improving the code
• Thus, we can reconstruct an LCS in O(m+n) time using a
procedure similar to PRINT -LCS.

• Although we save θ(mn) space by this method, the auxiliary

space requirement for computing an LCS does not
asymptotically decrease, since we need θ(mn) space for the c
table anyway.
Improving the code
• We can, however, reduce the asymptotic space requirements for
LCS-LENGTH , since it needs only two rows of table c at a
time: the row being computed and the previous row.

• This improvement works if we need only the length of an LCS;

if we need to reconstruct the elements of an LCS, the smaller
table does not keep enough information to retrace our steps in
O(m + n) time

Dynamic Programming Solution To The Longest Common Subsequence Problem
No ratings yet
Dynamic Programming Solution To The Longest Common Subsequence Problem
3 pages
2-Dynamic Programming and LCS Intro-12-01-2024
No ratings yet
2-Dynamic Programming and LCS Intro-12-01-2024
9 pages
8 Dynamic Programming
No ratings yet
8 Dynamic Programming
75 pages
Longest Common Subsquence
No ratings yet
Longest Common Subsquence
8 pages
Longest Common Subsequence Explained
No ratings yet
Longest Common Subsequence Explained
67 pages
W-8 - L-1 - DP Longest Common Subsequence and Edit Distance
No ratings yet
W-8 - L-1 - DP Longest Common Subsequence and Edit Distance
19 pages
17 Dynprog2
No ratings yet
17 Dynprog2
33 pages
17 Dynprog2
No ratings yet
17 Dynprog2
33 pages
Dynamic Programming: LCS and BST Optimization
No ratings yet
Dynamic Programming: LCS and BST Optimization
32 pages
Lecture Notes
No ratings yet
Lecture Notes
54 pages
Longest Common Subsequence Algorithm
No ratings yet
Longest Common Subsequence Algorithm
5 pages
Let N (1. N2. .... XM) and Y ('1 Y2 N) Be Sequences, and Let 7
No ratings yet
Let N (1. N2. .... XM) and Y ('1 Y2 N) Be Sequences, and Let 7
3 pages
Longest Common Subsequence Explained
No ratings yet
Longest Common Subsequence Explained
4 pages
Understanding Longest Common Subsequence
No ratings yet
Understanding Longest Common Subsequence
11 pages
CSE 205 Lab Manual 13 LCS
No ratings yet
CSE 205 Lab Manual 13 LCS
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
8 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
8 pages
Longest Common Subsequence Guide
No ratings yet
Longest Common Subsequence Guide
3 pages
Lecture 13
No ratings yet
Lecture 13
31 pages
Longest Common Subsequence Using Dynamic Programming: Submitted By: Submitted To
No ratings yet
Longest Common Subsequence Using Dynamic Programming: Submitted By: Submitted To
30 pages
Lecture9 IO BLG336E 2022
No ratings yet
Lecture9 IO BLG336E 2022
149 pages
LCS Algorithm in Dynamic Programming
No ratings yet
LCS Algorithm in Dynamic Programming
3 pages
B306 DAA Lab Manual Exp 7
No ratings yet
B306 DAA Lab Manual Exp 7
8 pages
Longest Common Subsequence Algorithm
No ratings yet
Longest Common Subsequence Algorithm
18 pages
Daa Unit4 DP
No ratings yet
Daa Unit4 DP
21 pages
Largest Common Subsequence
No ratings yet
Largest Common Subsequence
46 pages
Lect11 DP Lcs
No ratings yet
Lect11 DP Lcs
6 pages
Intro To Dynamic Programming
No ratings yet
Intro To Dynamic Programming
7 pages
Dynamic Programming Lecture Notes
No ratings yet
Dynamic Programming Lecture Notes
8 pages
L09 DynamicProgramming - Part03
No ratings yet
L09 DynamicProgramming - Part03
14 pages
Dynamic PRG 1
No ratings yet
Dynamic PRG 1
43 pages
Lecture13 Slides
No ratings yet
Lecture13 Slides
114 pages
5.4 Longest Common Subsequence Problem
No ratings yet
5.4 Longest Common Subsequence Problem
10 pages
4-Chapter Four - Dynamic Programming
No ratings yet
4-Chapter Four - Dynamic Programming
42 pages
Longest Common Subsequence: Given 2 Sequences, X And, Find A Common Subsequence Whose Length Is Maximum
No ratings yet
Longest Common Subsequence: Given 2 Sequences, X And, Find A Common Subsequence Whose Length Is Maximum
32 pages
Longest Common Subsequence Algorithm
No ratings yet
Longest Common Subsequence Algorithm
4 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
24 pages
Lecture - 21 - Dynamic Programming - LCS
No ratings yet
Lecture - 21 - Dynamic Programming - LCS
32 pages
B60 Exp07 Aoa
No ratings yet
B60 Exp07 Aoa
8 pages
Ewrerdf
No ratings yet
Ewrerdf
29 pages
Lec 15
No ratings yet
Lec 15
31 pages
PPT5 - Longest Common Subsequence
No ratings yet
PPT5 - Longest Common Subsequence
13 pages
11339AoA - EX-7
No ratings yet
11339AoA - EX-7
7 pages
CSE408 Longest Common Sub Sequence: Lecture # 25
No ratings yet
CSE408 Longest Common Sub Sequence: Lecture # 25
31 pages
LCS and LIS Algorithm Guide
No ratings yet
LCS and LIS Algorithm Guide
10 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
2 pages
Longest Common Subsequence Explained
No ratings yet
Longest Common Subsequence Explained
3 pages
Dynamic Programming: - Longest Common Subsequence
No ratings yet
Dynamic Programming: - Longest Common Subsequence
35 pages
10 Dynamic 1
No ratings yet
10 Dynamic 1
37 pages
Algorithm - Lecture 07 - LCS
No ratings yet
Algorithm - Lecture 07 - LCS
29 pages
DAA - Week 11 - Lecture 1 - Longest Common Subsequence
No ratings yet
DAA - Week 11 - Lecture 1 - Longest Common Subsequence
9 pages
9457lab Manual Expt No. 7 AOA - Longest Common Subsequence
No ratings yet
9457lab Manual Expt No. 7 AOA - Longest Common Subsequence
9 pages
WINSEM2024-25 BCSE204L TH VL2024250501496 2025-01-10 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE204L TH VL2024250501496 2025-01-10 Reference-Material-I
42 pages
PA Service Manual
No ratings yet
PA Service Manual
140 pages
Matlab Simulink of Three Phase Six-Pulse
No ratings yet
Matlab Simulink of Three Phase Six-Pulse
6 pages
TAY017 - EN Data Logger M380CE-385CEpdf
No ratings yet
TAY017 - EN Data Logger M380CE-385CEpdf
18 pages
Current Developments in Optical Fiber Technology
No ratings yet
Current Developments in Optical Fiber Technology
598 pages
Mobile Number Portability Rules
No ratings yet
Mobile Number Portability Rules
16 pages
I NOC
No ratings yet
I NOC
23 pages
Finance Analyst Job Description - UNDP Myanmar
No ratings yet
Finance Analyst Job Description - UNDP Myanmar
6 pages
Ottoman Period in Albanian Historiography
No ratings yet
Ottoman Period in Albanian Historiography
138 pages
Here Is A Possible Sample of Science and Technology Writing About An Earthquake
No ratings yet
Here Is A Possible Sample of Science and Technology Writing About An Earthquake
2 pages
Program To Multiply Two 16 Bit Numbers ProjectsGeek
No ratings yet
Program To Multiply Two 16 Bit Numbers ProjectsGeek
5 pages
Memory Management Simulation in OS
No ratings yet
Memory Management Simulation in OS
36 pages
Gorkha Community Certificate Application
No ratings yet
Gorkha Community Certificate Application
2 pages
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
No ratings yet
2-Parallel FIR Filter, 2-Parallel Fast FIR Filter
7 pages
Datasheet Skipapay
No ratings yet
Datasheet Skipapay
2 pages
Data-Driven Aerospace Engineering With ML
No ratings yet
Data-Driven Aerospace Engineering With ML
28 pages
Mohit Rathore
No ratings yet
Mohit Rathore
1 page
Data Analytics Career Boost
No ratings yet
Data Analytics Career Boost
31 pages
Threat Intelligence Roadmap
No ratings yet
Threat Intelligence Roadmap
3 pages
Document Formatting and Design Tips
No ratings yet
Document Formatting and Design Tips
3 pages
Result PDF
No ratings yet
Result PDF
1 page
Control and Accounting Systems Overview
No ratings yet
Control and Accounting Systems Overview
20 pages
Windows 10 Shortcuts
No ratings yet
Windows 10 Shortcuts
2 pages
C-54 Noc
No ratings yet
C-54 Noc
41 pages
Case Study: 4
No ratings yet
Case Study: 4
2 pages
Leon Couch Communication System Files
No ratings yet
Leon Couch Communication System Files
3 pages
Install GeoNode with Docker on Ubuntu
No ratings yet
Install GeoNode with Docker on Ubuntu
2 pages
Promozione Banca Dati Uk
No ratings yet
Promozione Banca Dati Uk
3 pages
Himalayan Bank Branches List
No ratings yet
Himalayan Bank Branches List
14 pages
Role of Technology and Computers in Our Life Word
No ratings yet
Role of Technology and Computers in Our Life Word
2 pages
PMP Cheat Sheet
No ratings yet
PMP Cheat Sheet
2 pages

M2-Longest Common Subsequence

Uploaded by

M2-Longest Common Subsequence

Uploaded by

Longest Common Subsequence

• In our example, neither S1 nor S2 is a substring of the

• In which bases in S3 appear in each of S1 and S2; these bases

• Formally, given a sequence X = <x1,x2,...,xm>, another

• Each subsequence of X corresponds to a subset of the indices

= <z1, z2,..., zk> be any LCS of X and Y .

1. If xm = yn , then ́zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1

2. If xm ≠ yn , then zk ≠ xm implies that Z is an LCS of Xm-1 and Y

3. If xm ≠ yn , then zk ≠ yn implies that Z is an LCS of X and Yn-1

common subsequence of X and Y of length k + 1, contradicting

the supposition that Z is a longest common subsequence of X

and Y . Thus, we must have ́ zk = xm = yn .

• Now, the prefix Zk-1 is a length (k -1) common subsequence of

• Suppose for the purpose of contradiction that there exists a

common subsequence W of Xm-1 and Yn-1 with length greater

• Then, appending xm = yn to W produces a common subsequence

of X and Y whose length is greater than k, which is a

• If there were a common subsequence W of X m-1 and Y with

length greater than k, then W would also be a common

subsequence of Xm and Y, contradicting the assumption that Z is

• (3) The proof is symmetric to (2)

• If xm = yn, we must find an LCS of Xm-1 and Yn-1

• Appending xm = yn to this LCS yields an LCS of X and Y

• If xm ≠ yn , then we must solve two subproblems: finding an LCS

and Yn-1 and of Xm-1 and Y

• But each of these subproblems has the subsubproblem of finding

an LCS of Xm-1 and Yn-1

• Many other subproblems share subsubproblems.

• either i = 0 or j = 0, one of the sequences has length 0, and so the

• We maintain two 2D tables c and b for dynamic programming

• c table maintains the length of the common sub sequence

• b table helps to construct the solution

construct an LCS for X = <x1, x2,...,xm> and Y = <y1, y2,...,yn>

• We simply begin at b[m, n] and trace through the table by

• Whenever we encounter a in entry b[i,j], it implies that x i =

yj is an element of the LCS that LCS-L ENGTH found.

• The following recursive procedure prints out an LCS of X and Y

• The initial call is PRINT -LCS(b, X, X.length, Y.length)

• Some changes can simplify the code and improve constant

• Others can yield substantial asymptotic savings in time and

• Although we save θ(mn) space by this method, the auxiliary

• This improvement works if we need only the length of an LCS;

You might also like