0% found this document useful (0 votes)
32 views

CSE408 Longest Common Sub Sequence: Lecture # 25

The document discusses the longest common subsequence (LCS) problem and its dynamic programming solution. It begins by introducing LCS and describing how it can be used to compare DNA strings. It then explains that the naive brute force algorithm has exponential time complexity, but the problem exhibits optimal substructure allowing for a dynamic programming solution. The document proceeds to present the recursive definition of the LCS problem and develop the dynamic programming algorithm to compute the length of the LCS in quadratic time.

Uploaded by

avinash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

CSE408 Longest Common Sub Sequence: Lecture # 25

The document discusses the longest common subsequence (LCS) problem and its dynamic programming solution. It begins by introducing LCS and describing how it can be used to compare DNA strings. It then explains that the naive brute force algorithm has exponential time complexity, but the problem exhibits optimal substructure allowing for a dynamic programming solution. The document proceeds to present the recursive definition of the LCS problem and develop the dynamic programming algorithm to compute the length of the LCS in quadratic time.

Uploaded by

avinash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

CSE408

Longest Common Sub


Sequence
Lecture # 25
Dynamic programming

• It is used, when the solution can be


recursively described in terms of solutions
to subproblems (optimal substructure)
• Algorithm finds solutions to subproblems
and stores them in memory for later use
• More efficient than “brute-force methods”,
which solve the same subproblems over
and over again
2
Longest Common Subsequence (LCS)

Application: comparison of two DNA strings


Ex: X= {A B C B D A B }, Y= {B D C A B A}
Longest Common Subsequence:
X= AB C BDAB
Y= B D CAB A
Brute force algorithm would compare each
subsequence of X with the symbols in Y
7/23/2018 3
LCS Algorithm
• if |X| = m, |Y| = n, then there are 2m
subsequences of x; we must compare each
with Y (n comparisons)
• So the running time of the brute-force
algorithm is O(n 2m)
• Notice that the LCS problem has optimal
substructure: solutions of subproblems are
parts of the final solution.
• Subproblems: “find LCS of pairs of prefixes
of X and Y”
7/23/2018 4
LCS Algorithm
• First we’ll find the length of LCS. Later we’ll
modify the algorithm to find LCS itself.
• Define Xi, Yj to be the prefixes of X and Y of
length i and j respectively
• Define c[i,j] to be the length of LCS of Xi and
Yj
• Then the length of LCS of X and Y will be
c[m,n]
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
7/23/2018 5
LCS recursive solution
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
• We start with i = j = 0 (empty substrings of x
and y)
• Since X0 and Y0 are empty strings, their LCS
is always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is
empty, so for every i and j: c[0, j] = c[i,0] = 0
7/23/2018 6
LCS recursive solution
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise
• When we calculate c[i,j], we consider two
cases:
• First case: x[i]=y[j]: one more symbol in
strings X and Y matches, so the length of LCS
Xi and Yj equals to the length of LCS of
smaller strings Xi-1 and Yi-1 , plus 1
7/23/2018 7
LCS recursive solution
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise

• Second case: x[i] != y[j]


• As symbols don’t match, our solution is not
improved, and the length of LCS(Xi , Yj) is the
same as before (i.e. maximum of LCS(Xi, Yj-1)
and LCS(Xi-1,Yj)

Why not just take the length of LCS(Xi-1, Yj-1) ?


7/23/2018 8
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c
7/23/2018 9
LCS Example
We’ll see how LCS algorithm works on the
following example:
• X = ABCB
• Y = BDCAB

What is the Longest Common Subsequence


of X and Y?

LCS(X, Y) = BCB
X=AB C B
Y= BDCAB 10
ABCB
LCS Example (0) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi

A
1

2 B

3 C

4 B

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
11
ABCB
LCS Example (1) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0

2 B
0

3 C 0

4 B 0

for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
12
ABCB
LCS Example (2) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 13
ABCB
LCS Example (3) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 14
ABCB
LCS Example (4) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 15
ABCB
LCS Example (5) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 16
ABCB
LCS Example (6) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 17
ABCB
LCS Example (7) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 18
ABCB
LCS Example (8) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 19
ABCB
LCS Example (10) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 20
ABCB
LCS Example (11) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 21
ABCB
LCS Example (12) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 22
ABCB
LCS Example (13) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 23
ABCB
LCS Example (14) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 24
ABCB
LCS Example (15) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

7/23/2018 25
LCS Algorithm Running Time

• LCS algorithm calculates the values of each


entry of the array c[m,n]
• So what is the running time?

O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
7/23/2018 26
How to find actual LCS
• So far, we have just found the length of LCS,
but not LCS itself.
• We want to modify this algorithm to make it
output Longest Common Subsequence of X
and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
2 2 For example, here
2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3
27
How to find actual LCS - continued
• Remember that
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max( c[i, j  1], c[i  1, j ]) otherwise

 So we can start from c[m,n] and go backwards


 Whenever c[i,j] = c[i-1, j-1]+1, remember x[i]
(because x[i] is a part of LCS)
 When i=0 or j=0 (i.e. we reached the
beginning), output remembered letters in
reverse order
7/23/2018 28
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3

7/23/2018 29
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 30

You might also like