0% found this document useful (0 votes)
425 views4 pages

Knuth-Morris-Pratt Algorithm KENT

The Knuth-Morris-Pratt algorithm is a linear-time string matching algorithm that improves upon the naive algorithm. It preprocesses the pattern to compute a failure function that allows it to avoid re-examining characters when a mismatch occurs. This failure function encodes repeated substrings in the pattern to indicate the maximum shift possible after a mismatch. By utilizing information from previous comparisons, the Knuth-Morris-Pratt algorithm runs in O(n+m) time, which is optimal in the worst case where all characters must be examined.

Uploaded by

Grama Silviu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
425 views4 pages

Knuth-Morris-Pratt Algorithm KENT

The Knuth-Morris-Pratt algorithm is a linear-time string matching algorithm that improves upon the naive algorithm. It preprocesses the pattern to compute a failure function that allows it to avoid re-examining characters when a mismatch occurs. This failure function encodes repeated substrings in the pattern to indicate the maximum shift possible after a mismatch. By utilizing information from previous comparisons, the Knuth-Morris-Pratt algorithm runs in O(n+m) time, which is optimal in the worst case where all characters must be examined.

Uploaded by

Grama Silviu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Knuth-Morris-Pratt Algorithm

http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/StringM
atch/kuthMP.htm

Knuth, Morris and Pratt discovered first linear time string-matching algorithm
by following a tight analysis of the nave algorithm. Knuth-Morris-Pratt
algorithm keeps the information that nave approach wasted gathered during the
scan of the text. By avoiding this waste of information, it achieves a running
time of O(n + m), which is optimal in the worst case sense. That is, in the
worst case Knuth-Morris-Pratt algorithm we have to examine all the characters
in the text and pattern at least once.

The Failure Function


The KMP algorithm preprocess the pattern P by computing a failure function f
that indicates the largest possible shift s using previously performed
comparisons. Specifically, the failure function f(j) is defined as the length of
the longest prefix of P that is a suffix of P[i . . j].

KNUTH-MORRIS-PRATT FAILURE (P)


Input: Pattern with m characters
Output: Failure function f for P[i . . j]
i1
j0
f(0) 0
while i < m do
if P[j] = P[i]
f(i) j +1
i i +1
j j + 1
else if
j f(j - 1)
else
f(i) 0
i i +1

Note that the failure function f for P, which maps j to the length of the longest
prefix of P that is a suffix of P[1 . . j], encodes repeated substrings inside the
pattern itself.

As an example, consider the pattern P = a b a c a b. The failure function,


f(j), using above algorithm is

a 1 2
j
3 4 5
a b a
P[j]
c a b
0 0
f(j)
1 0 1 2

By observing the above mapping we can see that the longest prefix of pattern,
P, is "a b" which is also a suffix of pattern P.
Consider an attempt to match at position i, that is when the pattern P[0 ..m
-1] is aligned with text P[i . . i + m -1].
T: a b a c a a b a c c
P: a b a c a b

Assume that the first mismatch occurs between characters T[ i+ j] and P[j]
for 0 < j < m. In the above example, the first mismatch is T[5] = a and
P[5] = b.
Then, T[i . . i + j -1] = P[0 . . j -1] = u
That is, T[ 0 . . 4] = P[0 . . 4] = u, in the example [u = a b
a c a] and
T[i + j] P[j] i.e., T[5] P[5], In the example [T[5] = a
b = P[5]].

When shifting, it is reasonable to expect that a prefix v of the pattern matches


some suffix of the portion u of the text. In our example, u = a b a c a and v
= a b a c a, therefore, 'a' a prefix of v matches with 'a' a suffix of u. Let l(j)
be the length of the longest string P[0 . . j -1] of pattern that matches with
text followed by a character c different from P[j]. Then after a shift, the
comparisons can resume between characters T[i + j] and P[l(j)], i.e., T(5)
and P(1)

T: a b a c a a b a c c
P: a b a c a b

Note that no comparison between T[4] and P[1] needed here.


KNUTH-MORRIS-PRATT (T, P)
Input: Strings T[0 . . n] and P[0 . . m]
Output: Starting index of substring of T matching P
f compute failure function of Pattern P
i0
j0
while i < length[T] do
if j m-1 then
return i- m+1 // we have a match
i i +1
j j +1
else if j > 0
j f(j -1)
else
i i +1

Analysis
The running time of Knuth-Morris-Pratt algorithm is proportional to the time
needed to read the characters in text and pattern. In other words, the worst-case
running time of the algorithm is O(m + n) and it requires O(m) extra space.
It is important to note that these quantities are independent of the size of the
underlying alphabet.

You might also like