Knuth-Morris-Pratt Algorithm KENT
Knuth-Morris-Pratt Algorithm KENT
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/StringM
atch/kuthMP.htm
Knuth, Morris and Pratt discovered first linear time string-matching algorithm
by following a tight analysis of the nave algorithm. Knuth-Morris-Pratt
algorithm keeps the information that nave approach wasted gathered during the
scan of the text. By avoiding this waste of information, it achieves a running
time of O(n + m), which is optimal in the worst case sense. That is, in the
worst case Knuth-Morris-Pratt algorithm we have to examine all the characters
in the text and pattern at least once.
Note that the failure function f for P, which maps j to the length of the longest
prefix of P that is a suffix of P[1 . . j], encodes repeated substrings inside the
pattern itself.
a 1 2
j
3 4 5
a b a
P[j]
c a b
0 0
f(j)
1 0 1 2
By observing the above mapping we can see that the longest prefix of pattern,
P, is "a b" which is also a suffix of pattern P.
Consider an attempt to match at position i, that is when the pattern P[0 ..m
-1] is aligned with text P[i . . i + m -1].
T: a b a c a a b a c c
P: a b a c a b
Assume that the first mismatch occurs between characters T[ i+ j] and P[j]
for 0 < j < m. In the above example, the first mismatch is T[5] = a and
P[5] = b.
Then, T[i . . i + j -1] = P[0 . . j -1] = u
That is, T[ 0 . . 4] = P[0 . . 4] = u, in the example [u = a b
a c a] and
T[i + j] P[j] i.e., T[5] P[5], In the example [T[5] = a
b = P[5]].
T: a b a c a a b a c c
P: a b a c a b
Analysis
The running time of Knuth-Morris-Pratt algorithm is proportional to the time
needed to read the characters in text and pattern. In other words, the worst-case
running time of the algorithm is O(m + n) and it requires O(m) extra space.
It is important to note that these quantities are independent of the size of the
underlying alphabet.