0% found this document useful (0 votes)
12 views

String Naive and KMP

The document outlines a C program for string manipulation that reads a main string, a pattern string, and a replacement string, performing pattern matching to replace occurrences of the pattern in the main string. It describes the algorithm and implementation details, including a naive method and the KMP (Knuth-Morris-Pratt) algorithm for efficient pattern searching. Additionally, it provides code snippets for both the naive approach and the KMP algorithm, detailing how to compute the longest prefix suffix (lps) array for improved matching efficiency.

Uploaded by

colabpython39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

String Naive and KMP

The document outlines a C program for string manipulation that reads a main string, a pattern string, and a replacement string, performing pattern matching to replace occurrences of the pattern in the main string. It describes the algorithm and implementation details, including a naive method and the KMP (Knuth-Morris-Pratt) algorithm for efficient pattern searching. Additionally, it provides code snippets for both the naive approach and the KMP algorithm, detailing how to compute the longest prefix suffix (lps) array for improved matching efficiency.

Uploaded by

colabpython39
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

EXPERIMENT - 02

Design, Develop and Implement a program in C for the following


operations on Strings
• Read a Main String (STR), a Pattern String (PAT) and a Replace String
(REP).
• Perform Pattern Matching Operation: Find and Replace all occurrences
of PAT in STR with REP if PAT exists in STR. Repost suitable messages
in case PAT does not exist in STR.
Support the program with functions for each of the above operations.
Don’t use built-in functions.
ALGORITHM:
Step 1: Start.
Step 2: Read main string STR, pattern string PAT and replace string REP.
Step 3: Search / find the pattern string PAT in the main string STR.
Step 4: If PAT is found then replace all occurrences of PAT in main string
STR with REP string.
Step 5: If PAT is not found give a suitable error message.
Step 6: Stop.
#include<stdio.h>
//Declarations
char str[100], pat[50], rep[50], ans[100];
int i, j, c, m, k, flag=0;
void stringmatch() {
i = m = c = j = 0;
while(str[c] ! = '\0') {
if(str[m] = = pat[i]) {//matching
i++;
m++;
if(pat[i] = = '\0') { //found occurrences.
flag = 1;
//copy replace string in ans string.
for(k = 0; rep[k] != '\0'; k++, j++)
ans[j] = rep[k];
i = 0; c = m;
}
} // if ends.
else { //... mismatch
ans[j] = str[c];
j++; c++; m=c; i=0;
}//elseends
} //end of while
ans[j] = '\0';
} //end stringmatch()
int main() {
printf("\nEnter a main string \n"); gets(str);
printf("\nEnter a pattern string \n"); gets(pat);
printf("\nEnter a replace string \n"); gets(rep);
stringmatch();
if(flag = = 1)
printf("\nThe resultant string is\n %s" , ans);
else
printf("\nPattern string NOT found\n");
return 0;
} // end of main
Naïve Method:
str = a b c d e f g h
Pat = def

aaaaaaaaab
aaab

Generating lps:
Pat= a b c d a b c
Prf = a, ab, abc, abcd
Suf=c, bc, abc, dabac
lps=abc

p1=a b c d a b e a b f
0 0 0 0 1 2 0 1 2 0

p2=a b c d e a b f a b c
0 0 0 0 0 1 2 0 1 2 3
p3=a a a a b a a c d
0 1 2 3 0 1 2 0 0

Str: a b a b c a b a b a b d
Pat: a b a b d
KMP (Knuth Morris Pratt) Pattern Searching: The Naive pattern-
searching algorithm doesn’t work well in cases where we see many
matching characters followed by a mismatching character.
1) txt[] = “AAAAAAAAAAAAAAAAAB”, pat[] = “AAAAB”
2) txt[] = “ABABABCABABABCABABABC”, pat[] = “ABABAC”
(A worst case for Naive).
The KMP matching algorithm uses degenerating property (pattern
having the same sub-patterns appearing more than once in the pattern) of
the pattern and improves the worst-case complexity to O(n+m). The
basic idea behind KMP’s algorithm is: whenever we detect a mismatch
(after some matches), we already know some of the characters in the text
of the next window. We take advantage of this information to avoid
matching the characters that we know will anyway match.

Preprocessing Overview: KMP algorithm preprocesses pat[] and


constructs an auxiliary lps[] of size m (same as the size of the pattern)
which is used to skip characters while matching.
• Name lps indicates the longest proper prefix which is also a suffix. A
proper prefix is a prefix with a whole string not allowed. For example,
prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”,
“A” and “AB”. Suffixes of the string are “”, “C”, “BC”, and “ABC”.
• We search for lps in subpatterns. More clearly we focus on sub-strings
of patterns that are both prefix and suffix.
• For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores the length
of the maximum matching proper prefix which is also a suffix of the sub-
pattern pat[0..i].
• lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of
pat[0..i].
Note: lps[i] could also be defined as the longest prefix which is also a
proper suffix. We need to use it properly in one place to make sure that
the whole substring is not considered.
For the pattern “AAAA”, lps[] is [0, 1, 2, 3]
For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4,
5]
For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]

In the preprocessing part,


 We calculate values in lps[]. To do that, we keep track of the length of
the longest prefix suffix value (we use len variable for this purpose) for
the previous index
 We initialize lps[0] and len as 0.
 If pat[len] and pat[i] match, we increment len by 1 and assign the
incremented value to lps[i].
 If pat[i] and pat[len] do not match and len is not 0, we update len to
lps[len-1].
How to use lps[] to decide the next positions (or to know the number
of characters to be skipped)?
 We start the comparison of pat[j] with j = 0 with characters of the
current window of text.
 We keep matching characters txt[i] and pat[j] and keep incrementing i
and j while pat[j] and txt[i] keep matching.

When we see a mismatch


 We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that
j starts with 0 and increments it only when there is a match).
 We also know (from the above definition) that lps[j-1] is the count of
characters of pat[0…j-1] that are both proper prefix and suffix.
 From the above two points, we can conclude that we do not need to
match these lps[j-1] characters with txt[i-j…i-1] because we know that
these characters will anyway match.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Fills lps[] for given pattern pat
void computeLPSArray(const char* pat, int M, int* lps) {
int len = 0; // Length of the previous longest prefix suffix
lps[0] = 0; // lps[0] is always 0
int i = 1; // Loop calculates lps[i] for i = 1 to M-1
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt and returns an array of occurrences
int* KMPSearch(const char* pat, const char* txt, int* count) {
int M = strlen(pat);
int N = strlen(txt);
// Create lps[] that will hold the longest prefix suffix values for pattern
int* lps = (int*)malloc(M * sizeof(int));
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int* result = (int*)malloc(N * sizeof(int));
// Number of occurrences found
*count = 0;
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
// Record the occurrence (1-based index)
result[*count] = i - j + 1;
(*count)++;
j = lps[j - 1];
}
else if (i < N && pat[j] != txt[i]) {
if (j != 0) {
j = lps[j - 1];
}
else {
i = i + 1;
}
}
}
free(lps);
return result;
}
// Driver code
int main() {
const char txt[] = "geeksforgeeks";
const char pat[] = "geeks";
int count;
// Call KMPSearch and get the array of occurrences
int* result = KMPSearch(pat, txt, &count);
// Print all the occurrences (1-based indices)
for (int i = 0; i < count; i++) {
printf("%d ", result[i]);
printf("\n");
// Free the allocated memory
free(result);
return 0;
}

You might also like