String Matching Algorithms Overview

This document discusses string matching algorithms. It summarizes four existing algorithms - Brute Force, Knuth-Morris-Pratt (KMP), Boyer Moore, and Rabin Karp. It also proposes three new algorithms - Enhanced Boyer Moore, Enhanced Rabin Karp, and Enhanced KMP. The performance of these algorithms is evaluated based on search time, number of iterations, and accuracy when searching for patterns in text files. Experimental results found that the enhanced KMP algorithm provided the best accuracy compared to other string matching algorithms. These algorithms have applications in text mining, document classification, plagiarism detection, and other areas involving pattern matching in text.

Uploaded by

bishal sarma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views5 pages

String Matching Algorithms Overview

Uploaded by

bishal sarma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323988995

String Matching Algorithms

Article in International Journal Of Engineering And Computer Science · March 2018

DOI: 10.18535/ijecs/v7i3.19

CITATIONS READS

6 1,602

3 authors, including:

Preeti Narooka
Terna Engineering College
3 PUBLICATIONS 12 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Use of Ensemble & Hybrid Classifiers for Intrusion Detection Systems View project

All content following this page was uploaded by Preeti Narooka on 04 February 2020.

The user has requested enhancement of the downloaded file.

www.ijecs.in
International Journal Of Engineering And Computer Science ISSN:2319-7242
Volume 7 Issue 3 March 2018, Page No. 23769-23772
Index Copernicus Value (2015): 58.10, 76.25 (2016) DOI: 10.18535/ijecs/v7i3.19

String Matching Algorithms

Mukku Bhagya Sri, Rachita Bhavsar, Preeti Narooka
Computer Department
Terna engineering college, Nerul
Computer Department
Terna Engineering college, nerul
Assistant professor
Computer Department
Terna Engineering college, Nerul

Abstract:
To analyze the content of the documents, the various pattern matching algorithms are used to find all the
occurrences of a limited set of patterns within an input text or input document. In order to perform this task,
this research work used four existing string matching algorithms; they are Brute Force algorithm, Knuth-
Morris-Pratt algorithm (KMP), Boyer Moore algorithm and Rabin Karp algorithm. This work also proposes
three new string matching algorithms. They are Enhanced Boyer Moore algorithm, Enhanced Rabin Karp
algorithm and Enhanced Knuth-Morris-Pratt algorithm.
Findings: For experimentation, this work has used two types of documents, i.e. .txt and .docx. Performance
measures used are search time, number of iterations and accuracy. From the experimental results, it is
realized that the enhanced KMP algorithm gives better accuracy compared to other string matching
algorithms. Application/Improvements: Normally, these algorithms are used in the field of text mining,
document classification, content analysis and plagiarism detection. In future, these algorithms have to be
enhanced to improve their performance and the various types of documents will be used for
experimentation.
Keywords: Brute Force, Boyer Moore, Information shift s in text T (or equivalently that the pattern P
Retrieval, Knuth-Morris-Pratt, Pattern Matching, occurs beginning at position s+1 in text T) if
Rabin Karp 0<=s<=n-m and T[s+1….s+m]=P[1..m]. If P occurs
with shift s in T then we
I. Introduction
calls a valid shift otherwise we call s an invalid
String searching algorithms, sometimes called shift. The string matching algorithm is the problem
string matching algorithms, are an important class of finding all valid shift with which a pattern P
of string algorithms that try to find a place where occurs in given text.
one or several strings (also called patterns) are
found within a larger string or text. Let Σ be an Large number of algorithms is known to exist to
alphabet (finite set). Formally, both the pattern and solve string matching problem. Based on the
searched text are vectors of elements of Σ. The Σ number of patterns searched for the algorithms can
may be a usual human alphabet (for example, the be classified as single pattern and multiple pattern
letters A through Z in the Latin alphabet). Other algorithms. Applications may require exact or
applications may use binary alphabet (Σ = {0,1}) or approximate string matching.
DNA alphabet (Σ = A,C,G,T}) in Exact String Matching Problem
bioinformatics.[11] We assume that the text is an We are given a text string pattern string we want to
array T[1..n] of length n and that the pattern is an find all occurrences of P in T. In Exact string
array of length[1..m] of length m and that m<=n. matching problem the pattern is exactly found
The character arrays T and P are often called strings inside the text. Consider the following example:
of characters. We say that pattern P occurs with T=AGCCTAAGCTCCTAAGTC
Mukku Bhagya Sri, IJECS Volume 7 Issue 3 March 2018 Page No. 23769-23772 Page 23769
P=CCTA
There are two occurrences of P in T as shown
below:
AGCCTAAGCTCCTAAGTC
A brute force method for exact string matching
algorithm:
T=ACCACTAGA
P=ACTA
ACTA
ACTA
ACTA
If the brute force method is used, many characters
which had been matched will be matched again
because each time a mismatch occurs, the pattern is
moved only one step. There are many exact string
matching algorithms. Nearly all of them are
concerned with how to slide the pattern. Few of
them are listed below. Fig: Methodology

Existing Algorithms:
II.Metholodgy: 1 Rabin- Karp Algorithm
The main goal of this research work is to match the Rabin-Karp Algorithm is the simplest string
patterns of text by analyzing the contents of the searching algorithm. This algorithm was developed
documents using string matching algorithms. In by Michael O. Rabin and Richard M. Karp in 1987.
order to perform this task, this research work uses This algorithm uses the hash function to discover
four existing string matching algorithms; they are the potential pattern in the input text. For the length
Brute Force algorithm, Knuth-Morris-Pratt of text n and pattern p of mutual length m, its
algorithm (KMP), Boyer Moore algorithm and average and best case running time is O (n+m) in
Rabin Karp algorithm. This work also proposes space O (p), and also the worst-case time is O (nm)
three new string matching algorithms. They are in space O (m). It is used to discover the hash value
Enhanced Boyer Moore algorithm, Enhanced Rabin of the certain pattern substring and then it discovers
Karp algorithm and Enhanced Knuth-Morris-Pratt the hash value of all possible m length substring of
algorithm. The performance factors are used time the input text. If the hash value of the pattern and
taken for searching the pattern, number of iterations text substring match than it returns the value
required and its accuracy for single word search, otherwise next substring value is matched to
multiple words search and a file search. But in this calculate the string of length m.
research work we study in detail about , Knuth- Algorithm: Rabin-Karp
Morris-Pratt algorithm (KMP) and Rabin Karp RABIN-KARP-MATCHER(T,P,d,q)
algorithm. 1 N=T.length
2 M=P.length
3 h=dm-1mod q
4 p=0
5 t0=0
6 for i=1 to m
7 p =(dp+P[i])mod q
8 t0=(dt0+t[i])mod q
9 for s = 0 to n-m
10 if p == ts
11 if p[1..m] == T[s+1...s+m]
12 Print”Pattern occcurs with shift ”s
13 If s<n-m
14 ts+1 =(d(ts – T[s+1]h)+T[s+m+1]) mod q
The procedure works as follows. All characters are
interpreted as radix-d digits. The subscript on t are

Mukku Bhagya Sri, IJECS Volume 7 Issue 3 March 2018 Page No. 23769-23772 Page 23770
provided only for clarity; the program works This searching algorithm that uses the hashing
correctly if all the subsripts are dropped. Line 3 function to find any one of a set of pattern in input
intializes h to the value of the high-order dogit text. Hashing offers a simple method to avoid a total
podition of an m-digit window. Line 4-8compute p number of character comparisons. For length of text
as the value of the of P[1...m] mod q and t0 as the N and the pattern P of combined length M, its best
value of T[1...m]modq. The for loop of lines 9-14 case running time is O (N+M). And the worst case
iterates thrugh all possible shifts s, maintaining the time is O (NM). First the algorithm used to find the
following invariant. hash value of the pattern. Then it checks the input
text along with its hash value. If mismatch occurs,
Knuth-Morris-Pratt Algorithm shift the window to the next character then calculate
The Knuth–Morris–Pratt were developed a linear the hash value and the same process will continue.
time string searching algorithm by analysis of the Otherwise it returns the index position of the
brute force algorithm or naïve algorithm. particular character.
The algorithm was developed in 1974 by Donald Algorithm: Enhanced Rabin Karp Algorithm
Knuth and Vaughan Pratt, and independently 1 Functrion relation(S,P,n,m,k,q)
by James H. Morris and they published it jointly in 2 Begin
1977.The Knuth-Morris-Pratt algorithm moderates 3 h – Km-1 mod q;
the total number of comparisons of the pattern 4 p – 0;
against the input string. A matching time of O(n) is 5 t0 – 0;
accomplished by evading associations with 6 for i=1 to m do
essentials of „S‟ that have earlier been 7 P –(K, p+ p[i])modq;
1. The prefix function, Π The prefix function, Π for 8 T0 –(K, t0+s[i])modq;
a pattern summarizes the knowledge regarding 9 End for
however the pattern matches in contradiction of 10 For j=0 to n-m do
shifts of itself. This information may be accustomed a) If p-tj then
avoid unusable shifts of the pattern “p”. In other i) If p=s[j+1,j+m] then
words, this succeeds avoiding backtracking on the Out j+1;
string “S”. ii) End if
2. The KMP Matcher With string “S”, pattern “p” b) End if
and prefix function “Π” as inputs, the prevalence of
11 If j<n-m then
“p” in “S” is found and the algorithm yields the
12 Tj-1 =(K(tj-s[j+1]).h)+s[j+m+1])mod q;
variety of shifts of “p” after which the existence is
13 End for
found.
14 End.
3. Running - time analysis: The period of time for
computing the prefix function is Θ (m) and period
Enhanced Knuth-Morris-Pratt Algorithm
of time of matching function is Θ (n).
Knuth-Morris-Pratt algorithm is one of the efficient
Algorithm:Knuth-Morris-
string matching algorithms. This algorithm
Pratt
examines for existences of a pattern p within a main
1 n = T.length
text t by using the reflection that while matching,
2 m=P.length
the mismatch occurs, the word itself represents
3 3.14 = Computer-Prefix-function(p)
satisfactory information to regulate where the next
4 q=0
match can begin, thus avoiding the re examination
5 for i = 1to n
of formerly matched characters. The KMP
6 while q>0 and P[q+1]=/ T[i]
algorithm uses a bit table to discover the mismatch
7 q = 3.14[q]
of the pattern in an input text. This algorithm
8 if P[q+1]== T[i]
performs the comparison from left to right. It uses
9 q = q+1
the bit table for the comparison, if match it returns
10 if q == m
the index of the text. Otherwise it checks the next
11 print”Pattern occurs with shift” i-m
bit.
12 q=3.14[q]
Algorithm: Enhanced Knuth-Morris-Pratt
Enhanced Algorithms:
Algorithm
1 KMP_search(E(p),E(T))
Enhanced Rabin Karp Algorithm

Mukku Bhagya Sri, IJECS Volume 7 Issue 3 March 2018 Page No. 23769-23772 Page 23771
2 Begin character in the alphabet. If a mismatch occurs on
3 Preprocess E(p)to obtain the next_bit table character in the text, the failure function table for
4 While (not emd of input)do character is consulted for the index in the pattern at
a) Get next bit b; which the mismatch took place. This will return the
b) If (j>=0)&(b!=E(p)[j])do length of the longest substring ending at matching a
c) End if prefix of the pattern, with the added condition that
d) If (j=|E(p)|) the character after the prefix is With this restriction,
i) Return a match character in the text need not be checked again in
ii) J--1 the next phase, and so only a constant number of
e) End if operations are executed between the processing of
f) J-j+1 each index of the text. This satisfies the real-time
g) End while computing restriction.
5 End.
III.Conclusion:
Variants: This research work analyzes the performance
Robin-Karp Algorithm measures of existing and enhanced string matching
algorithms. The performance factors are time,
A. Long patterns and Σ For long patterns and Σ, number of iteration and its accuracy for single line,
Boyer-Moore algorithm gives much better multiple lines and a file. From the analysis, in
efficiency compared to other string matching existing the KMP algorithm gives the better
algorithms. The program involves two heuristics accuracy for all the inputs. In enhanced algorithms,
that allows the program to skip many text characters the enhanced KMP algorithm gives the better
altogether. The algorithm makes successive accuracy. Form the existing and enhanced KMP
comparisons from right to left. When a mismatch algorithms; the enhanced KMP algorithm gives the
occurs, both heuristics proposes a value (maximum better accuracy.
of which is chosen) by which shift is increased
without skipping any valid shift. IV. Acknowledgement
We feel privileged to express our deepest sense of
B. Repetition Factors An efficient algorithm for gratitude. To our guide Profs.Preeti mam. Her
string matching based on repetition factors was prompt and kind help led to completion of work.
developed by Galil and Seiferas. The algorithm has
linear running time complexity and requires only References
O(1) storage beyond P and T. [1] Verma A, Kaur I, Singh I. Comparative
analysis of data mining tools and techniques
C. Approximate String Matching The Bitap for information retrieval. Indian Journal of
algorithm performs approximate string matching Science and Technology.
based on Levenshtein distance between strings. The
algorithm requires much lesser preprocessing and
can uses mostly bitwise operations, making the [2] Al-Mazroi A, Rashid NA. A Fast Hybrid
algorithm extremely fast. Algorithm for the Exact String Matching
Problem. American Journal of Engineering
D. Dictionary Matching Aho-Corasick algorithm and Applied Sciences. 2011.
can perform multiple (but finite) pattern matching in
a text in parallel achieving linear running time. [3] Algorithm book by Cormen.

E. Polymorphic String Matching Combination of

more than one string matching algorithm (example
KMP and Boyer-Moore fusion) can be used to
provide a better functional algorithm with decreased
space and

Knuth-Morris-Pratt
A real-time version of KMP can be implemented
using a separate failure function table for each

Mukku Bhagya Sri, IJECS Volume 7 Issue 3 March 2018 Page No. 23769-23772 Page 23772

View publication stats

Unit II
No ratings yet
Unit II
94 pages
String-Matching Algorithms Review
No ratings yet
String-Matching Algorithms Review
7 pages
Parallel Rabin-Karp for Plagiarism Detection
No ratings yet
Parallel Rabin-Karp for Plagiarism Detection
16 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
4 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
63 pages
String Matching
No ratings yet
String Matching
30 pages
Pattern Matching Algorithms Overview
No ratings yet
Pattern Matching Algorithms Overview
63 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
46 pages
Efficient String Matching Algorithms
No ratings yet
Efficient String Matching Algorithms
9 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
52 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
50 pages
Brute-Force Pattern Matching Algorithm
No ratings yet
Brute-Force Pattern Matching Algorithm
21 pages
DAA Unit 5 Part 1
No ratings yet
DAA Unit 5 Part 1
27 pages
String Matching Kmprabin Karp and Naive
No ratings yet
String Matching Kmprabin Karp and Naive
41 pages
String Matching Algorithms Explained
100% (1)
String Matching Algorithms Explained
27 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
M3-String Matching
No ratings yet
M3-String Matching
74 pages
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
18 pages
New PPT Daa2
No ratings yet
New PPT Daa2
12 pages
CH 8
No ratings yet
CH 8
26 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
21 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
49 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Knuth-Morris-Pratt String Search Algorithm
No ratings yet
Knuth-Morris-Pratt String Search Algorithm
12 pages
Naive and Rabin Karp
No ratings yet
Naive and Rabin Karp
47 pages
String Matching Techniques Explained
No ratings yet
String Matching Techniques Explained
5 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
42 pages
Unit-V String Matching Algorithms
No ratings yet
Unit-V String Matching Algorithms
53 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
33 pages
Naive Pattern Searching Explained
No ratings yet
Naive Pattern Searching Explained
23 pages
Adv Data Structure Chapter - 6
No ratings yet
Adv Data Structure Chapter - 6
15 pages
Overview of Pattern Matching Algorithms
No ratings yet
Overview of Pattern Matching Algorithms
35 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
35 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
43 pages
String Search Algorithm
No ratings yet
String Search Algorithm
6 pages
16 String Matching - Naive String Algorithm
100% (1)
16 String Matching - Naive String Algorithm
9 pages
Adsa
No ratings yet
Adsa
9 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
23 pages
Ads Unit5
No ratings yet
Ads Unit5
26 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
19 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
String Algorithms and Pattern Matching
No ratings yet
String Algorithms and Pattern Matching
23 pages
Exact String Matching Survey
No ratings yet
Exact String Matching Survey
25 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
57 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
100% (1)
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
17 pages
Unit-V DS Pattern Matching and Tries
No ratings yet
Unit-V DS Pattern Matching and Tries
26 pages
Improved Rabin-Karp String Matching Algorithm
No ratings yet
Improved Rabin-Karp String Matching Algorithm
4 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
33 pages
17 StringMatching
No ratings yet
17 StringMatching
18 pages
Adobe Scan Nov 24, 2023
No ratings yet
Adobe Scan Nov 24, 2023
5 pages
Java Cheatsheet - CodeWithHarry
No ratings yet
Java Cheatsheet - CodeWithHarry
17 pages
JavaScript Cheatsheet - CodeWithHarry
No ratings yet
JavaScript Cheatsheet - CodeWithHarry
13 pages
MySQL Cheatsheet - CodeWithHarry
100% (1)
MySQL Cheatsheet - CodeWithHarry
13 pages
Idea Plagiarism Detection System
No ratings yet
Idea Plagiarism Detection System
38 pages
Zoho Interview Questions
No ratings yet
Zoho Interview Questions
5 pages
Unit-2 Array and Function in PHP
No ratings yet
Unit-2 Array and Function in PHP
58 pages
GDHDHD
No ratings yet
GDHDHD
2 pages
Python Programming Unit 1 Overview
No ratings yet
Python Programming Unit 1 Overview
18 pages
Unit 3
No ratings yet
Unit 3
22 pages
Informatica Scenario Q&A Guide
No ratings yet
Informatica Scenario Q&A Guide
11 pages
CYK Algorithm For String Parsing
No ratings yet
CYK Algorithm For String Parsing
3 pages
Cutviewer Mill User Guide V3
No ratings yet
Cutviewer Mill User Guide V3
19 pages
Drawing and Working With Animation: By: Mitul Patel
No ratings yet
Drawing and Working With Animation: By: Mitul Patel
43 pages
C Program Unit3
No ratings yet
C Program Unit3
21 pages
UniFi Identity API Reference
No ratings yet
UniFi Identity API Reference
63 pages
TOC Solutions Adi
No ratings yet
TOC Solutions Adi
58 pages
Civil - I I BEX BCT and BCE All PDF
No ratings yet
Civil - I I BEX BCT and BCE All PDF
162 pages
CSE 331 Microprocessor Assignment
100% (1)
CSE 331 Microprocessor Assignment
5 pages
Spectrum Shadow ROMDisassembly
No ratings yet
Spectrum Shadow ROMDisassembly
149 pages
Complex Data Type in R
No ratings yet
Complex Data Type in R
8 pages
Common Lisp String Types
No ratings yet
Common Lisp String Types
8 pages
How To Merge Multiple PDF Forms Into Single One and Write in Application Serve1
No ratings yet
How To Merge Multiple PDF Forms Into Single One and Write in Application Serve1
5 pages
Java Input/Output and Printing Guide
No ratings yet
Java Input/Output and Printing Guide
10 pages
Reverse Words in a String Case Study
No ratings yet
Reverse Words in a String Case Study
7 pages
Mastering Regular Expressions in Programming
No ratings yet
Mastering Regular Expressions in Programming
13 pages
Java Scanner Class Overview and Methods
No ratings yet
Java Scanner Class Overview and Methods
6 pages
VB Variables & Calculations Guide
0% (1)
VB Variables & Calculations Guide
24 pages
Problem Solving in C Programming
No ratings yet
Problem Solving in C Programming
13 pages
Cambrige Class 6
100% (1)
Cambrige Class 6
26 pages
NIST SP 800-56Cr2
No ratings yet
NIST SP 800-56Cr2
41 pages
Java Data Structures and Programming Guide
No ratings yet
Java Data Structures and Programming Guide
4 pages
Essential Android Widgets Guide
0% (1)
Essential Android Widgets Guide
58 pages
C# Encrypt
100% (1)
C# Encrypt
23 pages
Java Basics Quiz Review
No ratings yet
Java Basics Quiz Review
2 pages

String Matching Algorithms Overview

Uploaded by

String Matching Algorithms Overview

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

String Matching Algorithms

Article in International Journal Of Engineering And Computer Science · March 2018

The user has requested enhancement of the downloaded file.

String Matching Algorithms

E. Polymorphic String Matching Combination of

View publication stats

You might also like