Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP)...
-
Upload
tobias-russell -
Category
Documents
-
view
255 -
download
0
Transcript of Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP)...
![Page 1: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/1.jpg)
1
Contest AlgorithmsJanuary 2016
Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp
13. String Searching
Contest Algorithms: 13. String Srch
![Page 2: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/2.jpg)
Definition: given a text string T and a search string (pattern) P, find P inside
T T: “the rain in spain stays mainly on the plain” P: “n th”
Applications: text editors, Web search engines (e.g. Google), image analysis
1. What is String Searching?
![Page 3: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/3.jpg)
Assume S is a string of size m.
A substring S[i .. j] of S is the string fragment between indexes i and j.
A prefix of S is a substring S[0 .. i] A suffix of S is a substring S[i .. m-1]
i is any index between 0 and m-1
String Concepts
"start of S"
"end of S"
![Page 4: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/4.jpg)
Substring S[1..3] == "ndr"
All possible prefixes of S: "andrew", "andre", "andr", "and", "an”, "a"
All possible suffixes of S: "andrew", "ndrew", "drew", "rew", "ew", "w"
Examplesa n d r e w
S
0 5
![Page 5: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/5.jpg)
Check each position in the text T to see if the pattern P starts in that position
2. The Brute Force Algorithm
a n d r e wT:
r e wP:
a n d r e wT:
r e wP:
. . . .P moves 1 char at a time through T
![Page 6: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/6.jpg)
Contest Algorithms:13. String Srch 6
public static int brute(String text, String pattern) { int n = text.length(); int m = pattern.length(); int j; for(int i=0; i <= (n-m); i++) { j = 0; while ((j < m) && (text.charAt(i+j) == pattern.charAt(j)) ) j++; if (j == m) return i; // match at i } return -1; // no match } // end of brute()
Code see BruteSearch.java
![Page 7: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/7.jpg)
Contest Algorithms:13. String Srch 7
Easy to code No preprocessing needs to be done on the pattern
Usually takes O(n+m) steps – not so bad n = length of text; m = length of pattern
Worst case scenario O(nm) when searching for aaabin aaaaaaaaaaaaaaaaaaaaaaaab
Properties of Brute-force Search
![Page 8: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/8.jpg)
The Knuth-Morris-Pratt (KMP) algorithm shifts the pattern more intelligently than the brute force algorithm.
steps are bigger than just 1 character move
3. The KMP Algorithm
continued
![Page 9: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/9.jpg)
If a mismatch occurs between the text and pattern P at P[j], what is the most we can shift the pattern to avoid wasteful comparisons?
Answer: the largest prefix of P[0 .. j-1] that is a suffix of P[1 .. j-1]
![Page 10: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/10.jpg)
Example
T:
P:
jnew = 2
j = 5
i
![Page 11: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/11.jpg)
Find largest prefix (start) of:"a b a a b" ( P[0..j-1] )
which is suffix (end) of:"b a a b" ( p[1 .. j-1] )
Answer: "a b" Set j = 2 // the new j value
Whyj == 5
![Page 12: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/12.jpg)
KMP preprocesses the pattern to find matches of prefixes of the pattern with the pattern itself.
j = mismatch position in P[] k = position before the mismatch (k = j-1).
The failure function F(k) is defined as the size of the largest prefix of P[0..k] that is also a suffix of P[1..k].
KMP Failure Function
![Page 13: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/13.jpg)
P: "a b a a b a" j: 0 1 2 3 4 5
In code, F() is represented by an array, like the table.
Failure Function Example
F(k) is the size of the largest prefix.
1
3
2
4210j
100F(j)
k
F(k)
(k == j-1)
![Page 14: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/14.jpg)
F(4) means find the size of the largest prefix of P[0..4] that is also a
suffix of P[1..4]= find the size largest prefix of "abaab" that
is also a suffix of "baab"= find the size of "ab"= 2
Why is F(4) == 2?P: "abaaba"
![Page 15: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/15.jpg)
Knuth-Morris-Pratt’s algorithm modifies the brute-force algorithm.
if a mismatch occurs at P[j] (i.e. P[j] != T[i]), then k = j-1; j = F(k); // obtain the new j
Using the Failure Function
![Page 16: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/16.jpg)
int kmpMatch(String text, String pattern) { int n = text.length(); int m = pattern.length();
int fail[] = computeFail(pattern);
int i=0; int j=0; :
Code
Return index where pattern starts, or -1
see KmpSearch.java
![Page 17: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/17.jpg)
while (i < n) { if (pattern.charAt(j) == text.charAt(i)) { if (j == m - 1) return i - m + 1; // match i++; j++; } else if (j > 0) j = fail[j-1]; else i++; } return -1; // no match } // end of kmpMatch()
![Page 18: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/18.jpg)
int[] computeFail(String pattern) { int fail[] = new int[pattern.length()]; fail[0] = 0;
int m = pattern.length(); int j = 0; int i = 1; :
![Page 19: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/19.jpg)
while (i < m) { if (pattern.charAt(j) == pattern.charAt(i)) { //j+1 chars match fail[i] = j + 1; i++; j++; } else if (j > 0) // j follows matching prefix j = fail[j-1]; else { // no match fail[i] = 0; i++; } } return fail; } // end of computeFail() Similar code
to kmpMatch()
![Page 20: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/20.jpg)
Example
1
a b a c a a b a c a b a c a b a a b b
7
8
19181715
a b a c a b
1614
13
2 3 4 5 6
9
a b a c a b
a b a c a b
a b a c a b
a b a c a b
10 11 12
c
0
3
1
4210k
100F(k)
T:
P:
![Page 21: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/21.jpg)
F(4) means find the size of the largest prefix of P[0..4] that is also a suffix
of P[1..4]= find the size largest prefix of "abaca" that
is also a suffix of "baca"= find the size of "a"= 1
Why is F(4) == 1?P: "abacab"
![Page 22: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/22.jpg)
Contest Algorithms:13. String Srch 22
Time to find match is only O(n) with O(m) preprocessing time
n = length of text; m = length of the pattern
Can be modified to search for multiple patterns in a single search.
Properties of KMP
![Page 23: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/23.jpg)
KMP doesn’t work so well as the size of the alphabet increases
more chance of a mismatch (more possible mismatches) mismatches tend to occur early in the pattern, but KMP is
faster when the mismatches occur later
KMP Disadvantage
![Page 24: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/24.jpg)
The basic algorithm doesn't take into account the letter in the text that caused the mismatch.
KMP Extensions
a a ab b
a a ab b a
x
a a ab b a
T:
P:
Basic KMPdoes not do this.
![Page 25: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/25.jpg)
String search is based on a hash function applied to the pattern and substrings in the text
Look for a match by comparing the hash values, not substrings.
5. The Rabin-Karp Algorithm
![Page 26: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/26.jpg)
Contest Algorithms:13. String Srch 26
long hash(String s) { long h = 0; for (int j = 0; j < s.size(); j++) h = (R * h + key.charAt(j)) % Q; // % acts as mod return h; }
R == radix; often 10 for numeric data; 128 for ASCII, etc.
Q == a large prime number; e.g. 997
Typical hash function
hash("26535") == 613
![Page 27: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/27.jpg)
Contest Algorithms:13. String Srch 27
Hash Function explained
Tt0 t1 t2 t3 tm-2 tm-1 tm tm+1... ... ... ... ...
Pp0 p1 p2 p3 pm-2 pm-1... ... ...
pattern has m chars
hash(P)
examine m char of text at a time = Xi
hash(Xi)
![Page 28: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/28.jpg)
Contest Algorithms:13. String Srch 28
The hash function calculates: hash(Xi) = ( to*Rm-1 + t1*Rm-2 + t3*Rm-1 + ... tm-2*R + tm-1 ) mod Q
![Page 29: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/29.jpg)
Contest Algorithms:13. String Srch 29
T = "31415926535" and P = "26" R = 10; Q = 11 hash("ab") = (a*10 + b) mod 11
Example
13 14 95 62 35 5T
62P hash(P) == hash("26") == 26 mod 11 = 4
![Page 30: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/30.jpg)
Iterate through the Text
13 14 95 62 35 5
13 14 95 62 35 5
14 mod 11 = 3 not equal to 4
31 mod 11 = 9 not equal to 4
13 14 95 62 35 5
41 mod 11 = 8 not equal to 4
![Page 31: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/31.jpg)
13 14 95 62 35 5
15 mod 11 = 4 equal to 4 -> wrong match
13 14 95 62 35 5
59 mod 11 = 4 equal to 4 -> wrong match
13 14 95 62 35 5
92 mod 11 = 4 equal to 4 -> wrong match
13 14 95 62 35 5
26 mod 11 = 4 equal to 4 -> correct match
![Page 32: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/32.jpg)
Contest Algorithms:13. String Srch 32
The hash() function uses modulo Q, so the range of results is 0 to Q-1.
If Q is small then it is likely that two different strings will hash to the same result
probability is 1/Q
Solution is to make Q very big, which reduces the chance of a wrong match. (e.g. Q = 232-1 == 4.3 billion)
Also double-check the match using string operations
Why Wrong Matches?
![Page 33: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/33.jpg)
This is an example of a Monte Carlo algorithm it's fast but may output an incorrect answer with a small
probability (1/Q)
The "double-checking" approach is known as a Las Vegas algorithm
it can be slow
Gambling Names
![Page 34: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/34.jpg)
Contest Algorithms:13. String Srch 34
After the hash() of the first substring of T, there is no need to keep calling hash() for the 2nd substring, 3rd substring, etc.
It is possible to calculate the next hash (e.g. hash(Xi+1)) based on the current hash value (hash(Xi))
much faster (O(m) --> O(1) running time) less memory needed
Speeding up hash Calculation
![Page 35: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/35.jpg)
Contest Algorithms:13. String Srch 35
hash(Xi) = ( to*Rm-1 + t1*Rm-2 + t3*Rm-1 + ... tm-2*R + tm-1 ) mod Q
hash(Xi+1) = ( t1*Rm-1 + t2*Rm-2 + t3*Rm-1 + ... tm-1*R + tm ) mod Q
Connection between hash()s
T t0 t1 t2 t3 tm-2 tm-1 tm tm+1... ... ... ... ...
Xi
Xi+1
![Page 36: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/36.jpg)
Contest Algorithms:13. String Srch 36
Therefore: hash(Xi+1) = ( ( hash(Xi+1) - t0*Rm-1 ) mod Q )*R
+ tm mod Q ) mod Q
= ( ( hash(Xi+1) + ( t0*Qm-1 - t0*Rm-1 )) mod Q )*R
+ tm mod Q ) mod Q
= ( ( hash(Xi+1) + t0( Q - (Rm-1 mod Q) ) )*R
+ tm ) mod Q
old front value
new end value
include so mod value is positive
a constant,which can be pre-calculated
![Page 37: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/37.jpg)
Using:
Modulo Properties
![Page 38: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/38.jpg)
We move through the text left-to-right, one character at a time, building up the hash for an m-character substring from preceding hash values.
Creating the Hash
![Page 39: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/39.jpg)
P: "26535" R = 10, Q = 997
Hash of the Pattern
the hash value for the pattern
![Page 40: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/40.jpg)
T: "3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3" M = 5, R = 10, Q = 997
Hashing the Text Substrings
In the code RM = Rm-1 mod Q
The hash values forthe M-char substrings
![Page 41: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/41.jpg)
Contest Algorithms:13. String Srch 41
public static void main(String[] args) { if (args.length != 2) { System.out.println("Usage: java RabinKarp <text> <pattern>"); return; }
RabinKarp searcher = new RabinKarp(args[1]); int pos = searcher.search(args[0]); showPos(args[0], args[1], pos); } // end of main()
Code see RabinKarp.java
![Page 42: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/42.jpg)
Contest Algorithms:13. String Srch 42
public class RabinKarp{ private static final int R = 256; // radix
private String pat; // the pattern; needs to be global for LV checking private long patHash; // pattern hash value
private int M; // pattern length private long Q; // a large prime, small enough to avoid long overflow private long RM; // == R^(M-1) % Q
public RabinKarp(String pat) { this.pat = pat; // save pattern (needed only for Las Vegas) M = pat.length(); Q = longRandomPrime();
// precompute R^(M-1) % Q for use in removing leading digit RM = 1; for (int i = 1; i <= M - 1; i++) RM = (R * RM) % Q; patHash = hash(pat, M); } // end of RabinKarp() :
![Page 43: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/43.jpg)
Contest Algorithms:13. String Srch 43
private static long longRandomPrime() // a random 31-bit probable prime { BigInteger prime = new BigInteger(31, 20, new Random()); return prime.longValue(); }
private long hash(String key, int M) // Compute hash for key[0..M-1]. { long h = 0; for (int j = 0; j < M; j++) h = (R * h + key.charAt(j)) % Q; return h; } // end of hash()
![Page 44: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/44.jpg)
Contest Algorithms:13. String Srch 44
public int search(String txt) { int N = txt.length(); if (N < M) return -1; long txtHash = hash(txt, M);
// hash match found at offset 0, so double-check if ((patHash == txtHash) && check(txt, 0)) return 0;
// iterate through the text for (int i = M; i < N; i++) { // Calculate new hash by removing leading digit, add trailing digit txtHash = (txtHash + Q - RM * txt.charAt(i - M) % Q) % Q; txtHash = (txtHash * R + txt.charAt(i)) % Q;
// found a hash match, so double-check int offset = i - M + 1; if ((patHash == txtHash) && check(txt, offset)) return offset; } return -1; // no match found } // end of search()
![Page 45: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/45.jpg)
Contest Algorithms:13. String Srch 45
private boolean check(String txt, int i) // Las Vegas version: does pat[] match txt[i..i-M+1] ? { for (int j = 0; j < M; j++) if (pat.charAt(j) != txt.charAt(i + j)) return false; return true; } // end of check()
![Page 46: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/46.jpg)
Contest Algorithms:13. String Srch 46
Has a poor worst-case running time (O(nm)), and so KMP is probably better for string searching.
KMP's hashing technique allows the search algorithm to be used on other things than text
e.g. image, audio, video search
Rabin-Karp can be easily modified to do fast multiple pattern search.
check whether the hash of a string in the text belongs to a set of hash values of patterns
Properties of Rabin-Karp
![Page 47: Contest Algorithms January 2016 Three types of string search: brute force, Knuth-Morris-Pratt (KMP) and Rabin-Karp 13. String Searching 1Contest Algorithms:](https://reader033.fdocuments.us/reader033/viewer/2022061518/5697bff61a28abf838cbddfe/html5/thumbnails/47.jpg)
Contest Algorithms:13. String Srch 47
Algorithm Preprocessing timem = pat len.
Matching time (average, worst)
n = text len;
Brute force 0 (no preprocessing) O(n+m), O(nm)
Knuth-Morris-Pratt O(m) O(n)
Rabin-Karp O(m) O(n+m), O(nm)
6. Summary
35 algorithms with C code at http://www-igm.univ-mlv.fr/~lecroq/string/