Kmp pattern search explained simple

14
KMP Pattern Search QUICK OVERVIEW Created by, Arjun SK arjunsk.com

Transcript of Kmp pattern search explained simple

Page 1: Kmp pattern search explained simple

KMP Pattern SearchQUICK OVERVIEW

Created by,Arjun SKarjunsk.com

Page 2: Kmp pattern search explained simple

What is Patter Searching ?

o Suppose you are reading a text document.o You want to search for a word.o You click CTRL + F and search for that word. o The word processor scans the document and shows the position

of occurrence.

What exactly happens is that, word i.e. pattern is searched inside the text document.

Page 3: Kmp pattern search explained simple

Implementation

Page 4: Kmp pattern search explained simple

Naïve ApproachThe naïve approach is to check whether the pattern matches the string at every possible position in the string.

P = Pattern (word) of length mT = Text (document) of length n

Naive string matching algorithm takes time O((n-m+1)m)

Page 5: Kmp pattern search explained simple

Basic Idea of KMPa b c d a b c a a a b c b a b

a b c d a b c d

Text

Pattern

Text

Pattern

We can find the next position for comparison, by looking at the pattern.

a b c d a b c a a a b c b a b

a b c d a b c d

Page 6: Kmp pattern search explained simple

KMP (Knuth-Morris-Prattern String Matching Algorithm)

Why KMP?Best known for linear time for exact pattern matching.

How is it implemented?

o We find patterns within the search pattern.

o When a pattern comparison partially fails, we can skip to next occurrence of prefix pattern.

o In this way, we can skip trivial comparisons.

Page 7: Kmp pattern search explained simple

Pre-processing

Let’s say we’re matching the pattern “abababca” against the text “bacbababaabcbab”.

Here’s our prefix match table : i.e. prefix-table[i]index 0 1 2 3 4 5 6 7char a b a b a b c avalue 0 0 1 2 3 4 0 1

Matching prefix i.e. aMatching prefix i.e. ab

Matching prefix i.e. abaMatching prefix i.e. abab

No matching prefix

Page 8: Kmp pattern search explained simple

Pre-processing - cont.

• partial_match_length = length of the matched pattern in a step.

• prefix-table = pre-processed prefix table

• If prefix-table[ partial_match_length ] > 1we may skip ahead

partial_match_length - prefix-table[ partial_match_length – 1 ] characters.

// Used to skip, already compared prefix match in the pattern.

Page 9: Kmp pattern search explained simple

Searchingb a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

This is a partial match length of 1The value at prefix-table[partial_match_length - 1] (or prefix-table[0]) is 0.so we don’t get to skip ahead any.

Page 10: Kmp pattern search explained simple

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Page 11: Kmp pattern search explained simple

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

In naïve approach we shift right and compare again:

Step 2

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Step 1

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Page 12: Kmp pattern search explained simple

But in KMP approach, we can directly skip Step 1

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern X X

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

This is a partial match length of 5The value at prefix-table[partial_match_length - 1] (or prefix-table[4]) is 3.

That means we get to skip ahead partial_match_length – prefix-table[partial_match_length - 1] (or 5 - table[4] = 5 - 3 = 2) characters:

We skip comparing “b”. The next comparison starts at next “ab” i.e. the prefix match.

Page 13: Kmp pattern search explained simple

In KMP we can directly skip comparing “ab”

This is a partial match length of 3The value at prefix-table[partial_match_length - 1] (or prefix-table[2]) is 1.

That means we get to skip ahead partial_match_length – prefix-table[partial_match_length - 1] (or 3 - table[2] = 3 - 1 = 2) characters:

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern X X

We skip comparing “b”. The next comparison starts at next “a” i.e. the prefix match.

Page 14: Kmp pattern search explained simple

Complexity

O(m) - It is to compute the prefix function values. O(n) - It is to compare the pattern to the text. Total of O(n + m) run time.