UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008
UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008
description
Transcript of UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008
![Page 1: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/1.jpg)
UMass Lowell Computer Science 91.503
Analysis of Algorithms Prof. Karen Daniels
Fall, 2008
UMass Lowell Computer Science 91.503
Analysis of Algorithms Prof. Karen Daniels
Fall, 2008
Tuesday, 12/2/08Tuesday, 12/2/08
String Matching AlgorithmsString Matching AlgorithmsChapter 32 Chapter 32
![Page 2: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/2.jpg)
Chapter DependenciesChapter Dependencies
Ch 32String Matching
Automata You’re responsible for material in Sections 32.1-32.4 of this chapter.
![Page 3: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/3.jpg)
String Matching AlgorithmsString Matching Algorithms
Motivation & BasicsMotivation & Basics
![Page 4: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/4.jpg)
String Matching ProblemString Matching Problem
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
MotivationsMotivations: text-editing, pattern matching in DNA sequences: text-editing, pattern matching in DNA sequences
TextText: array : array T T [1...[1...nn]] PatternPattern: array : array P P [1...[1...mm]]
Array ElementArray Element: Character from finite alphabet : Character from finite alphabet
Pattern Pattern PP occurs with shift occurs with shift ss in in TT if if PP [1... [1...mm] = ] = TT [ [ss +1...+1...s s + + mm] ] mns 0
mn
32.1
![Page 5: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/5.jpg)
String Matching AlgorithmsString Matching Algorithms
Naive AlgorithmNaive Algorithm Worst-case running time in O((Worst-case running time in O((nn--mm+1) +1) mm))
Rabin-KarpRabin-Karp Worst-case running time in O((Worst-case running time in O((nn--mm+1) +1) mm)) Better than this on average and in practiceBetter than this on average and in practice
Finite Automaton-BasedFinite Automaton-Based Worst-case running time in O(Worst-case running time in O(nn + + mm||))
Knuth-Morris-PrattKnuth-Morris-Pratt Worst-case running time in O(Worst-case running time in O(nn + + mm))
![Page 6: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/6.jpg)
Notation & TerminologyNotation & Terminology
* = set of all finite-length strings formed * = set of all finite-length strings formed using characters from alphabet using characters from alphabet
Empty string: Empty string: |x| = length of string x|x| = length of string x w is a prefix of x: w is a prefix of x: ww xx w is a suffix of x: w is a suffix of x: ww xx prefix, suffix are prefix, suffix are transitivetransitive
ab abccaab abcca
cca abccacca abcca
![Page 7: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/7.jpg)
Overlapping Suffix LemmaOverlapping Suffix Lemma
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
32.1
32.3 32.1
![Page 8: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/8.jpg)
String Matching AlgorithmsString Matching Algorithms
Naive AlgorithmNaive Algorithm
![Page 9: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/9.jpg)
Naive String MatchingNaive String Matching
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
worst-case running time is in worst-case running time is in ((((nn--mm+1)+1)mm))
32.4
![Page 10: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/10.jpg)
String Matching AlgorithmsString Matching Algorithms
Rabin-KarpRabin-Karp
![Page 11: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/11.jpg)
Rabin-Karp AlgorithmRabin-Karp Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Assume each character is digit in radix-d notation Assume each character is digit in radix-d notation (e.g. d=10)(e.g. d=10)
p = decimal value of patternp = decimal value of pattern ttss = decimal value of substring T[s+1..s+m] = decimal value of substring T[s+1..s+m] for s = 0,1...,n-mfor s = 0,1...,n-m
Strategy: Strategy: compute p in O(m) time (which is in O(n))compute p in O(m) time (which is in O(n))
compute all tcompute all tii values in total of O(n) time values in total of O(n) time
find all valid shifts s in O(n) time by comparing p with each tfind all valid shifts s in O(n) time by comparing p with each tss
Compute p in O(m) time using Horner’s rule:Compute p in O(m) time using Horner’s rule: p = P[m] + d(P[m-1] + d(P[m-2] + ... + d(P[2] + dP[1])))p = P[m] + d(P[m-1] + d(P[m-2] + ... + d(P[2] + dP[1])))
Compute tCompute t00 similarly from T[1..m] in O(m) time similarly from T[1..m] in O(m) time
Compute remaining tCompute remaining tii’s in O(n-m) time’s in O(n-m) time tts+1s+1 = d(t = d(tss - d - d m-1m-1T[s+1]) + T[s+m+1]T[s+1]) + T[s+m+1]
![Page 12: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/12.jpg)
Rabin-Karp AlgorithmRabin-Karp Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
p, tp, tss may be large, so use mod may be large, so use mod
32.5
![Page 13: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/13.jpg)
Rabin-Karp Algorithm (continued)Rabin-Karp Algorithm (continued)
p = 31415p = 31415
spuriousspurious
hithit
ts+1 = d(ts - d m-1T[s+1]) + T[s+m+1]
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
d m-1 mod q
![Page 14: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/14.jpg)
Rabin-Karp Algorithm (continued)Rabin-Karp Algorithm (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
![Page 15: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/15.jpg)
Rabin-Karp Algorithm (continued)Rabin-Karp Algorithm (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.worst-case running time is in worst-case running time is in ((n-m+1)m)((n-m+1)m)
(m) in (m) in (n)(n)
(m)(m)
(m)(m)((n-m+1)m)((n-m+1)m)
high-order digit position for m-digit window
Matching loop invariant: when line 10 executedts=T[s+1..s+m] mod q
rule out spurious hit
Try all possible shifts
d is radix. q is modulus
Preprocessing
![Page 16: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/16.jpg)
Rabin-Karp Algorithm (continued)Rabin-Karp Algorithm (continued)source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
average-case running time is in average-case running time is in (n+m)(n+m)
Assume reducing mod q is like random mapping from * to Zq
Estimate (chance that ts= p mod q) = 1/q # spurious hits is in O(n/q)
(m) in (m) in (n)(n)
(m)(m)
(m)(m)((n-m+1)m)((n-m+1)m)
high-order digit position for m-digit window
Matching loop invariant: when line 10 executedts=T[s+1..s+m] mod q
rule out spurious hit
Try all possible shifts
d is radix q is modulus
Preprocessing
Expected matching time = O(n) + O(m(v + n/q)) (v = # valid shifts)
If v is in O(1) and q >= m
set of all finite-length set of all finite-length strings formed from strings formed from
preprocessing + tpreprocessing + tss updates updates explicit matching comparisonsexplicit matching comparisons
![Page 17: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/17.jpg)
String Matching AlgorithmsString Matching Algorithms
Finite AutomataFinite Automata
![Page 18: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/18.jpg)
Finite AutomataFinite Automata
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
StrategyStrategy: Build automaton for pattern, then examine each text character once.: Build automaton for pattern, then examine each text character once.
worst-case running time is in worst-case running time is in (n) + (n) + automaton creation timeautomaton creation time
32.6
![Page 19: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/19.jpg)
Finite AutomataFinite Automata
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
![Page 20: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/20.jpg)
String-Matching AutomatonString-Matching Automaton
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Pattern = P = Pattern = P = ababacaababaca
Automaton accepts Automaton accepts strings strings ending in Pending in P
32.7
![Page 21: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/21.jpg)
String-Matching AutomatonString-Matching Automaton
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Suffix Function for P:
(x) = length of longest prefix of P that is a suffix of x
}:max{)( xPkx k
Automaton’s operational invariant
at each stepat each step: keeps track of longest pattern prefix that is a suffix of what has been read so far: keeps track of longest pattern prefix that is a suffix of what has been read so far
32.3
32.4
![Page 22: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/22.jpg)
String-Matching AutomatonString-Matching Automaton
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Simulate behavior of string-matching automaton that finds occurrences of pattern P of length m in T[1..n]
worst-case running time of worst-case running time of matchingmatching is in is in (n) (n)
assuming automaton has assuming automaton has already been createdalready been created......
![Page 23: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/23.jpg)
String-Matching Automaton (continued)String-Matching Automaton (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness of matching procedure...Correctness of matching procedure...
32.4
32.3
32.3 )()( )( aPxa x to be proved next…
![Page 24: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/24.jpg)
String-Matching Automaton (continued)String-Matching Automaton (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness of matching procedure...Correctness of matching procedure...
32.2
32.8 32.2
32.8
![Page 25: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/25.jpg)
String-Matching Automaton (continued)String-Matching Automaton (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness of matching procedure...Correctness of matching procedure...
32.3
32.9 32.3
32.9
32.2
32.1
![Page 26: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/26.jpg)
String-Matching Automaton (continued)String-Matching Automaton (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness of matching procedure...Correctness of matching procedure...
32.4
32.3
32.3 )()( )( aPxa x
![Page 27: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/27.jpg)
String-Matching Automaton (continued)String-Matching Automaton (continued)
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
worst-case running time of worst-case running time of automaton creationautomaton creation is in is in (m(m3 3 |||) |)
worst-case running time of entire string-matching strategy worst-case running time of entire string-matching strategy
is in is in (m(m |||) + |) + (n) (n)
can be improved to: can be improved to: (m(m |||) |)
pattern matching timepattern matching timeautomaton creation timeautomaton creation time
![Page 28: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/28.jpg)
String Matching AlgorithmsString Matching Algorithms
Knuth-Morris-PrattKnuth-Morris-Pratt
![Page 29: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/29.jpg)
Knuth-Morris-Pratt OverviewKnuth-Morris-Pratt Overview
Achieve Achieve (n+m)(n+m) time by shortening time by shortening automaton preprocessing time below automaton preprocessing time below (m(m |||)|)
ApproachApproach:: don’t precompute automaton’s transition functiondon’t precompute automaton’s transition function calculate enough transition data “on-the-fly”calculate enough transition data “on-the-fly” obtain data via “alphabet-independent” pattern obtain data via “alphabet-independent” pattern
preprocessingpreprocessing pattern preprocessing pattern preprocessing compares pattern against compares pattern against
shifts of itselfshifts of itself
![Page 30: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/30.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
determine how pattern matches against itself determine how pattern matches against itself
32.10
![Page 31: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/31.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Prefix function Prefix function shows how pattern matches against itself shows how pattern matches against itself
Equivalently, what is largest k < q such that PEquivalently, what is largest k < q such that Pkk P Pqq? ?
} and :max{)( qk PPqkkq
(q) is length of longest prefix of P that is a proper suffix of P(q) is length of longest prefix of P that is a proper suffix of Pqq
Example:Example:
32.5
![Page 32: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/32.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
(m) in (m) in (n)(n)
using amortized analysis
# characters matched
scan text left-to-right
next character does not match
next character matches
Is all of P matched?
Look for next match
(m+n) (m+n)
using amortized analysis
(n) (n)
![Page 33: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/33.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
Amortized Analysis
k
Potential Method
k = current state of algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
(m) (m) in in (n)(n)
initial potential value
potential decreases
Potential is never negative since (k) >= 0 for all k
potential increases by <=1 in each execution of for loop body
amortized amortized cost of loop cost of loop body is in body is in (1)(1)
(m) loop (m) loop iterationsiterations
![Page 34: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/34.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness...Correctness...
![Page 35: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/35.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness...Correctness...
32.5
32.1
32.6
32.6
![Page 36: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/36.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness...Correctness...
32.11 32.5
![Page 37: UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2008](https://reader036.fdocuments.us/reader036/viewer/2022062422/56813a25550346895da2050c/html5/thumbnails/37.jpg)
Knuth-Morris-Pratt AlgorithmKnuth-Morris-Pratt Algorithm
source: 91.503 textbook Cormen et al.source: 91.503 textbook Cormen et al.
Correctness...Correctness...
32.6
32.5
32.5
32.7
32.6