SCALEDPattern Matching
Amihood Amir Ayelet Butman Bar-Ilan University Moshe
Lewenstein and
Johns Hopkins University Bar-Ilan University
Motivation
Searching for Templates in Aerial Photographs
Input Aerial photo Template
Task Search for all locations where the template appears in the image
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Motivation
Searching for Templates in Aerial Photographs
Input Aerial photo Template
Task Search for all locations where the template appears in the image
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Model
bull Low level (pixel level) avoid costly processing
bull Asymptotically efficient solutions
bull Serial exact algorithms
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Types of Approximations
Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches
O(nsup2ksup2( edit distance k errors
rectangular patterns
O(nsup2kradic(m log m) radic(k log k)
edit distance k errors
half rectangular patterns
AL-88
AF-95
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Types of ApproximationOrientation results O(nsup2m ) FU-98
O(nsup2msup3) ACL-98
Scaling Natural scales results O(n) 1-d EV-88
O(nsup2 log |Σ|) 2-d ALV-92
O(nsup2) dictionary AC-96
Real scales this result O(n) 1-d truncation
5
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
It seems daunting buthellip
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
CPM 2003 Morelia Mexico
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Problem inherently inexact
What if occurrence is 1frac12 times bigger
What is the meaning of ldquofrac12 a pixelrdquo
Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
DefinitionText Pattern
Find all occurrences of the pattern in the text in all discrete sizes
m
m
n
n
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A
A A A A A A A A A A C C A A C A
A A A C C A A A A A C C A A A A
A A A C C A A A A A A A A
A A A A A A A A A A A A A
A A A A A A A A A C C A A
A A A A A A A A A C C A A
A A A C C C A A A A A A A
A A A C C C A A A A A A A
A A A C C C A A A A C A A
A A A A A A A A A A A A A
A A A A A A A A A C C A C
A A A A A A A A A A A A A
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Discrete exact Scaled Matching
P Z U Y K V S X E T
Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Idea Fix a scale s
Constant amount of work for each square (s-block)
s
s
nns
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Algorithm time
Time for scale s
Total time
converges to a constant
Making the total time O(nsup2)
sn2
2
mn
mn
ss ssn n
1
2
122
2
1
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Problem Real scales
Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Formally
nTT ||
mPrrrP aaa j
j ||21
21
aaaa crrc jjjj
121
121
1
rcrc jj
11
Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where
appears for some
r timesr
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Remark
α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of
loss of resolution
From ldquofar enoughrdquo away everything looks the same
By our definition for klt1m there is a match at every text location
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Simplify definition
bcba4312 2
323
23
23
aaaa rrrrjj
jj
121121
Definition 2 Look for in the textExample P=aabcccbbbb
Match by definition 2 daaabccccbbbbbbe Match by definition 1
but not by def 2 daaaabccccbbbbbbbe
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Why are definitions equivalent
Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Time
Time for split O(n+m)
Finding Ps in Ts O(n+m) (eg KMP)
HARD PART Finding PL in TL
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Definitions are Equivalent
aa rrj
j
1212
Claim Solving def 2 in time O(f(n))
Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and
TLTotal time O(f(n)+n)=O(f(n))
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Naiumlve algorithm for matching PL in TL
For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair
This is the interval of possible scales since
tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)
The intersection is empty thus no scaled match in location 1 Buthellip
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Check intersectionIf intersection of all intervals is not
empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)
The intersection is [7352) thus there is a scaled match in location 2
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Improvement ndash Parameterized Matching
Introduced Baker 1994
Motivation ldquocopyingrdquo code
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Parameterized Matching
Input two strings s and t |s|=|t| over alphabets sums and sumt
s parameterize matches t if bijection sums sumt such that (s) = t
exist
(a)=x
(b)=y
Π Π
ΠΠ
a ab b b
x xy y y
Example
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Parameterized Matching
Claim (AFM-94)
For Σ that can be sorted in linear time (eg Σ=1 n)
Parameterized matching can be done in time O(n)
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
The reduction
1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i
Proof Assume PL does not p-match TL at
location i
The possible situations are
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Possibility 1wlog c ge a+1
For c = a+1 (smallest possible)
TL
PL
a
b b
cnea
b
a
b
a
b
a
b
a 211
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Possibility 2
wlog c ge b+1
Intersection not empty only if
(a+1)(b+1) gt ab ie
ab+b gt ab+a
bgta
But this can never happen if α ge 1
TL
PL
a
b cneb
a
1
11
1
b
a
b
a
b
a
b
a
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Algorithm for Real Scaled String Matching
Let Pi1 Pi2 Pij be the different numbers in PL
1 P-match PL in TL2 For each match chack intersection
of intervals between Pi1 Pij and corresponding symbols in TL
End Algorithm
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches
TL = 5 6 5 6 5 6 10 6 10 6 10 7
scaled match
Example
2133 32
21
3121 2232
3121 2255
3231
21 3333
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Important Fact
So there are at most O(radicm) different Pikrsquos
Time O(n) for parameterized matching (Σ=12
hellipn) O(radicm) verification for each location Total O(nradicm)
mi
j
kP
k
1
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Tighter analysis
Upper bound number of possible p-matches
Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL
Then there are at most n2j p-matches of PL in TL
Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is
O((n2j) middot j) = O(n)
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Proof of Lemma
1st appearance of Pi1 Pij
PL Pi1 Pi2 Pij
TL a1 a2 aj
m-match
2
2
1
ja
j
ki
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Lemmarsquos proof (cont)
Let x be the total number of p-matches in the text
The sum of all text elements that match 1st occurrences of Piklsquos in the pattern
ge (xjsup2)2
But There are overlaps How many
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Lemmarsquos proof (cont)
For each text location at most j matches will count it Thereforehellip
Total count without overlaps ge
Clearly xmiddotj2 le n thus x le (2n)j
2
1
2
2
xjxjj
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Open Problem
Give 1-d algorithm linear in run-length compressed text and pattern
Top Related