Function MatchingFunction Matching
Amihood Amir
Yonatan Aumann
Moshe Lewenstein
Ely Porat
Bar Ilan University
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
Baker’s Parameterized MatchingBaker’s Parameterized Matching
Prog.c
int a,b;
a=1;a = g(a)*5+f(a);b=2;a = func(a,b);a = a*g(b);b=1;b = g(b)*5+f(b);….
Baker’s Parameterized MatchingBaker’s Parameterized Matching
c=1;c = g(c)*5+f(c);
Pattern
Baker’s work
pdup dupstat psearch
SICOMP 1997 JCSS 1996
Two dimensional parameterized matchingTwo dimensional parameterized matching
pattern
‘A horse is a horse,it ain’t make a differencewhat color it is’ John Wayne
Input P = p1…pm over alphabet T = t1 . . . tn over alphabet
Output: locations i of T, for which a bijection : exists s.t.
(P) = (p1) (p2)… (pm) = ti…ti+m-1
T
P
TPΠ
Π Π Π Π
Parameterized MatchingParameterized Matching
Parameterized MatchingParameterized Matching
• One dimensional
• Baker 1996, JCSS - Suffix Trees
• Baker 1997, SICOMP - Boyer Moore
• Amir, Farach, Muthu 1995, IPL - Knuth-Morris-Pratt
• Two dimensional
Regular methods fail !!
Function MatchingFunction Matching
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet
Output: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
T
P
TP
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabetT
P
P = h e h a e hT = a b c b a c b a d a b d a d d a d
Function MatchingFunction Matching
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T
P
P = h e h a e hT = a b c b a c b a d a b d a d d a d
f(h) = bf(e) = cf(a) = a
Function MatchingFunction Matching
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T
P
P = h e h a e hT = a b c b a c b a d a b d a d d a d
f(h) = af(e) = df(a) = b
Function MatchingFunction Matching
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T
P
P = h e h a e hT = a b c b a c b a d a b d a d d a d
f(h) = df(e) = af(a) = d
Function MatchingFunction Matching
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet T
P
P = h e h a e hT = a b c b a c b a d a b d a d d a d
f(h) = ??no match !
Function MatchingFunction Matching
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1
Function Matching vs. Parameterized MatchingFunction Matching vs. Parameterized Matching
P p-matches ti…ti+m-1 iff
1. P f-matches ti…ti+m-1
and 2. # of symbols in ti…ti+m-1 = # of symbols in P
P = h e h a e h h e h a e hT = a b c b a c b a d a b d a d d a d
f(h) = df(e) = af(a) = d
f(h) = bf(e) = cf(a) = a
Naïve AlgorithmNaïve Algorithm
At each location i of text T check if pattern f-matches
CheckFor each letter ‘a’ in pattern Are elements aligned with the pattern ‘a’s the same? no? declare ‘no match’ All letters “OK” – declare ‘match’
Running time: O(nm), where m = |P| and n = |T|
Function Matching with Don’t CaresFunction Matching with Don’t Cares
Input: P = p1…pm over alphabet {?} T = t1 . . . tn over alphabet T
P
P = h e ? ? e hT = a b c b a c b c d b c d a d d a d
TPOutput: locations i of T, where f: exists s.t.
f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1,
f(?) - wildcard
Why do we need don’t cares?Why do we need don’t cares?
Pattern
Text
Linearize Text and PatternLinearize Text and Pattern
Text
Pattern
…Line 1 Line 2
T =
Linearize Text and PatternLinearize Text and Pattern
Text
Pattern
…Line 5 Line 6
T= … P = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Line 1 Line 2
n
n
m
m
n-m n-m
t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1
p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn
p3t1 p3t2 p3t3 p3t3 . . . p3tn-1 p3tn
pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn
. . .. . .
..
Polynomial Multiplication - ConvolutionsPolynomial Multiplication - Convolutions
. . .. . .
Running time: O(n log m)
t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1
p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn
p3t1 p3t2 p3t3 p3t4 . . . p3tn-1 p3tn
pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn
. . .. . .
..
Convolutions: Fischer-Patterson [1974]Convolutions: Fischer-Patterson [1974]
p1 p2 p3 p4 . . . pm
m
iiitp
1
. . .. . .
t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1
p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn
p3t1 p3t2 p3t3 p3t4 . . . p3tn-1 p3tn
pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn
. . .. . .
..
p1 p2 p3 p4 . . . pm
m
iiitp
11
. . .. . .
Convolutions: Fischer-Patterson [1974]Convolutions: Fischer-Patterson [1974]
How does this help for Function Matching?How does this help for Function Matching?
beneath each symbol from the pattern alphabet all text characters must be the same
The property that needs to be checked is:
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example -
h in P vs.a in T
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example -
Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR
h = 0 0 1 0 0 1 0 1
h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR
h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example -
h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR
h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example - h e h a e h ? e
h - a
0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1
=> in O(n log m) time!!
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example -
Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1PR
h = 0 0 1 0 0 1 0 1
h - a 1 0 2 0 2 1 0 3 0 1 2 0
=> in O(| | n log m) time!!
h - b 0 3 0 1 1 1 1 0 1 0 1 0
h - c 2 0 1 2 0 1 1 0 1 0 0 0
h - d 0 0 0 0 0 0 1 0 1 2 0 3
T
0 1 0 0 0 0 0 1 0 0 0 1Match(h)
T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? e PR = e ? h e a h e h
Example -Example -
In general - the AlgorithmIn general - the Algorithm
• For each character ‘a’ in create Pa
• For each character ‘b’ in create Tb
• For all Pa and Tb multiply them and construct Match(a) for each ‘a’ in
• Announce each location i of T as a ‘match’ if Match(a)[i] = 1 for all a’s in P
=> in O(| || | n log m) time.T P
T
P
P
Improvement Improvement
Lemma: Let a1, ..., ak , then
k iff
for all i,j, ai = aj
Ν
k
1h
2h
k
1h h
2 )a(a
Idea: Let’s encode text with numbers for symbols
and encode pattern to compute their sum
and separately their sum of squares.
Improvement Improvement
Lemma: Let a1, ..., ak , then
k iff
for all i,j, ai = aj
Ν
T# = 1 2 3 2 13 2 1 3 1 2 4 1 4 4 1 4 5 1T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? ePe = 0 1 0 0 1 0 0 1
Example: Compute sum of text char’s beneath “e”
k
1h
2h
k
1h h
2 )a(a
Improvement Improvement
Lemma: Let a1, ..., ak , then
k iff
for all i,j, ai = aj
Ν
T#2= 1 4 9 4 1 9 4 1 9 1 4 16 1 16 16 1 16 25 1
T# = 1 2 3 2 1 3 2 1 3 1 2 4 1 4 4 1 4 5 1T = a b c b a c b a c a b d a d d a d e aP = h e h a e h ? ePe = 0 1 0 0 1 0 0 1
Example: Compute sum of squares beneath “e”
k
1h
2h
k
1h h
2 )a(a
Improvement Improvement
Lemma: Let a1, ..., ak , then
k iff
for all i,j, ai = aj
Ν
k
1h
2h
k
1h h
2 )a(a
Running Time:
Two convolutions for each pattern character.
O(| | n log m)P
Can we do better for big alphabets?
We have seen – 2 algorithms for Function Matching
1. O(nm) - naïve algorithm
2. O(| | n log m) - convolution basedP
We will see:
1. O(n log2m) - randomized convolutions based2. Lower bound of (nm) for deterministic
convolutions based methodsΩ
Def:Def: A pattern is 2-charactered if every character appears at most twice in the pattern.
Example:Example: P = a b c b c c b b P1 = a1 b1 c1 b1 c1 c2 b2 b2 (even pairs) P2 = a1 b1 c1 b2 c2 c2 b2 b3 (odd pairs)
Lemma: Lemma: Let P be a pattern and T a text. 2-charactered patterns P1 and P2 s.t. at loc. i of T P f-matches iff P1 and P2 f-match.
Situation:Situation: An algorithm for Function Matching with 2-charactered patterns a general algorithm for Function Matching.
So,all that needs to be checked is that: each pair in P has equal text symbols beneath it.each pair in P has equal text symbols beneath it.
1.1. For each character:For each character: - a in T, randomly choose ra in {0, 1} - relace all a’s in T with ra - get T’
- b in P, randomly choose sb in {1,2} - set first b to be sb and the second b to be -sb - get P’
2. Convolve T’ and P’R
3. For each location i, for which T’*P’R[i] equals 0 for the convolutiondeclare a ‘match’
New Randomized AlgorithmNew Randomized Algorithm
Example:Example:
P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a
f(a) =f(b) =f(c) =f(d) =
1001
g(v) =g(q) =g(u) =
268
f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1
g(P) = 2 6 –2 8 –6 –8 0 0
2+0–2+8+0–8+0+0 = 0
h(v) = ah(q) = bh(u) = ah(s) = a
Example:Example:
P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a
f(a) =f(b) =f(c) =f(d) =
1001
g(v) =g(q) =g(u) =
268
f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1
g(P) = 2 6 –2 8 –6 –8 0 0
0+6–2+0-6+0+0+0 = -2
Example:Example:
P = v q v u q u ? sT = a b a a b a b a c a b d a b c b d b a
f(a) =f(b) =f(c) =f(d) =
1001
g(v) =g(q) =g(u) =
268
f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1
g(P) = 2 6 –2 8 –6 –8 0 0
0= 2+6+0+0+0-8+0+0
Running Time: Running Time: O(nk log m) with probability 2-k
O(n log2m) with probability 1/m
if P f-matches at location i of T then f(T)*g(P)R [i+m-1] is trivially always equal to 0
if P does not f-match at location i of T then for each convolution <f,g>, f(T)*g(P)R [i+m-1],equals 0 with probability ½with k rounds of amplification the probability is (½)k
Correctness:Correctness:
Limitation of the Convolutions ModelLimitation of the Convolutions Model
Can we do the same deterministically? No!
To show this we use the model of communication complexity
Alice Bob
xf(x,y)
y
Limitation of the Convolutions ModelLimitation of the Convolutions Model
Known:Known: for x,y in {0,1}k the communication complexity of equals(x,y) is (k)
Take pattern P = a1 a2 a3 … am a1 a2 a3 … am, where i j ai aj
Given a collection of convolutions {<g(P), f(T)>}the convolutions of location i, (g(P)*f(t))[i+m-1] = g(aj )*f(ti+j-1) + g(aj )*f(ti+j+m-1). Since we arein essence comparing ti…ti+m-1 to ti+m…ti+2m-1
we get the equal information from the convolution.This is lower bounded by (m) for each location,In general (nm)
ΩΩ
m
j 1
m
j 1
Another Application for Function MatchingAnother Application for Function Matching
Protein Folding detection:
1 2 3 4 5 6
78910
789
10
1 2 3
P = 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 11 12 … 12 11 3 2 1
QuestionsQuestions
1. Can Function Matching be solved deterministicallyin o(nm) time for big alphabets?
2. Are there special cases of Function Matching thatare easier (other than Parameterized Matching andother trivial ones)?
3. Does 2-dimensional Parameterized Matching needto be solved with function matching?
Top Related