Hors Pool
Transcript of Hors Pool
-
8/12/2019 Hors Pool
1/16
1
Horspool Algorithm
Source : Practical fast searching in stringsR. NIGEL HORSPOOL
Advisor: Prof. R. C. T. LeeSpeaker: H. M. Chen
-
8/12/2019 Hors Pool
2/16
2
Definition of String Matching Problem
Given a pattern string P of length m and a text string Tof length n, we would like to know whether thereexists an occurrence of P in T.
Pattern
Text
-
8/12/2019 Hors Pool
3/16
3
Rule 2: Character Matching Rule
For any character X in T, find the nearest X in P which is tothe left of X in T.
T
P
x
x
-
8/12/2019 Hors Pool
4/16
4
For each position of the window, we compare its lastcharacter( ) with the last character of the pattern.
If they match, we scan the window backwardly against the
pattern until we either find the pattern or fail on a textcharacter.
Text
Pattern
Suffix search
match
-
8/12/2019 Hors Pool
5/16
5
Text
Safe shift
no in this part
Then, no matter whether there is a match or not, we shift thewindow so that the pattern matches . Note that is the lastcharacter of the previous window.
Text
Pattern
Suffix search
match
-
8/12/2019 Hors Pool
6/16
6
Preprocessing phase
HpBc tableThe value bmBc for a particular alphabet is defined as therightmost position of that character in the pattern 1.
a A C G *
HpBc[a] 1 6 2 8
Example :T : GCATCGCAGAGAGTATACAGTACG
P : GCAGAGAG7 6 5 4 3 2 1
-
8/12/2019 Hors Pool
7/16
7
Horspool (P = p1 p2p m,T =t 1t 2t n)PreprocessingForc Do d [c] m For j 1 m-1 Dod [ p j] m - j
Searching pos0 While pos n-m Do
j mWhile j > 0 And t pos+j = p j Do j j-1If j = 0 Then report an occurrence at pos+1 pos pos +d [t pos+m]
End of while
Pseudo code
-
8/12/2019 Hors Pool
8/16
8
Step1:For c Do d [c ] m
c {A C G T}d[A]=8 , d[C]=8d[G]=8 , d[T]=8
Preprocessing phase
for example :T : GCATCGCAGAGAGTATACAGTACGP : GCAGAGAG
Step2:
For j 1 m- 1 Do d [p j ] m j
d[A]=1 , d[C]=6d[G]=2 , d[T]=8
-
8/12/2019 Hors Pool
9/16
9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
GCATCGCAGAGAGTATACAGTACGGCAGAGAG pos 0 + d[t 0+7 ] , pos 0 + d[ A ], pos 1
GCATCGC AG AGAGTATACAGTACGGCAGAG AG
pos 1 + d[t 1+7 ] , pos 1 + d[ G ], pos 3
GCATCGC AGAG AGTATACAGTACGGCAG AGAG
pos 3 + d[t 3+7 ] , pos 3 + d[ G ], pos 5
A C G *1 6 2 8
pos pos +d[t pos +m ]
Example(1/3)
-
8/12/2019 Hors Pool
10/16
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
GCATCGCAGAGAG TATACAGTACGGCAGAGAG
While j > 0 And t pos+j = p j Do j j-1
If j = 0 Then report an occurrence at pos+1 pos 5 + d[t 5+7 ] , pos 5 + d[ G ], pos 7
GCATCGCAGAGAGTATACAGTACGGCAGAGAG
pos 7 + d[t 7+7 ] , pos 7 + d[ A ], pos 8
GCATCGCAGAGAGTATACAGTACGGCAGAGAG
pos 8 + d[t 8+7 ] , pos 8 + d[ T ], pos 16
A C G *1 6 2 8
Example(2/3)
-
8/12/2019 Hors Pool
11/16
11
Example(3/3)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
GCATCGCAGAGAGTATACAGTACGGCAGAGAG
pos 16 + d[t 16+7 ] , pos 16 + d[ G ], pos 18 pos > n-m // pos >23-7 jump out of while loop
A C G *1 6 2 8
-
8/12/2019 Hors Pool
12/16
-
8/12/2019 Hors Pool
13/16
13
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
AGATACGATATA TACATATA
We verify backward the window and find the occurrence. Wethen shift by re-using the last character of the window, d[A] = 2
AGATACGATATATA CATATA
We find the pattern. We shift by the last character of thenwindow, d[A] = 2. Then, pos > n-m and the search stops.
A T *2 1 5
Example(2/2)
-
8/12/2019 Hors Pool
14/16
14
Time complexity
preprocessing phase inO (m+ ) time andO () spacecomplexity.
searching phase inO (mn) time complexity.
the average number of comparisons for one text character is between 1/ and 2/(+1). ( is the number of storing characters)
-
8/12/2019 Hors Pool
15/16
15
References
AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook ofTheoretical Computer Science, Volume A, Algorithms and complexity, J. vanLeeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam.
BAEZA-YATES, R.A., RGNIER, M., 1992, Average running time of the Boyer-Moore-Horspool algorithm,Theoretical Computer Science 92(1):19-31.
BEAUQUIER, D., BERSTEL, J., CHRTIENNE, P., 1992, lmentsd'algorithmique, Chapter 10, pp 337-377, Masson, Paris.
CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computation Handbook , M.J. Atallah ed., Chapter 11, pp11-1--11-28, CRC Press Inc., Boca Raton, FL.
HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherched'un motif dans un texte, Ph. D. Thesis, University Paris 7, France.
HORSPOOL R.N. , 1980, Practical fast searching in strings,Software - Practice & Experience, 10(6):501-506.
LECROQ, T., 1995, Experimental results on string matching algorithms,Software - Practice & Experience 25(7):727-765. STEPHEN, G.A., 1994,String Searching Algorithms, World Scientific.
-
8/12/2019 Hors Pool
16/16
16
THANK YOU