Computing the Longest Common Prefix Array Based on the...
Transcript of Computing the Longest Common Prefix Array Based on the...
![Page 1: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/1.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Computing the Longest Common Prefix ArrayBased on the Burrows-Wheeler Transform
Timo Beller, Simon Gog, Enno Ohlebusch and ThomasSchnattinger
Institute of Theoretical Computer ScienceUlm University
![Page 2: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/2.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Suffix-Array
i SSA[i]1 1 annasanannas$2 2 nnasanannas$3 3 nasanannas$4 4 asanannas$5 5 sanannas$6 6 anannas$7 7 nannas$8 8 annas$9 9 nnas$10 10 nas$11 11 as$12 12 s$13 13 $14
![Page 3: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/3.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Suffix-Array
i SA[i] SSA[i]1 13 $2 6 anannas$3 8 annas$4 1 annasanannas$5 11 as$6 4 asanannas$7 7 nannas$8 10 nas$9 3 nasanannas$10 9 nnas$11 2 nnasanannas$12 12 s$13 5 sanannas$14
![Page 4: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/4.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Suffix-Array construction algorithms
Many algorithms, see survey paper of Puglisi et al. 2007:Time: O(n) to O(n2 log n)
Space: 5n to 18n bytes
DivSufSort of Yuta Mori 2008:Time: O(n log n)
Space: 5n bytes
InducedSort of Nong et al. 2009:Time: O(n)
Space: 5n bytes
![Page 5: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/5.jpg)
Introduction The New Algorithm Implementation Results Conclusion
BWT (Burrows–Wheeler transform)
i SA[i] SSA[i]1 13 $2 6 anannas$3 8 annas$4 1 annasanannas$5 11 as$6 4 asanannas$7 7 nannas$8 10 nas$9 3 nasanannas$10 9 nnas$11 2 nnasanannas$12 12 s$13 5 sanannas$14
![Page 6: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/6.jpg)
Introduction The New Algorithm Implementation Results Conclusion
BWT (Burrows–Wheeler transform)
i SA[i] BWT [i] SSA[i]1 13 s $2 6 s anannas$3 8 n annas$4 1 $ annasanannas$5 11 n as$6 4 n asanannas$7 7 a nannas$8 10 n nas$9 3 n nasanannas$10 9 a nnas$11 2 a nnasanannas$12 12 a s$13 5 a sanannas$14
![Page 7: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/7.jpg)
Introduction The New Algorithm Implementation Results Conclusion
BWT construction algorithms
Compute BWT from suffix array:Time: O(n)
Space: n bytes
Direct computation, e.g.:Lippert et al. 2005:
Time: O(n log n)Space: 1
2 (1 + σ)(1 + ε) bitsOkanohara and Sadakane 2009:
Time: O(n)Space: O(n logσ log(logσ n)) ≈ 2.5n bytes
![Page 8: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/8.jpg)
Introduction The New Algorithm Implementation Results Conclusion
LCP array (Longest Common Prefix array)
i SA[i] BWT [i] SSA[i]1 13 s $2 6 s anannas$3 8 n annas$4 1 $ annasanannas$5 11 n as$6 4 n asanannas$7 7 a nannas$8 10 n nas$9 3 n nasanannas$10 9 a nnas$11 2 a nnasanannas$12 12 a s$13 5 a sanannas$14
![Page 9: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/9.jpg)
Introduction The New Algorithm Implementation Results Conclusion
LCP array (Longest Common Prefix array)
i SA[i] BWT [i] LCP[i] SSA[i]1 13 s -1 $2 6 s 0 anannas$3 8 n 2 annas$4 1 $ 5 annasanannas$5 11 n 1 as$6 4 n 2 asanannas$7 7 a 0 nannas$8 10 n 2 nas$9 3 n 3 nasanannas$10 9 a 1 nnas$11 2 a 4 nnasanannas$12 12 a 0 s$13 5 a 1 sanannas$14 -1
![Page 10: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/10.jpg)
Introduction The New Algorithm Implementation Results Conclusion
LCP construction algorithms from suffix array
KLAAP-algorithm of Kasai et al. 2001:Time: O(n)
Space: 13n bytesSpace improvement by Manzini 2004: 9n bytes
Φ-algorithm of Kärkkäinen et al. 2009:Time: O(n)
Space: 5n + 4nk bytes or n + 4n
k bytes (semi-external)
go-Φ-algorithm of Gog and Ohlebusch 2010:Time: O(n)
Space: 2n bytes
![Page 11: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/11.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Overview
5n bytes 2.5n bytes
n bytes
1-2n bytes
Input: String of length n
Suffix array BWT
LCP array
![Page 12: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/12.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Task
5n bytes 2.5n bytes
n bytes
1-2n bytes ?
Input: String of length n
Suffix array BWT
LCP array
![Page 13: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/13.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Observation
Assume the string ω occurs t times in a string S:There are t suffixes of S that start with ω.These suffixes occur consecutively in the suffix array.Let j be the largest index, so that the corresponding suffixstarts with ω.LCP[j + 1] < |ω|
![Page 14: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/14.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s -1 $2 s 0 anannas$3 n 2 annas$4 $ 5 annasanannas$5 n 1 as$6 n 2 asanannas$7 a 0 nannas$8 n 2 nas$9 n 3 nasanannas$
10 a 1 nnas$11 a 4 nnasanannas$12 a 0 s$13 a 1 sanannas$14 -1
![Page 15: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/15.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s -1 $2 s 0 anannas$3 n 2 annas$4 $ 5 annasanannas$5 n 1 as$6 n 2 asanannas$7 a 0 nannas$8 n 2 nas$9 n 3 nasanannas$
10 a 1 nnas$11 a 4 nnasanannas$12 a 0 s$13 a 1 sanannas$14 -1
![Page 16: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/16.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Idea
Calculate all substrings of S, in the order of their length.Determine for each substring ω the corresponding interval[lb . . . rb].If LCP[rb + 1] wasn’t set before, set LCP[rb + 1] = |ω| − 1.
![Page 17: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/17.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Pseudocode
LCP[1]← −1LCP[i]← ⊥ ∀i : 2 ≤ i ≤ nLCP[n + 1]← −1initialize an empty queueenqueue(ε)while not all lcp values are calculated doω ← dequeue()for each a ∈ Σ do
enqueue(aω)[lb . . . rb]← getIntervalBounds(aω)if rb 6= ⊥ and LCP[rb + 1] = ⊥ then
LCP[rb + 1]← |aω| − 1
![Page 18: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/18.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Pseudocode
LCP[1]← −1LCP[i]← ⊥ ∀i : 2 ≤ i ≤ nLCP[n + 1]← −1initialize an empty queueenqueue(ε)while queue is not empty doω ← dequeue()for each a ∈ Σ do
enqueue(aω)[lb . . . rb]← getIntervalBounds(aω)if rb 6= ⊥ and LCP[rb + 1] = ⊥ then
LCP[rb + 1]← |aω| − 1enqueue(aω)
![Page 19: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/19.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-1⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 20: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/20.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-1⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 21: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/21.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-1⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 22: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/22.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 23: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/23.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 24: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/24.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 25: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/25.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥⊥-1
![Page 26: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/26.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥⊥⊥-1
![Page 27: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/27.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥⊥⊥-1
![Page 28: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/28.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥⊥⊥-1
![Page 29: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/29.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥⊥⊥-1
![Page 30: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/30.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 31: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/31.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 32: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/32.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 33: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/33.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 34: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/34.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 35: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/35.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 36: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/36.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 37: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/37.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥0⊥-1
![Page 38: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/38.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 39: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/39.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 40: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/40.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 41: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/41.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 42: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/42.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 43: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/43.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 44: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/44.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 45: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/45.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥⊥⊥01-1
![Page 46: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/46.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 47: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/47.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 48: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/48.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 49: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/49.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 50: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/50.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 51: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/51.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Example: annasanannas$
i BWT [i] LCP[i] SSA[i]1 s $2 s anannas$3 n annas$4 $ annasanannas$5 n as$6 n asanannas$7 a nannas$8 n nas$9 n nasanannas$
10 a nnas$11 a nnasanannas$12 a s$13 a sanannas$14
-10⊥⊥⊥⊥0⊥⊥1⊥01-1
![Page 52: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/52.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Saving space
Store interval boundaries [lb . . . rb] of ω not ω itself:Store two integers: 2 log n bit for each substring.Mark lb and rb in two bit vectors Blb and Brb of length n:2n bit for all substrings of same length.
Reserve only n byte for the LCP array:Use the fact, that algorithm calculates the LCP values inascending order.If new LCP value cannot be stored into the LCP array, writeLCP array to disk.
![Page 53: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/53.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Calculation of the subintervals
Problem: Given the interval [lb . . . rb] of ω, if aω, a ∈ Σ is asubstring of S, then find the interval of aω.
Modified backward search with wavelet tree of the BWT:Find all subintervals by traversing the wavelet tree in adepth-first manner.Use Huffman-shaped wavelet trees to save time andspace.Time: O(σ)
Space: n bytes
![Page 54: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/54.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Runtime and space
Time complexity: O(σ n)
Practical and space efficient implementation:
Time: O(n log n)
Space: ≈ 2.2n bytes
![Page 55: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/55.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Experimental Results
Test casesPizza&Chili CorpusSome DNA-files from www.ensembl.org (Release 62)
Implementationuses the sdsl-library of Simon Gog(www.uni-ulm.de/in/theo/research/sdsl.html)uses bit compressed arrays (i.e. log n bits, not 4 bytes or 8bytes per integer)
![Page 56: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/56.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Experimental Results
dna english proteins sources xml200MB 200MB 200MB 200MB 200MB
SA constr. 71 5 64 5 72 5 45 5 49 5BWT constr. 93 1.9 109 2.2 150 2.6 87 2.2 83 2.2KLAAP 58 9 48 9 48 9 33 9 32 9Φ1 37 9 30 9 30 9 22 9 22 9Φ4 83 6 74 6 78 6 60 6 63 6Φ64 80 5.1 76 5.1 78 5.1 64 5.1 75 5.1Φ4-Semi 78 2 72 2 72 2 59 2 63 2Φ64-Semi 76 1.1 70 1.1 70 1.1 56 1.1 73 1.1go-Φ 53 2 74 2 70 2 51 2 49 2new algorithm 66 1.8 124 2 137 2 131 2.2 99 2.1KLAAP 129 9 112 9 120 9 78 9 81 9Φ1 108 9 94 9 102 9 67 9 71 9Φ4 154 6 138 6 150 6 105 6 112 6Φ64 151 5.1 140 5.1 150 5.1 109 5.1 124 5.1Φ4-Semi 149 5 136 5 144 5 104 5 112 5Φ64-Semi 147 5 134 5 142 5 101 5 122 5go-Φ 124 5 138 5 142 5 96 5 98 5new algorithm 159 1.9 233 2.2 287 2.6 218 2.2 182 2.2
![Page 57: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/57.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Experimental Results
Stickleback Chicken Sloth Orangutan446 MB 1.050 MB 2.060 MB 3.093 MB
SA constr. 171 5 471 5 1.100 5 2.013 9BWT constr. 204 2 549 1,9 1.062 1,9 1.686 1,9KLAAP 150 9 454 9 951 9 1.527 9Φ1 98 9 318 9 756 9 1.183 9Φ4 187 6 534 6 1.236 6 - -Φ64 193 5,1 522 5,1 1.163 5,1 - -Φ4-Semi 182 2 523 2 1.183 2 1.786 2Φ64-Semi 180 1,1 454 1,1 1.064 1,1 1.648 1,1go-Φ 117 2 316 2 685 2 1.041 2new algorithm 141 1,8 338 1,8 800 1,8 1.270 1,8KLAAP 321 9 925 9 2.051 9 3.540 9Φ1 269 9 789 9 1.856 9 3.196 9Φ4 358 6 1.005 6 2.336 6 - -Φ64 364 5,1 993 5,1 2.263 5,1 - -Φ4-Semi 353 5 994 5 2.283 5 3.799 9Φ64-Semi 351 5 925 5 2.164 5 3.661 9go-Φ 288 5 787 5 1.785 5 3.054 9new algorithm 345 2 887 1,9 1.862 1,9 2.956 1,9
![Page 58: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/58.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Solution
5n bytes 2.5n bytes
n bytes
1-2n bytes 2.2n bytes!
Input: String of length n
Suffix array BWT
LCP array
![Page 59: Computing the Longest Common Prefix Array Based on the ...users.monash.edu/~gfarr/research/slides/Beller.pdf · Introduction The New Algorithm Implementation Results Conclusion BWT](https://reader035.fdocuments.us/reader035/viewer/2022063011/5fc538111a187115a4101852/html5/thumbnails/59.jpg)
Introduction The New Algorithm Implementation Results Conclusion
Thank you!Any Questions?