Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO,...
-
Upload
eric-mckinney -
Category
Documents
-
view
215 -
download
0
description
Transcript of Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO,...
![Page 1: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/1.jpg)
Bioinformatics PhD. Course
1. Biological introduction
Exact Extended Approximate
6. Projects: PROMO, MREPATT, …
5. Sequence assembly
2. Comparison of short sequences ( up to 10.000bps) Dot Matrix Pairwise align. Multiple align. Hash alg.
3. Comparison of large sequences ( more that 10.000bps) Data structures Suffix trees MUMs
4. String matching
![Page 2: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/2.jpg)
Comparison of large sequences
First part:
Alignment of large sequences
![Page 3: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/3.jpg)
Dynamic programming
What about genomes?
• Quadratic cost of space and time.
accaccacaccacaacgagcata … acctgagcgatat
acc..t
• Short sequences (up to 10.000 bps) can be aligned using dynamic programming
• Quadratic cost of space and time.
acc.................................agt | | |.................................|xxacc.................................a--
![Page 4: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/4.jpg)
Genomic sequences
In which case Dynamic Programming can be applied?
•The length of sequences is 1000 times longer.
• Genomic sequences have millions of base pairs.
•The running time is 1.000.000 times higher !
(1 second becomes 11 days)(1 minute becomes 2 years)
![Page 5: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/5.jpg)
First assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
……………………………………Genome B
……
……
……
……
……
….
Gen
ome
A
![Page 6: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/6.jpg)
Realistic assumption?
Unrealistic assumption!
More realistic
assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
………………………………………………………………….
………………………………………………...…………...….Genome A
Genome B
………………………
……
……G
enom
e A
Genome B
![Page 7: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/7.jpg)
Realistic assumptions?
But, now is it a
real case?
Unrealistic assumption!
More realistic
assumption
……………………………………………………………….
………………………….………………...…………...….
Genome B
Genome A
…………………………………………………………………
………………………………………………...…………...….Genome A
Genome B
………………………
……
……G
enom
e A
Genome B
![Page 8: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/8.jpg)
Preview in a real caseChlamidia muridarum: 1.084.689bps Chlamidia Thrachomatis:1057413bps
![Page 9: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/9.jpg)
Preview in a real case
Pyrococcus abyssis: 1.790.334 bpsPyrococcus horikoshu: 1.763.341 bps
![Page 10: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/10.jpg)
Methodology of an alignment
1st:
2nd:
3th: (Linear cost)
Identify the portions that can be aligned.
Make a preview: ……………………..….
…………………...….
Make the alignment:
…..…
……
………………….
(Linear cost)
![Page 11: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/11.jpg)
Methodology of an alignment
(Linear cost)
Make a preview: ……………………..….
…………………...….1st:
2nd:
3th:
Identify the portions that can be aligned.
Make the alignment:
…..…
……
………………….
?
![Page 12: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/12.jpg)
Preview-Revisited
… a a t g….c t g...
… c g t g….c c c ...
MatchingUniqueMaximal
MUMConnect to MALGEN
![Page 13: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/13.jpg)
Methodology of an alignment
1st:
2nd:
3th:
Identify the portions that can be aligned.
Make a preview: ……………………..….
…………………...….
Make the alignment:
…..…
……
………………….
How can MUMs be found?
With CLUSTALW, TCOFFEE,…
How can these portions be determined?
Linear costwith
Suffix trees
![Page 14: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/14.jpg)
Comparison of large sequences
M-GCAT
Todd Treangen
![Page 15: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/15.jpg)
Homework
1. Javier 14. Alexis2. Dmitry 15. Ramon3. Ana Iris4. David5. Patricia6. Rogeli7. Atif8. Aina9. Isaac10. Maria Merce11. Romina12. Guillem13. Raul
![Page 16: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/16.jpg)
Bioinformatics PhD. Course
Second part:
Introducing Suffix trees
![Page 17: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/17.jpg)
Suffix trees
Given string ababaas:
1: ababaas
2: babaas
3: abaas
4: baas
5: aas
6: as7: s
as,3
s,6
as,5
s,7
as,4ba
baas,2
a
babaas,1
a
babaas,1
ba
baas,2
as,3
as,4
s,6
as,5
s,7
Suffixes:
What kind of queries?
![Page 18: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/18.jpg)
Applications of Suffix trees
a
babaas,1as,3
ba
baas,2
as,4
s,6
as,5
s,7
1. Exact string matching
…………………………
• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
![Page 19: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/19.jpg)
Quadratic insertion algorithm
Given the string …………………………......
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 20: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/20.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
![Page 21: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/21.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
![Page 22: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/22.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1ababaabbs,1
![Page 23: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/23.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
abbs,3
![Page 24: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/24.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
babaabbs,2
ababaabbs,1
abbs,3
ba
baabbs,2
![Page 25: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/25.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
abbs,3
ba
baabbs,2
abbs,4
![Page 26: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/26.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
ababaabbs,1
abbs,3
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
![Page 27: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/27.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
abbs,5
![Page 28: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/28.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
abbs,3ba
a
baabbs,1
abbs,5
![Page 29: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/29.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4
ba
ba
baabbs,2
abbs,4
a abbs,5
ba abbs,3
baabbs,1
![Page 30: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/30.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
a abbs,5
ba abbs,3
baabbs,1
bs,6
![Page 31: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/31.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
abbs,4ba
baabbs,2
abbs,4
a abbs,5
ba abbs,3
baabbs,1
bs,6
![Page 32: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/32.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
ba abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
![Page 33: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/33.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
ba abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
s,8
![Page 34: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/34.jpg)
Quadratic insertion algorithm
Given the string ababaabbs
a abbs,5
ba abbs,3
baabbs,1
bs,6
a
baabbs,2
b
abbs,4
bs,7
s,7
s,9
![Page 35: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/35.jpg)
Generalizad suffix tree
The suffix tree of many strings …
and it is the suffix tree of the concatenation of strings.
the generalized suffix tree of ababaabb and aabaat …
is the suffix tree of ababaabαaabaatβ, :
is called the generalized suffix tree …
For instance,
![Page 36: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/36.jpg)
Generalizad suffix tree
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given the suffix tree of ababaabα :
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 37: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/37.jpg)
Generalizad suffix tree
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 38: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/38.jpg)
Generalizad suffix tree
Construction of the suffix tree of ababaabbαaabaaβ :
a bα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
![Page 39: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/39.jpg)
Generalizad suffix tree
Construction of the suffix tree of ababaabbαaabaaβ :
a bα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
![Page 40: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/40.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 41: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/41.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 42: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/42.jpg)
Construction of the suffix tree of ababaabbαaabaaβ :
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
a β,3
![Page 43: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/43.jpg)
Construction of the suffix tree of ababaabbαaabaaβ :
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
abaaβ,1
aβ,2
a β,3
![Page 44: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/44.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 45: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/45.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 46: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/46.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 47: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/47.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 48: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/48.jpg)
Generalizad suffix tree
a bα,5
ba bbα,3
baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
Construction of the suffix tree of ababaabbαaabaaβ :
![Page 49: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/49.jpg)
Generalizad suffix tree
a bα,5b
a bbα,3baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
Generalized suffix tree of ababaabbαaabaaβ :
![Page 50: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/50.jpg)
Applications of Suffix trees
a
babaas,1as,3
ba
baas,2
as,4
s,6
as,5
s,7
1. Exact string matching
…………………………
• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
![Page 51: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/51.jpg)
Applications of Suffix trees
2. The substring problem for a database of strings DB• Does the DB contain any ocurrence of patterns abab, aab, and ab?
a bα,5b
a bbα,3baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 52: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/52.jpg)
Applications of Suffix trees
3. The longest common substring of two strings
a bα,5b
a bbα,3baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 53: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/53.jpg)
Applications of Suffix trees
5. Finding MUMs.
a bα,5b
a bbα,3baabbα,1
bα,6
a
baabbα,2
b
bbα,4
bα,7
α,8
α,9
baaβ,1
aβ,2
a β,3
aβ,4β,5
β,6
![Page 54: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/54.jpg)
Bioinformatics PhD. Course
Third part:
Suffix links
![Page 55: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/55.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 56: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/56.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 57: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/57.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 58: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/58.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 59: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/59.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 60: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/60.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 61: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/61.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 62: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/62.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 63: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/63.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
?
![Page 64: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/64.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 65: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/65.jpg)
Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
a
![Page 66: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/66.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a
![Page 67: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/67.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a
![Page 68: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/68.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a aa in S2 [1] Unique matchings
![Page 69: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/69.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a aa in S2 [1] Unique matchings
aab in S2 [1] =
S1[5..6-7] in S2 [1]
![Page 70: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/70.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]
![Page 71: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/71.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a Unique matchings S1[5..6-7] in S2 [1]
![Page 72: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/72.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1] S1[3..6-…] in S2 [2]
![Page 73: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/73.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1] S1[3..6-…] in S2 [2]
![Page 74: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/74.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1] S1[3..6-…] in S2 [2]
![Page 75: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/75.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1] S1[3..6-…] in S2 [2]
![Page 76: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/76.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..6-7] in S2 [1] S1[3..6-8] in S2 [2]
S1[4..6-8] in S2 [3]
![Page 77: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/77.jpg)
Traversal using Suffix links
a abbα,5
ba abbα,3
baabbα,1
bα,6
a
baabbα,2
b
abbα,4
bα,7
α,8
α,9
Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4] S1[3..6-8] in S2 [2] S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]
![Page 78: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/78.jpg)
From UMs to MUMs
Given S2 = a a b a a b b a Unique matchings S1[5..8] in S2 [4] S1[3..6-8] in S2 [2] S1[4..6-8] in S2 [3] S1[6..8] in S2 [5] S1[7..8] in S2 [6]
Array of UMs123 6-84 6-85 86 87 889
and S1 = a b a b a a b b α
MUM: S1[3..6-8] in S2[2]
![Page 79: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/79.jpg)
Bioinformatics PhD. Course
Third part:
Linear insertion algorithm
![Page 80: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/80.jpg)
Quadratic insertion algorithm
Given the string …………………………......
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 81: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/81.jpg)
Linear insertion algorithm
Given the string …………………………......
P2: the string is the longest string that can be spelt through the tree.
P1: the leaves of suffixes from have been inserted
and the suffix-tree
…...
Invariant Properties:
![Page 82: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/82.jpg)
Linear insertion algorithm: example
Given the string ababaababb...
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
a
![Page 83: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/83.jpg)
Linear insertion algorithm: example
Given the string ababaababb...
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 8
![Page 84: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/84.jpg)
Linear insertion algorithm: example
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 8Given the string ababaababb...
![Page 85: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/85.jpg)
Linear insertion algorithm: example
ba
baababb...,2
a ababb...,5
ba ababb...,3
baababb...,1ababb...,4
6 7 89Given the string ababaababb...
![Page 86: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/86.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
baababb...,1ba
baababb...,2
ababb...,4
Given the string ababaababb...
6 7 89
baababb...,1b
b...,6
aababb...,1
![Page 87: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/87.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 88: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/88.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 89: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/89.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
![Page 90: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/90.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 89
b
b...,6
aababb...,1
baababb...,2b aababb...,2
![Page 91: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/91.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba
baababb...,2
ababb...,4
Given the string ababaababb...
7 8…
b
b...,6
aababb...,1
baababb...,2b
b...,7
aababb...,2
![Page 92: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/92.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 93: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/93.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 94: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/94.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 95: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/95.jpg)
Linear insertion algorithm: example
a ababb...,5
ba ababb...,3
ba ababb...,4
Given the string ababaababb...
89
b
b...,6
aababb...,1
b
b...,7
aababb...,2
![Page 96: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/96.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
89
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
![Page 97: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/97.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
89
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 98: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/98.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 99: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/99.jpg)
Linear insertion algorithm: example
a ababb...,5
b
ba ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8
![Page 100: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/100.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8a
![Page 101: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/101.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
aababb...,1
b
b...,7
aababb...,2
a
b...,8a
b...,9
![Page 102: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/102.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8a
b...,9
![Page 103: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/103.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb... 9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8a
b...,9
![Page 104: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/104.jpg)
Linear insertion algorithm: example
a ababb...,5
b
b ababb...,4
Given the string ababaababb...
9
ababb...,3
b
b...,6
ababb...,1
b
b...,7
aababb...,2
a
b...,8a
b...,9
![Page 105: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/105.jpg)
Index
Suffix arrays Suffix-arrays: a new method for on-line
string searches, G. Myers, U. Manber
![Page 106: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/106.jpg)
Suffix arrays
Given string ababaa#:
1: ababaa#2: babaa#
3: abaa#
4: baa#
5: aa#
6: a#
7: #
Suffixes: … but lexicographically sorted
1: ababaa#
2: babaa#
3: abaa#
4: baa#
5: aa#6: a#1: #1
234567
Which is the cost? O(n log(n))
![Page 107: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/107.jpg)
Applications of suffix arrays
1. Exact string matching• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
1: ababaa#
2: babaa#
3: abaa#
4: baa#
5: aa#6: a#1: #1
234567
Binary search
O(log(n) |P|)
… which is the cost?
O(log(n)+|P|) ?
Can it be improved to …
![Page 108: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/108.jpg)
Fast search with cost O(log(n)+|P|) Query:
Invariant Properties:
P1: α < query ≤ β α
β
12… …
n
Suffix array
P2: matches pref( query)
![Page 109: Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.](https://reader035.fdocuments.us/reader035/viewer/2022062906/5a4d1b197f8b9ab059992e26/html5/thumbnails/109.jpg)
Fast search with cost O(log(n)+|P|) Query:
Invariant Properties:
P1: α < query ≤ β α
β
γ Algorithm:
12… …
n
Suffix array
P2: matches pref( query)
If suff(γ)<suff(query) then α = γ else β = γ