Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.
-
Upload
doris-goodman -
Category
Documents
-
view
218 -
download
0
Transcript of Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.
![Page 1: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/1.jpg)
Suffix Trees and Derived
ApplicationsCarl Bergenhem and Michael Smith
![Page 2: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/2.jpg)
SimpleScalar Suite
• Linux Based Cache SimulatorAllows for simulation of predefined cache environments
• Cross-compiles code for SimulationThrough Linux GCC Fortran or C code can be compiled specifically for the SimpleScalar to allow complete execution of the code and keeping statistics
![Page 3: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/3.jpg)
Sim-cache
• General sim-cache Code run through sim-cache uses the following paramaters
– Number of sets in the structure
– Block size
– Associativity
– Replacement policy
• What this lets us do Can simulate how well a program will perform on different types of CPUs in regards to
cache simulation.
![Page 4: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/4.jpg)
Idea of a Suffix Tree
•A Suffix-Tree is a data structure that creates a path from the root to a leaf for each suffix of the input string.
•Ex: A seven letter string will have seven leaves
![Page 5: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/5.jpg)
Idea of a Suffix Tree
•The internal nodes of a tree are created when the start of a suffix is the same as another suffix
•Ex: From “banana”, “anana” and “ana” both start with “ana” so they can share the same path from the root until the end where they diverge
![Page 6: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/6.jpg)
Building a Tree
•Starting from an empty root, and building the suffix tree for “banana”
•The first step...
![Page 7: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/7.jpg)
Building a Tree
![Page 8: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/8.jpg)
Building a Tree
![Page 9: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/9.jpg)
Building a Tree
![Page 10: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/10.jpg)
Building a Tree
![Page 11: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/11.jpg)
Building a Tree
![Page 12: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/12.jpg)
Building a Tree
![Page 13: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/13.jpg)
Building a Tree
![Page 14: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/14.jpg)
Recap
•As seen, it is a simple process in a number of iterations equal to the length of the input string to create the suffix tree
![Page 15: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/15.jpg)
Use
•Fast String Comparisons Can be made in a number of comparisons of a most the length of the second to be compared string.
![Page 16: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/16.jpg)
Example
![Page 17: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/17.jpg)
REPuter
•The REPuter algorithm is a genetic algorithm that uses the Suffix Tree to efficiently find maximal repeats
![Page 18: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/18.jpg)
Maximal Repeats
•A maximal repeat requires that within a string, there exists a substring that occurs at least twice and is at least of length equal to a set threshold length.
![Page 19: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/19.jpg)
Example
•With a threshold value of 2, the word “banana” has the following maximal repeats
•“ana” appears twice
•“an” appears twice
•“na” appears twice
![Page 20: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/20.jpg)
Use
•Scientists use the REPuter algorithm to find common substrings within a genome sequence that are of a certain length.
•A useful extension of this algorithm is to find similar substrings that can account for mutations in the DNA
![Page 21: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/21.jpg)
How It Works
•The REPuter algorithm uses the suffix tree structure by traversing the entire tree, and whenever it is on a node that represents a string longer than the threshold, it is a valid maximal repeat so long as that node has 2 or more children nodes
![Page 22: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/22.jpg)
Example
![Page 23: Suffix Trees and Derived Applications Carl Bergenhem and Michael Smith.](https://reader035.fdocuments.us/reader035/viewer/2022062320/56649d135503460f949e6e5b/html5/thumbnails/23.jpg)
PSP Algorithm
• Probe Selection Problem (PSP) Algorithm– Relies upon the Suffix Tree to function.
– Contains a set S of genomic sequences.
– In order to find an olignucleotide (probe) for each sequence, a suffix tree of all the sequences is used.
– Allows the probe to be identified in such a way that hybridization can occur for a specific sequence and that sequence only
– Also grants the temperature at which the hybridization can occur