Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia...
Transcript of Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia...
![Page 1: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/1.jpg)
Alignment-free Sequence Analysis Methods
Hector [email protected]
PhD Student | Bioinformatics | Jordan Lab
Computational Genomics 2018 – Georgia Institute of TechnologyAtlanta, 8th February, 2017
![Page 2: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/2.jpg)
Out line
• Background• Sequence Similarity• Sequence Alignment (generalit ies, drawbacks)
• Alignment-free Methods• Classification • NGS Data Analysis
• STing: an Alignment-free Application• Sequence Typing• Multilocus Sequence Typing (MLST)• Performance (Typing, Gene Detection)
• Conclusions
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 202/08/2018
![Page 3: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/3.jpg)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Background
302/08/2018
![Page 4: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/4.jpg)
Sequence Similarity
• Knowledge derived from sequence similarity.
• Similar sequences tend to share features.
• Similarity: functional, structural and evolutionary inferences.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 402/08/2018
![Page 5: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/5.jpg)
Sequence Alignment
• Sequence Alignment is a very useful “tool”: provides a similarity measure.
• 80’s-90’s: BLAST, FASTA, MAFFT, Muscle, ClustalW, PSI-BLAST, HMMER/Pfam, Mauve, BLASTZ, TBA.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 502/08/2018
![Page 6: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/6.jpg)
Alignment-based Analysis Drawbacks
• Assumption of linearity and conservation in stretches of homologous sequences.
• Poor accuracy of alignment when sequence identity is below a crit ical point.
• Depends on multiple evolutionary assumptions about the sequences.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 602/08/2018
![Page 7: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/7.jpg)
Alignment-based Analysis Drawbacks
• Computationally expensive (RAM and processing time).
• Not ideal for NGS-era (not scalable).
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
NGS-era requires rapid and accurate analysis at a high scale (complete genomes, billions of sequences)
702/08/2018
![Page 8: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/8.jpg)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Alignment-free Methods
802/08/2018
![Page 9: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/9.jpg)
Alignment-free Sequence Analysis
“Any method that quantifies sequence similarity without producing/using alignment at any step of the algorithm application”
Zielezinski et al., 2017
Advantages:• Less computationally expensive.• Resistant to shuffling and
recombination events.• Evolutionary assumptions-free.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 902/08/2018
![Page 10: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/10.jpg)
Alignment-free Sequence Analysis
“Any method that quantifies sequence similarity without producing/using alignment at any step of the algorithm application”
Zielezinski et al., 2017
Advantages:• Less computationally expensive.• Resistant to shuffling and
recombination events.• Evolutionary assumptions-free.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 1002/08/2018
![Page 11: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/11.jpg)
Classificat ion of Alignment-free Methods
• Word frequency-based, and• Information-theory based.
• Other alignment-free methods:• Chaos game representation• Iterated maps• Graphical representation of DNA
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 1102/08/2018
![Page 12: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/12.jpg)
Word Frequency-based Methods
Depend on the amount of shared words/k-mers between sequences.
4-mer:
Three steps:• k-mer extraction and grouping.• Frequencies quantification.• Dissimilarity quantification.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Zielezinski et al., 2017
1202/08/2018
![Page 13: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/13.jpg)
Word Frequency-based Methods
Depend on the amount of shared words/k-mers between sequences.
4-mer:
Three steps:• k-mer extraction and grouping.• Frequencies quantification.• Dissimilarity quantification.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Zielezinski et al., 2017
1302/08/2018
![Page 14: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/14.jpg)
Informat ion-theory Based Methods
Depend on the amount of shared information (complexity/entropy).
Two steps:• Complexity calculation.• Dissimilarity quantification.
Alignment-free Sequence Analysis | CompGenomics2018 | Georgia Tech
Zielezinski et al., 2017
1402/08/2018
![Page 15: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/15.jpg)
Informat ion-theory Based Methods
Depend on the amount of shared information (complexity/entropy).
Two steps:• Complexity calculation.• Dissimilarity quantification.
Alignment-free Sequence Analysis | CompGenomics2018 | Georgia Tech
Zielezinski et al., 2017
1502/08/2018
![Page 16: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/16.jpg)
Alignment-free Methods in NGS Data Analysis
• Transcript identification (Kallisto, Sailfish, Salmon).
• Genomic variability profiling (FastGT, LAVA).
• Assembly: error correction (Quorum, Lighter, Trowel), overlapping (MHAP algorithm, Miniasm), and scaffolding (LINKS).
• Metagenomics: species identification/taxonomic profiling (Kraken, CLARK, MASH, stringMLST, STing, Taxonomer).
• Phylogenetics (AAF, NGS-MC, kSNP).
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 1602/08/2018
![Page 17: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/17.jpg)
Alignment-free for Research Purposes
Sequence similarity
• CAFE (desktop, GUI)• 28 distance measures.• Dissimilarity matrices.• Dendrograms, heatmaps, PCA and networks.
• Alfree (Web)• 38 distance measures.• Fully automated analysis.• Consensus phylogenetic tree.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 1702/08/2018
![Page 18: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/18.jpg)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
STing
1802/08/2018
![Page 19: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/19.jpg)
STing (Sequence Typing)
• A lightweight, alignment- and assembly-free application for the NGS era, that belongs to the group of word frequency-based methods.
• Two functionalit ies for NGS sample analysis
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Sequence Typing
Prediction of the Sequence Type (ST)
Gene Detection
Prediction of presence/absence of a
gene of interest
1902/08/2018
![Page 20: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/20.jpg)
Sequence Typing
• Identifying organisms within a species.
• Human pathogens of one species can comprise very diverse set of organisms.
• Typing technique must have a good discriminatory power.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 2002/08/2018
![Page 21: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/21.jpg)
Mult ilocus Sequence Typing (MLST)
• Pre-NGS era
• Gene-based approach (7 housekeeping)
• Extensive information available (PubMLST, MLST.net)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 2102/08/2018
![Page 22: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/22.jpg)
MLST: Computat ional Methods with NGS Data
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 2202/08/2018
![Page 23: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/23.jpg)
MLST: Computat ional Methods with NGS Data
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Alignment- and assembly-free with minimum expertise and time required.
2302/08/2018
![Page 24: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/24.jpg)
MLST: Computat ional Methods with NGS Data
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Alignment- and assembly-free with minimum expertise and time required.
2402/08/2018
![Page 25: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/25.jpg)
STing
• Addresses the shortcomings of its predecessor (stringMLST): speed and RAM consumption on larger typing schemes (rMLST, cgMLST).
• Uses Enhanced Suffix Arrays as core algorithm data structure.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Quick determination of the
membership of an input string
Search time depends on
query length, not on the DB size
2502/08/2018
![Page 26: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/26.jpg)
STing - St ructure
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 2602/08/2018
![Page 27: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/27.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 2702/08/2018
![Page 28: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/28.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Typer
2802/08/2018
![Page 29: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/29.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Detector
2902/08/2018
![Page 30: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/30.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3002/08/2018
![Page 31: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/31.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3102/08/2018
![Page 32: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/32.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3202/08/2018
![Page 33: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/33.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3302/08/2018
![Page 34: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/34.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3402/08/2018
![Page 35: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/35.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3502/08/2018
![Page 36: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/36.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3602/08/2018
![Page 37: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/37.jpg)
Algorithm Overview
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 3702/08/2018
![Page 38: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/38.jpg)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
STing: Sequence Typing
3802/08/2018
![Page 39: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/39.jpg)
STing – Typing Dataset
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Species Scheme # Locus DB Size (sequences) # Samples
C. jejuni MLST 7 4,117 10
C. trachomatis MLST 7 218 10
S. pneumoniae MLST 7 3,319 10
N. meningitidis MLST 7 5,325 1,009
N. meningitidis rMLST 53 461,054 20
N. meningitidis cgMLST 1,605 639,542 20rMLST: Ribosomal MLST; cgMLST: Core Genome MLST
3902/08/2018
![Page 40: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/40.jpg)
STing – Typing Dataset
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Species Scheme # Locus DB Size (sequences) # Samples
C. jejuni MLST 7 4,117 10
C. trachomatis MLST 7 218 10
S. pneumoniae MLST 7 3,319 10
N. meningitidis MLST 7 5,325 1,009
N. meningitidis rMLST 53 461,054 20
N. meningitidis cgMLST 1,605 639,542 20rMLST: Ribosomal MLST; cgMLST: Core Genome MLST
4002/08/2018
![Page 41: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/41.jpg)
STing – Typing Dataset
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Species Scheme # Locus DB Size (sequences) # Samples
C. jejuni MLST 7 4,117 10
C. trachomatis MLST 7 218 10
S. pneumoniae MLST 7 3,319 10
N. meningitidis MLST 7 5,325 1,009
N. meningitidis rMLST 53 461,054 20
N. meningitidis cgMLST 1,605 639,542 20rMLST: Ribosomal MLST; cgMLST: Core Genome MLST
4102/08/2018
![Page 42: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/42.jpg)
STing – Typing Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4202/08/2018
![Page 43: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/43.jpg)
STing – Typing Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4302/08/2018
![Page 44: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/44.jpg)
STing – Typing Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4402/08/2018
![Page 45: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/45.jpg)
STing – Typing Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4502/08/2018
![Page 46: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/46.jpg)
STing – Typing Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4602/08/2018
![Page 47: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/47.jpg)
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
STing: Gene Detect ion
4702/08/2018
![Page 48: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/48.jpg)
STing – Gene Detect ion Dataset
• We evaluated whether we can detect AMR genes (n=16) from the sequence reads of 12 genomesof nine species (positive samples)
• We artificially excised the AMR genes from each of the genomes to generate negative samples
• We simulated reads at 20x and 40x coverage from both positive and negative samples
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 4802/08/2018
![Page 49: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/49.jpg)
STing – Gene Detect ion Performance
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
100% accuracy
4902/08/2018
![Page 50: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/50.jpg)
STing – Other Applicat ions
• Virulence factor (VF) gene detection (e.g. Shiga toxin and hemolysin loci).
• Antimicrobial (AMR) gene detection in fungal isolates.
• Gene detection in metagenome samples.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 5002/08/2018
![Page 51: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/51.jpg)
STing – Other Applicat ions
• Virulence factor (VF) gene detection (e.g. Shiga toxin and hemolysin loci).
• Antimicrobial (AMR) gene detection in fungal isolates.
• Gene detection in metagenome samples.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech
Sung ImPhD Student (Binf)
5102/08/2018
![Page 52: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/52.jpg)
Conclusions
• Faster alternatives of analysis are necessary to face the challenges from the NGS-era.
• Although alignment-based analysis are slow, not scalable, they are irreplaceable! (e.g. annotation, ancestral DNA reconstruction, sequence evolution rate calculations).
• We applied the alignment-free paradigm for sequence typing and gene detection (accurately and efficiently).
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 5202/08/2018
![Page 53: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/53.jpg)
Conclusions
• STing algorithm scales efficiently to genome-scale typing schemes (cgMLST).
• STing performs orders of magnitude better than existing tools.
• Possible applications of STing include culture-free diagnostics as well as virulence factor and antimicrobial resistance profiling directly from NGS reads.
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 5302/08/2018
![Page 54: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/54.jpg)
LavanyaRishishwar
King Jordan
Aroon Chande
Heather Smith
Hector Espit ia
STing Team!
Jordan Lab @ Georgia Tech
![Page 55: Alignment-free Sequence Analysis Methods · Alignment-free Sequence Analysis Methods Hector Espitia hspitia@gatech.edu PhD Student | Bioinformatics | Jordan Lab. Computational Genomics](https://reader034.fdocuments.us/reader034/viewer/2022052008/601d09abe029de2dbf6c7573/html5/thumbnails/55.jpg)
References
1. Chowdhury, B., & Garai, G. (2017). A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics, 109(5–6), 419–431. https://doi.org/10.1016/j.ygeno.2017.06.007
2. Zielezinski, A., Vinga, S., Almeida, J., & Karlowski, W. M. (2017). Alignment-free sequence comparison: benefits, applications, and tools. Genome Biology, 18(1), 186. https://doi.org/10.1186/s13059-017-1319-7
3. Gupta, A., Jordan, I. K., & Rishishwar, L. (2017). stringMLST: a fast k-mer based tool for multilocussequence typing. Bioinformatics, 33(1), 119–121. https://doi.org/10.1093/bioinformatics/btw586
4. Song, K., Ren, J., Reinert, G., Deng, M., Waterman, M. S., & Sun, F. (2014). New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Briefings in Bioinformatics, 15(3), 343–353. https://doi.org/10.1093/bib/bbt067
Alignment-free Sequence Analysis | CompGenomics 2018 | Georgia Tech 5502/08/2018