TreeScaper: Software to Visualize and Extract Phylogenetic Signals from Sets of Trees
Transcript of TreeScaper: Software to Visualize and Extract Phylogenetic Signals from Sets of Trees
TreeScaper: Software to visualize and extractphylogenetic signals from sets of trees
Guifang Zhou 1, Wen Huang 3, Melissa Marchand 4, Jeremy Ash 2, David Morris 1,Pual Van Dooren 3, Jim C. Wilgenbusch 5, Jeremy M. Brown 1, Kyle A. Gallivan 4
1Department of Biological Sciences, Louisiana State University2Bioinformatics Research Center, North Carolina State University
3ICTEAM Institute, Université catholique de Louvain4Department of Mathematics, Florida State University
5Minnesota Supercomputing Institute, University of Minnesota
June 21, 2016
June 21, 2016
Motivations
Phylogenetic analyses often produce large sets ofcompeting trees
Summarize interesting evolutionary history:HybridizationRecombinationHorizontal Gene TransferIncomplete Lineage Sorting
Identify Systematic Error
June 21, 2016
Shortcomings of Current Approaches
Consensus treeDiscards information concerning competing trees
Dimensionality ReductionMay be difficult to interpret
June 21, 2016
Shortcomings of Current Approaches
ClusteringBased on pairwise tree to tree distanceOnly consider nonnegative links
June 21, 2016
Our Approaches
Apply graph-based methods to understand relationship among:
Tree topologies Bipartitions within treetopologies
June 21, 2016
TreeScaper (Version 1)
NLDR
Optimization Algorithm
Linear iterationMajorizationGauss-NewtonStochastic gradientdescentMCMC simulatedannealing
Cost functions
Kruskal-1 stressNormalized stressSammon stressCurvilinear componentsanalysis
Dimension Estimator
Nearest neighbor estimator
Correlation dimension
Maximum likelihood estimatorVisualization
June 21, 2016
TreeScaper (Version 2)
NLDRDimensionality estimationNew input data typesDistance/Affinity matrix
Robinson-Foulds (Unweighted/Weighted)MatchingSubtree Prune and Regraft
Covariance matrixCommunity Detection methods
Configuration Null ModelConstant Potts ModelErdos-Renyi Null ModelNo Null Model
Interactive visualization interface
June 21, 2016
Application
Yeast dataset with 5 species, 106 loci106 gene trees were reconstructed using maximumparsimony
June 21, 2016
Topology-based Network Analysis
Affinity matrixReciprocal of pairwisedistances
Detect communitiesDiscovered 11 communities
Consensus trees for eachcommunity
Top 2 recovers the top 2candidate species trees 62/106
17/10611/106
4/106
3/106
2/106
2/106
2/106
· · ·
June 21, 2016
Bipartition-based Network Analysis
Covariance matrix based on presence or absence ofbipartitions in the gene trees
June 21, 2016
TreeScaper Software
Available on GitHubhttps://github.com/whuang08/TreeScaper
June 21, 2016
TreeScaper Software
Available on GitHub
https://github.com/whuang08/TreeScaper
June 21, 2016
Acknowledgements
Computing support from FSU’s Research ComputingCenter and HPC@LSUThe National Science Foundation for funding to supportsome of this work (ABI-1262476)
June 21, 2016