Open Tree of Life Phyloseminar 2014
-
Upload
karen-cranston -
Category
Technology
-
view
614 -
download
0
description
Transcript of Open Tree of Life Phyloseminar 2014
TECHNICAL AND SOCIAL CHALLENGES IN SYNTHESIZING THE TREE OF LIFEKaren Cranston National Evolutionary Synthesis Center
@kcranstn http://slideshare.net/kcranstn
IF WE “HAD” A TREE OF LIFE?
complete = contains all of biodiversity
dynamic = continuously updated with new data
available digitally = browsing, querying, downloading
Produce a digitally-available phylogeny that contains all of biodiversity
Provide tools for managing, analyzing and sharing phylogenetic data
http://avatol.org
CHALLENGE: COMPLETENESS
Even if there were phylogenies for all species in GenBank, would only have a
small fraction of biodiversity
Soltis et al APG III phylogeny (30 taxa)
NCBI taxonomy data (578 taxa)
from Stephen Smith
Dipsicales graphSynthesized tree; contains structure of phylogeny but all 578 taxa
from Stephen Smith
Inputs: Published phylogenies
Taxonomies
• filter / weight input trees • synthesize into single data
structure
• process feedback • input new data sets
complete tree of life
CHALLENGE: ACCESS TO PUBLISHED PHYLOGENIES
* OpenTree grant proposal
“Phylogeny provides a mechanism through which to interpret the patterns and processes of evolution and to
predict the responses of life to rapid environmental change. Phylogenies and phylogenetic methods are now being used
to enhance agriculture, identify and combat diseases, conserve biodiversity, and predict responses to global
climate change and to biological invasions.” *
(tl;dr: We need trees to do cool and important science)
Expertise in phylogenetic
inference
Expertise in methods that use
phylogenies
import phytools!flyTree<-read.tree(“flies.tre”)!contMap(flyTree,flyData)
thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.
To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.
Wiegmann et al. PNAS Early Edition | 3 of 6
EVOLU
TION
TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),
((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),
(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,
((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,
((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,
(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,
(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));
Weigmann et al. PNAS, 2011
~ 4% of all published phylogenetic trees
Stoltzfus et al 2012
Archiving sequence data is a community norm
Archiving phylogenetic data is quite rare
OPENTREE PHYLOGENY INPUTS
Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms
Result: data for >2700 studies, >4800 trees
CHALLENGE: SELECTING BACKBONE TAXONOMY
Complete? Up to date with taxonomic literature? Phylogenetically-informed?
Systematics research
Online taxonomic resourcesvery slow…..
OPEN TREE TAXONOMY
+
+
+ patch files for manual edits (requires source info!)
+
• 3,133,028 nodes and 2,559,835 ‘species’
• https://github.com/OpenTreeOfLife/reference-taxonomy
CHALLENGE: PHYLOGENY CURATION
TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),
((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),
(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,
((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,
((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,
(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,
(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));
How was this tree inferred? What are the tip labels? Is it rooted correctly?
What clade was the focus of the study?
CURATOR TOOLS
Data curation
Tree synthesis
NeXSON (NeXML as JSON)
Input names Mapped to taxonomy
API layer
Common data store of NexSON files (NeXML as JSON)
Tree synthesis
• Open source software tools for managing open data
• Publicly-accessible data store • Full provenance data (who changed what & when?) • Allows access & download through standard
protocols (git) • Where possible, using Creative Commons 0 waiver
CHALLENGE: SYNTHESIZING PHYLOGENY AND TAXONOMY
Graph databases are key
Image:
Open Tree of Life
Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!
Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!
Synthesis details next week from Stephen Smith, University of Michigan
Thursday, February 13, 1 pm EST
phyloseminar.org
WHAT CAN WE DO WITH THESE DATA AND TOOLS?
Rick Ree & Lyndon Coghill
Comparing phylogeny and taxonomy
������������������ ���
����������������� ���
���������������� ���
�������������������������� ���
Open Tree of Life
Conflict within sets of trees
Stephen Smith
Highlight under-studied parts of the tree
Label internal nodes on phylogenies
Test various methods for synthesis
Quantify and visualize phylogenetic conflict
Extract phylogeny given list of taxa
Infer branch lengths on synthetic trees
Organize biodiversity data phylogenetically
… and many more, enabled by phylogenetic synthesis and digitally available phylogenetic data
COMING IN 2014
Hackathon, jointly with
Clade-based curation and analysis workshops
QUESTIONS? PARTICIPATE?
irc: #opentreeoflife on freenode
http://github.com/OpenTreeOfLife
Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree
Stephen Smith Doug Soltis Tiffani Williams + many postdocs, grad students and undergrads
@NESCent: Karen Cranston, Jonathan Rees, Jim Allman