Open Tree of Life Phyloseminar 2014

40
TECHNICAL AND SOCIAL CHALLENGES IN SYNTHESIZING THE TREE OF LIFE Karen Cranston National Evolutionary Synthesis Center @kcranstn http://slideshare.net/kcranstn

description

Phyloseminar about the Open Tree of Life project, given Feb 2014 by Karen Cranston http://phyloseminar.org

Transcript of Open Tree of Life Phyloseminar 2014

Page 1: Open Tree of Life Phyloseminar 2014

TECHNICAL AND SOCIAL CHALLENGES IN SYNTHESIZING THE TREE OF LIFEKaren Cranston National Evolutionary Synthesis Center

@kcranstn http://slideshare.net/kcranstn

Page 2: Open Tree of Life Phyloseminar 2014

IF WE “HAD” A TREE OF LIFE?

complete = contains all of biodiversity

dynamic = continuously updated with new data

available digitally = browsing, querying, downloading

Page 3: Open Tree of Life Phyloseminar 2014

Produce a digitally-available phylogeny that contains all of biodiversity

Provide tools for managing, analyzing and sharing phylogenetic data

http://avatol.org

Page 4: Open Tree of Life Phyloseminar 2014

CHALLENGE: COMPLETENESS

Page 5: Open Tree of Life Phyloseminar 2014

Even if there were phylogenies for all species in GenBank, would only have a

small fraction of biodiversity

Page 6: Open Tree of Life Phyloseminar 2014

Soltis et al APG III phylogeny (30 taxa)

NCBI taxonomy data (578 taxa)

from Stephen Smith

Page 7: Open Tree of Life Phyloseminar 2014

Dipsicales graphSynthesized tree; contains structure of phylogeny but all 578 taxa

from Stephen Smith

Page 8: Open Tree of Life Phyloseminar 2014

Inputs: Published phylogenies

Taxonomies

• filter / weight input trees • synthesize into single data

structure

• process feedback • input new data sets

complete tree of life

Page 9: Open Tree of Life Phyloseminar 2014

CHALLENGE: ACCESS TO PUBLISHED PHYLOGENIES

Page 10: Open Tree of Life Phyloseminar 2014

* OpenTree grant proposal

“Phylogeny provides a mechanism through which to interpret the patterns and processes of evolution and to

predict the responses of life to rapid environmental change. Phylogenies and phylogenetic methods are now being used

to enhance agriculture, identify and combat diseases, conserve biodiversity, and predict responses to global

climate change and to biological invasions.” *

(tl;dr: We need trees to do cool and important science)

Page 11: Open Tree of Life Phyloseminar 2014

Expertise in phylogenetic

inference

Expertise in methods that use

phylogenies

Page 12: Open Tree of Life Phyloseminar 2014

import phytools!flyTree<-read.tree(“flies.tre”)!contMap(flyTree,flyData)

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),

((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),

(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,

((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,

((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,

(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,

(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));

Weigmann et al. PNAS, 2011

Page 13: Open Tree of Life Phyloseminar 2014

~ 4% of all published phylogenetic trees

Stoltzfus et al 2012

Archiving sequence data is a community norm

Archiving phylogenetic data is quite rare

Page 14: Open Tree of Life Phyloseminar 2014

OPENTREE PHYLOGENY INPUTS

Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms

Result: data for >2700 studies, >4800 trees

Page 15: Open Tree of Life Phyloseminar 2014

CHALLENGE: SELECTING BACKBONE TAXONOMY

Page 16: Open Tree of Life Phyloseminar 2014

Complete? Up to date with taxonomic literature? Phylogenetically-informed?

Systematics research

Online taxonomic resourcesvery slow…..

Page 17: Open Tree of Life Phyloseminar 2014

OPEN TREE TAXONOMY

+

+

+ patch files for manual edits (requires source info!)

+

Page 18: Open Tree of Life Phyloseminar 2014

• 3,133,028 nodes and 2,559,835 ‘species’

• https://github.com/OpenTreeOfLife/reference-taxonomy

Page 19: Open Tree of Life Phyloseminar 2014

CHALLENGE: PHYLOGENY CURATION

Page 20: Open Tree of Life Phyloseminar 2014

TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),

((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),

(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,

((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,

((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,

(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,

(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));

How was this tree inferred? What are the tip labels? Is it rooted correctly?

What clade was the focus of the study?

Page 21: Open Tree of Life Phyloseminar 2014

CURATOR TOOLS

Page 22: Open Tree of Life Phyloseminar 2014

Data curation

Tree synthesis

NeXSON (NeXML as JSON)

Page 23: Open Tree of Life Phyloseminar 2014
Page 24: Open Tree of Life Phyloseminar 2014

Input names Mapped to taxonomy

Page 25: Open Tree of Life Phyloseminar 2014
Page 26: Open Tree of Life Phyloseminar 2014

API layer

Common data store of NexSON files (NeXML as JSON)

Tree synthesis

Page 27: Open Tree of Life Phyloseminar 2014

• Open source software tools for managing open data

• Publicly-accessible data store • Full provenance data (who changed what & when?) • Allows access & download through standard

protocols (git) • Where possible, using Creative Commons 0 waiver

Page 28: Open Tree of Life Phyloseminar 2014

CHALLENGE: SYNTHESIZING PHYLOGENY AND TAXONOMY

Page 29: Open Tree of Life Phyloseminar 2014

Graph databases are key

Image:

Page 30: Open Tree of Life Phyloseminar 2014

Open Tree of Life

Page 31: Open Tree of Life Phyloseminar 2014

Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!

Page 32: Open Tree of Life Phyloseminar 2014

Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!

Page 33: Open Tree of Life Phyloseminar 2014

Synthesis details next week from Stephen Smith, University of Michigan

Thursday, February 13, 1 pm EST

phyloseminar.org

Page 34: Open Tree of Life Phyloseminar 2014

WHAT CAN WE DO WITH THESE DATA AND TOOLS?

Page 35: Open Tree of Life Phyloseminar 2014

Rick Ree & Lyndon Coghill

Comparing phylogeny and taxonomy

Page 36: Open Tree of Life Phyloseminar 2014

������������������ ���

����������������� ���

���������������� ���

�������������������������� ���

Open Tree of Life

Conflict within sets of trees

Stephen Smith

Page 37: Open Tree of Life Phyloseminar 2014

Highlight under-studied parts of the tree

Label internal nodes on phylogenies

Test various methods for synthesis

Quantify and visualize phylogenetic conflict

Extract phylogeny given list of taxa

Infer branch lengths on synthetic trees

Organize biodiversity data phylogenetically

… and many more, enabled by phylogenetic synthesis and digitally available phylogenetic data

Page 38: Open Tree of Life Phyloseminar 2014

COMING IN 2014

Hackathon, jointly with

Clade-based curation and analysis workshops

Page 39: Open Tree of Life Phyloseminar 2014

QUESTIONS? PARTICIPATE?

[email protected]

[email protected]

irc: #opentreeoflife on freenode

http://github.com/OpenTreeOfLife

Page 40: Open Tree of Life Phyloseminar 2014

Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree

Stephen Smith Doug Soltis Tiffani Williams + many postdocs, grad students and undergrads

@NESCent: Karen Cranston, Jonathan Rees, Jim Allman