Open Tree of Life Phyloseminar 2014

Post on 10-May-2015

614 views 0 download

Tags:

description

Phyloseminar about the Open Tree of Life project, given Feb 2014 by Karen Cranston http://phyloseminar.org

Transcript of Open Tree of Life Phyloseminar 2014

TECHNICAL AND SOCIAL CHALLENGES IN SYNTHESIZING THE TREE OF LIFEKaren Cranston National Evolutionary Synthesis Center

@kcranstn http://slideshare.net/kcranstn

IF WE “HAD” A TREE OF LIFE?

complete = contains all of biodiversity

dynamic = continuously updated with new data

available digitally = browsing, querying, downloading

Produce a digitally-available phylogeny that contains all of biodiversity

Provide tools for managing, analyzing and sharing phylogenetic data

http://avatol.org

CHALLENGE: COMPLETENESS

Even if there were phylogenies for all species in GenBank, would only have a

small fraction of biodiversity

Soltis et al APG III phylogeny (30 taxa)

NCBI taxonomy data (578 taxa)

from Stephen Smith

Dipsicales graphSynthesized tree; contains structure of phylogeny but all 578 taxa

from Stephen Smith

Inputs: Published phylogenies

Taxonomies

• filter / weight input trees • synthesize into single data

structure

• process feedback • input new data sets

complete tree of life

CHALLENGE: ACCESS TO PUBLISHED PHYLOGENIES

* OpenTree grant proposal

“Phylogeny provides a mechanism through which to interpret the patterns and processes of evolution and to

predict the responses of life to rapid environmental change. Phylogenies and phylogenetic methods are now being used

to enhance agriculture, identify and combat diseases, conserve biodiversity, and predict responses to global

climate change and to biological invasions.” *

(tl;dr: We need trees to do cool and important science)

Expertise in phylogenetic

inference

Expertise in methods that use

phylogenies

import phytools!flyTree<-read.tree(“flies.tre”)!contMap(flyTree,flyData)

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),

((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),

(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,

((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,

((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,

(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,

(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));

Weigmann et al. PNAS, 2011

~ 4% of all published phylogenetic trees

Stoltzfus et al 2012

Archiving sequence data is a community norm

Archiving phylogenetic data is quite rare

OPENTREE PHYLOGENY INPUTS

Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms

Result: data for >2700 studies, >4800 trees

CHALLENGE: SELECTING BACKBONE TAXONOMY

Complete? Up to date with taxonomic literature? Phylogenetically-informed?

Systematics research

Online taxonomic resourcesvery slow…..

OPEN TREE TAXONOMY

+

+

+ patch files for manual edits (requires source info!)

+

• 3,133,028 nodes and 2,559,835 ‘species’

• https://github.com/OpenTreeOfLife/reference-taxonomy

CHALLENGE: PHYLOGENY CURATION

TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51,(49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))),((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19),

((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26,(5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70,(71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53,(77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84),

(83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)),(97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)),((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)),(199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193,

((209,(208,206)),(198,(200,207))))))))))))),(113,(((154,((169,170),(103,191))),((131,126),(128,((134,135),(129,(125,

((132,130),(104,133)))))))),((((190,166),((162,171),((116,120),(115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158,

(184,189)))))),((123,124),(((148,((165,161),(174,182))),((106,121),(163,(167,127)))),((173,(156,(155,160))),(164,

(((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144,(140,141))),((142,152),(147,((110,111),(149,(150,151)))))))))))))))))))))))))))))))));

How was this tree inferred? What are the tip labels? Is it rooted correctly?

What clade was the focus of the study?

CURATOR TOOLS

Data curation

Tree synthesis

NeXSON (NeXML as JSON)

Input names Mapped to taxonomy

API layer

Common data store of NexSON files (NeXML as JSON)

Tree synthesis

• Open source software tools for managing open data

• Publicly-accessible data store • Full provenance data (who changed what & when?) • Allows access & download through standard

protocols (git) • Where possible, using Creative Commons 0 waiver

CHALLENGE: SYNTHESIZING PHYLOGENY AND TAXONOMY

Graph databases are key

Image:

Open Tree of Life

Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!

Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!

Synthesis details next week from Stephen Smith, University of Michigan

Thursday, February 13, 1 pm EST

phyloseminar.org

WHAT CAN WE DO WITH THESE DATA AND TOOLS?

Rick Ree & Lyndon Coghill

Comparing phylogeny and taxonomy

������������������ ���

����������������� ���

���������������� ���

�������������������������� ���

Open Tree of Life

Conflict within sets of trees

Stephen Smith

Highlight under-studied parts of the tree

Label internal nodes on phylogenies

Test various methods for synthesis

Quantify and visualize phylogenetic conflict

Extract phylogeny given list of taxa

Infer branch lengths on synthetic trees

Organize biodiversity data phylogenetically

… and many more, enabled by phylogenetic synthesis and digitally available phylogenetic data

COMING IN 2014

Hackathon, jointly with

Clade-based curation and analysis workshops

QUESTIONS? PARTICIPATE?

opentreeoflife@googlegroups.com

opentreeoflife-software@googlegroups.com

irc: #opentreeoflife on freenode

http://github.com/OpenTreeOfLife

Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree

Stephen Smith Doug Soltis Tiffani Williams + many postdocs, grad students and undergrads

@NESCent: Karen Cranston, Jonathan Rees, Jim Allman