Open Tree of Life @NSF

Post on 10-May-2015

157 views 1 download

Tags:

description

Presentation about Open Tree of Life given at NSF, May 2013

Transcript of Open Tree of Life @NSF

Karen CranstonNational Evolutionary Synthesis Center

@kcranstnhttp://www.slideshare.net/kcranstn

opentreeoflife.org

What does it mean to “have” the tree of life?

complete & dynamic

browse, download, query

use for research questions

implies digital access

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

graph from David Hillis

Goals

1. Synthesize a complete draft tree of life from existing phylogenies

2. Release in year 1 with:

a. engaging public interface

b. ability to upload new data, explore conflict, see provenance

c. open data: tree, subtrees and source data

Graph databases of taxonomy + source trees •filter / weight input trees

• combine into synthetic trees

• feedback • input new data sets

~ 4% of all published phylogenetic trees

Stoltzfus et al 2012

Inputs: Phylogenetic data

Archiving sequence data is a community norm

assemblyalignmentinference

expertisetime$$$

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

Why do we need to database phylogenetic trees?

Heroic data collection efforts

Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms

Result: repository of data for >2300 studies, >4800 trees

Remaining data not available digitally

Manuscript accepted to PLoS Biology

Inputs: Taxonomy

Large fraction of species not represented in phylogenies

taxonomy provides backbone & coverage at tips

Need name resolution services for data cleaning

Process

Source trees(Phylografter) Data storage &

synthesis(treemachine)

OpenTree: visualization,

search, downloadTaxonomies(taxamachine)

Source tree management

phylografter.opentreeoflife.org

Source tree & taxonomy synthesis

Novel graph database for phylogenies (treemachine) and taxonomy (taxomachine)

Allows for efficient storage and retrieval

OpenTree

dev.opentreeoflife/opentree

Public tree of life

publictreeoflife.com/tree

open data: requiring CC0 license on source trees

open source software: https://github.com/OpenTreeOfLife

wiki: http://opentree.wikispaces.com/ (52 members)

public mailing list (67 members)

“Open” Tree of Life

Community engagement

~50 visitors per day to blog.opentreeoflife.org

@opentreeoflife on Twitter (~900 followers)

Tree of Life symposium: Evolution 2013

Hackathon in year 2 (joint with Arbor)

Collaborations

providing images and text for public tree

developing methods for subtree extraction

summer student providing links to ToLWeb pages

treeviz project from U Indiana MOOC, upcoming summer intern

year 2-3 plans for data archiving / harvest

Assessment: PI survey

general satisfaction with progress on data collection, synthesis and software development

more focus on incentives for users

more integration across labs

Assessment: Advisory board

Members:

David Hillis (UT Austin)

Jan Reichelt (Mendeley)

Andy Sinauer (Sinauer Associates)

Planning meeting for start of year 2

On track for year 1 release

1. Synthesize a complete draft tree of life from existing phylogenies

2. Release in year 1 with:

a. engaging public interface

b. ability to upload new data, explore conflict, see provenance

c. open data: tree, subtrees and source data

Goals for year 2

Refine draft tree based on user feedback

Empirical use cases drive development

Incentives for users / data contributors

Collaboration with external projects (AVAToL, ToLWeb, Phylotastic, Dryad)

opentreeoflife.org