OpenTree at NESCent Academy 2012

29
A community-assembled, continually updated evolutionary history of all life Karen A. Cranston National Evolutionary Synthesis Center Duke University

description

Open Tree of Life talk given at the 2012 NESCent Academy NGS course

Transcript of OpenTree at NESCent Academy 2012

Page 1: OpenTree at NESCent Academy 2012

A community-assembled, continually updated evolutionary history of all life

Karen A. CranstonNational Evolutionary Synthesis Center

Duke University

Page 2: OpenTree at NESCent Academy 2012
Page 3: OpenTree at NESCent Academy 2012

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

Page 4: OpenTree at NESCent Academy 2012

Where can I browse, search and download the

tree of life?

You can’t. (Yet)

Page 5: OpenTree at NESCent Academy 2012

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

Page 6: OpenTree at NESCent Academy 2012
Page 7: OpenTree at NESCent Academy 2012

DATA AVAILABILITY

~4% of all published phylogenetic trees

High archival rate of sequence data

Page 8: OpenTree at NESCent Academy 2012

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

Most trees published as (beautiful) figures

in PDF files

not reusable!

Weigmann et al. PNAS, 2011

Page 9: OpenTree at NESCent Academy 2012

Pictures of independent phylogenies

Page 10: OpenTree at NESCent Academy 2012

• Ideas Lab = 5-day workshop• Self-assembly into groups• Pitched pre-proposals and end of lab• NSF invited full proposals

Page 11: OpenTree at NESCent Academy 2012

opentreeoflife.org

Karen Cranston, lead PI (Duke)

Gordon Burleigh (Florida)

Keith Crandall (BYU)

Karl Gude (MSU)

David Hibbett (Clark)

Mark Holder (Kansas)

Laura Katz (Smith)

Rick Ree (FMNH)

Stephen Smith (Michigan)

Doug Soltis (Florida)

Tiffani Williams (TAMU)

AVAToL: Assembling, Visualizing and Analysis of the Tree of Life

Page 12: OpenTree at NESCent Academy 2012
Page 13: OpenTree at NESCent Academy 2012

• 1.8 million named species

•Millions more unnamed / undiscovered

Tree of life

Page 14: OpenTree at NESCent Academy 2012

COMPARATIVE BIOLOGY

Modified from Garland and Carter, 1994

Conventional statistics assume:

Evolutionary trees provide:

Page 15: OpenTree at NESCent Academy 2012

PHYLOGENETIC PLACEMENT

Metagenomic reads+

Reference phylogeny

Kembel et al 2011

Page 16: OpenTree at NESCent Academy 2012
Page 17: OpenTree at NESCent Academy 2012

1. Build the first complete draft tree of life

2. Engage the community in refinement and annotation

3. Promote a culture of data sharing through software products

4. Develop novel methods for phylogenetic synthesis

Page 18: OpenTree at NESCent Academy 2012

+ taxonomies of living and extinct species+ any digital phylogenetic data we can get: NSF Assembling the Tree of Life projects recent high-profile phylogenies ribosomal RNA trees for Bacteria and Archaea TreeBASE and Dryad trees

Graph database holding a ‘cloud’ of thousands of input trees with millions of nodes

Page 19: OpenTree at NESCent Academy 2012

Graph database holding thousands of input trees with millions of nodes

Filter / weight input data (number of taxa, size of alignment, year of publication, etc)

Synthesis (supertrees, grafting)

Page 20: OpenTree at NESCent Academy 2012

Graph database holding a ‘cloud’ of thousands of input trees with

millions of nodes • filter input trees• synthesize into summary

trees

• compare to previous trees• invite annotation• input new data sets

Page 21: OpenTree at NESCent Academy 2012

$GG�FLWDWLRQV

)ODJ�DV�GLVSXWHG

8SORDG�DOWHUQDWLYH

5HTXHVW�UHDQDO\VLV

Tree image modified from Tree of Life Web Project page http://tolweb.org/Nymphalidae/12172 Pictures by Katja Schulz (queen butterfly; CCAttribution-NonCommercial) and Charles Lam (via Flicker ;CCAttribution-ShareAlike)

FlagGet citationsAnnotateUpload alternate

Ability to annotate and improve

Clear links to source data and methods

Compare your results with synthetic tree

Page 23: OpenTree at NESCent Academy 2012

http://www.evoio.org/wiki/Phylotastic

NESCent hackathon to architect and implement a phylogenetic pruning service for megatrees

Page 24: OpenTree at NESCent Academy 2012

YEAR 2 & 3: SMART GENERATION OF FIGURES FOR PUBLICATION

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

• Semantic annotation layers

• Collaborative editing

• Integrated submission of topology, branch lengths and annotations to archives

Page 25: OpenTree at NESCent Academy 2012

YEAR 2 & 3: AUTOMATIC UPDATING

update trees with new

sequence data

detect and incorporate newly published trees

Page 26: OpenTree at NESCent Academy 2012

Community assembly of the tree of life (Open Tree of Life)

Next generation Phenomics (PI O’Leary)

Arbor: Comparative Analysis Workflows (PI Harmon)

Page 27: OpenTree at NESCent Academy 2012

POTENTIAL IMPACTS

• Phylogenies for any set of species easily available

• Benchmark for current state of phylogenetic knowledge

• Increasing rate of data archive

• Placing “dark taxa” in global informatics framework

Page 28: OpenTree at NESCent Academy 2012

BIGGEST CHALLENGES?

• Lack of digitally-available trees

• Visualization

• Engaging community to annotate and update

• Producing usable and visually appealing software

Page 29: OpenTree at NESCent Academy 2012

“OPEN” TREE OF LIFE?

http://opentreeoflife.org