Open Tree of Life at Duke Futures

18
OPEN TREE OF LIFE A community-assembled, continually updated evolutionary history of all life Karen A. Cranston National Evolutionary Synthesis Center Duke University

description

Presentation given to to Duke Information Future committee, March 2012

Transcript of Open Tree of Life at Duke Futures

Page 1: Open Tree of Life at Duke Futures

OPEN TREE OF LIFEA community-assembled, continually updated evolutionary

history of all life

Karen A. CranstonNational Evolutionary Synthesis Center

Duke University

Page 2: Open Tree of Life at Duke Futures

• 1.8 million named species

• Millions more unnamed / undiscovered

Tree of life

All biodiversity linked through common ancestry; tree of life is a grand challenge; scale is huge

Page 3: Open Tree of Life at Duke Futures

Springer, M.S. et al. Nature (1997)Stanhope, M.J. et al. PNAS (1998)

Hedges, B. PNAS 98 (2001)

Nature © Macmillan Publishers Ltd 1997

letters to nature

NATURE | VOL 388 | 3 JULY 1997 61

12. Kutzbach, J. E. & Guetter, P. J. The influence of changing orbital patterns and surface boundaryconditions on climate simulations for the past 18,000 years. J. Atmos. Sci. 43, 1726–1759 (1986).

13. Hall, N. M. J., Valdes, P. J. & Dong, B. The maintenance of the last great ice sheets: a UGAMP GCMstudy. J. Clim. 9, 1004–1009 (1996).

14. Barnosky, C. W. A record of late-Quaternary vegetation from the southwestern Columbia Basin,Washington. Quat. Res. 23, 109–122 (1985).

15. Sarna-Wojcicki, A. J. in Late Quaternary Environments of the United States Vol. 2 (ed. Wright, H. E. Jr)52–77 (Univ. Minnesota, Minneapolis, 1983).

16. Berger, G. W. & Busacca, A. J. Thermoluminescence dating of Late Pleistocene loess and tephra fromeastern Washington and southern Oregon and implications for the eruptive history of Mount St.Helens. J. Geophys. Res. 100, 22361–22374 (1995).

17. Grimm, E. C. in Vegetation History (eds Huntley, B. & Webb, T.) 53–76 (Kluwer, Dordrecht, 1988).18. Martinson, D. G. et al. Age dating and the orbital theory of the ice ages: development of a high

resolution 0 to 300,000-year chronostratigraphy. Quat. Res. 27, 1–29 (1987).19. Mack, R. N. & Bryant, V. M. Jr Modern pollen spectra from the Columbia Basin, Washington.

Northwest Sci. 48, 183–194 (1974).20. Franklin, J. F. & Dyrness, C. T. Natural Vegetation of Oregon and Washington (Oregon State Univ.,

Corvallis, 1988).21. Bond, G. et al. Correlations between climate records from North Atlantic sediments and Greenland

ice. Nature 365, 143–147 (1993).22. Imbrie, J. et al. in Milankovitch and Climate (eds Berger, A., Imbrie, J., Hays, J., Kukla, G. & Saltzman,

B.) 269–305 (Reidel, Dordrecht, 1984).23. Berger, A. & Loutre, M. F. Insolation values for the last 10 million years. Quat. Sci. Rev. 10, 297–317

(1991).24. Greenland Ice-core Project (GRIP) Members. Climate instability during the last interglacial period

recorded in the GRIP ice core. Nature 364, 203–207 (1993).25. Thouveny, N. et al. Climate variations in Europe over the past 140 kyr deduced from rock magnetism.

Nature 371, 503–506 (1994).26. Morley, J. J., Pisias, N. G. & Leinen, M. Late Pleistocene time series of atmospheric and oceanic

variables recorded in sediments from the subarctic Pacific. Paleoceanography 2, 49–62 (1987).27. Stuiver, M. & Reimer, P. J. Extended 14C data base and revised CALIB 3.0 14C age calibration program.

Radiocarbon 35, 215–230 (1993).28. Bard, E., Hamelin, B., Fairbanks, R. G. & Zindler, A. Calibration of the 14C timescale over the past

30,000 years using mass spectrometric U-Th ages from Barbados corals. Nature 345, 405–410 (1990).29. Mazaud, A., Laj, C., Bard, E., Arnold, M. & Tric, A. E. Geomagnetic field control of 14C production

over the last 80 kyr: implications for the radiocarbon time-scale. Geophys. Res. Lett. 18, 1885–1888 (1991).30. Cleveland, W. S. Visualizing Data (Hobard, Summit, 1993).

Acknowledgements. We thank A. Sarna-Wojcicki for tephra identifications, R. J. Nickmann for help withthe pollen analysis, and J. Guiot and R. E. Gresswell for reviews. The work was supported by the NSF andthe Westinghouse-Hanford Paleoclimate Program.

Correspondence and requests for materials should be addressed to C.W. (e-mail: [email protected]).

EndemicAfricanmammalsshake thephylogenetic treeMark S. Springer*, Gregory C. Cleven*, Ole Madsen†,Wilfried W. de Jong†‡, Victor G. Waddell§,Heather M. Amrine* & Michael J. Stanhope§

* Department of Biology, University of California, Riverside, California 92521,USA† Department of Biochemistry, University of Nijmegen, PO Box 9101,6500 HB Nijmegen, The Netherlands‡ Institute for Systematics and Population Biology, University of Amsterdam, POBox 94766, 1090GT Amsterdam, The Netherlands§ Biology and Biochemistry, Queen’s University, 97 Lisburn Road,Belfast BT9 07BL, UK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The order Insectivora, including living taxa (lipotyphlans) andarchaic fossil forms, is central to the question of higher-levelrelationships among placental mammals1. Beginning withHuxley2, it has been argued that insectivores retain many primi-tive features and are closer to the ancestral stock of mammals thanare other living groups3. Nevertheless, cladistic analysis suggeststhat living insectivores, at least, are united by derived anatomicalfeatures4. Here we analyse DNA sequences from three mito-chondrial genes and two nuclear genes to examine relationshipsof insectivores to other mammals. The representative insectivoresare not monophyletic in any of our analyses. Rather, golden molesare included in a clade that contains hyraxes, manatees, elephants,elephant shrews and aardvarks. Members of this group are ofpresumed African origin5,6. This implies that there was an exten-sive African radiation from a single common ancestor that gaverise to ecologically divergent adaptive types. 12S ribosomal RNAtransversions suggest that the base of this radiation occurredduring Africa’s window of isolation in the Cretaceous period

before land connections were developed with Europe in theearly Cenozoic era.

Relationships among orders of placental mammals have proveddifficult to resolve1. To extend the available mitochondrial (mt)sequences, a 2.6-kilobase (kb) segment containing the 12S rRNA,valine transfer RNA, and 16S rRNA genes was sequenced for ninetaxa to generate a data set that is representative of 12 of the 18placental orders and all three insectivore suborders4. Phylogeneticanalyses provide strong support for well-established mammalianclades such as carnivores, hominoids, and Cetacea plus Artiodactyla(Fig. 1a). In agreement with other molecular studies7–10 thatincluded an assortment of taxa, most interordinal associations arenot resolved at bootstrap values �75%. However, the mtDNA datado provide strong support for the association of the two paen-ungulates (hyrax, manatee) together, and of these with elephantshrews, aardvarks and golden moles (Fig. 1a and Table 1). Theassociation of hyraxes with proboscideans and sirenians was sug-gested by Cope11. A competing hypothesis is an association ofhyraxes with perissodactyls12. Our results agree with earlierprotein13,14 and DNA studies7–10 supporting Cope’s paenungulatehypothesis. In addition to bootstrap support, T-PTP15 and Kishino–Hasegawa16 tests also support paenungulate monophyly (Table 2).Anatomical data provide some evidence that aardvarks and/orelephant shrews may be related to paenungulates17,18 but suggestother hypotheses as well: for example, six osteological features areputative synapomorphies uniting elephant shrews with lagomorphsand rodents19. All the available sequence data, including amino-acidsequences13,14, DNA sequences for three nuclear genes8–10, and thepresent mitochondrial genes, support an association of aardvarksand elephant shrews with paenungulates. What is most unexpectedis that golden moles, a family of insectivores, are also part of thisclade. 12S rRNA sequences earlier suggested an association ofgolden moles with paenungulates, but did not provide convincingbootstrap support for this hypothesis7. Our expanded data setdemonstrates that insectivores are not monophyletic (Table 2)

Table 1 Bootstrap support for select clades based on different methods

Clade

Paenungulata Paenungulata � aardvark� elephant shrew

�golden mole

Mitochondrial DNAParsimony 99 95Transversion parsimony 64 90Minimum evolution

Tamura–Nei I 100 92Tamura–Nei II 100 78Logdet 99 90

Maximum likelihood 100 100

vWFParsimony

All positions 49 991st and 2nd positions 24 653rd positions 51 93

Transversion parsimony 30 95Minimum evolution

Tamura–Nei I 37 99Tamura–Nei II 30 99Logdet 43 97

Maximum likelihood 78 100

A2ABParsimony

All sites 71 881st and 2nd positions 49 813rd positions 31 67

Transversion parsimony 71 54Minimum evolution

Tamura–Nei I 83 84Tamura–Nei II 28 25Logdet 79 78

Maximum likelihood 81 89.............................................................................................................................................................................Only two of the three paenungulate orders were represented among the mitochondrial andA2AB sequences. Tamura–Nei27 I and II distances were calculated by using an equal-ratesassumption and a gamma-distribution of rates, respectively.

Learned amazing things about biodiversity from studying the tree; radically changed views

Page 4: Open Tree of Life at Duke Futures

EVOLUTION OF HUMAN PATHOGENS

Smith et al, Nature, 2009

Importance to human health and disease

Page 5: Open Tree of Life at Duke Futures

MODERN BIOLOGY NEEDS TREES

Modified from Garland and Carter, 1994

Conventional statistics assume:

Evolutionary trees provide:

Page 6: Open Tree of Life at Duke Futures

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

We’ve been extremely productive

Page 7: Open Tree of Life at Duke Futures

Where can I browse, search and download a complete tree of life?

You can’t. (Yet)

Page 8: Open Tree of Life at Duke Futures

0"

2000"

4000"

6000"

8000"

10000"

12000"

1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"

Num

ber'o

f'pap

ers'p

ublishe

d'

Year'

Phylogeny'papers,'1978;2008'

Source:"ISI"Web"of"Science""

Rapid"increase"in"applica?ons"of"phylogeny,"beginning"in"early"1990s"

It’s not because we haven’t been productive

Page 9: Open Tree of Life at Duke Futures

Reason 1: silos of phylogenetic knowledge

Page 10: Open Tree of Life at Duke Futures

DATA AVAILABILITY

~4% of all published phylogenetic trees

High archival rate of sequence data

Reason 2: can’t do synthesis without data; community norm to deposit sequences but not trees

Page 11: Open Tree of Life at Duke Futures

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

Most trees published as (beautiful) figures

in PDF files

not reusable!

Weigmann et al. PNAS, 2011Most phylogenetic knowledge locked in PDFs; even best case (here): trees and alignments in treebase, sequences in GenBank but so much more data in this figure!

Page 12: Open Tree of Life at Duke Futures

1. Build and make publicly available the first complete draft tree of life

2. Engage the community in refinement and annotation of the draft tree

3. Promote a culture of data sharing among systematists through software products that target key bottlenecks in phylogenetic workflows

4. Develop, disseminate, and evaluate novel methods for phylogenetic synthesis

Page 13: Open Tree of Life at Duke Futures

+ NSF Assembling the Tree of Life projects+ recent high-profile phylogenies+ ribosomal RNA trees for Bacteria and Archaea+ unnamed environmental sequences+ taxonomies of living and extinct species

Draft Tree of Life as seed for collaboration

Get all data available! Part of draft tree will be good; parts not so good

Page 14: Open Tree of Life at Duke Futures

$GG�FLWDWLRQV

)ODJ�DV�GLVSXWHG

8SORDG�DOWHUQDWLYH

5HTXHVW�UHDQDO\VLV

Tree image modified from Tree of Life Web Project page http://tolweb.org/Nymphalidae/12172 Pictures by Katja Schulz (queen butterfly; CCAttribution-NonCommercial) and Charles Lam (via Flicker ;CCAttribution-ShareAlike)

Tools to browse, update, monitor the

draft tree of life

Tools that incentivize collaboration

publish refinement tools along with draft tree; invite comment / critique / updating

Page 15: Open Tree of Life at Duke Futures

SMART GENERATION OF FIGURES FOR PUBLICATION

thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.

To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution

Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (!lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.

Wiegmann et al. PNAS Early Edition | 3 of 6

EVOLU

TION

• Semantic annotation layers

• Collaborative editing

• Integrated submission of data and annotations to archives

Motivated to organize data at this point in pipeline; want to capture digitally at this point

Page 16: Open Tree of Life at Duke Futures

AUTOMATIC UPDATING

update trees with new

sequence data

detect and synthesize newly published trees

Can’t rely on manual refinement alone; need automate based on new data (and need new analytical method development)

Page 17: Open Tree of Life at Duke Futures

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

2005 FLYNN ET AL.—SAMPLING AND RESOLUTION IN PHYLOGENY OF CARNIVORA 323

FIGURE 5. A schematic cladogram representing the major evolutionary relationships recovered in this analysis of Carnivora. Illustrations ofrepresentative taxa for major lineages include (from top): Nandinia binotata; Felidae (Lynx rufus); Viverridae (Viverra zibetha); Hyaenidae (Crocutacrocuta); Herpestidae (Mungos mungo); Malagasy carnivorans (Eupleres goudotii); Canidae (Canis lupus); Ursidae (Ursus americanus); Phocidae(Phoca vitulina); Otariidae (Zalophus californianus); Odobenidae (Odobenus rosmarus); Ailurus fulgens; Mephitidae (Mephitis mephitis); Procyonidae(Potos flavus); Mustelidae, basal/other mustelids (generalized schematic representing diverse taxa [African polecat and striped marten, badger,etc.]); Mustelidae, Martes-group (Gulo gulo); Mustelidae, Mustela (Mustela frenata); Mustelidae, Lutrinae (Lontra canadensis).

community refinement

automatic updating

????

downstream analyses

education and outreach

Dynamic, continually updated

tree of all life

The grand vision

Page 18: Open Tree of Life at Duke Futures

opentreeoflife.comKaren Cranston, lead PI (Duke) Laura Katz (Smith)

Gordon Burleigh (Florida) Rick Ree (FMNH)Keith Crandall (BYU) Stephen Smith (Michigan)

Karl Gude (MSU) Doug Soltis (Florida)David Hibbett (Clark) Tiffani Williams (TAMU)Mark Holder (Kansas)

AVAToL: Assembling, Visualizing and Analysis of the Tree of Life, $5.76 million

The team and funding; empirical + computational biologists + comp sci + graphics / communication