Jonathan Eisen talk at ASM General Meeting 2010

93
A phylogeny driven genomic encyclopedia of bacteria and archaea Jonathan A. Eisen Talk at ASMGM May 25, 2010 Tuesday, May 25, 2010

description

Talk by Jonathan Eisen at ASM General Meeting

Transcript of Jonathan Eisen talk at ASM General Meeting 2010

Page 1: Jonathan Eisen talk at ASM General Meeting 2010

A phylogeny driven genomic encyclopedia of bacteria and archaea

Jonathan A. Eisen

Talk at ASMGMMay 25, 2010

Tuesday, May 25, 2010

Page 2: Jonathan Eisen talk at ASM General Meeting 2010

Fleischmann et al. 1995

Tuesday, May 25, 2010

Page 3: Jonathan Eisen talk at ASM General Meeting 2010

Microbial genomes

From http://genomesonline.orgTuesday, May 25, 2010

Page 4: Jonathan Eisen talk at ASM General Meeting 2010

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 5: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 6: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

Based on Hugenholtz, 2002

2002

Tuesday, May 25, 2010

Page 7: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

Based on Hugenholtz, 2002

2002

Tuesday, May 25, 2010

Page 8: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in ArchaeaBased on Hugenholtz, 2002

2002

Tuesday, May 25, 2010

Page 9: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in EukaryotesBased on Hugenholtz, 2002

2002

Tuesday, May 25, 2010

Page 10: Jonathan Eisen talk at ASM General Meeting 2010

The Tree is not Happy

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 11: Jonathan Eisen talk at ASM General Meeting 2010

Why Increase Phylogenetic Coverage?

• Common approach within some eukaryotic groups

• Many small projects to fill in bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

• Many potential benefits

Tuesday, May 25, 2010

Page 12: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Tuesday, May 25, 2010

Page 13: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 14: Jonathan Eisen talk at ASM General Meeting 2010

The Tree of Life is Still Angry

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Eukaryotes

Bacteria

Archaea

Tuesday, May 25, 2010

Page 15: Jonathan Eisen talk at ASM General Meeting 2010

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Tuesday, May 25, 2010

Page 16: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

• Solution - use tree to really fill gaps

Well sampled phyla

Tuesday, May 25, 2010

Page 17: Jonathan Eisen talk at ASM General Meeting 2010

http://www.jgi.doe.gov/programs/GEBA/pilot.htmlTuesday, May 25, 2010

Page 18: Jonathan Eisen talk at ASM General Meeting 2010

A Genomic Encyclopedia of Bacteria and Archaea (GEBA)

Tuesday, May 25, 2010

Page 19: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify branches with a cultured representative in DSMZ

• Grow > 200 of these and prep. DNA• Sequence and finish 100 (covering breadth

of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing

Tuesday, May 25, 2010

Page 20: Jonathan Eisen talk at ASM General Meeting 2010

GEBA and Openness

• All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc

• Data also available in Biotorrents (http://biotorrents.net)

• Individual genome reports published in OA “Standards in Genome Sciences (SIGS)”

• 1st GEBA paper in Nature freely available and published using Creative Commons License

Tuesday, May 25, 2010

Page 21: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Lesson 1

rRNA Tree is Useful for Identifying Phylogenetically Novel Genomes

Tuesday, May 25, 2010

Page 22: Jonathan Eisen talk at ASM General Meeting 2010

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 23: Jonathan Eisen talk at ASM General Meeting 2010

Network of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 24: Jonathan Eisen talk at ASM General Meeting 2010

Whole Genome Tree w/ AMPHORA

http://bobcat.genomecenter.ucdavis.edu/AMPHORA/See Wu and Eisen, Genome Biology 2008 9: R151

Tuesday, May 25, 2010

Page 25: Jonathan Eisen talk at ASM General Meeting 2010

Compare PD in Trees

Tuesday, May 25, 2010

Page 26: Jonathan Eisen talk at ASM General Meeting 2010

PD of rRNA, Genome Trees Similar

From Wu et al. 2009 Nature 462, 1056-1060Tuesday, May 25, 2010

Page 27: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Lesson 1B

rRNA Tree topology is not perfect;Genome-based trees better

Tuesday, May 25, 2010

Page 28: Jonathan Eisen talk at ASM General Meeting 2010

16s Says Hyphomonas is in Rhodobacteriales

Badger et al. 2005

28Tuesday, May 25, 2010

Page 29: Jonathan Eisen talk at ASM General Meeting 2010

WGT and individual gene trees:Its Related to Caulobacterales

Badger et al. 2005

29Tuesday, May 25, 2010

Page 30: Jonathan Eisen talk at ASM General Meeting 2010

Wh

Concatenated alignment “whole genome tree” built using AMPHORA

Tuesday, May 25, 2010

Page 31: Jonathan Eisen talk at ASM General Meeting 2010

Whole genome phylogeny?• Many approaches

– Gene presence/absence– Concatenation of phylogenetic markers– Separate phylogeny of genes and then

integration of results (e.g., networks)– Models that incorporate gain/loss as well as

gene phylogeny• No new results from us

– However ... see Eric Alm talk Ballroom A - “Microbes in a changing world” session tomorrow AM

Tuesday, May 25, 2010

Page 32: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Lesson 2

Phylogeny-driven genome selection helps discover new genetic diversity

Tuesday, May 25, 2010

Page 33: Jonathan Eisen talk at ASM General Meeting 2010

Network of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 34: Jonathan Eisen talk at ASM General Meeting 2010

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Tuesday, May 25, 2010

Page 35: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 36: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 37: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 38: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 39: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 40: Jonathan Eisen talk at ASM General Meeting 2010

Synapomorphies exist

Tuesday, May 25, 2010

Page 41: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Lesson 3

Phylogeny-driven genome selection improves genome annotation

Tuesday, May 25, 2010

Page 42: Jonathan Eisen talk at ASM General Meeting 2010

Predicting Function

• Key step in genome projects• More accurate predictions help guide

experimental and computational analyses• Many diverse approaches• Comparative and evolutionary analysis

greatly improves most predictions

Tuesday, May 25, 2010

Page 43: Jonathan Eisen talk at ASM General Meeting 2010

Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling

• Better definition of protein family sequence “patterns” (e.g., improved HMMs)

• Conversion of hypothetical into conserved hypotheticals

• Greatly improves “comparative” and “evolutionary” based predictions

• Linking distantly related members of protein families

• Improved non-homology prediction

Tuesday, May 25, 2010

Page 45: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Lesson 4

Phylogeny-driven genome selection improves analysis of genome data

from uncultured organisms

Tuesday, May 25, 2010

Page 46: Jonathan Eisen talk at ASM General Meeting 2010

Metagenomics Challenge

Tuesday, May 25, 2010

Page 47: Jonathan Eisen talk at ASM General Meeting 2010

Metagenomics Challenge

1. Who is out there? 2. What are they doing?

Tuesday, May 25, 2010

Page 48: Jonathan Eisen talk at ASM General Meeting 2010

Who is out there?

• Mimic rRNA PCR based studies• But can now do these with other genes

Tuesday, May 25, 2010

Page 49: Jonathan Eisen talk at ASM General Meeting 2010

rRNA phylotyping from metagenomics

Venter et al., 2004

Tuesday, May 25, 2010

Page 50: Jonathan Eisen talk at ASM General Meeting 2010

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004

Tuesday, May 25, 2010

Page 51: Jonathan Eisen talk at ASM General Meeting 2010

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

Tuesday, May 25, 2010

Page 52: Jonathan Eisen talk at ASM General Meeting 2010

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

Should improve with better genomic sampling

Tuesday, May 25, 2010

Page 53: Jonathan Eisen talk at ASM General Meeting 2010

Functional Inference from Metagenomics

• Can work well for individual genes• Predicting “community” function is

challenging because treating community as a bag of genes does not work well

• Better to “compartmentalize” data ...

Tuesday, May 25, 2010

Page 54: Jonathan Eisen talk at ASM General Meeting 2010

ABCDEFG

TUVWXYZ

Binning challenge

Tuesday, May 25, 2010

Page 55: Jonathan Eisen talk at ASM General Meeting 2010

ABCDEFG

TUVWXYZ

Binning challenge

Best binning method: reference genomes

Tuesday, May 25, 2010

Page 56: Jonathan Eisen talk at ASM General Meeting 2010

Reference Genomes Coming from Select Environment

Tuesday, May 25, 2010

Page 57: Jonathan Eisen talk at ASM General Meeting 2010

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Tuesday, May 25, 2010

Page 58: Jonathan Eisen talk at ASM General Meeting 2010

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Phylogeny ....Tuesday, May 25, 2010

Page 59: Jonathan Eisen talk at ASM General Meeting 2010

AMPHORA

Guide treeTuesday, May 25, 2010

Page 60: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own treeTuesday, May 25, 2010

Page 61: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Should improve with better genomic sampling

Tuesday, May 25, 2010

Page 62: Jonathan Eisen talk at ASM General Meeting 2010

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in– Gene identification / confirmation– Functional prediction– Binning– Phylogenetic classification

Tuesday, May 25, 2010

Page 63: Jonathan Eisen talk at ASM General Meeting 2010

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in– Gene identification / confirmation– Functional prediction– Binning– Phylogenetic classification

• But not a lot ...

Tuesday, May 25, 2010

Page 64: Jonathan Eisen talk at ASM General Meeting 2010

How to improve phylogenetic analysis of metagenomic data

• Fragmented data

• Which genes to use?

• More automation

Tuesday, May 25, 2010

Page 65: Jonathan Eisen talk at ASM General Meeting 2010

iSEEM Project

Tuesday, May 25, 2010

Page 66: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic challenge

A single tree with everything

Tuesday, May 25, 2010

Page 67: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with better phylogenetic methods

Tuesday, May 25, 2010

Page 68: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with more gene families

Tuesday, May 25, 2010

Page 69: Jonathan Eisen talk at ASM General Meeting 2010

New “Marker Genes”

• 100 representative genomes• MCL gene families• Identify gene families w/

– High universality– High uniformity of copy number– Phylogenetic tree similar to “whole genome

tree”

Tuesday, May 25, 2010

Page 70: Jonathan Eisen talk at ASM General Meeting 2010

0 1 2 3 4 5 6

rRNA16SruvBnusArplBpurArpsJsecYrpsIpyrHrpsErplPrplNrpsCruvArplFrplAserSrplKrpsKpriAsmpBrpsGguaArpsQrpsLrplUrplOrpsMinfCrplSrplVrplCrpsPrplErplTrplLrplQrpsHmraWrpsOrpsBrplIrplMrplRttffrrtsfrplDradArpsStrmDcoaErpmA

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

nusArpsCrpsEpriArplBsecY

rRNA16SrpsJrpsBruvBguaArplNserSrplFfrrrplArplErplCinfCrplDrplKpurAradAruvArpsMpyrHrplIrplMrpsGrpsLmraWrpsIttfrplStrmDtsfrplUrpsKrpsPrplOrplTrplVrpsSrplPrpsOsmpBrpsHrplQrplRrpsQrplLrpmAcoaE

Ribosomal protein Transcription/translation related proteinDNA repair protein Protein of other functionAMPHORA marker

Distance between the genome tree and 100 random trees (average ± standard deviation)

NODAL distance SPLIT distance

Distances between gene trees and the AMPHORA concatenated genome tree

Tuesday, May 25, 2010

Page 71: Jonathan Eisen talk at ASM General Meeting 2010

Screen gene markers for any given taxonomic groupPhylogenetic group Genome

NumberGene Number

Maker Candidates

Archaea 62 145415 106

Actinobacteria 63 267783 136

Alphaproteobacteria 94 347287 121

Betaproteobacteria 56 266362 311

Gammaproteobacteria 126 483632 118

Deltaproteobacteria 25 102115 206

Epislonproteobacteria 18 33416 455

Bacteriodes 25 71531 286

Chlamydae 13 13823 560

Chloroflexi 10 33577 323

Cyanobacteria 36 124080 590

Firmicutes 106 312309 87

Spirochaetes 18 38832 176

Thermi 5 14160 974

Thermotogae 9 17037 684

Tuesday, May 25, 2010

Page 72: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacter

ia

Gammap

roteob

acteria

Deltapr

oteo

bacter

ia

Epsil

onpr

oteo

bacter

ia

Uncla

ssified

Pro

teob

acteria

Cyan

obac

teria

Chlamyd

iae

Acidob

acteria

Bacter

oide

tes

Actin

obac

teria

Aquific

ae

Plan

ctom

ycetes

Spiro

chae

tes

Firmicu

tes

Chloro

flexi

Chloro

bi

Uncla

ssified

Bac

teria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with better automation

Tuesday, May 25, 2010

Page 73: Jonathan Eisen talk at ASM General Meeting 2010

Zorro

• http://sourceforge.net/projects/probmask/• ZORRO is a probabilistic masking program

that assigns confidence scores to each column in a multiple seqeunce alignment. These scores can then be used to account for alignment accuracy in phylogenetic inference pipelines

• Wu, Chatterji, Eisen submitted

Tuesday, May 25, 2010

Page 74: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 75: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Phylogenomic Lesson 5

We have still only scratched the surface of microbial diversity

Tuesday, May 25, 2010

Page 76: Jonathan Eisen talk at ASM General Meeting 2010

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Tuesday, May 25, 2010

Page 77: Jonathan Eisen talk at ASM General Meeting 2010

Phylogenetic Diversity: Sequenced Bacteria & Archaea

From Wu et al. 2009Tuesday, May 25, 2010

Page 81: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phylaPoorly sampled

No cultured taxaTuesday, May 25, 2010

Page 82: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria• Genome sequences are mostly

from three phyla• Most phyla with cultured

species are sparsely sampled• Lineages with no cultured

taxa even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxaTuesday, May 25, 2010

Page 83: Jonathan Eisen talk at ASM General Meeting 2010

Uncultured Lineages:Technical Approaches

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Tuesday, May 25, 2010

Page 84: Jonathan Eisen talk at ASM General Meeting 2010

GEBA Phylogenomic Lesson 6

Need Experiments from Across the Tree of Life too

Tuesday, May 25, 2010

Page 85: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 86: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 87: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

• Some studies in other phyla

As of 2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 88: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 89: Jonathan Eisen talk at ASM General Meeting 2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Tuesday, May 25, 2010

Page 90: Jonathan Eisen talk at ASM General Meeting 2010

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Tuesday, May 25, 2010

Page 91: Jonathan Eisen talk at ASM General Meeting 2010

Tuesday, May 25, 2010

Page 92: Jonathan Eisen talk at ASM General Meeting 2010

MICROBES

Tuesday, May 25, 2010

Page 93: Jonathan Eisen talk at ASM General Meeting 2010

A Happy Tree of Life

Tuesday, May 25, 2010