Jonathan Eisen slides for #HMP2010

84
A phylogeny driven genomic encyclopedia of bacteria and archaea Jonathan A. Eisen UC Davis Talk for HMP2010 September 2, 2010

Transcript of Jonathan Eisen slides for #HMP2010

Page 1: Jonathan Eisen slides for #HMP2010

A phylogeny driven genomic encyclopedia of bacteria and

archaea

Jonathan A. EisenUC Davis

Talk for HMP2010September 2, 2010

Page 2: Jonathan Eisen slides for #HMP2010
Page 3: Jonathan Eisen slides for #HMP2010

Social Networking in Science

Page 4: Jonathan Eisen slides for #HMP2010

Bacterial evolve

Page 5: Jonathan Eisen slides for #HMP2010

Progress in Genome Sequencing

From http://genomesonline.org

Page 6: Jonathan Eisen slides for #HMP2010

Progress in Genome Sequencing

From http://genomesonline.org

Page 7: Jonathan Eisen slides for #HMP2010

Progress in Genome Sequencing

From http://genomesonline.org

Page 8: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

Page 9: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Page 10: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Page 11: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Illumina

Page 12: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Illumina

Page 13: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Illumina

Solid

Page 14: Jonathan Eisen slides for #HMP2010

Way Back Machine - 2002

454

Illumina

Solid

Page 15: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

2002

Based on Hugenholtz, 2002

Page 16: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

Based on Hugenholtz, 2002

2002

Page 17: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

Based on Hugenholtz, 2002

2002

Page 18: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

Based on Hugenholtz, 2002

2002

Page 19: Jonathan Eisen slides for #HMP2010

Why Increase Phylogenetic Coverage?

• Common approach within some eukaryotic groups (FGP, NHGRI, etc)

• Many successful small projects to fill in bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

• Many potential benefits

Page 20: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Page 21: Jonathan Eisen slides for #HMP2010
Page 22: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Still highly biased in terms of the tree

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Page 23: Jonathan Eisen slides for #HMP2010

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Page 24: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Page 25: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Page 26: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Page 27: Jonathan Eisen slides for #HMP2010

Progress in Genome Sequencing

From http://genomesonline.org

Page 28: Jonathan Eisen slides for #HMP2010

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Tree

• GEBA• A genomic

encyclopedia of bacteria and archaea

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Page 29: Jonathan Eisen slides for #HMP2010

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify branches with a cultured representative in DSMZ

• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100 (covering breadth of

bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009

Page 30: Jonathan Eisen slides for #HMP2010

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,

Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat

Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)

Page 31: Jonathan Eisen slides for #HMP2010

GEBA Lesson 1

rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms

Page 32: Jonathan Eisen slides for #HMP2010

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Page 33: Jonathan Eisen slides for #HMP2010

Network of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Page 34: Jonathan Eisen slides for #HMP2010

“Whole Genome” Tree w/ AMPHORA

http://bobcat.genomecenter.ucdavis.edu/AMPHORA/See Wu and Eisen, Genome Biology 2008 9: R151

http://itol.embl.de/

Analogous to method of Ciccarelli et al.

Page 35: Jonathan Eisen slides for #HMP2010

Compare PD in rRNA and WGT

Page 37: Jonathan Eisen slides for #HMP2010

GEBA Lesson 2

Phylogeny-driven genome selection helps discover new genetic diversity

Page 38: Jonathan Eisen slides for #HMP2010

Network of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Page 39: Jonathan Eisen slides for #HMP2010

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Page 40: Jonathan Eisen slides for #HMP2010
Page 41: Jonathan Eisen slides for #HMP2010
Page 42: Jonathan Eisen slides for #HMP2010
Page 43: Jonathan Eisen slides for #HMP2010
Page 44: Jonathan Eisen slides for #HMP2010
Page 45: Jonathan Eisen slides for #HMP2010

Synapomorphies exist

Page 46: Jonathan Eisen slides for #HMP2010

Phylogenetic Distribution Novelty: Bacterial Actin Related Protein

Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin

!"#$%&'()*&& !"#$%&'(%()+"#,-.(/01 !"#*+,**'+(

2"#3)&4&*&& !"#*)$*),+%5"#$-.-6&0&1- !"#$%,$-%)(7"#0(1.8-9& !"#$''+-+,',!5"#:1,)*&$/0 !"#&$,%+)+-+

;"#01,&-*0 !"#%*+$--(<"#$-.-3.1%&0 !"#%',&'-+)

2"#$&*-.-1 !"#$'(-%%+&$="#$.1001 !"#-*$+$(&(>"#0$1,/%1.&0 !"#&$**+),)-!;"#01,&-*0 !"#*+,$*'(

5"#:1,)*&$/0 !"#&$,%+%-%%5"#$-.-6&0&1- !"#',&+$)*?"#@-%1*)A10(-. !"#&%'%&*%*B"#A1%%/0# "#%*,-&*'(2"#*-)').@1*0 !"#*-&'''(+5"#$-.-6&0&1- !"#',&&*&*?"#@-%1*)A10(-. !"#$)),)*%,;"#01,&-*0 !"#*+,$*),!;"#)$C.1$-/@ !"#&&),(*((-

."#,1(-*0 !"#$'-+*$((&!!"#(C1%&1*1 !"#$-,(%'+-!

5"#$-.-6&0&1- !"#$++-&%%!

?"#@-%1*)A10(-. !"#$)),),%)

?"#C1*0-*&&!"#&$-*$$(&$5"#$-.-6&0&1- !"#',&,$$%

5"#:1,)*&$/0 !"#&$,%+-,(,!5"#$-.-6&0&1- !"#$,+$(,&

?"#4&0$)&4-/@ !"#''-+&%$-

D"#01(&61 !"#$-&'*)%&+!!"#(C1%&1*1!"#$-%$ $),)

?"#@-%1*)A1(-. !"#$((&+,*-<"#@/0$/%/0 !"#&&'&%'*(,

((

')

$++$++

'*

$++

$++

)*

$++

$++

*$

((),

$++()

(%$++

)%

$++

-)

$++

+/*!

!"#$%

!&'(

!&')

!&'*

+!&'

!&',

!&'-

!&'.

!&'/

!&'(0

See also Guljamow et al. 2007 Current Biology.

Page 47: Jonathan Eisen slides for #HMP2010

GEBA Lesson 3

Phylogeny-driven genome selection improves genome annotation

Page 48: Jonathan Eisen slides for #HMP2010

Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved

hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Kostas Mavrommatis

Natalia Ivanova

Thanos Lykidis

Nikos Kyrpides

Iain Anderson

Page 49: Jonathan Eisen slides for #HMP2010

GEBA Lesson 4

Metadata and individual genome papers important

Page 51: Jonathan Eisen slides for #HMP2010

GEBA Lesson 5

Phylogeny-driven genome selection improves analysis of metagenome data

Page 52: Jonathan Eisen slides for #HMP2010

Who is out there?

Page 53: Jonathan Eisen slides for #HMP2010

rRNA phylotyping from metagenomics

Venter et al., 2004

Page 54: Jonathan Eisen slides for #HMP2010

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004

Page 55: Jonathan Eisen slides for #HMP2010

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

Page 56: Jonathan Eisen slides for #HMP2010

ABCDEFG

TUVWXYZ

Binning challenge

Page 57: Jonathan Eisen slides for #HMP2010

ABCDEFG

TUVWXYZ

Binning challenge

Best binning method: reference genomes

Page 58: Jonathan Eisen slides for #HMP2010

Reference Genomes Coming from Select Environment

Page 59: Jonathan Eisen slides for #HMP2010

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Page 60: Jonathan Eisen slides for #HMP2010

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Phylogeny ....

Page 61: Jonathan Eisen slides for #HMP2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Page 62: Jonathan Eisen slides for #HMP2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Limited in past by poor genomic sampling

Page 63: Jonathan Eisen slides for #HMP2010

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification

Page 64: Jonathan Eisen slides for #HMP2010

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification

• But not a lot ...

Page 65: Jonathan Eisen slides for #HMP2010

GEBA Future 1

Need to adapt genomic and metagenomic methods to make use of

GEBA data

Page 66: Jonathan Eisen slides for #HMP2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with better phylogenetic methods

Page 67: Jonathan Eisen slides for #HMP2010

Improving Phylogeny for Metagenomic Reads

• Examples using reference trees– AMPHORA (Wu and Eisen)– PPlacer (Erik Matsen)– FastTree (Morgan Price)

• Variants– Use concatenated alignment of markers not just

individual genes (Steven Kembel)– Apply to OTU identification not just classification

(Thomas Sharpton)– CoBinning: look for linkage among fragments/genes

(Aaron Darling)

Page 68: Jonathan Eisen slides for #HMP2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with more gene families

Page 69: Jonathan Eisen slides for #HMP2010

Keep only the families with:

Universality * Evenness * monophyly >= 90*90*90

Phylogenetic group Genome Number Gene Number Maker Candidates

Archaea 62 145415 102

Actinobacteria 63 267783 136

Alphaproteobacteria 94 347287 142

Betaproteobacteria 56 266362 294

Gammaproteobacteria 126 483632 141

Deltaproteobacteria 25 102115 44

Epislonproteobacteria 18 33416 446

Bacteriodes 25 71531 179

Chlamydae 13 13823 561

Chloroflexi 10 33577 140

Cyanobacteria 36 124080 532

Firmicutes 106 312309 80

Spirochaetes 18 38832 72

Thermi 5 14160 727

Thermotogae 9 17037 646

Page 70: Jonathan Eisen slides for #HMP2010

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with rebuilding gene family models

Page 71: Jonathan Eisen slides for #HMP2010

Other Ways to Make Better Use of the Data

• Rebuild protein family models• Experiments from across the tree needed• Need better phylogenies, including HGT• Improved tools for using distantly related

genomes in metagenomic analysis• Better recording and sharing of metadata

about organisms

Page 72: Jonathan Eisen slides for #HMP2010

GEBA Future 2

The dark matter of the biological universe

Page 73: Jonathan Eisen slides for #HMP2010

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Page 78: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria• Genome sequences are mostly

from three phyla• Most phyla with cultured

species are sparsely sampled• Lineages with no cultured

taxa even more poorly sampled

Well sampled phylaPoorly sampled

No cultured taxa

Page 79: Jonathan Eisen slides for #HMP2010

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria• Genome sequences are mostly

from three phyla• Most phyla with cultured

species are sparsely sampled• Lineages with no cultured taxa

even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxa

Page 80: Jonathan Eisen slides for #HMP2010

Uncultured Lineages:Technical Approaches

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Page 81: Jonathan Eisen slides for #HMP2010
Page 82: Jonathan Eisen slides for #HMP2010

MICROBES

Page 83: Jonathan Eisen slides for #HMP2010

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Tree

• GEBA• A genomic

encyclopedia of bacteria and archaea

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Page 84: Jonathan Eisen slides for #HMP2010

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,

Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat

Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)