Jonathan Eisen slides for #HMP2010

Post on 29-Jan-2018

6.980 views 0 download

Transcript of Jonathan Eisen slides for #HMP2010

A phylogeny driven genomic encyclopedia of bacteria and

archaea

Jonathan A. EisenUC Davis

Talk for HMP2010September 2, 2010

Social Networking in Science

Bacterial evolve

Progress in Genome Sequencing

From http://genomesonline.org

Progress in Genome Sequencing

From http://genomesonline.org

Progress in Genome Sequencing

From http://genomesonline.org

Way Back Machine - 2002

Way Back Machine - 2002

454

Way Back Machine - 2002

454

Way Back Machine - 2002

454

Illumina

Way Back Machine - 2002

454

Illumina

Way Back Machine - 2002

454

Illumina

Solid

Way Back Machine - 2002

454

Illumina

Solid

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

2002

Based on Hugenholtz, 2002

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

Based on Hugenholtz, 2002

2002

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

Based on Hugenholtz, 2002

2002

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

Based on Hugenholtz, 2002

2002

Why Increase Phylogenetic Coverage?

• Common approach within some eukaryotic groups (FGP, NHGRI, etc)

• Many successful small projects to fill in bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

• Many potential benefits

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Still highly biased in terms of the tree

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs

Progress in Genome Sequencing

From http://genomesonline.org

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Tree

• GEBA• A genomic

encyclopedia of bacteria and archaea

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify branches with a cultured representative in DSMZ

• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100 (covering breadth of

bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,

Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat

Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)

GEBA Lesson 1

rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Network of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

“Whole Genome” Tree w/ AMPHORA

http://bobcat.genomecenter.ucdavis.edu/AMPHORA/See Wu and Eisen, Genome Biology 2008 9: R151

http://itol.embl.de/

Analogous to method of Ciccarelli et al.

Compare PD in rRNA and WGT

GEBA Lesson 2

Phylogeny-driven genome selection helps discover new genetic diversity

Network of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Synapomorphies exist

Phylogenetic Distribution Novelty: Bacterial Actin Related Protein

Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin

!"#$%&'()*&& !"#$%&'(%()+"#,-.(/01 !"#*+,**'+(

2"#3)&4&*&& !"#*)$*),+%5"#$-.-6&0&1- !"#$%,$-%)(7"#0(1.8-9& !"#$''+-+,',!5"#:1,)*&$/0 !"#&$,%+)+-+

;"#01,&-*0 !"#%*+$--(<"#$-.-3.1%&0 !"#%',&'-+)

2"#$&*-.-1 !"#$'(-%%+&$="#$.1001 !"#-*$+$(&(>"#0$1,/%1.&0 !"#&$**+),)-!;"#01,&-*0 !"#*+,$*'(

5"#:1,)*&$/0 !"#&$,%+%-%%5"#$-.-6&0&1- !"#',&+$)*?"#@-%1*)A10(-. !"#&%'%&*%*B"#A1%%/0# "#%*,-&*'(2"#*-)').@1*0 !"#*-&'''(+5"#$-.-6&0&1- !"#',&&*&*?"#@-%1*)A10(-. !"#$)),)*%,;"#01,&-*0 !"#*+,$*),!;"#)$C.1$-/@ !"#&&),(*((-

."#,1(-*0 !"#$'-+*$((&!!"#(C1%&1*1 !"#$-,(%'+-!

5"#$-.-6&0&1- !"#$++-&%%!

?"#@-%1*)A10(-. !"#$)),),%)

?"#C1*0-*&&!"#&$-*$$(&$5"#$-.-6&0&1- !"#',&,$$%

5"#:1,)*&$/0 !"#&$,%+-,(,!5"#$-.-6&0&1- !"#$,+$(,&

?"#4&0$)&4-/@ !"#''-+&%$-

D"#01(&61 !"#$-&'*)%&+!!"#(C1%&1*1!"#$-%$ $),)

?"#@-%1*)A1(-. !"#$((&+,*-<"#@/0$/%/0 !"#&&'&%'*(,

((

')

$++$++

'*

$++

$++

)*

$++

$++

*$

((),

$++()

(%$++

)%

$++

-)

$++

+/*!

!"#$%

!&'(

!&')

!&'*

+!&'

!&',

!&'-

!&'.

!&'/

!&'(0

See also Guljamow et al. 2007 Current Biology.

GEBA Lesson 3

Phylogeny-driven genome selection improves genome annotation

Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved

hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Kostas Mavrommatis

Natalia Ivanova

Thanos Lykidis

Nikos Kyrpides

Iain Anderson

GEBA Lesson 4

Metadata and individual genome papers important

GEBA Lesson 5

Phylogeny-driven genome selection improves analysis of metagenome data

Who is out there?

rRNA phylotyping from metagenomics

Venter et al., 2004

Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)

Venter et al., 2004

0

0.1250

0.2500

0.3750

0.5000

Alphaproteobacteria

Betaproteobacteria

Gammaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Thermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ght

ed %

of

Clo

nes

Major Phylogenetic Group

EFGEFTuHSP70RecARpoBrRNA

Shotgun Sequencing Allows Use of Other Markers

Venter et al., 2004

ABCDEFG

TUVWXYZ

Binning challenge

ABCDEFG

TUVWXYZ

Binning challenge

Best binning method: reference genomes

Reference Genomes Coming from Select Environment

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

ABCDEFG

TUVWXYZ

Binning challenge

No reference genome? What do you do?

Phylogeny ....

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Limited in past by poor genomic sampling

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification

Metagenomic Analysis Improves w/ Phylogenetic Sampling

• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification

• But not a lot ...

GEBA Future 1

Need to adapt genomic and metagenomic methods to make use of

GEBA data

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with better phylogenetic methods

Improving Phylogeny for Metagenomic Reads

• Examples using reference trees– AMPHORA (Wu and Eisen)– PPlacer (Erik Matsen)– FastTree (Morgan Price)

• Variants– Use concatenated alignment of markers not just

individual genes (Steven Kembel)– Apply to OTU identification not just classification

(Thomas Sharpton)– CoBinning: look for linkage among fragments/genes

(Aaron Darling)

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with more gene families

Keep only the families with:

Universality * Evenness * monophyly >= 90*90*90

Phylogenetic group Genome Number Gene Number Maker Candidates

Archaea 62 145415 102

Actinobacteria 63 267783 136

Alphaproteobacteria 94 347287 142

Betaproteobacteria 56 266362 294

Gammaproteobacteria 126 483632 141

Deltaproteobacteria 25 102115 44

Epislonproteobacteria 18 33416 446

Bacteriodes 25 71531 179

Chlamydae 13 13823 561

Chloroflexi 10 33577 140

Cyanobacteria 36 124080 532

Firmicutes 106 312309 80

Spirochaetes 18 38832 72

Thermi 5 14160 727

Thermotogae 9 17037 646

Phylogenetic Binning Using AMPHORA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Alph

apro

teob

acteria

Betapr

oteo

bacte

ria

Gammap

roteob

acteria

Deltap

roteob

acteria

Epsil

onpr

oteo

bacte

ria

Uncla

ssifie

d Pr

oteo

bacte

ria

Cyan

obac

teria

Chlam

ydiae

Acido

bacte

ria

Bacte

roide

tes

Actin

obac

teria

Aquif

icae

Planc

tomyc

etes

Spiro

chae

tes

Firmicu

tes

Chlor

oflex

i

Chlor

obi

Uncla

ssifie

d Ba

cteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

AMPHORA - each read on its own tree

Improves with rebuilding gene family models

Other Ways to Make Better Use of the Data

• Rebuild protein family models• Experiments from across the tree needed• Need better phylogenies, including HGT• Improved tools for using distantly related

genomes in metagenomic analysis• Better recording and sharing of metadata

about organisms

GEBA Future 2

The dark matter of the biological universe

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Archaea

Eukaryotes

Bacteria

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria• Genome sequences are mostly

from three phyla• Most phyla with cultured

species are sparsely sampled• Lineages with no cultured

taxa even more poorly sampled

Well sampled phylaPoorly sampled

No cultured taxa

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria• Genome sequences are mostly

from three phyla• Most phyla with cultured

species are sparsely sampled• Lineages with no cultured taxa

even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxa

Uncultured Lineages:Technical Approaches

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

MICROBES

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Tree

• GEBA• A genomic

encyclopedia of bacteria and archaea

Eisen & Ward, PIs

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,

Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat

Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor

Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)