Jonathan Eisen talk on "Phylogenomics of Microbes" at Lake Arrowhead Small Genomes Meeting 2002
Jonathan Eisen slides for #HMP2010
-
Upload
jonathan-eisen -
Category
Education
-
view
6.980 -
download
0
Transcript of Jonathan Eisen slides for #HMP2010
A phylogeny driven genomic encyclopedia of bacteria and
archaea
Jonathan A. EisenUC Davis
Talk for HMP2010September 2, 2010
Social Networking in Science
Bacterial evolve
Progress in Genome Sequencing
From http://genomesonline.org
Progress in Genome Sequencing
From http://genomesonline.org
Progress in Genome Sequencing
From http://genomesonline.org
Way Back Machine - 2002
Way Back Machine - 2002
454
Way Back Machine - 2002
454
Way Back Machine - 2002
454
Illumina
Way Back Machine - 2002
454
Illumina
Way Back Machine - 2002
454
Illumina
Solid
Way Back Machine - 2002
454
Illumina
Solid
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
2002
Based on Hugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
Based on Hugenholtz, 2002
2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
Based on Hugenholtz, 2002
2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
Based on Hugenholtz, 2002
2002
Why Increase Phylogenetic Coverage?
• Common approach within some eukaryotic groups (FGP, NHGRI, etc)
• Many successful small projects to fill in bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
• Many potential benefits
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Still highly biased in terms of the tree
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae
2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Archaea
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Eukaryotes
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Viruses
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Progress in Genome Sequencing
From http://genomesonline.org
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution: Really Fill in the Tree
• GEBA• A genomic
encyclopedia of bacteria and archaea
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which no genomes are available
• Identify branches with a cultured representative in DSMZ
• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 100 (covering breadth of
bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,
Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat
Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)
GEBA Lesson 1
rRNA Tree is Useful for Identifying Phylogenetically Novel Organisms
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Network of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
“Whole Genome” Tree w/ AMPHORA
http://bobcat.genomecenter.ucdavis.edu/AMPHORA/See Wu and Eisen, Genome Biology 2008 9: R151
http://itol.embl.de/
Analogous to method of Ciccarelli et al.
Compare PD in rRNA and WGT
PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060
GEBA Lesson 2
Phylogeny-driven genome selection helps discover new genetic diversity
Network of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Protein Family Rarefaction Curves
• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
Synapomorphies exist
Phylogenetic Distribution Novelty: Bacterial Actin Related Protein
Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin
!"#$%&'()*&& !"#$%&'(%()+"#,-.(/01 !"#*+,**'+(
2"#3)&4&*&& !"#*)$*),+%5"#$-.-6&0&1- !"#$%,$-%)(7"#0(1.8-9& !"#$''+-+,',!5"#:1,)*&$/0 !"#&$,%+)+-+
;"#01,&-*0 !"#%*+$--(<"#$-.-3.1%&0 !"#%',&'-+)
2"#$&*-.-1 !"#$'(-%%+&$="#$.1001 !"#-*$+$(&(>"#0$1,/%1.&0 !"#&$**+),)-!;"#01,&-*0 !"#*+,$*'(
5"#:1,)*&$/0 !"#&$,%+%-%%5"#$-.-6&0&1- !"#',&+$)*?"#@-%1*)A10(-. !"#&%'%&*%*B"#A1%%/0# "#%*,-&*'(2"#*-)').@1*0 !"#*-&'''(+5"#$-.-6&0&1- !"#',&&*&*?"#@-%1*)A10(-. !"#$)),)*%,;"#01,&-*0 !"#*+,$*),!;"#)$C.1$-/@ !"#&&),(*((-
."#,1(-*0 !"#$'-+*$((&!!"#(C1%&1*1 !"#$-,(%'+-!
5"#$-.-6&0&1- !"#$++-&%%!
?"#@-%1*)A10(-. !"#$)),),%)
?"#C1*0-*&&!"#&$-*$$(&$5"#$-.-6&0&1- !"#',&,$$%
5"#:1,)*&$/0 !"#&$,%+-,(,!5"#$-.-6&0&1- !"#$,+$(,&
?"#4&0$)&4-/@ !"#''-+&%$-
D"#01(&61 !"#$-&'*)%&+!!"#(C1%&1*1!"#$-%$ $),)
?"#@-%1*)A1(-. !"#$((&+,*-<"#@/0$/%/0 !"#&&'&%'*(,
((
')
$++$++
'*
$++
$++
)*
$++
$++
*$
((),
$++()
(%$++
)%
$++
-)
$++
+/*!
!"#$%
!&'(
!&')
!&'*
+!&'
!&',
!&'-
!&'.
!&'/
!&'(0
See also Guljamow et al. 2007 Current Biology.
GEBA Lesson 3
Phylogeny-driven genome selection improves genome annotation
Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling
• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”
based predictions• Conversion of hypothetical into conserved
hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction
Kostas Mavrommatis
Natalia Ivanova
Thanos Lykidis
Nikos Kyrpides
Iain Anderson
GEBA Lesson 4
Metadata and individual genome papers important
SIGS http://standardsingenomics.org/
GEBA Lesson 5
Phylogeny-driven genome selection improves analysis of metagenome data
Who is out there?
rRNA phylotyping from metagenomics
Venter et al., 2004
Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)
Venter et al., 2004
0
0.1250
0.2500
0.3750
0.5000
Alphaproteobacteria
Betaproteobacteria
Gammaproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Thermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ght
ed %
of
Clo
nes
Major Phylogenetic Group
EFGEFTuHSP70RecARpoBrRNA
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
ABCDEFG
TUVWXYZ
Binning challenge
ABCDEFG
TUVWXYZ
Binning challenge
Best binning method: reference genomes
Reference Genomes Coming from Select Environment
ABCDEFG
TUVWXYZ
Binning challenge
No reference genome? What do you do?
ABCDEFG
TUVWXYZ
Binning challenge
No reference genome? What do you do?
Phylogeny ....
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacte
ria
Gammap
roteob
acteria
Deltap
roteob
acteria
Epsil
onpr
oteo
bacte
ria
Uncla
ssifie
d Pr
oteo
bacte
ria
Cyan
obac
teria
Chlam
ydiae
Acido
bacte
ria
Bacte
roide
tes
Actin
obac
teria
Aquif
icae
Planc
tomyc
etes
Spiro
chae
tes
Firmicu
tes
Chlor
oflex
i
Chlor
obi
Uncla
ssifie
d Ba
cteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacte
ria
Gammap
roteob
acteria
Deltap
roteob
acteria
Epsil
onpr
oteo
bacte
ria
Uncla
ssifie
d Pr
oteo
bacte
ria
Cyan
obac
teria
Chlam
ydiae
Acido
bacte
ria
Bacte
roide
tes
Actin
obac
teria
Aquif
icae
Planc
tomyc
etes
Spiro
chae
tes
Firmicu
tes
Chlor
oflex
i
Chlor
obi
Uncla
ssifie
d Ba
cteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Limited in past by poor genomic sampling
Metagenomic Analysis Improves w/ Phylogenetic Sampling
• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification
Metagenomic Analysis Improves w/ Phylogenetic Sampling
• Small but real improvements in–Gene identification / confirmation–Functional prediction–Binning–Phylogenetic classification
• But not a lot ...
GEBA Future 1
Need to adapt genomic and metagenomic methods to make use of
GEBA data
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacte
ria
Gammap
roteob
acteria
Deltap
roteob
acteria
Epsil
onpr
oteo
bacte
ria
Uncla
ssifie
d Pr
oteo
bacte
ria
Cyan
obac
teria
Chlam
ydiae
Acido
bacte
ria
Bacte
roide
tes
Actin
obac
teria
Aquif
icae
Planc
tomyc
etes
Spiro
chae
tes
Firmicu
tes
Chlor
oflex
i
Chlor
obi
Uncla
ssifie
d Ba
cteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with better phylogenetic methods
Improving Phylogeny for Metagenomic Reads
• Examples using reference trees– AMPHORA (Wu and Eisen)– PPlacer (Erik Matsen)– FastTree (Morgan Price)
• Variants– Use concatenated alignment of markers not just
individual genes (Steven Kembel)– Apply to OTU identification not just classification
(Thomas Sharpton)– CoBinning: look for linkage among fragments/genes
(Aaron Darling)
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacte
ria
Gammap
roteob
acteria
Deltap
roteob
acteria
Epsil
onpr
oteo
bacte
ria
Uncla
ssifie
d Pr
oteo
bacte
ria
Cyan
obac
teria
Chlam
ydiae
Acido
bacte
ria
Bacte
roide
tes
Actin
obac
teria
Aquif
icae
Planc
tomyc
etes
Spiro
chae
tes
Firmicu
tes
Chlor
oflex
i
Chlor
obi
Uncla
ssifie
d Ba
cteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with more gene families
Keep only the families with:
Universality * Evenness * monophyly >= 90*90*90
Phylogenetic group Genome Number Gene Number Maker Candidates
Archaea 62 145415 102
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 142
Betaproteobacteria 56 266362 294
Gammaproteobacteria 126 483632 141
Deltaproteobacteria 25 102115 44
Epislonproteobacteria 18 33416 446
Bacteriodes 25 71531 179
Chlamydae 13 13823 561
Chloroflexi 10 33577 140
Cyanobacteria 36 124080 532
Firmicutes 106 312309 80
Spirochaetes 18 38832 72
Thermi 5 14160 727
Thermotogae 9 17037 646
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacte
ria
Gammap
roteob
acteria
Deltap
roteob
acteria
Epsil
onpr
oteo
bacte
ria
Uncla
ssifie
d Pr
oteo
bacte
ria
Cyan
obac
teria
Chlam
ydiae
Acido
bacte
ria
Bacte
roide
tes
Actin
obac
teria
Aquif
icae
Planc
tomyc
etes
Spiro
chae
tes
Firmicu
tes
Chlor
oflex
i
Chlor
obi
Uncla
ssifie
d Ba
cteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with rebuilding gene family models
Other Ways to Make Better Use of the Data
• Rebuild protein family models• Experiments from across the tree needed• Need better phylogenies, including HGT• Improved tools for using distantly related
genomes in metagenomic analysis• Better recording and sharing of metadata
about organisms
GEBA Future 2
The dark matter of the biological universe
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Phylogenetic Diversity: Sequenced Bacteria & Archaea
From Wu et al. 2009
Phylogenetic Diversity with GEBA
From Wu et al. 2009
Phylogenetic Diversity: Isolates
From Wu et al. 2009
Phylogenetic Diversity: All
From Wu et al. 2009
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria• Genome sequences are mostly
from three phyla• Most phyla with cultured
species are sparsely sampled• Lineages with no cultured
taxa even more poorly sampled
Well sampled phylaPoorly sampled
No cultured taxa
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria• Genome sequences are mostly
from three phyla• Most phyla with cultured
species are sparsely sampled• Lineages with no cultured taxa
even more poorly sampled
Well sampled phyla
Poorly sampled
No cultured taxa
Uncultured Lineages:Technical Approaches
• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification
MICROBES
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution: Really Fill in the Tree
• GEBA• A genomic
encyclopedia of bacteria and archaea
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen,
Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat
Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, DSMZ, GBMF)