Eisen.Geba.Jgi2009b
-
Upload
jonathan-eisen -
Category
Business
-
view
1.687 -
download
0
description
Transcript of Eisen.Geba.Jgi2009b
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBAA genomic encyclopedia of
bacteria and archaea
Jonathan A. Eisen
JGI User Meeting 2009
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
“Nothing in biology makes senseexcept in the light of evolution.”
T. Dobzhansky (1973)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
rRNA Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The Tree is not Happy
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. From http://genomesonline.org
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Archaea
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Need for Tree Guidance Well Established
• Common approach within some eukaryotic groups
• Many small projects funded to fill in some bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen, Ward, Badger, Wu, Wu, et al.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Bacterial aTOL Project AIMS
• Improve resolution of deep branches in the bacterial tree
• Launch biological studies of these phyla and discover functional novelty
• Leverage data for interpreting environmental surveys
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
T. roseum genome
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The Tree of Life is Still Angry
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Within Phyla Diversity Immense
• Each phyla represents billions of years of evolution
• Some have hundreds of major lineages
• New lineages are being discovered all the time
• Most branches within most phyla have few or no genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae
2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Additional Impetus for Tree Guided Projects
• Suggestion to sequence all bacteria and archaea in Bergey’s Manual (Stevens et al)
• Success in sequencing genomes from across the tree in animals
• Multiple government reports suggest a more systematic approach to sequencing is needed
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 100 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
• Solution - use tree to really fill gaps
Well sampled phyla
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
http://www.jgi.doe.gov/programs/GEBA/pilot.html
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot Project Overview
• Select 200 organisms using tree
• Develop high throughput pipeline for strain growth and DNA preparation
• Sequence and finish 100
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot I: Selecting Targets
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot Target List
0
5
10
15
20
25
30
35
B: Actinobacteria (High GC)
B: Aminanaerobia
B: Aquificae
B: BacteroidetesB: Chloroflexi
B: DeferribacteresB: Deferribacteres
B: Deinococci
B: Delta ProteobacteriaB: Epsilon Proteobacteria
B: FirmicutesB: Fusobacteria
B: Gamma ProteobacteriaB: Gemmatimonadetes
B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes
B: Thermodesulfobacteria
B: ThermodesulfobiaB: Thermovenabulae
A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia
A: ThermococciA: Thermoprotei
Phyla
# of Genomes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot II: The Importance of Project
Management
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Project Flowchart
GEBA Proposal
Scientific and Technical Review1
Negotiate Scope of Work
Receive Starting Material1
OK?
Project Initiation Sequencing
Annotation
Draft Sequencing
and Assembly1
Finish Sequencing
and Assembly2
IMG1
Finish Annotation3
Complete Genome GenBank
Submission1
Draft Annotation3
Shotgun Genome GenBank
Submission1
IMG – ER1
1 PGF2 LANL3 ORNL
OK?
OK?
IMG – ER1
Gene-QA1
David Bruce, Lynne Goodwin et al
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot III: Partnership with DSMZ
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Biggest Challenge:Getting DNA
• Getting quality DNA is biggest bottleneck• Solution: Beg Borrow and Steal
• DSMZ offered to do for free• ATCC is doing a small number for a fee• In discussions with other PCC and other
collections
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
MicroorganismsMicroorganismsQuantification gel of the genomic DNA isolated from
Conexibacter woesei (DSM 14684T)
Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
1 2 3 4 5 6 7 8
Lane 1: c(-Marker)= 15 ngLane 2: c(-Marker)= 30 ngLane 3: c(-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche
236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens
Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche
236250)Lane 14: c(-Marker)= 125 ngLane 15: c(-Marker)= 250 ng Lane 16: c(-Marker)= 500 ng
9 10 11 12 13 14 15 16
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot IV: Sequencing, Annotation, Data
Release
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Current Status
• >100 in progress
• GEBA 56 (focus of first paper)– 34 finished genomes– 55 submitted to Genbank– Released to IMG-GEBA page and JGI-FTP site
• All data is completely Open for anyone to use
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
IMG/GEBA
QuickTime™ and a decompressor
are needed to see this picture.
http://img.jgi.doe.gov/cgi-bin/geba/main.cgi
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Adopt a Microbe
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA Pilot IV: Assess Benefits of GEBA56
All genomes have some value
But what, if any, is the benefit of tree-guided sequencing over other
selection methods
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Why Increase Taxonomic Coverage II?
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Mechanisms of diversification
• Species phylogeny and classification
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Value of diverse genomes I: Gene discovery
• Premise:– New genomes frequently contain genetic
novelty– Phylogenetic diversity of a genome should be
correlated to novelty
• Caveat: – Does lateral gene transfer wipe out contribution
of phylogenetic diversity to novelty?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
0
50000
100000
150000
200000
250000
300000
350000
0 10 20 30 40 50 60 70 80
S. agalactiae
Enterobacteriaceae
Actinobacteria
Bacteria from GEBA project
Genome Number
Tot
al G
ene
Num
ber
Num
ber
of p
rote
ins
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Novelty 2 - Structural Novelty
• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)
• Structural modeling suggests many are structurally novel too (D'haeseleer)
• 372 being crystallized by the PSI (Kerfeld)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Novelty 3
Diversity within known families
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Transporter Profiles
0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
actmi
beuca
brafa catac
celfl
conwo
dyafe
halmu
halut
krifl
nakmu
pedhe
sacvi sphth
spili
stana
strro
sulde theac
thete
tsupa xylce
denac
detpe haloc
halbo kanko
plali
acife
meiru
meisi
rhoma
aliac chipi
desr5
desba geoob
thebi thecu
anapr
atopa
bramu
desa7
jonde
sanke sebte
slahe
capoc
crycu eggle gorbr kytse
lepbu nocda strmo
veipa
Number of transporters
i n o r g a n i c i o n s a m i n o a c i d s , n i t r o c o m p o u n d s a n d p e p t i d e s d r u g s / t o x i n s s u g a r s c a r b o x y l a t e s n u c l e o s i d e s / t i d e s , b a s e s s i d e r o p h o r e s o t h e r
Sebaldella termitidis ATCC 33386 has 2x number of sugar PTS transporters of any genome
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Novelty 4
Unusual distribution patterns
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Shotgun Sequencing Detects More Diversity than PCR-methods
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
First Bacterial Actin Related Protein
First found by V. Kunin, Structure Analysis by Patrik D. et al
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Most Closely Related to ARP8
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Value of 100 diverse genomes II: Annotation
• Premise:– Increased phylogenetic coverage should
improve our ability to annotate genes in other (e.g., reference/model genomes)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Annotation Improves
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Non-homology functional prediction methods
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Linking Protein Families ImprovedGenes -links
0 20 40 60 80 100 120 140
Haliangium ochraceum SMP-2, DSM 14365Spirosoma linguale DSM 74
Catenulispora acidiphila ID139908, DSM 44928Streptosporangium roseum NI 9100, DSM
Sebaldella termitidis ATCC 33386Planctomyces limnophilus DSM 3776
Dyadobacter fermentans NS 114, DSM 18053Chitinophaga pinensis UQM 2034, DSM 2588
Actinosynnema mirum 101, DSM 43827Stackebrandtia nassauensis LLR-40K-21, DSM
Kribbella flavida DSM 17836Desulfotomaculum acetoxidans 5575, DSM 771
Halogeometricum borinquense DSM 11551Meiothermus silvanus DSM 9946
Nakamurella multipartita Y-104, DSM 44233Nocardiopsis dassonvillei dassonvillei DSM
Conexibacter woesei ID131577, DSM 14684Gordonia bronchialis DSM 43247
Leptotrichia buccalis C-1013-b, DSM 1135Halorhabdus utahensis AX-2, DSM 12940
Brachyspira murdochii 56-150, DSM 12563Meiothermus ruber DSM 1279
Denitrovibrio acetiphilus N2460, DSM 12809Slackia heliotrinireducens DSM 20476
Pedobacter heparinus HIM 762-3, DSM 2366Alicyclobacillus acidocaldarius acidocaldarius
Capnocytophaga ochracea DSM 7271Desulfomicrobium baculatum DSM 4028
Jonesia denitrificans DSM 20603Saccharomonospora viridis P101, DSM 43017
Halomicrobium mukohataei arg-2, DSM 12286Geodermatophilus obscurus G-20, DSM 43160
Thermobaculum terrenum YNP1, ATCC BAA-798Sphaerobacter thermophilus 4ac11, DSM 20745Beutenbergia cavernosae HKI 0122, DSM 12333
Thermomonospora curvata DSM 43183Cellulomonas flavigena 134, DSM 20109
Dethiosulfovibrio peptidovorans SEBR 4207,Eggerthella lenta VPI 0255, DSM 2243Xylanimonas cellulosilytica DSM 15894
Rhodothermus marinus DSM 4252Veillonella parvula Te3, DSM 2008
Tsukamurella paurometabola DSM 20162Kytococcus sedentarius DSM 20547
Kangiella koreensis SW-125, DSM 16069Sanguibacter keddieii DSM 10542
Thermobispora bispora DSM 43833Streptobacillus moniliformis DSM 12112Sulfurospirillum deleyianum DSM 6946
Brachybacterium faecium DSM 4810Anaerococcus prevotii PC 1, DSM 20548
Desulfohalobium retbaense DSM 5692Acidimicrobium ferrooxidans DSM 10331
Cryptobacterium curtum DSM 15641Atopobium parvulum IPP 1246, DSM 20469Thermanaerovibrio acidaminovorans Su883,
Genome
Links
Genes -links
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Fusion Based Predictions Improvedgene harboring new fusions (COG)
0 5 10 15 20 25 30 35
Thermanaerovibrio acidaminovorans Su883, DSM 6589 Cryptobacterium curtum DSM 15641
Sphaerobacter thermophilus 4ac11, DSM 20745 Kytococcus sedentarius DSM 20547
Dethiosulfovibrio peptidovorans SEBR 4207, DSM 11002 Haliangium ochraceum SMP-2, DSM 14365
Atopobium parvulum IPP 1246, DSM 20469 Denitrovibrio acetiphilus N2460, DSM 12809
Brachybacterium faecium DSM 4810 Meiothermus silvanus DSM 9946
Slackia heliotrinireducens DSM 20476 Xylanimonas cellulosilytica DSM 15894
Stackebrandtia nassauensis LLR-40K-21, DSM 44728 Nakamurella multipartita Y-104, DSM 44233
Desulfohalobium retbaense DSM 5692 Tsukamurella paurometabola DSM 20162
Sanguibacter keddieii DSM 10542 Streptosporangium roseum NI 9100, DSM 43021
Actinosynnema mirum 101, DSM 43827 Rhodothermus marinus DSM 4252
Cellulomonas flavigena 134, DSM 20109 Brachyspira murdochii 56-150, DSM 12563 Leptotrichia buccalis C-1013-b, DSM 1135
Catenulispora acidiphila ID139908, DSM 44928 Conexibacter woesei ID131577, DSM 14684
Meiothermus ruber DSM 1279 Nocardiopsis dassonvillei dassonvillei DSM 43111
Thermobispora bispora DSM 43833 Beutenbergia cavernosae HKI 0122, DSM 12333
Acidimicrobium ferrooxidans DSM 10331 Desulfotomaculum acetoxidans 5575, DSM 771
Kribbella flavida DSM 17836 Eggerthella lenta VPI 0255, DSM 2243
Gordonia bronchialis DSM 43247 Thermobaculum terrenum YNP1, ATCC BAA-798
Desulfomicrobium baculatum DSM 4028 Thermomonospora curvata DSM 43183
Geodermatophilus obscurus G-20, DSM 43160 Planctomyces limnophilus DSM 3776
Jonesia denitrificans DSM 20603 Halogeometricum borinquense DSM 11551
Chitinophaga pinensis UQM 2034, DSM 2588 Dyadobacter fermentans NS 114, DSM 18053
Alicyclobacillus acidocaldarius acidocaldarius 104-IA,Halomicrobium mukohataei arg-2, DSM 12286
Kangiella koreensis SW-125, DSM 16069 Anaerococcus prevotii PC 1, DSM 20548
Saccharomonospora viridis P101, DSM 43017 Halorhabdus utahensis AX-2, DSM 12940
Pedobacter heparinus HIM 762-3, DSM 2366 Sebaldella termitidis ATCC 33386
Capnocytophaga ochracea DSM 7271 Sulfurospirillum deleyianum DSM 6946
Spirosoma linguale DSM 74 Streptobacillus moniliformis DSM 12112
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Improving Rosetta Stone Predictions
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Value of 100 diverse genomes III: Metagenomics
• Premise: – Increased sampling of diverse genomes should
improve many aspects of metagenomic analysis
• To test:– Annotation– Binning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Metagenomic Annotation Improves (Slightly)
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Compositional Binning Improves (Slightly)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Phylogenetic Binning Improves Slightly
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
AlphaproteobacteriaBetaproteobacteriaGammaproteobacteria
DeltaproteobacteriaEpsilonproteobacteria
Unclassified Proteobacteria
CyanobacteriaChlamydiae
AcidobacteriaBacteroidetesActinobacteria
Aquificae
PlanctomycetesSpirochaetes
FirmicutesChloroflexiChlorobi
Unclassified Bacteria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Value of 100 diverse genomes V: Phylogeny
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
16s Says Hyphomonas is in Rhodobacteriales
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Badger et al. 2005
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
WGT Says Its Related to Caulobacterales
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Badger et al. 2005
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
GEBA - After the Pilot
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PD of sequenced organisms
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PD with GEBA
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
Well sampled phyla
Poorly sampled
No cultured taxa
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Viruses
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Microbial Eukaryotes
As of 2002
Based on Hugenholtz, 2002
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Need experimental studies from across the tree too
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
MICROBES
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
A Happy Tree of Life
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.