Post on 07-Jul-2015
description
Morgan Langille
Dalhousie University
July 10, 2012
16S rRNA gene
Standard marker gene for bacterial and
archaeal species identification
Recent widespread use in metagenomic
microbiome surveys
Limited to telling us: “who is there?”
Using 16S anonymously
16S reads often clustered into OTUs
Alpha diversity
Beta diversity
Rarefaction
Biogeography
Bik et al., 2012
What is in a name?
Real names vs OTU1234
Lee et al. 2010
What is in a name?
Real names vs OTU1234
Haloferax
Lee et al. 2010
What is in a name?
Real names vs OTU1234
Haloferax
Prochlorococcus
Lee et al. 2010
What is in a name?
Real names vs OTU1234
Haloferax
Prochlorococcus
Bacillus
Lee et al. 2010
Extending 16S to functions
Metagenomics: “What are they doing?”
Requires WGS sequencing
More costly
Use microbial databases
~3500 genomes
IMG
NCBI
Etc.
Find genome Functional
Information
• KEGG
• PFAM
• EC
• SEED
• Etc.
• 16S gene
• Or Other
Marker Gene
PICRUST Phylogenetic Investigation of
Communities by Reconstruction of
Unobserved STates
http://picrust.sourceforge.net
PICRUST: Predicting genomes
Reference 16S
Tree
(Green Genes)
Genome Trait
Table
(e.g. KEGG, 16S
copy number)
Prune taxa with
no genome
information
Infer
ancestral
genome traits
Predict
genome
compositions
PICRUST: Predicting metagenomes
OTU Table
(16S by Sample)
16S Copy Number
Predictions
(per genome)
Functional Trait
Predictions
(per genome)
Normalize OTU Table Predict Metagenome
Functional Traits
Functions by
Sample
Ancestral State Reconstruction
Needs to accept continuous data
Must run fast! (8000 traits across 3500 genomes)
Wagner Parsimony (Count software; Csuos, 2010)
ACE (APE R Library; Paradis, 2004)
PIC
ML
REML
Accuracy for metagenome prediction
1. Obtain metagenomic projects with both
WGS and 16S only sequencing
2. Make functional predictions using
PICRUST with 16S only data
3. Compare predictions with WGS data
ASR methods on metagenomics
HMP Mock Community (known organisms sequenced)
All methods give similar results except for “ACE ML” known problem
and recently added “REML” method solves problem
R2= 0.92 R2= 0.91
R2= 0.92 R2= 0.72
Wagner Parsimony ACE PIC
ACE REML ACE ML
Accuracy on metagenomes
Accuracy across various HMP sites
Accuracy for genome prediction
1. Pretend a genome has not been sequenced
2. Predict genome composition using PICRUST
3. Compare predictions to real data
4. Repeat for all genomes
Accuracy depends on distance to
closest sequenced genome
R2=-0.72
Accuracy across the TOL
http://itol.embl.de/shared/mlangill
Staphylococcus aerues
E. coli
Accuracy depends on type of functional category
PICRUST Accuracy
Possible applications
1. 16S only microbiome studies Make hypotheses about the functions they encode
2. Complete metagenomic studies Compare functions we “observe” to what we would expect
based on species present
3. Aid other metagenomic computational methods Binning
Metabolic reconstruction
4. Insight into correlation between species & function For different taxonomic groups
For different functional classes
Acknowledgements
Rob Beiko
Curtis Huttenhower
Rob Knight
Jesse Zaneveld
Greg Caporaso
Joshua Reyes
Dan Knights
Daniel McDonald