Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority...
-
Upload
gillian-edwards -
Category
Documents
-
view
221 -
download
1
Transcript of Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority...
![Page 1: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/1.jpg)
Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority
Sourav ChatterjiUC Davis Genome [email protected]
![Page 2: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/2.jpg)
Background
![Page 3: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/3.jpg)
The Microbial World
![Page 4: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/4.jpg)
Exploring the Microbial World
• Culturing– Majority of microbes currently unculturable.– No ecological context.
• Molecular Surveys (e.g. 16S rRNA)– “who is out there?”– “what are they doing?”
![Page 5: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/5.jpg)
Environmental Shotgun Sequencing
![Page 6: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/6.jpg)
Interpreting Metagenomic Data
• Nature of Metagenomic Data– Mosaic– Fragmentary
• New Sequencing Technologies– Enormous amount of data– Short Reads
![Page 7: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/7.jpg)
Overview of Talk
• Metagenomic Binning• PhyloMetagenomics• The Big Picture/ Future Work
![Page 8: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/8.jpg)
Overview of Talk
• Metagenomic Binning– Background– CompostBin [to appear in RECOMB 2008]
• PhyloMetagenomics• The Big Picture
![Page 9: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/9.jpg)
Metagenomic Binning
Classification of sequences by taxa
![Page 10: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/10.jpg)
Current Binning Methods
• Assembly • Align with Reference Genome• Database Search [MEGAN, BLAST]• Phylogenetic Analysis• DNA Composition [TETRA,Phylopythia]
![Page 11: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/11.jpg)
Current Binning Methods
• Need closely related reference genomes.• Poor performance on short fragments.
– Sanger sequence reads 500-1000 bp long.– Current assembly methods unreliable
• Complex Communities Hard to Bin.
![Page 12: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/12.jpg)
Genome Signatures
• Does genomic sequence from an organism have a unique “signature” that distinguishes it from genomic sequence of other organisms?– Yes [Karlin et al. 1990s]
• What is the minimum length sequence that is required to distinguish genomic sequence of one organism from the genomic sequence of another organism?
![Page 13: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/13.jpg)
DNA-composition metrics
The K-mer Frequency MetricCompostBin uses hexamers
![Page 14: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/14.jpg)
• Working with K-mers for Binning.– Curse of Dimensionality : O(4K) independent
dimensions.– Statistical noise increases with decreasing
fragment lengths.• Project data into a lower dimensional space to
decrease noise.– Principal Component Analysis.
DNA-composition metrics
![Page 15: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/15.jpg)
PCA separates species
Gluconobacter oxydans[65% GC] and Rhodospirillum rubrum[61% GC]
![Page 16: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/16.jpg)
Effect of Skewed Relative Abundance
B. anthracis and L. monogocytes
Abundance 1:1 Abundance 20:1
![Page 17: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/17.jpg)
A Weighting Scheme
For each read, find overlap with other sequences
![Page 18: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/18.jpg)
A Weighting Scheme
Calculate the redundancy of each position.
4 5 5 3
Weight is inverse of average redundancy.
![Page 19: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/19.jpg)
Weighted PCA
• Calculate weighted mean µw :
• Calculates weighted co-variance matrix Mw
• Principal Components are eigenvectors of Mw.– Use first three PCs for further analysis.
å=
=N
1iiiXwwμ
Twi
N
1iwiiw )μ(X)μ(XwM --=å
=
![Page 20: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/20.jpg)
Weighted PCA separates species
B. anthracis and L. monogocytes : 20:1
PCA Weighted PCA
![Page 21: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/21.jpg)
Un-supervised Classification
![Page 22: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/22.jpg)
Semi-Supervised Classification
• 31 Marker Genes [courtesy Martin Wu]– Omni-present– Relatively Immune to Lateral Gene Transfer
• Reads containing these marker genes can be classified with high reliability.
![Page 23: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/23.jpg)
Semi-supervised Classification
Use a semi-supervised version of the normalized cut algorithm
![Page 24: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/24.jpg)
The Semi-supervised Normalized Cut Algorithm
1. Calculate the K-nearest neighbor graph (KNN-graph) from the point set.
2. Update the KNN-graph with information from marker genes.
3. Bisect the graph using the normalized-cut algorithm.
![Page 25: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/25.jpg)
Generalization to multiple bins
Gluconobacter oxydans [0.61], Granulobacter bethesdensis[0.59] and Nitrobacter hamburgensis
[0.62]
Apply algorithm
recursively
![Page 26: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/26.jpg)
Generalization to multiple bins
Gluconobacter oxydans [0.61], Granulobacter bethesdensis[0.59] and Nitrobacter hamburgensis
[0.62]
![Page 27: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/27.jpg)
Testing
• Simulate Metagenomic Sequencing– Variables
• Number of species• Relative abundance• GC content• Phylogenetic Diversity
• Test on a “real” dataset where answer is well-established.
![Page 28: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/28.jpg)
![Page 29: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/29.jpg)
Future Directions
• Holy Grail : Complex Communities• Semi-supervised methods
– More marker genes– Semi-supervised projection?
• Hybrid Methods– Assembly Information– Population Genetic Information
![Page 30: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/30.jpg)
Overview of Talk
• Metagenomic Binning• Phylo-Metagenomics
– Applications– Incorporating Alignment Accuracy
• The Big Picture/ Future Work
![Page 31: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/31.jpg)
Garcia Martin et al., Nat. Biotechnology (2006)
Population Structure of Communities
![Page 32: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/32.jpg)
Yooseph et al., PLoS Biology (2007)
Gene Family Characterization
![Page 33: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/33.jpg)
![Page 34: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/34.jpg)
Manual Masking
• Require skilled and tedious manual intervention
• Subjective and non-reproducible• Impractical for high throughput data
– Frequently ignored. “Garbage-in-and-garbage-out”
![Page 35: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/35.jpg)
Gblocks
![Page 36: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/36.jpg)
Probabilistic Masking using pair-HMMs
• Probabilistic formulation of alignment problem.
• Can answer additional questions– Alignment Reliability– Sub-optimal Alignments
Durbin et al., Cambridge University Press (1998)
![Page 37: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/37.jpg)
Probabilistic Masking
• What is the probability residues xi and yj are homologous?
• Posterior Probability the residues xi and yj are homologous
• Can be calculated efficiently for all pairs (and gaps) in quadratic time.
y]Pr[x,y]x,,yPr[x
]yPr[x jiji
à=à
![Page 38: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/38.jpg)
Scoring Multiple Alignments
• Calculate the “posterior probability matrix” and distances dij between every pair of sequences.
• Weighted “sum of pairs” score for column r :
å
å à
ji,ij
jiji,
ij
d
]rPr[rd
![Page 39: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/39.jpg)
Testing
The Balibase 3.0 Benchmark Database
![Page 40: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/40.jpg)
Testing
• Realign sequences using MSA programs like Clustalw.
• Sensitivity: for all correctly aligned columns, the fraction that has been masked as good
• Specificity: for all incorrectly aligned columns, the fraction that has been masked as bad
![Page 41: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/41.jpg)
Performance
Gblocks
Prob Mask
Sensitivity Specificity
97% 93%
53% 94%
![Page 42: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/42.jpg)
The Final Result
A Phylogenetic Database/Pipeline (with Martin Wu)
![Page 43: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/43.jpg)
Overview of Talk
• Metagenomic Binning • Phylo-Metagenomics• The Big Picture/ Future Work
![Page 44: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/44.jpg)
Population Structure
Venter et al. , Science (2004)
How to integrate information from multiple markers?
![Page 45: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/45.jpg)
Species-species Interactions
![Page 46: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/46.jpg)
Interactions in Microbial Communities
![Page 47: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/47.jpg)
Time Series Data
Ruan et al., Bioinformatics (2006)
![Page 48: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/48.jpg)
Interaction Networks in Microbial Communities
Ruan et al., Bioinformatics (2006)
![Page 49: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/49.jpg)
Functional Profiling
Prediction of Gene Function Prediction of Metabolic Pathway
![Page 50: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/50.jpg)
Functional Profiling (with Binning)
McCutcheon and Moran PNAS.(2007)
![Page 51: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/51.jpg)
Single Cell Genomics
Hutchinson and Venter, Nature Biotechnology (2006)
![Page 52: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/52.jpg)
Single Cell Genomics
Reads From Single Cell “Simulated” Contamination
![Page 53: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/53.jpg)
The Big PictureMicrobial Community
Metagenomic Sampling Single Cell Genomics
Population Structure Functional Profiling
Species Interaction Network
Time Series Data
![Page 54: Computational Metagenomics: Algorithms for Understanding the "Unculturable" Microbial Majority Computational Metagenomics: Algorithms for Understanding.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e745503460f94b74bc9/html5/thumbnails/54.jpg)
Acknowledgements
UC Davis• Jonathan Eisen • Martin Wu• Dongying Wu• Ichitaro Yamazaki• Amber Hartman• Marcel Huntemann
UC Berkeley• Lior Pachter• Richard Karp• Ambuj Tewari• Narayanan Manikandan
Princeton University• Simon Levin• Josh Weitz• Jonathan Dushoff